A Geometric Approach to Multiple Target Tracking Using Lie

Brigham Young University Brigham Young University

BYU ScholarsArchive BYU ScholarsArchive

Theses and Dissertations

2021-12-13

A Geometric Approach to Multiple Target Tracking Using Lie A Geometric Approach to Multiple Target Tracking Using Lie

Groups Groups

Mark E. Petersen Brigham Young University

Follow this and additional works at: https://scholarsarchive.byu.edu/etd

Part of the Engineering Commons

BYU ScholarsArchive Citation BYU ScholarsArchive Citation Petersen, Mark E., "A Geometric Approach to Multiple Target Tracking Using Lie Groups" (2021). Theses and Dissertations. 9354. https://scholarsarchive.byu.edu/etd/9354

This Dissertation is brought to you for free and open access by BYU ScholarsArchive. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of BYU ScholarsArchive. For more information, please contact [email protected].

http://home.byu.edu/home/

http://home.byu.edu/home/

https://scholarsarchive.byu.edu/

https://scholarsarchive.byu.edu/etd

https://scholarsarchive.byu.edu/etd?utm_source=scholarsarchive.byu.edu%2Fetd%2F9354&utm_medium=PDF&utm_campaign=PDFCoverPages

https://network.bepress.com/hgg/discipline/217?utm_source=scholarsarchive.byu.edu%2Fetd%2F9354&utm_medium=PDF&utm_campaign=PDFCoverPages

https://scholarsarchive.byu.edu/etd/9354?utm_source=scholarsarchive.byu.edu%2Fetd%2F9354&utm_medium=PDF&utm_campaign=PDFCoverPages

mailto:[email protected]

A Geometric Approach to Multiple Target Tracking

Using Lie Groups

Mark E. Petersen

A dissertation submitted to the faculty ofBrigham Young University

in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

Randal W. Beard, ChairCammy K. PetersonTim W. McLainMarc D. Killpack

Department of Electrical and Computer Engineering

Brigham Young University

Copyright © 2021 Mark E. Petersen

All Rights Reserved

ABSTRACT

A Geometric Approach to Multiple Target TrackingUsing Lie Groups

Mark E. PetersenDepartment of Electrical and Computer Engineering, BYU

Doctor of Philosophy

Multiple target tracking (MTT) is the process of localizing targets in an environmentusing sensors that perceive the environment. MTT has many applications such as wildlifemonitoring, air traffic monitoring, and surveillance. These applications motivate furtherresearch in the different challenging aspects of MTT. One of these challenges that we willfocus on in this dissertation is constructing a high fidelity target model.

A common approach to target modeling is to use linear models or other simplifiedmodels that do not properly describe the target’s pose (position and orientation), motion,and uncertainty. These simplified models are typically used because they are easy to im-plement and computationally efficient. A more accurate approach that improves trackingperformance is to define the target model using a geometric representation of the target’snatural configuration manifold. In essence, this geometric approach seeks to define a tar-get model that can express every pose and motion of the target while preserving geometricproperties such as distances and angles.

We restrict our discussion of MTT to objects that move in physical space and canbe modeled as a rigid body. This restriction allows us to construct generic geometric targetmodels defined on Lie groups. Since not every Lie group has additional structure thatpermits vector space arithmetic like Euclidean space, many components of MTT such asdata association, track initialization, track propagation and updating, track association andfusing, etc, must be adapted to work with Lie groups.

The main contribution of this dissertation is the presentation of a novel MTT algo-rithm that implements the different MTT components to work with target models defined onLie groups. We call this new algorithm, Geometric Multiple Target Tracking (G-MTT). Thisdissertation also serves as a guide on how other MTT algorithms can be modified to workwith geometric target models. As part of the presentation there are various experimentalresults that strengthen the argument that a geometric approach to target modeling improvestracking performance.

Keywords: multiple target tracking, data association, PDA, IPDA, LG-IPDA, Lie group,track fusion, track association, centralized measurement fusion, Recursive RANSAC

ACKNOWLEDGMENTS

I am grateful for the support of the Center for Unmanned Aircraft Systems (C-UAS),

a National Science Foundation Industry/University Cooperative Research Center (I/UCRC)

under NSF award Numbers IIP-1161036 and CNS-1650547, and the support of the C-UAS

industry members who have made possible my research on geometric target tracking.

I am thankful for the mentorship, guidance and support of Dr. Randal Beard, whose

expertise and patience empowered me to become a more competent student and researcher.

His contagious enthusiasm for knowledge and theory inspired me to broaden and deepen

my own understanding, which greatly influenced my research. I am also grateful for the

other professors on my committee who have helped me through their teachings, and have

charitably given of their time and expertise to guide me in my research.

My journey as a researcher would not have been nearly as enriching and rewarding

without my fellow students who have helped and encouraged me immensely throughout

these past few years. Specifically Parker Lusk who helped me immensely with programming

and other theoretical concepts, Devon Morris who I have conversed with numerous times

on mathematical theory and other subjects, Jacob Johnson who was willing to review and

provide feedback on my work, and Jaron Ellingson for collaborating with me on the project

Multiple Target Tracking on SE (3) presented in Chapter 13.

I am grateful for the love and support of my family who have encouraged me during

this undertaking. I am especially grateful for my wife Nicole Petersen, for her patience and

time that she has given me to pursue my PhD, for the love and encouragement she freely

offers me, and for all acts of kindness that I never knew. Above all, I am indebted to God

who has given me all that I have and has helped me accomplish all that I have done.

TABLE OF CONTENTS

Title Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

Nomenclature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii

Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Target Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.1.1 Why Non-Geometric, Linear Models Sometimes Fail . . . . . . . . . . 41.1.2 Why Non-Geometric Non-Linear Models Sometimes Fail . . . . . . . 101.1.3 Interactive Multiple Models . . . . . . . . . . . . . . . . . . . . . . . 161.1.4 Lie Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.2 Tracking Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.2.1 Nearest Neighbor Filter . . . . . . . . . . . . . . . . . . . . . . . . . 191.2.2 Global Nearest Neighbor Filter . . . . . . . . . . . . . . . . . . . . . 191.2.3 Probabilistic Data Association Filter . . . . . . . . . . . . . . . . . . 201.2.4 Joint Probabilistic Data Association Filter . . . . . . . . . . . . . . . 201.2.5 Multiple Hypothesis Tracking . . . . . . . . . . . . . . . . . . . . . . 211.2.6 Probabilistic Multi-Hypothesis Tracker . . . . . . . . . . . . . . . . . 221.2.7 Probabilistic Hypothesis Density Algorithm . . . . . . . . . . . . . . 221.2.8 KALMANSAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231.2.9 R-RANSAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

1.3 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251.4 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Chapter 2 Overview of Geometric MTT . . . . . . . . . . . . . . . . . . . . 27

2.1 Sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.2 Data Management: Items 1-5 in Fig. 2.1 . . . . . . . . . . . . . . . . . . . . 302.3 Track Initialization: Items 6-8 in Fig. 2.1 . . . . . . . . . . . . . . . . . . . . 312.4 Track Management: Items 9-12 in Fig. 2.1 . . . . . . . . . . . . . . . . . . . 322.5 Summary of Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Chapter 3 Lie Group Review . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.1 Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

iv

3.1.1 Group Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.2 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.3 Riemannian Manifold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.4 Lie Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.4.1 Adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.4.2 Exponential Map at Identity . . . . . . . . . . . . . . . . . . . . . . . 473.4.3 Jacobians of the Exponential Map . . . . . . . . . . . . . . . . . . . . 503.4.4 Direct Product Group . . . . . . . . . . . . . . . . . . . . . . . . . . 513.4.5 First Order Taylor Series and Partial Derivatives . . . . . . . . . . . 523.4.6 Expressing Uncertainty on Lie Groups . . . . . . . . . . . . . . . . . 53

Chapter 4 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.1 System Affinization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.2 Transforming Measurements and States . . . . . . . . . . . . . . . . . . . . . 614.3 Extending the Observation Function with State Transformations . . . . . . . 64

Chapter 5 Measurement Management . . . . . . . . . . . . . . . . . . . . . . 66

Chapter 6 Lie Group Integrated Probabilistic Data Association Filter . 70

6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726.2 Prediction Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766.3 Data Association . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 786.4 Update Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.4.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 896.5 Experiment and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 906.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

Chapter 7 Centralized Measurement Fusion . . . . . . . . . . . . . . . . . . 95

7.1 Parallel Centralized Measurement Fusion with the LG-IPDAF Overview . . . 977.2 Augmented System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1007.3 MS-LG-IPDAF Update Step . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

7.3.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1087.4 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

7.4.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1107.4.2 Simulated Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 114

7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

Chapter 8 Track Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

8.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1208.2 RANSAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

8.2.1 Generating a State Hypothesis . . . . . . . . . . . . . . . . . . . . . . 1238.2.2 Finding the Inlier Measurements to a State Hypothesis . . . . . . . . 1268.2.3 Computing the Score of a State Hypothesis . . . . . . . . . . . . . . . 1278.2.4 Initializing a New Track . . . . . . . . . . . . . . . . . . . . . . . . . 128

v

Chapter 9 Track-to-Track Association and Fusion . . . . . . . . . . . . . . 129

9.1 Track-to-Track Association . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1309.2 Track-to-Track Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

Chapter 10 Measurement Latency . . . . . . . . . . . . . . . . . . . . . . . . . 137

10.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13810.2 Updating With a Latent Measurements . . . . . . . . . . . . . . . . . . . . . 141

Chapter 11 End-To-End MTT Framework . . . . . . . . . . . . . . . . . . . . 144

11.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14511.2 Visual Front End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

11.2.1 KLT Feature Tracker . . . . . . . . . . . . . . . . . . . . . . . . . . . 14811.2.2 Estimate Homography . . . . . . . . . . . . . . . . . . . . . . . . . . 14811.2.3 Estimate Essential Matrix . . . . . . . . . . . . . . . . . . . . . . . . 15111.2.4 Moving / Non-moving Segmentation . . . . . . . . . . . . . . . . . . 153

11.3 R-RANSAC Multiple Target Tracker . . . . . . . . . . . . . . . . . . . . . . 15511.3.1 Data Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15611.3.2 Track Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15811.3.3 Track Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

11.4 Track Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16111.5 Target Following Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

11.5.1 Following Multiple Targets . . . . . . . . . . . . . . . . . . . . . . . . 16411.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16511.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

Chapter 12 Tracking on SE(2) Using a Monocular Camera . . . . . . . . . 167

12.1 Normalized Virtual Image Plane . . . . . . . . . . . . . . . . . . . . . . . . . 16812.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

12.2.1 System Affinization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17112.3 Proving Local Observability . . . . . . . . . . . . . . . . . . . . . . . . . . . 17212.4 Seeding Track Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . 17512.5 Transforming Measurements and Tracks . . . . . . . . . . . . . . . . . . . . 17712.6 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18212.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

Chapter 13 Tracking on SE(3) Using a Monocular Camera . . . . . . . . . 185

13.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18613.1.1 System Model SE(3)-CV . . . . . . . . . . . . . . . . . . . . . . . . . 18613.1.2 System Model LTI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

13.2 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18813.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

Chapter 14 Conclusion and Future Research . . . . . . . . . . . . . . . . . . 193

14.1 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19514.1.1 Higher Order System Model . . . . . . . . . . . . . . . . . . . . . . . 195

vi

14.1.2 Measurement Management . . . . . . . . . . . . . . . . . . . . . . . . 196

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

Appendix A Proofs For The LG-IPDAF . . . . . . . . . . . . . . . . . . . . . . 208

A.1 Proof of Lemma 6.2.2: Prediction Step . . . . . . . . . . . . . . . . . . . . . 208A.2 Proof of Lemma 6.4.1: Association Events . . . . . . . . . . . . . . . . . . . 210A.3 Proof of Lemma 6.4.2: Split Track Update . . . . . . . . . . . . . . . . . . . 213A.4 Proof of Lemma 6.4.3: Track Likelihood . . . . . . . . . . . . . . . . . . . . 217

Appendix B Proofs For The MS-LG-IPDAF . . . . . . . . . . . . . . . . . . . 220

B.1 Proof of Lemma 7.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220B.2 Proof of Lemma 7.3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223B.3 Proof of Lemma 7.3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

Appendix C Common Lie Groups . . . . . . . . . . . . . . . . . . . . . . . . . . 227

C.1 Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227C.1.1 Adjoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228C.1.2 Exponential Map and Jacobians . . . . . . . . . . . . . . . . . . . . . 228

C.2 SO(2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228C.2.1 Adjoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229C.2.2 Exponential Map and Jacobians . . . . . . . . . . . . . . . . . . . . . 230

C.3 SO(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230C.3.1 Adjoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231C.3.2 Exponential Map and Jacobians . . . . . . . . . . . . . . . . . . . . . 231

C.4 SE(2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232C.4.1 Adjoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233C.4.2 Exponential Map and Jacobians . . . . . . . . . . . . . . . . . . . . . 233

C.5 SE(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234C.5.1 Adjoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235C.5.2 Exponential Map and Jacobians . . . . . . . . . . . . . . . . . . . . . 236

Appendix D Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

vii

LIST OF TABLES

6.1 Statistical measures from the experiment . . . . . . . . . . . . . . . . . . . . . . 93

13.1 The performance measures of the experiment . . . . . . . . . . . . . . . . . . . 19013.2 Linear Least Squares Regression Results . . . . . . . . . . . . . . . . . . . . . . 191

viii

LIST OF FIGURES

1.1 A depiction showing how a pendulum can be geometrically represented by theunit circle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 A depiction of tracking a car moving in a fairly straight line using the LTI-CVmodel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 A depiction of tracking a car transitioning into a turn using the LTI-CV model. 71.4 A depiction of why the state estimate of the target drifts from the true state when

using the LTI-CV model to track a target that is turning. . . . . . . . . . . . . 71.5 A depiction of how the size of the error covariance (green ellipse) relative to the

measurement noise covariance (red ellipse) affects the update to the state estimatedepicted by the arrow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.6 A depiction of the negative affect of increasing the process noise covariance. Thelarger the process noise is, the more challenging it is to associate the propermeasurement to the target. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.7 A depiction of the CTRV model configuration where the yellow arrow denotesthe orientation of the car with respect to the x and y axes. . . . . . . . . . . . . 11

1.8 A depiction showing the relation between the unit circle S1 and the local repre-sentation φ (U). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.9 A depiction of a Gaussian distribution of φ (U) and corresponding Gaussian dis-tribution on S1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.10 A depiction of showing how the CTRV model does not preserve the shape andsize of the the error state’s probability density function. . . . . . . . . . . . . . . 16

2.1 The architecture of the G-MTT algorithm. . . . . . . . . . . . . . . . . . . . . . 29

3.1 A depiction of two compatible charts (φ1, U1) and (φ2, U2) where T is a topologicalmanifold, U1 and U2 are not disjoint sets, and Rn denotes Euclidean space. . . . 39

3.2 A depiction the smooth manifold M and tangent spaces at the points m1, m2,and m3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.3 A depiction of the two curves γ1 : [0, 1] → M and γ2 : [0, 1] → M that begin atthe point m1 ∈ M and end at the point m2 ∈ M , and their respective tangentvectors at different points along the curve. . . . . . . . . . . . . . . . . . . . . . 42

3.4 A depiction of the relationship between a Lie group G, its Lie algebra g, and itscorresponding Cartesian algebraic space RG. . . . . . . . . . . . . . . . . . . . . 46

3.5 A depiction of a geodesic from g1 ∈ G to g2 ∈ G in the direction of v ∈ RG usingthe exponential map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.6 A depiction of the Gaussian distribution. . . . . . . . . . . . . . . . . . . . . . . 54

5.1 A depiction of the trajectory of three targets being observed in two dimensionalspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.2 An illustration of possible clusters pertaining to the scenario depicted in Fig. 5.1. 68

6.1 A depiction of the challenge of identifying which track represents the target. . . 716.2 A depiction of a single iteration of the LG-IPDAF . . . . . . . . . . . . . . . . . 756.3 A depiction of geodesic track fusion . . . . . . . . . . . . . . . . . . . . . . . . . 86

ix

6.4 Plots of the zoomed in circular trajectories for the LTI-CV and SE(2)-CV models. 926.5 Plots of the meandering trajectories for the LTI-CV and SE(2)-CV models. . . 926.6 Plots of linear trajectories for the LTI-CV and SE(2)-CV models. . . . . . . . . 93

7.1 A depiction of centralized fusion. . . . . . . . . . . . . . . . . . . . . . . . . . . 967.2 A depiction of the parallel and sequential fusion. . . . . . . . . . . . . . . . . . . 967.3 Illustration of the experiment with the MS-LG-IPDAF . . . . . . . . . . . . . . 1157.4 A comparison between the parallel and sequential fusion. . . . . . . . . . . . . . 1167.5 The average Euclidean error and the probability of detection as a function of the

number of false measurements per sensor scan from each sensor. . . . . . . . . . 1177.6 The average Euclidean error and the probability of detection as a function of the

relative angle variance in the measurement. . . . . . . . . . . . . . . . . . . . . 118

8.1 An overview of the track initialization process. . . . . . . . . . . . . . . . . . . . 1208.2 A depiction of a cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1238.3 An overview of the RANSAC algorithm. . . . . . . . . . . . . . . . . . . . . . . 1248.4 A depiction of the inliers to the current state hypothesis. . . . . . . . . . . . . . 127

9.1 A depiction of two coalescing tracks that represent the same target. . . . . . . . 1299.2 A depiction of the track fusion process. . . . . . . . . . . . . . . . . . . . . . . . 133

10.1 A depiction of measurement latency. . . . . . . . . . . . . . . . . . . . . . . . . 138

11.1 A depiction of the end-to-end MTT tracking scenario. . . . . . . . . . . . . . . . 14411.2 Target tracking and following architecture. . . . . . . . . . . . . . . . . . . . . . 14611.3 A depiction of tracking a car using a monocular camera fixed to a UAV. . . . . 14711.4 A depiction showing how the food features to track and the KLT feature tracker

work to find matching points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14911.5 The geometry for the derivation of the homography matrix between two camera

poses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15011.6 A depiction of applying the homography to the scenario in Fig. 11.4c. . . . . . . 15111.7 Epipolar geometry. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15211.8 Motion detection using the homography matrix. Matching features are shown in

red and blue. The set Mink is shown in blue, and the set Mout

k is shown in red. . 15411.9 Motion detection using the essential matrix. Matching pairs in Mout

k are shownin blue and red, where the red features are in Mmoving

k . . . . . . . . . . . . . . . 15511.10A depiction of the first few steps of RANSAC . . . . . . . . . . . . . . . . . . . 15911.11A depiction of generating multiple state hypotheses and their corresponding tra-

jectory hypotheses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16011.12A notional depiction of the camera frame and the virtual camera frame. . . . . . 16211.13The X and Y errors are in the normalized virtual image plane in units of meters

and the yaw error is in units of radians. . . . . . . . . . . . . . . . . . . . . . . . 165

12.1 A depiction of three targets being tracked by a UAV via a monocular camera. . 16712.2 An illustration of the relationship between the image plane and the virutal image

plane. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

x

12.3 The image shows the number of valid tracks and the number of missed targets ateach time step of the simulation. The x-axis is time in seconds. . . . . . . . . . 183

13.1 A visualization of the experiment tracking targets on SE (3). . . . . . . . . . . . 190

A.1 A depiction of the state estimate update conditioned on θk,j by using µ−k|k,j to

form a geodesic from xk|k− to xk|k,j. . . . . . . . . . . . . . . . . . . . . . . . . . 217

xi

NOMENCLATURE

General ti An instance in time in secondsti:j Time interval from time ti to time tjTW Time window intervalIn×n The n× n identity matrix0m×n The m× n zero matrix1m×n The m× n ones matrixEnd (y) An endomorphism on the space y

Lie groupG A generic Lie groupg A generic Lie algebra corresponding to GR A generic algebraic Cartesian space corresponding to GGy The Lie group of object ygy The Lie algebra of object yRy The algebraic Cartesian space of object y

ExpGy

I : Ry → Gy The exponential map at identity of the Lie group Gy

LogGy

I : Gy → Ry The logarithm map at identity of the Lie group Gy

JGyr : Ry → End (Ry) The right Jacobian corresponding to Gy

JGy

l : Ry → End (Ry) The left Jacobian corresponding to Gy

AdGyg : Ry → Ry The matrix adjoint of g ∈ Gy on Ry

adGyv : Ry → Ry The matrix adjoint of v ∈ Ry on Ry

SensorS =

1, 2, . . . , NS

The indexing set of sensors

Si ⊆ S The indexing set that produced measurements at time tisi ∈ S The ith sensorP siD The probability of detection for sensor siλsi The spacial density of false measurements for sensor siP siG The gate probability for sensor si

Vsii The volume of the validation region at time ti for sensor si

System Model

xi = (gi, vi) ∈ Gx ≜ G×R The target’s state at time tiqi:j =

(qgi:j, q

vi:j

)∈ Rx The process noise during the time interval ti:j

Q (ti:j) Process noise covariance during the time interval ti:jf : Gx ×Rx × R → Gx State transition functionFi:j : Rx → Rx Derivative of f w.r.t. the state over the time interval ti:jGi:j : Rx → Rx Derivative of f w.r.t. the process noise during ti:jzsii ∈ Gsi A measurement from sensor si obtained at time tirsii ∈ Rsi Measurement noise from sensor si at time tiRsi Measurement noise covariance for sensor sihsi : Gx ×Rsi → Gsi Observation function for sensor siHsii : Rx → Rsi Derivative of hsi w.r.t. the state at time ti

xii

V sii : Rsi → Rsi Derivative of hsi w.r.t. the measurement noise at time ti

Measurementsmsii ∈ R Number of track associated measurements from si at time ti

Msii = 1, 2, . . . ,msi

i Indexing set of track associated measurements from sensor siat time ti

zsii,j The jth track associated measurement from si at time ti

Zsii = zi,jm

sii

j=1 The set of track associated measurements from si at time ti

Zsi0:i =

Zsij

i

j=0The set of track associated measurements from sensor si

and from the initial time up to time ti

Metricsdsi,sjCD : Gsi ×Gsj → R Cluster distance metric between measurements from sensors

si and sjdCT : R× R → R Cluster time metricdCV : Gsi ×Gsj × R2 → R Cluster velocity metricdsiV : Gsi ×Gx × Rn×n → R The validation region metric w

dTA : Gx2 × (Rn×n)

2 → R Track association metricdsiRI : Gsi ×Gx RANSAC inlier metric for sensor si

ThresholdsτCD The cluster distance metric thresholdτCT The cluster time metric thresholdτCV The cluster velocity metric thresholdτCM The number of measurements a cluster needs to be

used with RANSACτ siV The validation region metric threshold for siτTA The track association metric thresholdτTR Track rejection thresholdτTC Track confirmation thresholdτRI RANSAC inlier thresholdτRMI RANSAC maximum iterationsτRSS RANSAC score stopping criteriaτRmS RANSAC minimum score criteriaτRSC RANSAC subset cardinality

EstimationN (χ,Σ) Gaussian distribution with mean χ and covariance Σx ∈ Gx Estimated stateP ∈ Rn×n Error covariance of the estimated statex ∼ N (µ, P ) ∈ Rx Error statep (ϵ) The track likelihood

xiii

Subscripts[ ]i The value of an object at time ti[ ]i|j The value of an object at time ti conditioned on

measurements up to time tj[ ]i,j The value of the jth object at time ti[ ]i|j,k The value of the kth object at time ti conditioned on

measurements up to time tj

xiv

CHAPTER 1. INTRODUCTION

Consider a multirotor flying overhead with the objective of identifying, locating, and

tracking objects that are moving below with a camera. We refer to these objects as targets,

which can be people, vehicles, wildlife, etc. The task of tracking the targets is called multiple

target tracking (MTT). While we will mainly focus on using a fixed camera mounted to a

multirotor as our motivating example, there is a wide variety of other MTT applications

such as wildlife monitoring [1], [2], battlefield surveillance, search and rescue operations [3]–

[5], missile defense, traffic and airspace monitoring [6], [7], and many more. These different

applications might require different frameworks: hardware, sensors, etc; however, the core

components of MTT are the same.

To facilitate our discussion of MTT algorithms, we break the algorithm down into

four main components. The first component is composed of the mathematical representation

of the target’s state (e.g. position, orientation, velocity, etc) and dynamics (i.e. motion),

which we refer to as the target model1, and the mathematical representation of the sensor(s)

that observe the target(s), which we refer to as the sensor model . Together, these two models

form the first component called the system model . MTT algorithms store information about

a target such as the state estimate and error covariance in a representation of the target

called a track .

We refer to the second component as data management. Data management consists of

three processes: data association, track propagation, and track updating. Data association

is the process of associating measurements to tracks or a data structure if the measurement

does not belong to a track. Track propagation is the process of propagating the track forward

in time to reflect the progression of time, and track updating is the process of incorporating

1Many of the italicised terms are definitions that can be found in Appendix D

1

track associated measurements to refine the track’s state estimate, error covariance and other

target related information.

The third component of MTT is the process of using non-track associated measure-

ments to create new tracks. This process is called track initialization. The last and forth

component is track management. This component is responsible for merging, confirming,

and rejecting tracks. Merging tracks is the process of combining the information from two or

more tracks deemed to represent the same target into a single track. A confirmed track is a

track with a high probability of representing an actual target, and a rejected track is a track

with a sufficiently low probability of representing a target that it is removed (i.e. deleted).

A target model is used to express the target’s state, motion, and the corresponding

model uncertainty due to inaccuracies in the model. Two challenges in modeling a target are

accounting for the forces acting on the target and capturing the wide variety of the target’s

motions such as straight lines, curvy paths, rotations, or any combination of the three. For

example, consider tracking a car restricted to a plane using a camera. The car can move in a

straight line down the road or a curve as it makes a turn. As the car undergoes these different

motions, forces acting on it cause it to accelerate and decelerate, and these forces cannot be

directly measured by the camera. Since the forces are unknown, they are modeled as input

noise or part of the process noise. Because of the unknown forces acting on the target, the

target model is often designed to be a constant velocity, acceleration, or higher-order model

that is driven by a white, Gaussian process representing the unknown forces.

Many of the common target models used only capture a single motion. For example,

the linear, time-invariant, constant velocity (LTI-CV) model [8] describes linear motion, and

the constant turn rate and velocity model (CTRV) describes turning motions that have a

constant turn rate and velocity [9]. The LTI-CV model works well in tracking a car moving

in a straight line down the road, but it struggles when the car turns. Likewise, the CTRV

model works well in tracking a turning car, but it struggles when a car moves in a straight

line down the road. A more accurate approach would be to use a single target model that

describes all of the target’s motions.

To define a target model that describes all of the target’s motions, we must con-

sider the space that the target naturally resides in and represent it mathematically. The

2

representation of the target’s natural space must be able to represent every possible state

and motion of the target and must maintain all geometric relations between the different

states such as angles and distances. For example, the natural space of a planar pendulum

can be represented geometrically using a circle with a fixed radius where a point on the

circle represents the position of the pendulum as shown in Fig.1.1. The circle preserves

the geometry of the planar pendulum since we can measure angles and distances between

points on a circle. Another example that is harder to visualize is a car, but it too can be

described using a geometric representation which we will discuss later in more detail. We

refer to target models that preserve these geometric properties as geometric target models

and all other target models as non-geometric target models. For a more detailed definition of

geometric target models, we refer the reader to [10], which applies geometric target models

to geometric mechanics.

Figure 1.1: A depiction showing how a pendulum can be geometrically represented by the unitcircle.

We further motivate the use of using geometric target models in Section 1.1. In

Section 1.2 we review several tracking algorithms that can be adapted to geometric target

models, and in Section 1.3 we present the objective of the dissertation and summary of

contributions. There are a lot of technical terms in target tracking literature. When we

introduce one of these terms, we will write the term in bold font. Many of these terms can

be found in the glossary at the end of the document for future reference.

3

1.1 Target Model

There has been an increasing amount of research in modeling targets using a geometric

approach for a significant improvement in both control and estimation since geometric target

models are a more accurate representation than non-geometric target models [11]–[16].

To motivate the use of geometric target models, we will discuss some of the short-

comings and failures of non-geometric target models that are overcome by geometric target

models. We restrict our discussion to discrete, time-invariant target models whose uncer-

tainty in the target’s state estimate and motion is represented using a Gaussian distribution.

The uncertainty in motion is modeled using a white, zero mean, Gaussian distribution called

the process noise, and the uncertainty in the target’s state estimate is modeled using a

white, zero-mean, Gaussian distribution whose covariance is referred to as the error covari-

ance. Even though Gaussian distributions do not always accurately depict the uncertainty,

it is often a good enough assumption provided that the signal-to-noise (SNR) is high enough.

This is the assumption used with the Kalman filter or any of its variations. For an in depth

survey on different target models used for MTT, we refer the reader to the series Survey of

Maneuvering Target Tracking parts I-VI [8], [17]–[21].

We proceed to first discuss several issues with non-geometric, linear models that

cause these target models to sometimes fail, and then we will discuss several issues with

non-geometric, non-linear models that cause these target models to sometimes fail.

1.1.1 Why Non-Geometric, Linear Models Sometimes Fail

To demonstrate several issues with non-geometric, linear, time-invariant models, we

will apply the linear, time-invariant, constant-velocity (LTI-CV) model to our motivating

example of tracking a car. The LTI-CV model assumes that the target’s state can be rep-

resented as position and translational velocity and that the target’s motion is linear. The

LTI-CV model has no notion of orientation or angular velocity. Due to this limitation, it

cannot express every state and motion of the car; hence, it is a non-geometric model in this

case.

4

The LTI-CV model works well when the car moves in a fairly straight line, but

struggles when the car turns. To demonstrate this, we will illustrate tracking a single car in

two scenarios. In the first scenario, the car moves in a straight line. In the second scenario,

the car transitions from moving in a straight line to turning. We will break each scenario

into two steps: the prediction step and the update step. In the prediction step, the target

model is used to propagate the target’s state estimate and error covariance forward in time.

In this step, the volume of the error covariance increases to reflect the inaccuracies in the

target model (e.g. the forces are unknown), and the propagated state estimate drifts farther

away from the target’s true state due to inaccuracies in the target model. In the update

step, sensor measurements, which contain information about the target’s true state, are used

to improve the state estimate and decrease the error covariance.

The first scenario is depicted in Fig 1.2. In this figure, the target is represented by

the car, the measurement is represented by a red circle, the target’s state estimate and error

covariance are represented by the green ellipse with the center of the ellipse denoting the

state estimate and the area of the ellipse representing one standard deviation of the error

covariance. In the top left image, the car is moving down the road in a straight line, and

the state estimate is aligned with the target. In the top right image, the car moves forward,

and the state estimate and error covariance are propagated forward in time according to the

target model. Notice that since the target model captures linear motion well, the center of

the green ellipse drifts only slightly from the target’s true position. The volume of the error

covariance increases to reflect the uncertainty in the target’s motion. In the bottom left

image, a point measurement is received from the camera. In the bottom right image, the

measurement is used to improve the state estimate. This is shown by the ellipse shrinking

and being more centered around the target.

The second scenario is depicted in Fig 1.3. In the top left image the target and state

estimate are aligned. In the top right image, the car begins to turn and the state estimate and

error covariance are propagated forward in time. Notice that the state estimate continues

to move straight according to the LTI-CV model and is starting to drift significantly from

the target. In the bottom left image, the camera observes the car and produces a point

measurement. In the bottom right image, the point measurement is used to update the

5

1) Initial conditions 2) Prediction step

3) Receive measurement 4) Update step

Target State estimate and errorcovariance

Measurement

Figure 1.2: A depiction of tracking a car moving in a fairly straight line using the LTI-CV model.

state estimate. Notice how the center of the state estimate shifts closer to the car but is not

centered nor oriented with the car. In addition, notice that the error covariance shrinks in

volume just as much as it did in the first scenario; implying that the tracking algorithm is just

as certain about the state estimate as in the first scenario even though it is worse. Essentially,

since the LTI-CV model cannot properly express the car’s turning motion, the state estimate

may drift farther and farther away from the target to the point that the tracking algorithm

looses the target. This drifting over several time steps is depicted in Fig. 1.4 where the black

curve represents the turning trajectory of the target, and the disjoint, green, straight arrows

show the trajectory of the state estimate over multiple propagation and update steps. The

green arrows are straight to depict the linear model prediction, and disjoint because the state

estimate jumps when it is updated with new measurements.

There are some common techniques used to improve tracking performance when using

a linear time invariant (LTI) target model to track non-linear, agile motions such as a car

turning. It was reported in [22] and [23] that higher-order, target models such as LTI constant

6

Target State estimate and errorcovariance

Measurement

3) Receive measurement 4) Update step

2) Prediction step1) Initial conditions

Figure 1.3: A depiction of tracking a car transitioning into a turn using the LTI-CV model.

Target's trajectory

Track's trajectory

Figure 1.4: A depiction of why the state estimate of the target drifts from the true state whenusing the LTI-CV model to track a target that is turning.

acceleration and jerk models are able to track agile targets better than the LTI constant

velocity model. One of the reasons for the improved behavior is due to the increased volume

of the error covariance pertaining to the target’s position when using a constant acceleration

and jerk model as opposed to a constant velocity model. The volume of the error covariance

pertaining to the target’s position is greater since the uncertainty in velocity, acceleration,

and jerk are all integrated into the uncertainty in the target’s position. Another reason

is that constant acceleration and jerk models can describe some turning motions, but not

7

constant-turn-rate motions. To clarify, when a target has a constant turn-rate motion and is

being represented in Cartesian coordinates, the magnitude of the translational acceleration

is constant, but the direction of the acceleration in the inertial frame, in which tracking is

done, changes as the target turns; hence, the target’s acceleration from the perspective of

the tracking algorithm is not constant. This is why a constant LTI acceleration model in

Cartesian coordinates cannot model constant-turn-rate motions. An LTI-CV model in polar

coordinates can be used to model constant turn-rate motions; however, it cannot model the

target’s straight motions. In other words, a single LTI-CV model cannot be used to model

both turning and straight motions.

Another common technique used to improve tracking performance when using an LTI-

CV model is to increase the process noise covariance in the target model. To understand this

technique we need to understand the significance of the size of the process noise covariance

relative to the size of the measurement noise covariance. The size of the process noise

covariance directly affects the size of the error covariance. The larger the process noise

covariance is, the larger the error covariance will be after the prediction step and during the

update step. In the update step, the MTT algorithm takes into account the size of the error

covariance relative to the size of the measurement noise covariance in order to determine

how much the associated measurement should influence the state estimate.

A large error covariance relative to the measurement noise covariance tells the MTT

to trust the measurement more than the target model. This will cause the state estimate to

be heavily influenced by the measurement and move more towards the measurement. This

scenario is depicted in the left image of Fig. 1.5, where the error covariance is depicted as a

green ellipse, the measurement noise covariance is depicted as a red ellipse, and the change

in the state estimate is depicted using an arrow. Note that the state estimate will move

greatly towards the measurement, but maybe not all the way to the measurement’s center.

On the other hand, a small error covariance relative to the measurement noise covari-

ance tells the MTT to trust the target model more than the measurement. This will cause

the state estimate to be slightly influenced by the measurement and move slightly towards

the measurement. This scenario is depicted in the right image of Fig. 1.5. Note that the

8

state estimate will move only slightly towards the measurement since the target model is

trusted more than the measurement.

Process noise Measurement noise State update

Figure 1.5: A depiction of how the size of the error covariance (green ellipse) relative to themeasurement noise covariance (red ellipse) affects the update to the state estimate depicted by thearrow.

The technique of increasing the process noise covariance can help, but it has the

disadvantage of the error covariance being larger. This can be an issue when associating

measurements to tracks when tracking multiple targets and when the sensor produces a lot

of measurements that are not target-originated called false measurements or clutter. To

illustrate the issue, we return to the second scenario depicted in Fig. 1.3. Suppose that

there is another car following the first, that both cars are detected by the camera, and that

the camera returns a false measurement as shown in Fig. 1.6. Notice that the green ellipse

corresponding to the state estimate of the turning car is larger since we assume that the

process noise is larger to compensate for the car’s turning motion. Also notice that all three

measurements are within the green ellipse corresponding to the uncertainty of the first car’s

state estimate. In this case, discerning which measurement originated from the turning car

is difficult if not impossible, and associating the wrong measurement can cause the tracking

algorithm to lose the turning car.

We do not wish to discredit the value of LTI models. They have many advantages

including being simple to implement and being computationally efficient. In addition, there

are some cases in which LTI models are appropriate and sufficient. For example, in the

case that the target’s motion is constrained to be linear, using an LTI model is actually a

geometric approach and can be used to track targets with only linear motion well. Also,

in the case that the target’s motion is not constrained to be linear, there is a single or few

9

Figure 1.6: A depiction of the negative affect of increasing the process noise covariance. The largerthe process noise is, the more challenging it is to associate the proper measurement to the target.

targets, and the sensors do not produce a lot of false measurements, the LTI models work

fairly well and may be sufficient. If an LTI model is not sufficient, a non-linear model should

be considered.

1.1.2 Why Non-Geometric Non-Linear Models Sometimes Fail

To demonstrate some of the issues with non-geometric, non-linear models, we will

look at the constant turn rate and velocity (CTRV) model with our motivating example

scenario of tracking a car using a camera. Unlike the LTI-CV model, the CTRV model takes

into account both the position and orientation of the car and their derivatives to express

both rotational and nearly linear motions. The CTRV model is defined as

xk = xi +

viωi(sin (θi + ωti:k)− sin (θi))

viωi(cos (θi)− cos (θi + ωiti:k))

ωδ

0

0

+ qi:k, (1.1)

where xi = [px, py, θ, v, ω]⊤ denotes the state of the target at time ti, px and py are the

components of the position of the target, θ is the heading of the target measured from the

10

x-axis with counter clock-wise rotation being positive rotation , v is the velocity of the target

in the forward direction, ω is the angular velocity, ti:k is the time interval from time ti to time

tk, and qi:k ∼ N (0, Q (ti:k)) denotes the zero-mean, Gaussian process noise with covariance

Q (ti:k) modeled as a Weiner process [24]. A depiction of the CTRV model is shown in

Fig. 1.7.

Figure 1.7: A depiction of the CTRV model configuration where the yellow arrow denotes theorientation of the car with respect to the x and y axes.

From the definition of the CTRV model in equation (1.1), it is apparent that the

target model is not defined if the target’s angular velocity is zero; thus, the target model is

not geometric since it cannot represent every possible configuration of the target. However,

that is a byproduct of the discretization of the model, and is not an issue in the continuous

time version of the CTRV model. That being said, we use the discrete CTRV model to

demonstrate a more subtle issue with some non-geometric, non-linear models regarding the

representation of orientation.

Suppose for a moment that the car can only rotate. The orientation of the car can

then be represented geometrically using the unit circle, denoted S1, where each point on

the circle represents an orientation. The CTRV model in equation (1.1) represents the car’s

orientation using θ ∈ φ (U) = (−π, π) ⊊ R where U ⊊ S1 and φ (U) is a local representation

of S1. The relation between the set φ (U) and the unit circle is shown in Fig. 1.8 where θ1, θ2

and θ3 denote three different orientations, and φ−1 : φ (U) → U ⊊ S1 is a bijection between

11

the open subset φ (U) and the open set U . Since U is a proper subset of S1, the set φ (U)

cannot represent all of S1;thus it is only a local representation of S1.

Figure 1.8: A depiction showing the relation between the unit circle S1 and the local representationφ (U).

To be more precise, the unit circle S1 is a Riemannian manifold that represents the

car’s orientation provided that the car does not translate. A manifold is simply a space

that looks locally Euclidean everywhere (i.e. flat), and a Riemannian manifold is a manifold

with a differentiable inner product defined everywhere [25]. The inner product is a metric

used to measure lengths, distances, and angles. It is the inner product that ensures that

geometric properties are preserved on Riemannian manifolds. We provide more details about

a Riemannian manifold in Chapter 3.

As a local representation of the unit circle, the subset φ (U) maintains the geometric

properties of the unit circle provided that we look in a small enough region, but it does not

maintain the global geometric properties. To help illustrate how φ (U) is a local representa-

tion that preserves local geometric properties, we will look at the two orientations represented

by θ1 and θ3 in Fig. 1.8. According to the local representation φ (U), the shortest path from

θ1 to θ3 corresponds to the car rotating clockwise, and according to the global representation

S1, the shortest path from φ−1 (θ1) to φ−1 (θ3) corresponds to the car rotating clockwise. In

12

this case the local representation φ (U) is able to maintain the geometric relations between

the two orientations represented by θ1 and θ3.

To help illustrate how φ (U) fails to maintain the global geometric properties, we will

look at the two orientations represented by θ1 and θ2 in Fig. 1.8. According to the local

representation φ (U) the orientations represented by θ1 and θ2 are far apart, and the shortest

path from θ1 to θ2 corresponds to a clockwise rotation that passes through the origin. On

the other hand, according to the global representation S1, the two orientations represented

by φ−1 (θ1) and φ−1 (θ2) are close together and the shortest path from φ−1 (θ1) to φ

−1 (θ2)

corresponds to a counter clockwise rotation. Thus, the local representation φ (U) is able to

maintain local geometric properties but not global geometric properties.

The inability of the local representation φ (U) to maintain geometric properties can

be a significant issue when designing controllers and estimators. For example, suppose

that the car’s true orientation is θ = π, the estimated orientation is θ1 = 9π10

radians, and

the sensor measurement is θ2 = −9π10

radians, then the error in the local representation is

e = 9π10

− −9π10

= 18π10

. A naive estimator using the CTRV model would try to reduce the

error by moving the estimated orientation closer to the measurement clockwise through the

origin. If the error covariance is small relative to the measurement noise covariance, the

estimated orientation will only move slightly clockwise towards θ2 possibly resulting in the

new estimated orientation being θ3, which is a worse estimate.

Another issue with the local representation φ (U) is how uncertainty is represented.

Using Fig. 1.9, suppose that the car’s estimated orientation is θ1 and the error covariance is

portrayed by the Gaussian distribution centered at θ1. The obvious issue is that the Gaussian

distribution extends past the point π. The more subtle issue with the Gaussian distribution

defined on φ (U) is that it suggests that the probability of the target’s orientation being

close to θ = −π is several standard deviations away when in reality it is only about two.

Typically, this issue is compensated by extending the space φ (U) beyond π which requires

other methods such as angle wrapping to account for this extension. Angle wrapping is the

mapping of an angle θ to any of its equivalent multiples 2πk + θ where k ∈ R. Returning

to Fig. 1.9, the angle θ = −π is several standard deviations away from θ1, but one of its

multiples , π, is only two standard deviations away. There are many cases in which angle

13

wrapping is sufficient, but it turns a continuous system into a disjoint system, and many

properties are lost.

Figure 1.9: A depiction of a Gaussian distribution of φ (U) and corresponding Gaussian distributionon S1.

Some other common target models that use local representations include models used

to express the target’s orientation in three dimensional space such as Euler angles, axis-angle,

Rodriguez parameters, etc; for a survey of attitude representations, see [26]. All of these

have similar issues as the ones already described. For example, one of the common issues

with Euler angles is gimbal lock. Gimbal lock occurs when the target pitches at an angle of

π/2. At this angle, there is no difference between the yaw and the roll angles. This problem

is a result of the Euler angles being a local representation where the pitch of π/2 is outside

of the local representation.

A challenge with any nonlinear system is preserving the shape and volume of the

estimated state’s Gaussian probability density function. This is because Gaussian density

functions are preserved under affine transformations and not under nonlinear transforma-

tions. To preserve the Gaussian structure when a nonlinear target model is used to propagate

the uncertainty in the state estimate, the target model is made affine about the current state

estimate, and the affine model is used to propagate the uncertainty. This affinization is a

14

first-order approximation and results in information about the true uncertainty distribution

being lost.

To illustrate the challenge of preserving the error state’s probability density func-

tion, we ran three Monte Carlo simulations consisting of 500 iterations each. In each

simulation we used the CTRV model defined in equation (1.1) with an initial state of

x = [0, 0, π/2, 1, 1e−4]⊤, an initial error covariance of zero, and the following different process

noise covariances

Q1 = diag(5e−2, 5e−2, 0, 0, 0

),

Q2 = diag(5e−2, 5e−2, 5e−2, 0, 0

),

Q3 = diag(5e−3, 5e−3, 5e−2, 0, 0

),

for the three simulations.

In each iteration, the target model was propagated forward in time with a time step

of 0.1 seconds for a total of 6 seconds. We then created three plots for each simulation

consisting of the target’s final position at the end of each iteration (depicted as black dots),

a standard deviation of the position portion of the error covariance centered at the mean

position (depicted by a green ellipse), and the trajectory of the target without any process

noise (depicted by a blue dotted line). These plots are shown in Fig. 1.10 where the left

image was generated using Q1, the middle image was generated using Q2, and the right

image was generated using Q3. Notice that when the process noise covariance for θ is small,

the target model can be approximated as affine, and the shape and size of the uncertainty

in the state estimate is preserved. However, when the process noise covariance for θ is large,

the shape and size of the error state’s probability density function are not well preserved due

to propagating the uncertainty using an affinized target model.

The inability to preserve the size and shape of the error covariance can have a negative

impact in estimation depending on the tracking algorithm. This is because some target

tracking algorithms take into account the probability of a measurement given the current

state estimate in order to determine if a measurement should be associated to the track, or

how much a measurement should influence the state estimate during the update step. A given

15

Figure 1.10: A depiction of showing how the CTRV model does not preserve the shape and size ofthe the error state’s probability density function.

measurement might seem unlikely given the approximated error covariance when in reality

it is likely. This could result in the target-originated measurement not being associated to

the track or having little influence in the update step. Fortunately, the geometric approach

is better able to preserve the size and shape of Gaussian probability density functions as

discussed in [27].

There are other problems with non-geometric, non-linear target models that we will

not discuss, but the main idea that we wanted to convey is that many non-geometric, non-

linear target models are only local representations of an underlying Riemannian manifold.

These local representations fail to preserve the global geometric properties which results in

a degradation of tracking performance. These failures can be overcome by modeling the

target from a geometric perspective. That being said, we do not wish to discredit the value

of non-geometric, non-linear target models. They have been used for years because they are

often sufficient and appropriate for certain applications.

1.1.3 Interactive Multiple Models

Before concluding this section, we want to briefly discuss the interactive multiple

model algorithm (IMM) [28], and how it can improve tracking performance. The IMM

algorithm uses multiple target models that describe different modes. These modes can be

different motions such as linear or turning motions, or different levels of process noise.

16

The IMM assigns a probability (i.e. weight) to each target model indicating how

well the target model currently represents the target relative to other target models. As the

target moves, there is a probability that the target transitions from one mode to another.

These probabilities are called transition probabilities. When a target model is propagated

forward in time, its new state estimate and error covariance is a blend of the state estimates

and error covariance of all of the target models according to the target model’s assigned

weight and transitioning probabilities. Measurements are then used to update the target

model’s state estimate, error covariance and weight.

An example application of the IMM algorithm is using the LTI-CV and CTRV models

to track a car. A car exhibits straight and turning motions that can be expressed using the

LTI-CV model and the CTRV model. When the car is moving straight down a road the LTI-

CV model will better express the target’s motion and have a higher weight than the CTRV

model. Similarly, when the car turns the CTRV will better express the turning motion and

have a higher weight than the LTI-CV model. Simply stated, the IMM algorithm switches

between the different target models depending on the motion of the car, ultimately increasing

tracking performance.

The IMM algorithm is only one example of a multiple model algorithm. Another

common multiple model algorithm is the autonomous multiple models algorithm [29], [30].

These algorithms are commonly used when one target model is unable to fully express the

target’s motion. While these algorithms can improve tracking performance and are appropri-

ate and sufficient in some applications, they have several drawbacks. One drawback is that

multiple model algorithms require a bank of filters, one filter for each target model; thus, it

is more computationally demanding. Two other drawbacks, as stated in [20], are selecting

the target models to use to represent the target and the determination of the transition

probabilities. The transition probabilities are determined by the user and require tuning.

Poor transition probabilities might result in the multiple model algorithm not transitioning

from one target model to the other at the proper time resulting in losing the target.

Since geometric target models are able to express the all of the target’s motions, a

single geometric target model is sufficient to track the target making the multiple model

algorithm unnecessary.

17

1.1.4 Lie Groups

In this section we have discussed several shortcomings of non-geometric target models

that are overcome by using geometric target models. Geometric target models are defined

on Riemannian manifolds since Riemannian manifolds preserve geometric properties. The

Riemannian manifolds of particular interest to us are Lie groups [31]. These smooth Rieman-

nian manifolds have the additional structure of a group and are commonly used to model

rigid-body motion, which is typically what targets are assumed to be in target tracking.

An example of a Lie group is the set of complex numbers of unit length with the

group operator being complex multiplication. This set is defined as

S1 = x ∈ C | xx∗ = 1 ,

and is used to represent the unit circle to express orientation. Another representation of the

unit circle is the special orthogonal group of two dimensions defined as the set

SO (2) =R ∈ R2×2 | RR⊤ = I, det (R) = 1

with the group operation being matrix multiplication. This Lie group is not only used to

indicate orientation, but is the set of all rotations in two dimensional space. For more

examples of Lie groups see Appendix C.

Lie groups are a particular interest to us since they have already shown to improve

the accuracy and performance of tracking a single target [32]–[36]. Although literature

regarding tracking multiple targets on Lie groups is sparse, we found one MTT scheme

presented in [37], in which the authors use the joint integrated probabilistic data association

filter (JIPDAF) [38] to track multiple targets. While these authors have contributed much

to the field of target tracking on Lie groups, their works focus on very specific applications.

In this dissertation, we will present MTT on Lie groups generically and with sufficient detail

so that others can easily apply it to different applications and different tracking algorithms.

In the following section, we will discuss some of the more common tracking algorithms that

can be adapted to work with target models defined on Lie groups.

18

1.2 Tracking Algorithms

Selecting the proper tracking algorithm is just as important as selecting the proper

target model. There are many different tracking algorithms that can be adapted to work with

Lie groups. We will restrict our discussion of tracking algorithms that use Bayesian filters to

maximize the a posteriori probability (MAP) of the target’s states given the measurements.

These specific tracking algorithms have propagation and update steps that have traces to

the traditional Kalman filter [39] or its variants [40]. This commonality or similarity is what

allows our approach to target tracking on Lie groups to be generalized to other algorithms.

In this section we present some of the more common Baysian tracking algorithms that

can be adapted to Lie groups. For a good review of target tracking algorithms we refer the

reader to [41].

1.2.1 Nearest Neighbor Filter

The nearest neighbor filter (NNF) is one of the simplest and computationally efficient

target tracking algorithms [42]. It associates the measurement closest to a track and uses

it to update the track’s state estimate. Typically, the update and propagation steps are

done via a Kalman filter or one of its variants. In this method, it is possible for a single

measurement to be used to update multiple tracks especially if a target was not detected

at a certain time step. This method is not robust to outliers and target occlusions. This

method also requires prior knowledge of the number of targets and cannot merge, initialize,

confirm or reject tracks.

1.2.2 Global Nearest Neighbor Filter

The global nearest neighbor filter (GNNF) is similar to the NNF except that it assigns

a measurement at most to one track by assigning the measurements such that distance

between the measurements and the associated tracks are minimized [43], [44]. Like the

NNF, this filter is not very robust, requires prior knowledge of the number of targets, cannot

merge, initialize, confirm or reject tracks.

19

1.2.3 Probabilistic Data Association Filter

The probabilistic data association filter (PDAF) is a soft data association algorithm

designed to track a single target in the presence of clutter [45], [46]. This method assumes

that when a sensor produces measurements, at most one measurement is target originated

and that the rest are false measurements uniformly distributed in the sensor’s measurement

space. The number of false measurements at every time step is modeled by an exponential

probability density function. The PDAF uses a gating technique to improve performance by

requiring measurements to be within some distance of the track in order to be associated

to the track. The associated measurements are assigned a weight according to their relative

probability of being the target-originated measurement and used to update the track.

The PDAF also assumes that the track is already initialized, and that the track

properly represents the target and not clutter. The PDAF was augmented in [47] to be

able to initialize, confirm and reject tracks in order to relax the assumption that the track

represents the target and not clutter. This variation of the PDAF is called the integrated

probabilistic data association filter (IPDAF).

It is common to see the PDAF used to track multiple targets even though it was

originally designed to track a single target. This is done by storing a bank of PDAFs for

each track. All of the measurements that fall inside the validation region are used to update

the track whose validation region they fell into. One issue with this is when a measurement

falls inside the validation region of two or more tracks. The PDAF assigns the measurement

to each track without taking into account the possibility that the measurement belongs to

the other track. This drawback was the motivation for the joint probabilistic data association

filter (JPDAF).

1.2.4 Joint Probabilistic Data Association Filter

The JPDAF is an extension of the probabilistic data association filter (PDAF) to

multiple target tracking. It is a single-scan tracker that uses a soft data association tech-

nique to weigh new measurements inside a track’s validation region based upon the new

measurement’s joint data association probability [46]. The weighted measurements are used

20

to update the tracks. Even though it is a single-scan tracker, calculating the joint prob-

ability of all possible data associations per scan makes it more computationally complex

and demanding. In addition, this method requires an additional scheme to initialize, merge,

confirm, and reject tracks.

Like the PDAF, another shortcoming of the JPDAF is that it assumes that the target

a track represents exists. This shortcoming was addressed in [38] which introduced the joint

integrated probabilistic data association filter (JIPDAF). It is the JPDAF with the additional

piece of calculating the probability that a track represents a target. In [37] the JIPDAF was

modified and adapted for Lie groups.

1.2.5 Multiple Hypothesis Tracking

The multiple hypothesis tracking (MHT) algorithm is an optimal tracking algorithm

in the sense that it considers all possible measurement associations and all possible track

initializations. Each possibility is referred to as a hypothesis. The hypothesis with the

greatest probability is considered the correct one as discussed in [48]. The sheer amount of

possibilities results in it being very computationally expensive and not feasible in real time

unless appropriate simplifying techniques are used to limit the number of hypotheses. Some

of the techniques used to simplify the algorithm are pruning hypotheses with low probability,

merging similar hypotheses, using a sliding window to reduce the number of measurements

to consider, and others described in [49]. When these techniques are used, the algorithm is

suboptimal. Even with these simplifications, the algorithm can be difficult, computationally

expensive and complex to implement.

There are two versions of the MHT algorithm that attempt to mitigate the com-

putational complexity in different ways: measurement-oriented (MO-MHT) [50] and track-

oriented (TO-MHT) [51]. In the measurement oriented approach, a set of hypotheses are

stored that exhaustively enumerate all possible measurement associations. A measurement

is either associated with an existing track, a new track or classified as a false measurement

ensuring that no measurements are shared between any track. The hypothesis that has the

most probable set of tracks is deemed correct. As stated previously, this is very compu-

21

tationally expensive and not feasible unless appropriate simplifying techniques are used to

limit the number of generated hypotheses.

The TO-MHT uses a track tree structure to store all possible track hypotheses with

each leaf denoting a possible state estimate of the target. As new measurements are received,

they are gated and incorporated into the track tree thus extending the tree. Tracks are

defined as compatible if they do not share common observations. A global hypothesis is the

subset of all possible tracks that are compatible. The global hypothesis that maximizes the

a posteriori is considered the correct set of tracks. In addition, leaves with low probability

are pruned at each time step. This method is capable of real time tracking for moderately

simple tracking scenarios. It is also a complex and difficult algorithm to implement.

1.2.6 Probabilistic Multi-Hypothesis Tracker

The probabilistic multi-hypothesis tracker (PMHT) is a batch algorithm that uses

all measurements from a time window to update each track [52]. Unlike the traditional

MHT, the PMHT assigns each measurement to every track with an associated probability

that the measurement originated from a certain track. This probability is estimated using

an empirical Bayesian algorithm. Because of this approach, this algorithm does not require

an enumeration of all possible measurement to track assignments nor pruning because all

measurements are associated to every track. As a result, the computational expense of this

algorithm is much less than the traditional MHT.

In its original form, PMHT was not designed to initialize, merge, confirm and reject.

These issues were addressed and resolved in [53]. However, the algorithm requires using

the expectation-maximization (EM) algorithm [54] to solve the MAP problem which is an

iterative optimizaiton problem that can be slow to converge. For a well written and general

overview of the PMHT algorithm we refer the reader to [55].

1.2.7 Probabilistic Hypothesis Density Algorithm

The probabilistic hypothesis density (PHD) algorithm was first presented in [56]. It

is a multiple target tracking algorithm that recursively estimates the number of targets in a

22

region and their states. It leverages the concept of random finite sets to formulate a multi-

hypothesis Bayes filter without requiring hard data associations. Therefore, it does not

require an enumeration of all possible measurement to track associations. This drastically

reduces the computational complexity of the algorithm.

The PHD algorithm is unique since it does not track targets individually; rather, it

poses the problem as a mixture of densities; typically Poisson or Gaussian that indicates the

number of targets in a certain region [57]. The PHD algorithm can initialize tracks from

measurements or other tracks and remove tracks by adding or removing a distribution either

to or from the mixture.

By definition of the PHD algorithm, the density distribution can be integrated over

a region to calculate the expected number of targets in that region. This is useful for

monitoring traffic congestion on roads or in the air. Since the density distribution is a mixture

of densities, the states of the targets are read from the peaks of the density distribution, i.e.

every node of the distribution represents the state of a target.

This algorithm has shown good results. In [58], it was demonstrated that the PHD

filter has fewer false or dropped tracks than the traditional MHT algorithm. It was also

stated that the Gaussian-mixture of the cardinalized PHD filter can achieve similar tracking

performance as the joint probabilistic data association filter (JPDAF) but at a much smaller

computational expense.

One of the disadvantages of the PHD filter is a lack of track identification, and poor

performance when the signal to noise ratio (SNR) is low. What is meant by lack of track

identification is that the algorithm only specifies where targets are and how many in each

region, but it cannot correlate a track at one time step with a track at another time step.

While this may not be important in some applications, it makes following a specific target

very difficult. As mentioned in [57], [59], there are variations of the PHD that do track

identification at a high computational expense.

1.2.8 KALMANSAC

KALMANSAC is a robust target tracking algorithm that uses RANSAC [60] to filter

out false measurements [61]. At each time step, given a set a measurements containing

23

inliers and outliers, KALMANSAC uses RANSAC to find the data association resulting

in the MAP solution. It does this by repeating the following procedure. KALMANSAC

randomly samples the set of new measurements and estimates a new state for every track

based on the sampled measurements. It then seeks to find all measurements that support the

new state estimates and refines the estimate. This process is repeated up to a predetermined

amount of time. The sample of measurements that result in the best MAP solution is used

in the final state estimate. This target tracking method is robust to outliers; however, it is

only designed to handle one target. In addition, the method is sensitive to the initial guess

of the state estimate, i.e. the track initialization. If the track is initialized far from the truth,

then KALMANSAC may never converge to the true state.

1.2.9 R-RANSAC

Recursive Random Sample Consensus (R-RANSAC) is a recent MTT algorithm that

has been developed in [22], [59], [62]–[73]. It has a modular paradigm that is capable of data

association and initializing, propagating, updating, merging, confirming and rejecting tracks.

Its modular paradigm allows it to use different filtering algorithms such as the nearest neigh-

bor filter, global nearest neighbor filter, probabilistic data association filter, and others. The

novelty of R-RANASAC is in the way it performs track initialization. Many target tracking

algorithms initialize tracks from one or two measurements which results in many false tracks.

R-RANSAC has a more robust track initialization scheme. Using non-track associated mea-

surements, R-RANSAC uses RANSAC [60] to fit a subset of measurements to the target

model in order to create a possible current state estimate of the target. R-RANSAC then

searches for additional measurements that support the possible state estimate. This process

is repeated for a specified number of times creating many different possible current state

estimates, and the state estimate with the largest number of supporting measurements is

used to initialize a track. This track initialization method has proven robust in the presence

of dense clutter, and drastically reduces the number of false tracks created.

R-RANSAC offers a good balance between tracking performance and complexity. It

also incorporates all of the four components of multiple target tracking: system model, data

management, track initialization, and track management.

24

1.3 Summary of Contributions

The main contribution of this dissertation is the development of a novel MTT algo-

rithm that tracks targets whose target model can be defined on any unimodular, geodesically

complete Lie group. We name the algorithm geometric multiple target tracking (G-MTT).

G-MTT is influenced by several existing algorithms: the measurement management algo-

rithm used in density-based R-RANSAC [72]; the track initialization scheme used in R-

RANSAC [63]; data association and track propagation, updating, confirming and rejecting

algorithms used in the integrated probabilistic data association filter (IPDAF) [38]; the par-

allel centralized measurement fusion algorithm from [74] and the track-to-track association

and fusion from [74].

We aim to describe the development of G-MTT in detail, so that this dissertation

can serve as a guide to adapt other MTT algorithms discussed in Section 1.2 to work with

target models defined on Lie groups. As part of the development and testing of G-MTT the

following are the major contributions of this dissertation:

• Design a generic, discrete, constant-velocity target model that evolves on Lie groups

and provide specific examples.

• Develop a measurement management strategy using clusters that are defined on Lie

groups. This method is influenced by density-based R-RANSAC [72].

• Extend the IPDAF presented in [38] to Lie groups. We call this algorithm Lie group

IPDAF (LG-IPDAF).

• Develop a parallel centralized measurement fusion algorithm that works with the LG-

IPDAF algorithm.

• Derive a robust track initialization scheme for targets defined on Lie groups. This

method is based on the track initialization scheme in [63].

• Derive a track-to-track association and fusion algorithm for tracks defined on Lie

groups. This method in based on the track-to-track association and fusion algorithms

presented in [74].

25

• Present a method to deal with measurement latency where the system model is defined

using Lie groups.

• Develop an end-to-end target tracking framework using a monocular camera fixed on

a multirotor.

• Provide several examples of target tracking using G-MTT.

1.4 Dissertation Outline

The outline of the remainder of this dissertation is summarized as follows. Chapter 2

presents an overview of geometric multiple target tracking. Chapter 3 reviews fundamen-

tal theory pertaining to Lie groups that is required to understand the the derivation of

G-MTT. Chapter 4 derives the generic, discrete, time-invariant, constant-velocity, system

model defined on Lie groups. Chapter 5 describes how measurements not associated to

tracks are managed to improve the track initialization process. Chapter 6 derives the Lie

group integrated probabilistic data association filter that is used by G-MTT to perform data

association and to propagate and update tracks. Chapter 7 derives the parallel centralized

measurement fusion that is used with the LG-IPDAF in the case of multiple sensors. Chap-

ter 8 presents the track initialization scheme on generic Lie groups. Chapter 9 presents

the track-to-track association and fusion algorithms used to merge tracks that represent

the same target. Chapter 10 shows how to deal with measurement latency using the LG-

IPDAF. Chapter 11 presents an end-to-end MTT framework using a camera mounted to a

small unmanned air systems (sUAS) to track ground targets. Chapter 12 shows how to use

the end-to-end MTT framework to track targets on a moving image plane where the target

model is defined on SE (2). Chapter 13 presents an example application of tracking multiple

sUAS on SE (3) using a fixed monocular camera. Chapter 14 presents our conclusions and

future research.

26

CHAPTER 2. OVERVIEW OF GEOMETRIC MTT

Geometric multiple target tracking is designed to track multiple dynamic targets. The

targets are modeled using a geometric, constant-velocity, white-noise-driven, system model

as defined in Chapter 4. The targets can be observed by one or more sensors. Each sensor

observes a subset of the measurement space called a sensor surveillance region. The union

of every sensor surveillance region is called the surveillance region. G-MTT tracks targets

without prior knowledge of the number of targets within the surveillance region at any given

time, implying that targets can move in and out of the surveillance region. This is because

G-MTT is capable of initializing, confirming, and rejecting tracks.

To track the targets, G-MTT creates mathematical representations of the targets

called tracks . Formally, a track is a tuple T = (x, P, CS, ϵ, L), where x is the state estimate

of a target, P is the corresponding error covariance, CS is the set of measurements associated

to the track called the consensus set , ϵ denotes the probability that the track represents a

target and is called the track likelihood , and L is the track label, a unique numerical label

used to identify tracks confirmed to represent a target.

G-MTT initializes tracks from the measurements. Since sensors produce both target-

originated measurements, called true measurements , and non-target-originated measure-

ments, called false measurements or clutter, tracks generated by G-MTT can be created

from true measurements and/or false measurements. Therefore, an initialized track can ei-

ther represent a target or clutter. To identify the tracks that represent a target, G-MTT

calculates the track likelihood . The track likelihood is discussed in detail in Chapter 6. A

track with a high track likelihood is confirmed to represent a track and is called a confirmed

track .

G-MTT performs target tracking in the tracking frame. The tracks’ state estimates

and error covariances are expressed in the tracking frame with respect to the tracking frame.

27

The sensors produce measurements in their respective sensor frames . This requires that

all of the measurements from the different sensors are either transformed into the tracking

frame before being given to G-MTT, or that a transformation that transforms a track’s

state estimate and error covariance from the tracking frame to the sensor frame of sensor si

( denoted T si ) is provided to G-MTT to perform data association and update the tracks.

We refer to the first option as sensor mode one and the second option as sensor mode two.

Sensor mode one is used when the measurements can be transformed into the tracking

frame. In the case of sensor mode one, if the tracking frame moves, then a transformation

(denoted T T ) must be provided that can transform the previous measurements and tracks

stored by G-MTT from the previous tracking frame to the current tracking frame. An

example of sensor mode one is tracking targets on the ground using a fixed camera mounted

to a moving multirotor, where tracking is performed on the image plane. As the camera

moves, the tracking frame moves, and so a transformation between the previous and current

tracking frame needs to be provided. This specific example is explored more in Chapter 12.

Sensor mode two is used when the track’s state estimate and error covariance can be

transformed into the tracking frame. Sensor mode two is typically used instead of sensor

mode one when either the measurements cannot be transformed into the tracking frame,

or when transforming the measurement covariance into the tracking frame via Jacobians

results in information being lost. An example of sensor mode two is tracking multirotors

using a network of radar sensors. Some radars measure the change in range in addition to

the target’s relative range and angle to the radar. The change in range may not be able to

be transformed into the tracking frame, and transforming the measurement covariance into

the tracking frame would require Jacobians and results in a first order approximation of the

measurement covariance (i.e. loss of information). Instead of transforming the measurements

from their respective sensor frames, tracks are transformed from the tracking frame to each

sensor frame when performing data association and track updating. This example is explored

more in Chapter 7.

The architecture of G-MTT is depicted in Fig. 2.1. The data flow is as follows. The

sensors observe the targets in their respective sensor surveillance regions. The measurements

are then either transformed to the tracking frame or a transformation from the tracking frame

28

to each sensor frame is provided. The measurements and transformations are then given to

G-MTT. Once G-MTT receives the input, it performs three phases: data management, track

initialization, and track management. We discuss each phase later in Sections 2.2, 2.3 and

2.4. After performing all three phases, G-MTT returns a list of confirmed tracks.

TrackingFrame

Text

1. Remove expired measurements from clusters 2. Propagate tracks 3. Transform tracks and measurements 4. Associate new measurements to tracks or clusters 5. Update tracks

6. Generate state hypotheses from clusters 7. Initialize new tracks using state hypotheses 8. Manage clusters

9. Merge similar tracks 10. Confirm and reject tracks 11. Assign ID to confirmed tracks 12. Prune tracks

Track Management

Track Initialization

Data Management

Measurements,

Confirmed Tracks

Geometric-MTT

SensorSurveillance

RegionSensorFrame

SensorSurveillance

RegionSensorFrame

Measurements,

Measurements,

Figure 2.1: The architecture of the G-MTT algorithm.

2.1 Sensor

To distinguish between different sensors we use the indexing set S =1, 2, . . . , NS

where NS denotes the number of sensors and si ∈ S denotes the ith sensor. When a sensor

produces a measurement it is called a sensor scan. G-MTT assumes that sensor si detects

a target with probability P siD ∈ [0, 1] where P si

D = 0 indicates that the target is not in sensor

surveillance region of sensor si. It is assumed that a sensor produces at most one target

originated measurement for every target per sensor scan, and all other measurements are

false, independent identically distributed (iid) with uniform spatial distribution, and that

29

the number of false measurements per sensor scan from sensor si is modeled by the Poisson

distribution

µF (ϕ) = exp (λsiVsi)(λsiVsi)ϕϕ!

, (2.1)

where λsi is the spatial density of false measurements for sensor si, Vsi is a portion of the

sensor si’s surveillance region to consider, and ϕ is the number of false measurements.

The sensors can produce measurements synchronously or asynchronously at non-fixed

time intervals. The measurements are not required to be given to G-MTT in chronological

order since G-MTT is able to deal with measurement latency as discussed in Chapter 10.

When a sensor scan occurs, the measurements are given to G-MTT with any necessary

transformations.

2.2 Data Management: Items 1-5 in Fig. 2.1

The first phase of G-MTT is data management. Data management begins once new

measurements and transformations are received. G-MTT stores measurements and tracks in

memory. To reduce memory storage requirements G-MTT removes measurements that were

received at a time outside of the sliding time window denoted TW . For example, if the time

window is Tw = 10 seconds, then all measurements with a time stamp older than 10 seconds

are removed. We call these measurements expired measurements.

Once expired measurements are removed, G-MTT propagates tracks to the current

time using the LG-IPDAF derived in Chapter 6. In the case of sensor mode one and provided

that the tracking frame has moved, the tracks and previous measurements are transformed

to the current tracking frame. This ensures that data association is done in the current

tracking frame.

After the tracks have been propagated to the current time and transformed to the

current tracking frame, G-MTT associates the new measurements to tracks or clusters. G-

MTT first tries to associate the new measurements to existing tracks using a validation

region. A validation region is a volume in measurement space centered on a track’s estimated

measurement and serves as a gate during data association as discussed in Chapter 6. The

remaining non-track-associated measurements are then tested to see if they belong to an

30

existing cluster. A cluster is a group of neighboring measurements and is thoroughly defined

in Chapter 5. To associate measurements from different measurement spaces to a cluster,

we require that measurements from different measurement spaces can be compared using

a metric. Measurement spaces that can be compared using a metric are called compatible

measurement spaces. As an example, one sensor might define a measurement using polar

coordinates and another sensor might define a measurement in two dimensional Cartesian

coordinates. Since there exists a map between these two spaces, we can define a metric

between them and the two measurement spaces are compatible. The measurements that do

not belong to a track or an existing cluster are used to create new clusters.

After data association, G-MTT performs the update step. During the update step

G-MTT uses a track’s associated measurements to update the track’s state estimate, error

covariance, and track likelihood according to the LG-IPDAF algorithm discussed in Chap-

ter 6. The new associated measurements are added to the track’s consensus set. The data

management phase is done after the update step.

2.3 Track Initialization: Items 6-8 in Fig. 2.1

G-MTT performs the track initialization phase after the data management phase.

During the track initialization phase G-MTT tries to initialize a new track from each cluster

that has at least τCM measurements with unique time stamps using the robust, regression

algorithm, random sample consensus (RANSAC) [60]. RANSAC seeks to find a state esti-

mate that fits the largest subset of measurements from a cluster according to the system

model defined in Chapter 4. G-MTT then uses the subset of measurements to refine the state

estimate, initialize the error covariance and calculate the initial track likelihood by filtering

in the measurements using the LG-IPDAF. From this process a new track is created. The

measurements used to initialize a new track are taken from the cluster and added to the

track’s consensus set. The track initialization process is discussed in detail in Chapter 8.

31

2.4 Track Management: Items 9-12 in Fig. 2.1

G-MTT performs the track management phase after the track initialization phase.

During this phase G-MTT searches for tracks that represent the same target using a track-

to-track association algorithm discussed in Chapter 9. If tracks are deemed to represent the

same target, they are fused together according to the track-to-track fusion algorithm derived

in Chapter 9. The track-to-track association and fusion algorithms assume that the tracks

are independent.

Next G-MTT looks at the track likelihood of each track. If the track likelihood is

below the threshold τTR, the track is rejected and removed from the algorithm. If the track

likelihood is above the threshold τTC , the track is confirmed to represent a target. When a

track is first confirmed to represent a target it is assigned a unique numerical label to identify

it from other confirmed tracks.

Due to the possibility that a track moves outside the surveillance region of all of the

sensors, G-MTT keeps a record of the last instance in time a measurement is associated to

the track. If no new measurement are associated to the track after a period of time, the

track is assumed far beyond any surveillance region and is removed.

To manage the computational and memory requirements, G-MTT will only store

up to M tracks. In the case that there are more than M tracks after the previous track

management steps, G-MTT will remove the tracks with the lowest track likelihood until

there are only M tracks. Therefore, G-MTT assumes that there are at most M targets in

the surveillance region. This assumption is not a limitation on the algorithm since M can

be arbitrarily large rather a limitation of the processing power. At the end of this phase,

G-MTT provides a list of confirmed tracks to the user.

2.5 Summary of Assumptions

Geometric multiple target tracking makes the following assumptions:

2.1. There are one or more targets that can be observed by the sensors and modeled by a

constant-velocity, white-noise-driven target model.

2.2. The number of targets in the surveillance region is unknown and can change.

32

2.3. A track is a possible representation of a target and may not actually represent a target

when initialized.

2.4. Initially there are no initialized tracks.

2.5. Targets are observable according to the system model.

2.6. Tracks are independent .

2.7. There are one or more sensors, and they may produce measurements synchronously or

asynchronously and out of chronological order.

2.8. The time interval between sensor scans is not fixed.

2.9. Sensor si detects a target with probability P siD ∈ [0, 1], with P si

D = 0 signifying that

the target is not in the sensor surveillance region of sensor si.

2.10. Each sensor produces at most one target originated measurement for each target per

sensor scan.

2.11. The false measurements are modeled as independently identically distributed (iid) with

uniform spatial distribution, and the number of false measurements every time sensor

si produces measurements is modeled using a Poisson distribution defined in equa-

tion (2.1) with spatial density λsi .

2.12. A target originated measurement from sensor si is associated to the track representing

the target with probability P siG provided that the track exists.

2.13. All of the measurement spaces are compatible.

2.14. The sensors’ measurement noises and the system model’s process noise are independent,

and thus uncorrelated.

33

CHAPTER 3. LIE GROUP REVIEW

Lie group theory is an extensive topic that we cannot thoroughly cover in this chapter.

The objective of this chapter is to provide a review of pertinent Lie group theory and to

establish additional notation. The review covers the definition of a group in Section 3.1,

applicable topological definitions in Section 3.2, applicable concepts of Riemannian manifolds

in Section 3.3, and basic theory regarding Lie groups in Section 3.4. We motivate each part

of the review by discussing the significance of each part in regards to Lie groups. This will

facilitate in understanding our approach in deriving the G-MTT algorithm in the subsequent

chapters.

We invite the reader to skim through the chapter and digest only the parts they are

more unfamiliar with. If a more in depth discussion is needed, we recommend the excellent

tutorials for robotics applications given in [14], [75]. We also recommend [10], [13], [31],

[76]–[78] for a more rigorous treatment of Lie group theory.

3.1 Group

In this section we present the definition of a group and a group action to explicitly

state what operations are valid when working with groups, and therefore Lie groups.

Definition 3.1.1. A group is a set G equipped with a binary operation defined on G denoted

• with the following group axioms:

3.1. Closure: For all g1, g2 ∈ G, g1 • g2 ∈ G.

3.2. Associative: For all g1, g2, g3 ∈ G, (g1 • g2) • g3 = g1 • (g2 • g3).

3.3. Identity Element: There exists an element I ∈ G such that for every g1 ∈ G, I • g1 =g1 • I = g1.

34

3.4. Inverse Element: For every g1 ∈ G there exists an element g2 ∈ G such that g1 • g2 =g2 • g1 = I. We denote this relation as g−1

1 = g2, where [·]−1 is the inverse map.

Definition 3.1.2. A commutative group is a group with the additional commutative

property: for every element g1, g2 ∈ G, g1 • g2 = g2 • g1.

An example of a commutative group is the set of all rotations in 2-dimensional space

called the special orthogonal group denoted SO (2). This group is the set

SO (2) ≜R ∈ R2×2 | RR⊤ = I2×2 and det (R) = 1

equipped with matrix multiplication where I2×2 is the 2× 2 identity matrix.

3.1.1 Group Action

Definition 3.1.3. Let S denote a set, s ∈ S be an element of the set S and (G, •) denote a

group. A group action of G on the set S is a map Φ : G× S → S with the properties

Φ (I, s) = s, and Φ (g, s) ∈ S, ∀g ∈ G, ∀s ∈ S.

There are three types of group actions: left group action, right group action, and conjugate

group action. The left group action of G on S is a group action with the additional

property

Φl (g2,Φl (g1, s)) = Φl (g2 • g1, s) .

The right group action of G on S is a group action with the additional property

Φr (g2,Φr (g1, s)) = Φr (g1 • g2, s) .

The conjugate group action of G on S is the composite of the left and right group action

defined as

Φc (g, s) = Φr

(g−1,Φl (g, s)

).

35

To demonstrate the group actions we will use the group SO (2) and the set of skew

symmetric matrices

so (2) ≜Ω ∈ R2×2 | Ω + Ω⊤ = 02×2

,

where 02×2 is the 2 × 2 zero matrix. Let R ∈ SO (2) and Ω ∈ so (2). The left group action

can be defined as Φl (R,Ω) = RΩ, the right group action can be defined as Φr (R,Ω) = ΩR,

and the conjugate group action can be defined as Φc (R,Ω) = RΩR−1.

By the definition of the group action and the properties of a group, any group can

define a group action on itself. In the rest of this dissertation we will drop the • notation

and the explicit group action map and simply use juxtaposition as we did in our example

regarding the group action of SO (2) on so (2).

3.2 Topology

This section defines a topological space and properties of topological spaces that are

relevant to Lie groups. The definitions can be difficult to understand so we give simple

examples and specify how the properties are relevant.

Definition 3.2.1. A topological space is a set S with a collection O of subsets called

open sets such that

3.1. ∅ ∈ O and S ∈ O, where ∅ is the empty set.

3.2. if U1, U2 ∈ O, then U1 ∩ U2 ∈ O.

3.3. the union of any collection of open sets is open.

A basic example is the real line where the collection of open sets is the unions of open

intervals.

Definition 3.2.2. A topological space is called first countable if for each s ∈ S and U1 ∈ Othat contains s, there exists another open set U2 ∈ O such that s ∈ U2 and U2 ⊊ U1, where

we use ⊊ to denote a proper subset. A proper subset is a subset that is not the original space.

This means that U2 = U1

36

An example of a first countable topological space is the real line. Regardless of the real

number a ∈ R, there is always a smaller open interval that contains a. The first countable

property means that the topological space can have a sequence of open sets U1, U2, . . . , Unsuch that Un ⊊ Un−1. However, if the sequence converges it is not guaranteed to converge

to a unique element since it may converge to an open set containing multiple elements.

Definition 3.2.3. A topological space is called Hausdorff if for every two distinct points

s1, s2 ∈ S there exists two disjoint sets U1, U2 ∈ O such that s1 ∈ U1 and s2 ∈ U2.

A basic example of a Hausdorff space is the real line. For any two distinct points

there exists two disjoint open intervals that contain the points. This means that if there

is an open set U containing two points s1, s2 ∈ U , there exists two disjoint open subsets

U1, U2 ⊊ U that contain one of the points each (e.g. s1 ∈ U1 and s2 ∈ U2). Topological

spaces that are both first countable and Hausdorff have sequences that converge to a unique

element in the set. In other words, limits on Hausdorff spaces that converge, converge to a

unique element. This is a necessary but not sufficient condition for defining derivatives and

integrals on a space.

Definition 3.2.4. A subset β ⊂ O is called a basis for the topology if each open set U ∈ Ois a union of elements in β. The topological space is called second countable if it has a

countable basis.

A second countable space is also first countable, but the converse it not necessarily

true.

Definition 3.2.5. Let T1 = (S1,O1) and T2 = (S2,O2) be two topological spaces and φ :

T1 → T2 be a bijective mapping from T1 to T2. If the map φ and its inverse are continuous,

then it is called a homeomorphism and the two topological spaces, T1 and T2, are said to

be homeomorphic.

A homeomorphism preserves the topological properties between homeomorphic spaces.

37

3.3 Riemannian Manifold

Definition 3.3.1. Let T = (S,O) be a topological space. T is called a topological manifold

if it has the following properties:

3.1. T is Hausdorff

3.2. T is second countable, and therefore first countable

3.3. T is locally Euclidean of dimension n. This means that each point of T is an element

of an open set U ∈ O that is homeomorphic to an open subset of the Euclidean space

Rn.

The first two properties mean that converging limits on T converge to a unique point.

The third property means that for each t ∈ T there exists an open subset U ⊆ T that contains

t, an open subset V ⊆ Rn, and a homeomorphism φ : U → V . Since a homeomorphism

preserves topological properties, the topology T must have the same topological properties

as Euclidean space. Since Euclidean space is Hausdorff and second countable, the topological

manifold must also have these properties. This is one of the reasons why these properties

are necessary. The homeomorphism allows us to analyze the local topological properties

of U by looking at V . This is useful because working in Euclidean space may be easier

than working in the original manifold T . In addition, Euclidean space may have additional

structure that the manifold T does not. For example, the manifold T may not have a vector

space structure, and so we cannot scale or add elements of T together, but those operations

are defined in Euclidean space.

Definition 3.3.2. A chart on the topological manifold T is a tuple (φ,U) where φ : U ⊂T → V ⊂ Rn is a homeomorphism and U ∈ O and V are open subsets. The subset V is

called the local representation of U , and for u ∈ U , the element φ (u) ∈ V is called the local

coordinates of u.

Fig. 3.1 shows an example of two charts where T is the topological manifold, (φ1, U1)

and (φ2, U2) are charts defined on T where φ1 : U1 → V1, φ2 : U2 → V2, and V1, V2 ⊆ Rn.

For an example of a chart consider the unit circle defined by the topological manifold

S1 = (x, y) ∈ R2 | x2+y2 = 1. We define the chart (φ,U) where φ ((x, y)) = arctan2 (y, x),

38

Figure 3.1: A depiction of two compatible charts (φ1, U1) and (φ2, U2) where T is a topologicalmanifold, U1 and U2 are not disjoint sets, and Rn denotes Euclidean space.

U = (x, y) ∈ S1 | x = −1 and φ (U) = V = (−π, π). This chart is depicted in Fig. 1.8

which is used to illustrate why defining a system model in local coordinates fails to preserve

global properties. We emphasize that the subset V is homeomorphic to U , and therefore

only preserves the local topological properties of U , and is not guaranteed to preserve global

geometric properties.

Definition 3.3.3. Two charts (φ1, U1) and (φ2, U2) are said to be smoothly compatible if

either U1 and U2 are disjoint or φ1 φ−12 : φ2 (U1 ∩ U2) → φ1 (U1 ∩ U2) is a diffeomorphism

as shown in Fig. 3.1. A maximal smooth atlas A is a collection of all smoothly compatible

charts whose domain covers T .

A diffeomorphism not only preserves the topological properties between the diffeo-

morphic spaces, but also the properties of their tangent spaces. Thus, a maximal smooth

atlas allows us to map open sets of a topological manifold to open sets in Euclidean space

and perform operations defined in Euclidean space that involve the tangent space such as

integration and differentiation. For example, to integrate a curve on the manifold you would

begin integration at the local coordinates at the start of the curve in one chart, and then

move to other charts that cover the other parts of the curve. Similarly we can find the

tangent vectors along a curve by taking the derivative of a curve in a chart.

39

Definition 3.3.4. A smooth manifold M = (T,A) is a topological manifold with a maximal

smooth atlas.

An example of a smooth manifold is Euclidean space where the atlas contains a single

chart where the diffeomorphism is the identity map and the corresponding open set is the

Euclidean space itself. We will provide other examples of smooth manifolds throughout the

dissertation.

Definition 3.3.5. Let M denote an n-dimensional smooth manifold. At each point m ∈M

there is an n-dimensional tangent space (denoted TmM). The collection of tangents spaces

at all the point in M is called the tangent bundle denoted TM = (m,TmM) | m ∈M.

A depiction of different tangent spaces is shown in Fig. 3.2 where M denotes the

n-dimensional smooth manifold, m1,m2,m3 ∈ M and Tm1M , Tm2M and Tm3M are their

corresponding tangent spaces. The tangent spaces are disjoint from each other implying

that a vector v1 ∈ Tm1M cannot be directly compared to a vector in v2 ∈ Tm2M , and

therefore we cannot simply add the two vectors together. In order to compare vectors from

different tangent spaces, the vectors must be mapped to a single tangent space. We often

do not think about the tangent spaces being disjoint since we typically work in Euclidean

space where the mapping between tangent spaces is the identity map. The important thing

for us, is to ensure that vectors are in the same tangent space when we add them together.

This is an important concept when deriving the LG-IPDAF algorithm in Section 6.

Definition 3.3.6. A Riemannian metric of a smooth manifold is an inner product defined

on the tangent space at each point on the manifold that varies differentiably between tangent

spaces. A smooth manifoldM equipped with a Riemannian metric d is called a Riemannian

manifold and denoted R = (M, d).

An inner product measures lengths and angles; thus, a Riemannian metric measures

the length of tangent vectors and angles between tangent vectors. This is beneficial for many

reasons. One of which is to measure the length of curves. For example, let M be a smooth

manifold, γ1 : [0, 1] → M and γ2 : [0, 1] → M be two curves defined on the manifold such

that γ1 (0) = γ2 (0) = m1 and γ1 (1) = γ2 (1) = m2 as shown in Fig. 3.3. Now suppose we

are interested in finding which curve has the shortest path from m1 to m2.

40

Figure 3.2: A depiction the smooth manifold M and tangent spaces at the points m1, m2, and m3.

To calculate the length of the curves, we can map the curves to local coordinates

using its atlas, and calculate the derivative of the curves to compute their tangent vectors

(denoted by blue arrows in Fig. 3.3). Using Proposition 2.10 of [25], that a smooth manifold

has a Riemannian metric, we can then equip the manifold with a Riemannian metric and

calculate the length of each curve by integrating over the length of the tangent vectors that

compose each curve.

Now suppose that we are interested in finding the “straightest curve”. In a similar

way as finding the length of each curve, we can measure how the direction of the tangent

vectors are changing along the curve since the Riemannian metric can measure angles be-

tween tangent vectors. The curve whose tangent vectors change the least in direction is the

“straightest curve.” A smooth manifold can be equipped with different Riemannian met-

rics. For this dissertation we use the induced Riemannian metric obtained by embedding

the manifold in Euclidean space. For the purposes of this dissertation, we do not need to

explicitly define that metric.

Definition 3.3.7. Let I ⊂ R, R denote a Riemannian manifold, TR denote its tangent

bundle and γ : I × TR → R be a parameterized curve such that γ (0, (p, v)) = p where p ∈ R

and v ∈ TpR. The parameterized curve is called a geodesic if it is straight according to the

Riemannian metric.

41

Figure 3.3: A depiction of the two curves γ1 : [0, 1] → M and γ2 : [0, 1] → M that begin at thepoint m1 ∈ M and end at the point m2 ∈ M , and their respective tangent vectors at differentpoints along the curve.

For a more formal definition of a geodesic see Definition 2.1 from [25].

Definition 3.3.8. A Riemannian manifold R is geodesically complete if for all p ∈ R

and for all v ∈ TpR there exists a geodesic γ : I × TR → R starting from p and is defined

for all values of t ∈ I.

The property of a Riemannian manifold being geodesically complete has many use-

ful implications according to the Hopf-Rinow Theorem. It states that if R is geodesically

complete, then for any two points p1, p2 ∈ R, there exists a geodesic between them. In other

words, there is at least one (possibly infinite) number of straight paths between any two

points. It also implies that the manifold is connected.

Definition 3.3.9. Let R be a Riemannian manifold and TR the corresponding tangent

bundle. The exponential map, denoted ExpR : TR → R, is the geodesic curve γ : I×TR →R restricted to the domain 1 × TM . e.g. let (p, v) ∈ TR, then ExpR (p, v) = γ (1, (p, v)).

The exponential map can be thought of as traveling from the point p in the direction of v for

unit length of time.

We denote the restriction of the exponential map at a point p ∈ R as ExpRp : TpR → R.

According to the Hopf-Rinow Theorem, if the Riemannian manifold is geodesically complete,

42

the exponential map ExpRp is a surjection from TpR to R, and there exists an open subset

V ∈ TpR and an open subset U ∈ R containing p with the relation ExpRp (U) = V such that

the exponential map restricted to V (denoted Expp |V ) is a diffeomorphism between V and

U [25]. This property has several important implications for us.

3.1. The map Expp |V has a smooth inverse called the logarithm map and is denoted

Logp |U .

3.2. The tuple(Logp, U

)is a smooth chart.

3.3. For any point q ∈ U , the geodesic from p to q is not only a straight curve, but the

shortest curve between the two points. (Proposition 3.6 from [25])

These properties imply that for a geodesically complete manifold we can find the

shortest path between two points and move from one point towards the other by finding

the vector that points from the first point to the second and moving in the direction of the

vector. An example of where this is applicable is state estimation. Suppose that the state

estimate is at a point p1 and a measurement is at a point p2. Using the logarithm map

we can create a geodesic connecting the two points and move from p1 towards p2 along the

straightest path. How this is precisely done will be shown in Section 6.

3.4 Lie Group

Definition 3.4.1. A Lie group is a Riemannian manifold G that is also a group in the

algebraic sense with the property that its binary operator and inverse map are both smooth.

We restrict our discussion of Lie groups to geodesically-complete, unimodular Lie

groups. A unimodular Lie group, is a Lie group whose group actions have a determinant of

one; i.e. they persevere volume. An example of a Lie group is the Special Euclidean group

of 2-dimensions defined as the set

SE (2) =

R p

01×2 1

∈ R3×3

∣∣∣∣∣∣

R ∈ SO (2) and p ∈ R2

,

43

with the binary operation being matrix multiplication, the inverse operation being the matrix

inverse, and the identity element being I3×3. We use In×n to denote the n×n identity matrix

and 0m×n to denote the m× n zeros matrix.

In this dissertation we do not specify the atlas used by the Lie group since we only

work in a single chart whose diffeomorphism is the exponential map, and the other charts

are unnecessary. We use the exponential map defined by the induced Riemannian metric

obtained by embedding the Lie group in an equal or higher-order Euclidean space. We do

not explicitly specify this metric since we do not use it directly, but we will specify the

exponential maps when we provide examples.

The tangent space of a Lie group G at the identity element is called the Lie algebra

of G and is denoted g ≜ TIG.

Definition 3.4.2. A Lie algebra g on the field of real numbers R is a real vector space

equipped with a binary map called the Lie bracket [·, ·] : g×g → g that satisfies the following

properties for all V1, V2, V3 ∈ g:

3.1. Bilinearity: For a1, a2 ∈ R,

[a1V1 + a2V2, V3] = a1 [V1, V3] + a2 [V2, V3]

[V3, a1V1 + a2V2] = a1 [V3, V1] + a2 [V3, V2] .

3.2. Antisymmetry: [V1, V2] = − [V2, V1]

3.3. Jacobi Identity: [V1, [V2, V3]] + [V2, [V3, V1]] + [V3, [V1, V2]] = 0

As an example, the Lie algebra of SE (2) is defined as the set

se (2) =

[ω]1× ρ

01×2 0

∈ R3×3

∣∣∣∣∣∣

ω ∈ R and ρ ∈ R2

,

where [·]1× is the skew symmetric operator defined in equation (C.3), and the bracket is

defined as [v1,v2] ≜ v1v2 − v2v1 where v1,v2 ∈ se (2).

The Lie algebra can take on various representations; however, by taking advantage of

its algebraic structure, elements of the Lie algebra can be expressed as the linear combination

44

of orthonormal basis elements ei where ei ∈ so (2). For example, let v ∈ se (2), then

v =∑3

i=1 aiei with ai denoting the coefficient associated with ei and where

e1 =

0 −1 0

1 0 0

0 0 0

e2 =

0 0 1

0 0 0

0 0 0

e3 =

0 0 0

0 0 1

0 0 0

.

The coefficients form an algebraic space isomorphic to the Lie algebra that we refer

to as the Cartesian algebraic space denoted RG where the subscript changes based on its

corresponding Lie group. Elements in the Cartesian algebraic space can be represented using

matrix notation as v = [a1, a2, . . .]⊤. Note that we distinguish between elements of the Lie

algebra using the bold font v ∈ g and their corresponding element in the Cartesian algebraic

space using the normal font v ∈ RG.

To map between the Lie algebra and its corresponding Cartesian algebraic space, we

use the the isomorphisms ·∧ : RG → g and ·∨ : g → RG called the wedge and vee maps

respectively. For an example, let v =

[ω]1× ρ

01×2 0

∈ se (2) the vee map for se (2) is defined

as

v∨ =

ρ

ω

= v ∈ RSE(2), (3.1)

and the wedge map is the inverse. Since the vee and wedge maps are linear isomorphisms,

any function and property defined on the Lie algebra can extended to its Cartesian algebraic

space. This permits us to work in the Cartesian algebraic space instead of the Lie algebra.

Fig. 3.4 depicts the relationship between a Lie group G, its Lie algebra g, and its

corresponding Cartesian algebraic space RG. Note that the Lie algebra and the Cartesian

algebraic space are defined as the tangent space of the Lie group at the identity element.

For other examples of common Lie groups in robotics see Appendix C.

3.4.1 Adjoints

Lie groups can perform group actions on itself and on their Lie algebra. We specify

the group actions by juxtaposition of an element of the group g ∈ G with an element of the

45

Figure 3.4: A depiction of the relationship between a Lie group G, its Lie algebra g, and itscorresponding Cartesian algebraic space RG.

Lie algebra v ∈ g such that gv is the left group action of g on v and vg is the right group

action.

The adjoint of g ∈ G is the conjugate group action of G on v ∈ g defined as

AdGg (v) ≜ gvg−1.

Let v ∈ RG with the relation v = v∨. The matrix adjoint of g ∈ G is a matrix representation

of G denoted AdGg such that AdGg v =(AdGg (v)

)∨. As an example, let g ∈ SE (2), v ∈ se (2),

and v ∈ RSE(2). The adjoint of SE (2) on se (2) is

AdSE(2)g (v) = gvg−1

=

R p

01×2 1

[ω]1× ρ

0 0

R p

01×2 1

−1

.

The matrix adjoint of SE (2) on RSE(2) is derived in [79] and is

AdSE(2)g =

R − [p]1×

01×2 1

.

It can be shown that AdSE(2)g v =

(

AdSE(2)g (v)

)∨

.

46

Let v1,v2 ∈ g and v1, v2 ∈ RG with the relations v∨1 = v1 and v∨

2 = v2. Another

adjoint is the adjoint of v1 ∈ g on g denoted adg

v1: g → g and is defined as

adGv1(v2) ≜ [v1,v2] ,

where [·, ·] is the Lie bracket defined in Definition 3.4.2. The matrix adjoint of v1 ∈ RG on

RG is a representation of RG denoted adGv1 such that adGv1v2 =(adG

v1(v2)

)∨. For example

let v1,v2 ∈ se (2) and v1, v2 ∈ RSE(2) with the relations v∨1 = v1 and v∨

2 = v2. The adjoint

of se (2) on se (2) is defined as

adSE(2)v1

(v2) = [v1,v2]

= v1v2 − v2v1,

and the matrix adjoint of v1 ∈ RSE(2) on RSE(2) is derived in [13] and is

adSE(2)v1

=

[ω1]1× − [1]1× ρ1

0 0

.

It can be shown that adSE(2)v1

v2 =(

adSE(2)v1

(v2))∨

. An interesting aside is that the matrix

adjoint of g is the Lie algebra of the matrix adjoint of G.

3.4.2 Exponential Map at Identity

When working with Lie groups we restrict the exponential and logarithm map at the

identity element of the Lie group G and generically define them as

ExpGI : RG → G

LogGI : G→ RG.

Note that we define the exponential map at identity with the domain as the Cartesian

algebraic space instead of the Lie algebra. When working with the logarithm map, it is

implied that its domain is restricted to the subset U ⊂ G such that the exponential map is

47

bijective. This means that there might be a subset V = G/U for which the logarithm map is

not defined. For geodesically-complete Lie groups, the subset V is so small compared to U

and sufficiently far enough from the Group identity element that this technicality is almost

always ignored. In this dissertation we ignore this technicality, but mention it to make you

aware of it. For the rest of this dissertation we will refer to the exponential map on Lie

groups restricted to the Lie group’s identity element as the exponential map.

For matrix Lie groups, the exponential and logarithm maps are the matrix exponential

and logarithm maps composed with the wedge and vee functions. For example, the Lie group

SE (2) is a matrix Lie group and its exponential and logarithm maps are defined as

ExpSE(2)I : RSE(2) → SE (2) ≜ Expm ∧

LogSE(2)I : SE (2) → RSE(2) ≜ ∨ Logm,

where Expm and Logm are the matrix exponential and logarithm. By definition, the matrix

exponential and logarithm are infinite Taylor series; however, there exists closed form solu-

tions for many matrix Lie groups. For example, let g ∈ SE (2) and v ∈ RSE(2), the closed

form solution of exponential and logarithm maps are

ExpSE(2)I (v) = Exp

SE(2)I

ρ

ω

=

Exp

SO(2)I (ω) V (ω) ρ

01×2 1

(3.2)

LogSE(2)I (g) = Log

SE(2)I

R p

01×2 1

=

V(

LogSO(2)I (R)

)−1

p

LogSO(2)I (R)

, (3.3)

where ExpSO(2)I and Log

SO(2)I are defined in equations (C.6) and (C.7) and the function V is

defined in equation (C.23). Unfortunately, not every Lie group is isomorphic to a matrix Lie

group; however, most of the interesting Lie groups are [77]. Thus, the matrix exponential

can serve as the exponential map for the majority of the interesting Lie groups.

Lie groups are able to shift the exponential map from the identity element using the

left and right group actions. To show how this is done, let g1, g2, g3 ∈ G and v ∈ RG with

48

the relation

g2 = g1ExpGI (v) .

This is a geodesic from g1 to g2 in the direction of the tangent vector v formed by using the

exponential map at identity and the left group action. In a way we are shifting the geodesic

from beginning at the identity element to beginning at g1. This concept is shown in Fig. 3.5,

and is referred to as left trivialization. Similarly we can form another geodesic from g1

to g3 in the direction of the tangent vector v using the exponential map at identity and the

right group action such that

g3 = ExpGI (v) g1.

This is referred to as right trivialization. The element g2 = g3 only if v = 0, g1 = I, or

the Lie group is commutative.

Figure 3.5: A depiction of a geodesic from g1 ∈ G to g2 ∈ G in the direction of v ∈ RG using theexponential map.

We can relate the left and right trivializations using the matrix adjoint on G. Let

v1, v2 ∈ RG and g1, g2 ∈ G with the relation g2 = g1ExpGI (v1) = ExpGI (v2) g1. The vectors

v1 and v2 are related by the matrix adjoint as

v2 = AdGg1v1 (3.4)

49

as discussed in [14]. In this dissertation we will primarily use the left trivialization to shift

geodesics from the group identity to other elements of the Lie group.

3.4.3 Jacobians of the Exponential Map

When working with Lie groups we need the partial derivatives of the exponential and

logarithm maps. These differentials are commonly called the right and left Jacobians and

correspond the right and left trivializations. The right and left Jacobians and their inverses

are defined to map elements of RG to the general linear group (set of invertible matrices)

that acts on RG. Let v ∈ RG, they are defined as

JGr (v) =∞∑

n=0

(−adGv

)n

(n+ 1)!, JGl (v) =

∞∑

n=0

(adGv

)n

(n+ 1)!,

JG−1

r (v) =∞∑

n=0

Bn

n!

(−adGv

)n, JG

−1

l (v) =∞∑

n=0

Bn

n!

(adGv

)n,

where Bn denote the Bernoulli numbers. The derivation of the left and right Jacobians

stems from the Baker-Campbell-Hausdorff formula and can be studied in [13], [77]. The

right Jacobian has the properties that for any v ∈ RG and any small v ∈ RG,

ExpGI (v + v) ≈ ExpGI (v) ExpGI(JGr (v) v

)(3.7a)

ExpGI (v) ExpGI (v) ≈ ExpGI

(

v + JG−1

r (v) v)

(3.7b)

Similarly for the left Jacobian,

ExpGI (v + v) ≈ ExpGI(JGl (v) v

)ExpGI (v)

ExpGI (v) ExpGI (v) ≈ ExpGI

(

v + JG−1

l (v) v)

.

50

For SE (2) and many other matrix Lie groups, the Jacobians have closed form solu-

tions. As an example let v ∈ RSE(2), the right Jacobian for SE (2) is

JSE(2)r (v) = JSE(2)

r

ρ

ω

=

Wr (ω) Dr (ω) ρ

01×2 1

(3.8)

where

Wr (ω) =

I if ω = 0

cosω−1ω

[1]× + sinωωI else

Dr (ω) =

I2

if ω = 0

1−cosωω2 [1]× + ω−sinω

ω2 I else

.

3.4.4 Direct Product Group

In this dissertation we occasionally combine a Lie group with another Lie group or

Cartesian algebraic space to form a new Lie group using the direct product. We can use a

Cartesian algebraic space to form a new Lie group since it is a Riemannian manifold and

has a group structure with the binary operator being addition; thus, the Cartesian algebraic

space is a Lie group with additional structure.

To show how we form a new Lie group, we will use the target’s state as an example.

We assume that the target has nearly constant velocity. Its pose is modeled as an element of

the Lie group g ∈ G, and its velocity is modeled as an element of the Cartesian algebraic space

v ∈ RG; thus, the target’s state is the the direct product of the Lie group G and the Cartesian

algebraic spaceRG denoted Gx ≜ G×RG. The operator of this Lie group is inherited from its

subgroups G and RG. Since RG is an algebraic space, it has a commutative group structure

with the group operator being addition. Let x1 = (g1, v1) ∈ Gx and x2 = (g2, v2) denote two

different states, the group operator and inverse of Gx are

x1 • x2 = (g1 • g2, v1 + v2) ,

51

x−11 =

(g−11 ,−v1

),

where the symbol • denotes the binary operator of G, the addition symbol denotes the

binary operator of RG, which is standard addition, and v−11 = −v1 since it is an element of

an algebraic space.

The corresponding Cartesian algebraic space of Gx= G×RG is Rx ≜ RG×RG. Let

u = (ug, uv) ∈ Rx and x = (g, v) ∈ Gx, the exponential and logarithm maps are

ExpGxI (u) =

(ExpGI (ug) , uv

)(3.9a)

LogGxI (x) =

(LogGI (g) , v

), (3.9b)

since the exponential map in Euclidean space is the identity map. The right Jacobian of Gx

is

JGxr (u) =

JGr (ug) 0n×n

0n×n I

. (3.10)

For more information about the direct product group see [80].

3.4.5 First Order Taylor Series and Partial Derivatives

Let g, g ∈ G, and let g ∈ RG be a small perturbation from I ∈ Rg with the relation

g = gExpGI (g). Also let f : G→ G be an arbitrary function. The first order Taylor series of

f evaluated at g is

f (g) ≈ f (g) ExpGI

(

∂f

∂g

∣∣∣∣g=g

g

)

,

where ∂f∂g

∣∣∣g=g

is the partial derivative of f with respect to g evaluated at g = g. Using the

definition and notation shown in [14] the partial derivative of f with respect to g is defined

as∂f

∂g= lim

τ→0

LogGI(f (g)−1 f

(gExpGI (τ)

))

τ(3.11)

where τ ∈ RG.

52

In Equations (3.11), we abuse notation by denoting the vector of the numerator

divided by each element of τ , as the numerator divided by the vector τ . Note that the limit

in equation (3.11) is done in the Cartesian algebraic space instead of the Lie group. This

is because derivatives are defined only on vector spaces. Since not every Lie group has a

vector space structure, we make use of the exponential map to perform the derivative in the

Cartesian algebraic space since it is a vector space.

3.4.6 Expressing Uncertainty on Lie Groups

We use Gaussian distributions to model the uncertainty in the sensor, state estimate,

and system model. As an example, let x ∈ Rx be a zero-mean, Gaussian random variable

with mean µ and covariance P denoted x ∼ N (µ = 0, P ) with the probability density

function defined as

p (x) = η exp

(

−1

2x⊤P−1x

)

(3.12)

with η denoting the normalizing coefficient. Gaussian distributions are defined on vector

spaces. Since not every Lie group has a vector space structure (e.g. SE (2) is not a vector

space since scalar multiplication is not defined on the set) Gaussian distributions cannot

be defined directly on every Lie group. However, they can be defined on the Cartesian

algebraic space at the identity element of the Lie group and extended to the Lie group

using the exponential map. Thus, the probability of an element of the Lie group is the

probability of the corresponding element of the Cartesian algebraic space. For example, let

x1 = ExpGxI (x1) where x1 ∈ Gx and x1 ∈ Rx. The probability of x1 is the probability of

x1 which is p (x = x1) as defined in (3.12). This relation is depicted in Fig 3.6 where the

random variable x is defined on the tangent space with a Gaussian distribution depicted as

a green curve and centered at the origin.

For the uncertainty to be indirectly defined over the entire Lie group, the Lie group

is required to be geodescially complete. This allows the Gaussian distribution to extend to

every element of the Lie group via the exponential map.

Depending on the Lie group, the exponential map may not be injective, which means

that possibly an infinite number of elements of the Cartesian algebraic space will map to the

53

Figure 3.6: A depiction of the Gaussian distribution.

same element of the Lie group. In this case, we require the uncertainty to have a concentrated

Gaussian density (CGD) [81]. The CGD is a zero mean Gaussian distribution that is tightly

focused around the origin of the Cartesian algebraic space, where by tightly focused we mean

that the majority of the probability mass is in a subset U ⊆ Rx centered around the origin

such that the exponential mapping from U to Gx is injective, and that the probability of an

element not being in U is negligible. This property allows us to ignore the probability of an

element being outside of U .

The CGD can be centered at another element other than the group identity element

using the group action provided that the Lie group is unimodular. The unimodular property

ensures that as the Gaussian distribution is moved on the manifold, its probability mass

density does not change. To show how uncertainty is moved around, let the target’s state

be represented as

x = xExpGxI (x) , (3.13)

where x ∈ Gx is the target’s state estimate and x is the error state whose probability density

function is defined in equation (3.12). The exponential function maps the random variable

x to the Lie group, and the state estimate moves the uncertainty to the target’s state while

preserving the distribution of the probability density functions. These properties allow the

probability of the state x to be related to the corresponding probability of the error state x.

54

Thus, the uncertainty distribution of x is defined by the distribution of x as

p (x) = η exp

(

−1

2LogGx

I

(x−1x

)⊤P−1LogGx

I

(x−1x

))

(3.14a)

= η exp

(

−1

2x⊤P−1x

)

= p (x) . (3.14b)

With a slight abuse of notation, we denote the probability density function (PDF) of the

state x as x ∼ N (x, P ) where x is the state estimate and P is the error covariance of the

error state x.

An advantage to representing the uncertainty in the Cartesian algebraic space is

having a minimum representation of the uncertainty. For example, an element of the matrix

group SE (2) has three dimensions but is represented by a 3× 3 matrix with nine elements.

Representing the uncertainty directly on the set of 3× 3 matrices with nine elements would

require the covariance to be 9 × 9; whereas, the corresponding Cartesian algebraic space

only has three components and the corresponding covariance matrix will be 3× 3. For more

information about representing uncertainty on Lie groups see [27], [36], [82], [83].

55

CHAPTER 4. SYSTEM MODEL

We assume that the target can be modeled using a constant-velocity, Gaussian white-

noise driven, time-invariant system model. Under this assumption, the target’s state is the

direct product of the pose (e.g. position and orientation) and the twist (e.g. translational

and angular velocities), where the pose is an element of a Lie group g ∈ G and the twist is an

element of the Cartesian algebraic space v ∈ RG. We denote the target’s state at time tk ∈ R

as xk = (gk, vk) ∈ Gx ≜ G×RG. Let ti:k ∈ R denote the time interval from time ti to time

tk. The system process noise over the time interval ti:k is modeled as a Wiener process as

defined in [24] and is denoted qi:k = (qgi:k, qvi:k) ∼ N (0, Q (ti:k)) ∈ Rx, where q

gi:k ∈ RG is the

process noise that corresponds to the pose, qvi:k ∈ RG is the process noise that corresponds

to the twist, and Rx= RG ×RG is the Cartesian algebraic space of Gx.

We assume that there are one or more sensors that observe the target at fixed or

non-fixed time intervals, and that some or all of the sensors may provide measurements at

the same time. To distinguish between the sensors we use the indexing set S = 1, 2, 3, . . .,where si ∈ S denotes the ith sensor. The measurements from sensor si are elements of sensor’s

measurement space denoted Gsi with corresponding Cartesian algebraic space Rsi . We

denote a measurement from sensor si at time tk as zsik ∈ Gsi , and we denote the measurement

noise from sensor si at time tk as rsik ∼ N (0, Rsi) ∈ Rsi .

Under the specified assumptions, the generic, discrete system model is

xk = f (xi, qi:k, ti:k) (4.1a)

zsik = hsi (xk, rsik ) , (4.1b)

where f : Gx ×Rx × R → Gx is the state transition function defined as

f (xi, qi:k, ti:k) ≜ xiExpGxI ((ti:kvi + qgi:k, q

vi:k)) (4.2a)

56

=(giExp

GI (ti:kvi + qgi:k) , vi + qvi:k

), (4.2b)

and hsi : Gx × Rsi → Gsi is the observation function. We do not explicitly define the

observation function since it is application dependent; however, we will provide examples

of observation functions when we present applications of G-MTT later on. This system

model is similar to the one defined in [84], with the difference being how the process noise

is modeled.

By the definition of the state transition function, the state is propagated forward in

time when tk > ti, backwards in time when tk < ti, and remains unchanged when tk = ti.

4.1 System Affinization

The uncertainty in the system model is represented by Gaussian random variables

defined in the Cartesian algebraic spaces of the state and measurement. Depending on the

system model for a particular Lie group, the propagation of the Gaussian random variables

are either affine or nonaffine. In the affine case, the random variables retain their Gaus-

sian structure, and in the nonaffine case the random variables lose their Gaussian structure

because Gaussian random variables are only preserved under affine transformations.

In the nonaffine case and under the assumption that the signal-to-noise ratio (SNR)

is high, the Gaussian structure of the random variables can be preserved with little loss

of information by approximating their propagation as affine. This does not mean that the

system model is affine, but rather the propagation of the random variables or uncertainty is

affine. For lack of a better term, we refer to this “affinized” form as the affine system model.

The assumption that the signal-to-noise ratio is high comes from the extended Kalman filter.

It means that the zero-mean noise is tightly concentrated about the mean; thus, the majority

of the information about the noise is within a small perturbation of the mean and can be

well approximated by an affine system.

Let xi = xiExpGxI (xi) where xi is the target’s state estimate, and xi ∈ Rx is the error

state and is small. The system model is made “affine” by computing its first order Taylor

57

series at the points

ζfi:k= (xi = xi, qi:k = 0, ti:k) (4.3)

ζhsik= (xk = xk, r

sik = 0) , (4.4)

using the method discussed in Subsection 3.4.5. The affine system model requires calculating

the Jacobians of the state transition function and observation functions with respect to the

state, process noise, and measurement noise.

The Jacobians of the state transition function can be explicitly defined for all Lie

groups and are derived in this section. We generically define the Jacobians of the observation

function in this section, and we explicitly define them when we get to specific examples.

Lemma 4.1.1. Given the discrete, time-invariant model in equations (4.1) and (4.2), the

Jacobians of the state transition function and observation functions evaluated at the points

ζfi:k and ζhsik are

Fi:k=∂f

∂x

∣∣∣∣ζfi:k

=

AdG

ExpGI (ti:k vi)−1 JGr (ti:kvi) ti:k

0n×n In×n

(4.5a)

Gi:k=∂f

∂q

∣∣∣∣ζfi:k

=

JGr (ti:kvi) 0n×n

0n×n In×n

(4.5b)

Hsik

=∂hsi

∂x

∣∣∣∣ζhsik

(4.5c)

V sik

=∂hsi

∂rsi

∣∣∣∣ζhsik

, (4.5d)

where n is the dimension of the target’s pose.

Consequently,if the process noise and measurement noise are concentrated about its

zero-mean and xi ∈ Gx is a state estimate that is close to xi, and if the error state between

xi and xi is defined as xi= LogGx

I

(x−1i xi

)∈ Rx, then the evolution of the system can be

described by the “affinized system”

xk ≈ f (xi, 0, ti:k)ExpGxI (Fi:kxi +Gi:kqi:k) (4.6a)

58

zsik ≈ hsi (xk, 0)ExpGsiI (Hsi

k xk + V sik r

sik ) , (4.6b)

and the propagated error state is xk ≈ Fi:kxi + Gi:kqi:k. Note how the propagation of the

random variables are affine; therefore, preserving the Gaussian structure or the random

variables.

Proof. We begin with deriving Fi:k. The Jacobian Fi:k is evaluated with the process noise

set to zero, so to simplify this derivation we will initially set the process noise to zero as it

will have no impact on the derivation. Let τ = (τ g, τ v) ∈ Rx where τ g, τ v ∈ RG denote the

perturbation of the state when computing the derivative. The derivation of Fi:k is written

as if we are perturbing the state by the vector τ where all of its elements are simultaneously

non-zero; however, we are only perturbing the state by a single element of τ at a time. We

do this to condense notation.

Using the definition of the derivative in equation (3.11)

∂f

∂xi= lim

τ→0

LogGxI

(f (xi, 0, ti:k)

−1 f(xiExp

GxI (τ) , 0, ti:k

))

τ.

Substituting the definition of the state transition function in equation (4.2) yields

∂f

∂xi= lim

τ→0

LogGxI

((giExp

GI (ti:kvi) , vi

)−1 (giExp

GI (τ g) ExpGI (ti:kvi + ti:kτ

v) , vi + τ v))

τ

= limτ→0

LogGxI

((ExpGI (ti:kvi)

−1 g−1i giExp

GI (τ g) ExpGI (ti:kvi + ti:kτ

v) , vi + τ v − vi))

τ

= limτ→0

LogGxI

((ExpGI (ti:kvi)

−1 ExpGI (τ g) ExpGI (ti:kvi + ti:kτv) , τ v

))

τ

Using the property of the adjoint in equation (3.4) and the property of the right

Jacobian in equation (3.7) gives

∂f

∂xi= lim

τ→0

LogGxI

((

ExpGI (ti:kvi)−1 ExpGI (τ g) ExpGI (ti:kvi) Exp

GI

(JGr (ti:kvi) ti:kτ

v), τv))

τ

= limτ→0

LogGxI

((

ExpGI

(

AdGExpGI (ti:kvi)

−1τg)

ExpGI(JGr (ti:kvi) ti:kτ

v), τv))

τ

59

= limτ→0

(

LogGI

(

ExpGI

(

AdGExpGI (ti:kvi)

−1τg)

ExpGI(JGr (ti:kvi) ti:kτ

v))

, τv)

τ

The portion of the derivative corresponding to τ g is computed as

∂f

∂gi= lim

τg→0

(

LogGI

(

ExpGI

(

AdGExpGI (ti:kvi)

−1τ g))

, 0)

τ g(4.7a)

= limτg→0

(

AdGExpGI (ti:kvi)

−1τ g, 0)

τ g=

AdG

ExpGI (ti:kvi)−1

0n×n

, (4.7b)

and the portion of the derivative corresponding to τ v is computed as

∂f

∂vi= lim

τv→0

(LogGI

(ExpGI


v)), τ v)

τ v(4.8a)

= limτv→0


v, τ v)

τ v=

JGr (ti:kvi) ti:k

In×n

. (4.8b)

Combining equations (4.7b) and (4.8b) yields (4.5a).

Next we derive the Jacobian Gi:k. This Jacobian is evaluated with the process noise

set to zero, so to simplify the derivation we will initially set the process noise to zero as it

will have no impact on the derivation. Let ξ = (ξg, ξv) ∈ Rx denote the perturbation of the

process noise when computing the derivative where ξg, ξv ∈ RG. From equation (4.1) we

have∂f

∂q= lim

ξ→0

LogGxI

(f (xi, 0, ti:k)

−1 f (xi, ξ, ti:k))

ξ.

Substituting the definition of the state transition function in equation (4.2) yields

∂f

∂q= lim

ξ→0

LogGxI

((giExp

GI (ti:kvi) , vi

)−1 (giExp

GI (ti:kvi + ξg) , vi + ξv

))

ξ

= limξ→0

LogGxI

((ExpGI (ti:kvi)

−1 g−1i giExp

GI (ti:kvi + ξg) , vi + ξv − vi

))

ξ

= limξ→0

LogGxI

((ExpGI (ti:kvi)

−1 ExpGI (ti:kvi + ξg) , ξv))

ξ

Using the property of the right Jacobian in equation (3.7) gives

60

∂f

∂q= lim

ξ→0

LogGxI

((ExpGI (ti:kvi)

−1 ExpGI (ti:kvi) ExpGI

(JGr (ti:kvi) ξ

g), ξv))

ξ

= limξ→0

LogGxI

((ExpGI

(JGr (ti:kvi) ξ

g), ξv))

ξ

= limξ→0

(LogGI

(ExpGI

(JGr (ti:kvi) ξ

g)), ξv)

ξ

= limξ→0

(JGr (ti:kvi) ξ

g, ξv)

ξ

=

JGr (ti:kvi) 0n×n

0n×n In×n

4.2 Transforming Measurements and States

As specified in Chapter 2, geometric multiple target tracking has two sensor modes.

In sensor mode one, it is assumed that the measurements are expressed in and with respect

to the current tracking frame when given to G-MTT, and that the tracking frame can

move. If the tracking frame moves, then information on how to transform the measurements,

measurement covariances, state estimates, and error covariances from the previous tracking

frame to the current tracking frame must be provided to G-MTT. In this section we will

show how to transform the measurement covariances, state estimates and error covariances

provided that the state and measurement transformations are known.

Lemma 4.2.1. In this lemma we drop the dependency on time and assume that the state,

measurement, and error covariance are at the same time. Let T x : Gx → Gx denote the

transformation of the state from the previous tracking frame to the current tracking frame and

T si : Gsi → Gsi denote the transformation of a measurement produced by sensor si from the

previous tracking frame to the current tracking frame. Let Gx ∋ xa ∼ N (xa, Pa) denote the

state expressed in and with respect to the previous tracking frame with state estimate xa ∈ Gx

and error state xa ∼ N (0, Pa) ∈ Rx. Also let zsia ∈ Gsi denote a measurement from sensor si

expressed in and with respect to the previous tracking frame with corresponding measurement

61

noise Rsi ∋ rsia ∼ N (0, Rsia ). We suppose directly that the transformations T x and T si

are known, differentiable and that their derivatives are continuous, then the transformation

of the state estimate, error covariance, measurement and measurement covariance from the

previous tracking frame to the current tracking frame are

xb = T x (xa)

Pb =∂T x

∂x

∣∣∣∣xa

Pa∂T x

∂x

∣∣∣∣

⊤

xa

zsib = T si (zsia )

Rsib =

(

∂T si

∂z

∣∣∣∣zsia

∂hsi

∂r

∣∣∣∣(xa,0)

)

Rsia

(

∂T si

∂z

∣∣∣∣zsia

∂hsi

∂r

∣∣∣∣(xa,0)

)⊤

,

where xb, Pb, zsib and Rsi

b are the transformed state estimate, error covariance, measurement

and measurement covariance expressed in and with respect to the new tracking frame. ∂Tx

∂x

∣∣xa

denotes the partial derivative of T x with respect to the state and evaluated at xa,∂T si

∂z

∣∣zsia

denotes the partial derivative of T si with respect to the measurement and evaluated at zsia ,

and ∂hsi∂r

∣∣(xa,0)

denotes the partial derivative of the observation function with respect to the

measurement noise and evaluated at the state estimate xa and zero measurement noise as

defined in equation (4.5).

Proof. The transformations T x and T si can be an affine or a non-affine transformation. In

either case we affinize the transformation to preserve the Gaussian structure of the measure-

ment noise and uncertainty in the error state. We begin with the state. The affinization

of the state transform is done using the method in Subsection 3.4.5 by computing the first

order Taylor series. The first order Taylor series of the state transform is

xb = T x (xa)

≈ T x (xa) ExpGxI

(∂T x

∂x

∣∣∣∣xa

xa

)

,

where ∂Tx

∂x

∣∣xa

is calculated using equation (3.11). The transformed state estimate is xb =

T x (xa), and the transformed error state is xb = ∂Tx

∂x

∣∣xaxa. To calculate the transformed

62

error covariance, we calculate the covariance of xb, which is

cov (xb) = E[xbx

⊤b

]

= E

[(∂T x

∂x

∣∣∣∣xa

xa

)(∂T x

∂x

∣∣∣∣xa

xa

)⊤]

=∂T x

∂x

∣∣∣∣xa

E[xax

⊤a

] ∂T x

∂x

∣∣∣∣

⊤

xa

=∂T x

∂x

∣∣∣∣xa

Pa∂T x

∂x

∣∣∣∣

⊤

xa

To transform the measurement covariance we construct a new observation function

using the transform T si and the observation function defined in equation (4.1b) to get

zsib = T si (h (xa, rsia )) .

Using the process in Section 4.1 and method in Subsection 3.4.5 we affinize the new obser-

vation function to get

zsib ≈ T si (h (xa, 0)) ExpGsiI

(

∂T si

∂z

∣∣∣∣hsi (xa,0)

∂h

∂r

∣∣∣∣(xa,0)

rsia

)

,

Since the value of xa and h (xa, 0) are unknown, we approximate these values by

setting xa to xa and h (xa, 0) to zsia to get

zsib ≈ T si (zsia ) ExpGsiI

(

∂T si

∂z

∣∣∣∣zsia

∂hsi

∂r

∣∣∣∣(xa,0)

rsia

)

.

Thus, the transformed measurement noise covariance is

cov (rsib ) = E[

rsib (rsib )⊤]

= E

(

∂T si

∂z

∣∣∣∣zsia

∂hsi

∂r

∣∣∣∣(xa,0)

rsia

)(

∂T si

∂z

∣∣∣∣zsia

∂hsi

∂r

∣∣∣∣(xa,0)

rsia

)⊤

=

(

∂T si

∂z

∣∣∣∣zsia

∂hsi

∂r

∣∣∣∣(xa,0)

)

Rsia

(

∂T si

∂z

∣∣∣∣zsia

∂hsi

∂r

∣∣∣∣(xa,0)

)⊤

63

4.3 Extending the Observation Function with State Transformations

In sensor mode two, the tracks’ state estimates and error covariances are transformed

from the tracking frame to each sensor frame to perform data association and to update

the tracks. This requires adding the transformation into the observation function. In this

section we drop the notation of time and assume that the measurements and states are at

the same time. Let T sit : Gx → Gx denote the state transform from the tracking frame to

the sensor frame of sensor si, zsisidenote a measurement from sensor si expressed in and with

respect to its corresponding sensor frame, Gx ∋ xt ∼ N (xt, Pt) denote the state expressed in

and with respect to the tracking frame with corresponding error state Rx ∋ xt ∼ N (0, Pt),

and let h : Gx×Rsi → Gsi be the observation function defined in equation (4.1b) that maps

the target’s state expressed in and with respect to the sensor frame of si to a measurement

expressed in and with respect to the sensor frame of si. The extended observation function

is

zsisi = hsi (T sit (xt) , rsi) . (4.10)

Using the chain rule and the method in Subsection 3.4.5, the affinized extended

observation function is

zsisi ≈ hsi (T sit (xt) , 0) ExpGsiI (Hsi

t xt + V sirsi) . (4.11)

where

Hsit =

∂hsi

∂x

∣∣∣∣(T si

t (xt),rsi)

∂T sit∂x

∣∣∣∣xt

(4.12)

V si =∂hsi

∂r

∣∣∣∣(xt,0)

, (4.13)

and the Jacobians ∂hsi∂x

and ∂hsi∂r

are defined in equation (4.5).

The rest of this dissertation is presented using the non-extended observation function,

and is easily adapted to the extended observation function by substituting in the definition

64

of the extended observation function and its Jacobians for the non-extended observation

function and its Jacobians.

65

CHAPTER 5. MEASUREMENT MANAGEMENT

In this chapter we will discuss how G-MTT organizes and manages measurements

from sensors. Measurements are stored in either a cluster if they have not been associated

to a track or a track’s consensus set if they have been associated to a track.

To understand the construction of clusters, we must analyze the behavior of constant-

velocity targets being tracked. For a moment, assume that a constant-velocity target in two

dimensional space is being observed by a sensor at a fixed rate, and that the sensor observes

the target’s position. As the target moves and time elapses, the target forms a continuous

trajectory, and the target-originated measurements should be close to the trajectory clustered

together and approximately equidistantly spread along the trajectory. This idea is illustrated

for three targets in Fig. 5.1. The targets’ trajectories are illustrated as the three arrowed

lines. The black dots represent non-target originated measurements. The red dots represent

measurements originated from the first target. The green dots represent measurements

originated from the second target. The orange dots represent measurements originated from

the third target. The fading of the measurement’s color is used to denote time where the

darker color indicates the more recent the measurements.

As illustrated in Fig. 5.1, measurements that are target originated are clustered to-

gether. Measurements from the same time and target are very close together. Measurement

from the same target and different times seem to have a fairly fixed space between them

related by the target’s velocity and the time interval between measurements. We leverage

the target-originated measurements’ relation in space and time by grouping them together

into a cluster.

A cluster is a grouping of neighboring measurements in space and time. An illustra-

tion of possible clusters pertaining to the scenario depicted in Fig. 5.1 is shown in Fig. 5.2.

This figure shows three different clusters. The first cluster C1 contains the measurements

66

Figure 5.1: A depiction of the trajectory of three targets being observed in two dimensional space.The target’s trajectory are illustrated as the three pointed lines. The black dots represent non-target originated measurements. The red dots represent measurements originated from the firsttarget. The green dots represent measurements originated from the second target. The orange dotsrepresent measurements originated from the third target. The fading of the measurement’s color isused to denote time where the darker color indicates the more recent the measurement.

from the first target (red dots) and a few false measurements (black dots). The second cluster

C2 contains the measurements from the second target (green dots) and a few false measure-

ments. The third cluster C3 contains the measurements from the third target (orange dots)

and a false measurement. We want to design the clusters so that ideally a cluster contains

all of the target-originated measurements from a single target and no false measurements.

We do this by defining a cluster using the three metrics: the cluster distance metric, the

cluster time metric, and the cluster velocity metric.

Let zsik,l ∈ Gsi denote the lth measurement from sensor si at the time tk, zsjk,c ∈ Gsj

denote the cth measurement from sensor sj at time tk, φssi: Gsi → Gs and φ

ssj

: Gsj → Gs

denote injective maps from the measurement space Gsi (Gsj respectively) to a common

measurement space Gs. The cluster distance metric between measurements from sensors si

and sj is denoted dsi,sjCD : Gsi ×Gsj → R and is defined as

dsi,sjCD

(zsik,l, z

sjk,c

) =∥∥∥LogGs

I

(

φssi(zsik,l)−1

φssj(zsjk,c

))∥∥∥ . (5.1)

where it maps each measurement from their respective measurement space to a common

measurement space. After being mapped to a common measurement space, the tangent

67

Figure 5.2: An illustration of possible clusters pertaining to the scenario depicted in Fig. 5.1.

vector associated with the geodesic that begins at φssi(zsik,l)and ends at φssj

(zsjk,c

)is computed

using the LogGsI map. The norm of this tangent vector is considered the distance between

the two points, which is consistent with measuring length on a Riemannian manifold.

The map φssi can be the identity map, a projection map, etc. As an example, suppose

that sensor si measures the position and velocity of the target and that sensor sj measures

only the target’s position. The map φsjsi would project the measurement zsik,l onto Gsj by

extracting the position. Now that both measurements are in a common measurement space,

we can calculate the distance between them. If the metric dsi,sjCD

(zsik,l, z

sjk,c

)exists, we say that

the measurement spaces si and sj are compatible. G-MTT assumes that all the measure-

ments spaces are compatible to ensure that target-originated measurements from different

sensors can be compared.

Let zsia,l denote the lth measurement from sensor si at time ta, and let zsjb,c denote the

cth measurement from sensor sj at time tb. We define the cluster time metric as

dCT(zsia,l, z

sjb,c

)= ∥ta:b∥ = ∥ta − tb∥ , (5.2)

which is the norm of the time interval between measurements. We define the cluster velocity

metric as

dsi,sjCV

(zsia,l, z

sjb,c

)=dsi,sjCD

(zsia,l, z

sjb,c

)

dCT(zsia,l, z

sjb,c

) , (5.3)

68

where we calculate the distance between the two measurements and divide by the time

interval between them to estimate the magnitude of the target’s velocity. The intuition

behind this metric is that a constant velocity target should produce measurements consistent

with the target’s velocity.

A non-track associated measurement zsik,l is associated with cluster Cm if there exists

a measurement zsjk,c ∈ Cm from the same time as zsik,l such that d

si,sjCD

(zsik,l, z

sjk,c

)≤ τCD, or

if there exists a measurement zsjb,c from a different time such that dCT

(zsik,l, z

sjb,c

)≤ τCT and

dsi,sjCV

(zsik,l, z

sjb,c

)≤ τCV . The constraint on the time interval is necessary to ensure that

measurements far apart in distance and time are not associated with each other; otherwise,

all the measurements would be in one big cluster.

To keep the memory storage requirements feasible, at the beginning of the data

management phase discussed in Section 2.2, G-MTT removes measurements from clusters

and consensus sets that have a time stamp before the sliding time window TW . In addition,

during the track initialization phase, measurements used to initialize a new track are removed

from the cluster. After this phase G-MTT checks all of the clusters to see if there exists

a measurement in each cluster with a time stamp within the threshold TCT of the current

time. If such a measurement is not found within the cluster, the entire cluster is removed.

It is removed because no new measurement will be associated to the cluster and it does not

contain measurements that G-MTT can use to initialize new tracks. If it did, G-MTT would

have already initialized a track from the cluster.

69

CHAPTER 6. LIE GROUP INTEGRATED PROBABILISTIC DATA AS-SOCIATION FILTER

When tracking a single target whose initial position is unknown using a sensor, the

sensor produces non-target-originated measurements (i.e. false measurements or clutter). In

the presence of dense clutter, it is a challenge to locate and track the target since it is difficult

to distinguish between target-originated measurements and false measurements. The typical

approach to tracking the target is to use new measurements to either improve the estimate

of an existing track (a track is a representation of the target which consists of at least the

state estimate) or initialize new tracks. If the clutter density is high, numerous tracks that

do not represent the target are initialized from clutter.

An example scenario illustrating the challenge of tracking in clutter is depicted in

Fig. 6.1 where the black dots represent measurements, the green car represents the true tar-

get, and the blue cars represent the tracks currently in memory. The left image represents a

time step when measurements are received for the first time. The right image represents a

subsequent time step when previous measurements are used to initialize new tracks and addi-

tional measurements are received. A challenge to solve is identifying which track represents

the target.

Identifying which track best represents the target requires the additional statistical

measure called the track likelihood . Tracks with a low track likelihood are rejected and pruned

from memory, and tracks with a high track likelihood are confirmed and are candidates to

represent the target. The confirmed track with the highest track likelihood is used as the

best estimate of the target.

Different approaches to calculating the track likelihood depend on the data association

algorithm. Data association is the process of assigning new measurements to existing tracks

or a cluster so that the track-associated measurement can be used to improve the estimates

of the tracks. There are two types of data association: hard data association and soft data

70

Figure 6.1: A depiction of the challenge of identifying which track represents the target. The blackdots represents measurements, the green car represents the target, and the blue cars representtracks. The left image represents the first time step when measurements are received for the firsttime. The right image represents a subsequent time step when previous measurements are used toinitialize new tracks and additional measurements are received.

association. Hard data association assigns at most one new measurement to each track, and

soft data association can assign multiple new measurements to each track.

Tracking algorithms that use hard data association, such as the Nearest Neighbor

filter (NNF) [42], the Global Nearest Neighbor filter (GNNF) [43], [44] and track splitting

algorithms, commonly use the likelihood function or the negative log likelihood function

(NLLF) to determine if a track should be rejected [74]. The likelihood function measures the

joint probability density of all of the measurements associated with a track. If the track’s

likelihood function falls below some threshold, the track is rejected. This approach does

not indicate which track best represents the target, only which tracks should be removed;

however, it is common to assume that the track with the highest likelihood best represents

the target.

Another approach to quantify the track likelihood when using hard data association

is based on the sequential probability ratio test (SPRT) [85]. The SPRT uses a sequence

of data to either confirm a null hypothesis or reject the alternate hypothesis by analyzing

the probability ratio of the two hypotheses. In terms of tracking, the SPRT calculates the

joint probability density of the measurements associated to a track for the hypothesis that

all the measurements are target originated and for the hypothesis that all the measurements

are false. It then takes the ratio of the two probability densities and rejects the track if the

ratio is below a threshold, confirms the track if the ratio is above a threshold, or continues to

gather more information as new measurements are received until the track can be confirmed

71

or rejected [86]. The SPRT is used in a variety of tracking algorithms including the Multiple

Hypothesis Tracker (MHT) [48].

The two aforementioned methods of quantifying the track likelihood do not work

for soft data association algorithms, and therefore does not work with the probabilistic

data association filter (PDAF) [45], [46]. For this reason, the PDAF was extended in [47]

to calculate the track likelihood using a novel approach called the Integrated Probabilistic

Data Association Filter (IPDAF).

A nice feature of the IPDAF is that it can be used with many different types of system

models including system models defined on Lie groups. To our knowledge, the IPDAF has not

been adapted to Lie groups; however, the joint integrated probabilistic data association filter

(JIPDAF) was adapted to the Lie group SE (2) in [37]. The JIPDAF is the adaptation of the

IPDAF to tracking multiple targets. When tracking only a single target, the JIPDAF reduces

to the IPDAF. However, the reduction would require understanding the more complicated

JIPDAF instead of the simpler IPDAF. Since the main focus of [37] was not the adaptation

of the JIPDAF to general Lie groups, the algorithm was not derived and explained in detail.

The purpose of this chapter is to present the IPDAF adapted to geodesically complete,

unimodular Lie groups. We refer to the resulting algorithm as the Lie group integrated

probabilistic data association filter (LG-IPDAF).

The rest of this chapter is outlined as follows. In Section 6.1 we provide a brief

overview of the LG-IPDAF algorithm. In Sections 6.2 and 6.4 we present the prediction

and update steps of the LG-IPDAF. In Section 6.3 we present the track data association

algorithm. In Section 6.5 we present experimental results, and finally conclude this chapter

in Section 6.6.

6.1 Overview

The LG-IPDAF is used by G-MTT in the data management phase to associate new

measurements to tracks, propagate tracks, and update tracks with associated measurements

as discussed in Section 2.2. The LG-IPDAF is designed to track a single target using a

single sensor in the presence of clutter. However, it can be easily adapted to work with

multiple targets and multiple sensors. We will discuss this adaptation at different points in

72

the chapter, and proceed under the assumption that we are tracking a single target using a

single sensor unless otherwise mentioned.

At this time we introduce and review notation. Let zk,j denote the jth track-associated

measurement at time tk, and mk denote the number of track-associated measurements at

time tk. The set of measurements associated to the track at time tk is denoted Zk =

zk,jmk

j=1. The cumulative set of track-associated measurements from the initial time to time

tk is denoted Z0:k = Zℓkℓ=0. Note that we have dropped the superscript identifying the

corresponding sensor since we assume a single sensor in this chapter. We denote the Lie

group and corresponding Cartesian algebraic space of the measurements as Gs and Rs.

Let ϵ denote a Bernoulli random variable that represents the probability that the

track represents the target. We refer to the probability that the track represents a target

conditioned on Z0:k as the track likelihood and denote it as p (ϵk | Z0:k). Conversely, we

denote the probability that the track does not represent a target conditioned on Z0:k as

p (ϵk = F | Z0:k) = 1− p (ϵk | Z0:k).

The LG-IPDAF propagates and updates the tracks as time progresses and new mea-

surements are received. This means that the state estimates, error covariances, and track

likelihoods change as they are propagated and updated. Let tk− denote the time previous to

tk. The state estimate and error covariance at time tk− conditioned on track representing a

target and the measurements Z0:k− are denoted xk−|k− and Pk−|k− , and the track likelihood

at time tk− conditioned on the measurements Z0:k− is denoted ϵk−|k− . The first subscript

indicates the time, and the second subscript indicates the measurements the random variable

is conditioned on.

We denote and define the probability of the track’s state conditioned on the previous

track-associated measurements and it representing a target as

p (xk | ϵk, Z0:k−) = η exp

(

−1

2x⊤k|k−P

−1k|k− xk|k−

)

, (6.1)

where xk|k− = LogGxI

(

x−1k|k−xk

)

is the error state, xk|k− is the state estimate and Pk|k− is the

error covariance at time tk conditioned on the measurements Z0:k− .

73

When new measurements are received, the LG-IPADF algorithm performs three steps:

the prediction step, the data association step, and the update step. We describe these steps

for a single initialized track. During the prediction step the track’s state estimate, error

covariance and track likelihood are propagated forward in time using the process described

in Section 6.2. A track’s propagated state estimate, error covariance and track likelihood are

denoted xk|k− , Pk|k− , and ϵk|k− respectively. During the data association step, new measure-

ments are either associated to a track or given to a cluster as described in Section 6.3. The

track associated measurements are denoted zk,jmk

j=1. The data association step is discussed

in detail in Section 6.3. During the update step, the track’s state estimate xk|k− is copied

for every associated measurement and updated using a distinct associated measurement to

produced the split state estimatesxk|k,j

mk

j=0, where xk|k,j is the state estimate after being

updated with measurement zk,j and xk|k,j=0 = xk|k− is the null hypothesis that none of the

measurements originated from the target. The split state estimatesxk|k,j

mk

j=0are then fused

together according to their probability of being the correct state estimate. This way, the

LG-IPDAF is never totally correct but never completely wrong as in the case of hard data

association algorithms. The update step is described in detail in Section 6.4. A depiction of

a single iteration of the LG-IPDAF is shown in Fig. 6.2.

The LG-IPDAF makes the following assumptions:

6.1. There exists a single target that can be observed by a single sensor and modeled by a

constant-velocity, white-noise driven target model defined in equation (4.1).

6.2. A sensor scan occurs whenever the sensor observes the measurement space. At every

sensor scan there are mk validated measurements denoted zk,jmk

j=1 = Zk.

6.3. At every scan there is at most one target originated measurement and all others are

false (i.e. non target-originated measurements).

6.4. The senor detects the target with probability PD ∈ [0, 1].

6.5. The target originated measurement falls within the track’s validation region with prob-

ability PG provided that the track represents the target. The probability PG is discussed

in Section 6.3.

74

Figure 6.2: A depiction of a single iteration of the LG-IPDAF. The large green car representsthe target, the smaller blue cars represent the tracks, the black dots represent measurements,the gray ellipse represents the validation region and the small red cars represent the split stateestimates. The top left image depicts the prediction and data association steps during which newmeasurements are received, the track is propagated in time, the validation region is constructed,and four measurements are associated to the track. The top right image shows the first part of theupdate step where the state estimate is split into five states; one for each associated measurementand one for the null hypothesis that none of the measurements are target originated. The bottomleft shows the rest of the update step where the split state estimates are fused together to form asingle state estimate. The bottom right image shows the last step which initializes new tracks fromnon-track-associated measurements.

6.6. The false measurements are independently identically distributed (iid) with uniform

spatial density λ.

6.7. The expected number of false measurements per sensor scan is modeled using the

density function µF . In this chapter, µF denotes a Poisson distribution defined as

µF (ϕ) = expλVk(λVk)ϕϕ!

, (6.2)

where Vk is the volume of the validation region defined in Section 6.3, λVk is the

expected number of false measurements, and ϕ is the number of false measurements.

6.8. The past information about a track is summarized as(xk−|k− , Pk−|k− , ϵk−|k−

)where

xk−|k− , Pk−|k− , and ϵk−|k− denote the track’s state estimate, error covariance and track

75

likelihood at time previous time and conditioned on the previous track-associated mea-

surements.

6.2 Prediction Step

The prediction step of the LG-IPADF is similar to the prediction step of the indirect

Kalman filter with the addition of propagating the track likelihood [79], [84].

To derive the prediction step, we need to construct the Gaussian approximation of

the probability of the track’s current state conditioned on the track’s previous state, the

track representing the target and the previous measurements denoted p (xk | xk− , ϵk− , Z0:k−).

Lemma 6.2.1. We suppose directly Assumptions 6.1. and 6.8. and that the target’s state

evolves according to the state transition function defined in equation (4.2), then the Gaussian

approximation of p (xk | xk− , ϵk− , Z0:k−) is

p (xk | xk− , ϵk− , Z0:k−) ≈ η exp

(

−1

2

(xk|k− − Fk−:kxk−|k−

)⊤Qk−:k


))

,

(6.3)

where

xk|k− = LogGxI

(

x−1k|k−xk

)

, (6.4)

xk|k− = f(xk−|k− , 0, tk−:k

)(6.5)

Qk−:k = Gk−:kQ (tk−:k)G⊤k−:k, (6.6)

f is the state transition function defined in equation (4.2), and the Jacobians Fk−:k and Gk−:k

are defined in equation (4.5) and evaluated at the point ζfk−:k=(xk−|k− , 0, tk−:k

).

Proof. The probability p (xk | xk− , ϵk− , Z0:k−) is approximated as Gaussian using the affinized

system defined in (4.6) and computing the first and second moments of the propagated error

76

state. From equation (4.6) the affinized state transition function is

xk|k− ≈ f(xk−|k− , 0, tk−:k

)

︸︷︷︸

xk|k−

ExpGxI

Fk−:kxk−|k− +Gk−:kqk−:k︸︷︷︸

xk|k−

, (6.7)

where xk|k− , xk|k− = f(xk−|k− , 0, tk−:k

), xk|k− = Fk−:kxk−|k− + Gk−:kqk are respectively the

predicted state, predicted state estimate and predicted error state all conditioned on the

previous measurements and the track’s previous state.

Since the probability p (xk | xk− , ϵk− , Z0:k−) is conditioned on a specific value of the

previous state, the previous error state xk−|k− is not a random variable; thus, the first and

second moments of the propagated error state are

E[xk|k−

]= E

[Fk−:kxk−|k− +Gk−:kqk−:k

]

= Fk−:kxk−|k− ,

cov[xk|k−

]= Gk−:kQ (tk−:k)G

⊤k−:k = Qk−:k,

and the approximate Gaussian PDF is given in equation (6.3).

Lemma 6.2.2. We suppose the following: 1) the probability of the track’s previous state

conditioned on the track representing the target and previous measurements is known as

stated by Assumption 6.8.. We denote and define this probability as

p (xk− | ϵk− , Z0:k−) = η exp

(

−1

2x⊤k−|k−P

−1k−|k− xk−|k−

)

, (6.8)

where xk−|k− = LogGxI

(

x−1k−|k−xk−

)

. 2) The target’s state evolves according to the state

transition function defined in equation (4.2) as stated by Assumption 6.1.. 3) The probability

p (xk | xk− , ϵk− , Z0:k−) is known and defined in Lemma 6.2.1. Then the propagation of the

track’s state estimate and error covariance are

xk|k− = f(xk−|k− , 0, tk−:k

), (6.9a)

Pk|k− = Fk−:kPk−|k−F⊤k−:k +Gk−:kQ (tk−:k)G

⊤k−:k (6.9b)

77

and the probability of the track’s current state conditioned on the track representing the target

and the previous measurements is

p (xk | ϵk− , Z0:k−) = η

(

−1

2x⊤k|k−P

−1k|k− xk|k−

)

(6.10)


(

x−1k|k−xk

)

, xk−|k− and Pk−|k− denote the error state, state estimate and

error covariance at time tk− conditioned on the measurements Z0:k−, and the Jacobians Fk−:k

and Gk−:k are defined in (4.5) evaluated at the point ζfk−:k=(xk−|k− , 0, tk−:k

).

The proof of Lemma 6.2.2 is given in Appendix A.1.

The track likelihood is modeled using a Markov process as described in [47], and its

propagation is

p (ϵk | Z0:k−) = σ (tk−:k) p (ϵk− | Z0:k−) ,

where the probability σ (tk−:k) ∈ [0, 1] is chosen to represent how the track likelihood can

change in time due the target being occluded, leaving the sensor surveillance region, etc.

6.3 Data Association

In this section we derive the data association algorithm that associates new measure-

ments to tracks. Let ψ denote a Bernoulli random variable that the measurement originated

from the target. The estimated measurement of a track with state estimate xk|k− is

zk ≜ h(xk|k− , 0

), (6.11)

where h is the generic observation function defined in equation (4.1).

The validation region is a volume in measurement space centered around the track’s

estimated measurement. The volume is selected such that a target-originated measurement

has probability PG to fall within the track’s validation region provided that the track repre-

sents the target. A measurement that falls within the validation region of a track is called a

validated measurement and is associated to the track; otherwise, the measurement is given

78

to a cluster as discussed in Chapter 5. Computation of the validation region is complicated

by the fact that the measurements and the target’s state are elements of Lie groups and do

not have a vector space structure that the traditional validation region is defined on.

Lemma 6.3.1. Let zk be an arbitrary measurement and let ψ be a Bernoulli random vari-

able that indicates that the measurement is target-originated. We suppose directly that a

target-originated measurement is related to the target’s state according to the observation

function defined in equation (4.1b), then the Gaussian approximation of the probability of

the measurement zk provided that it is target-originated and conditioned on the track’s state,

the track representing the target and the previous measurements is denoted and defined as

p (zk | ψ, xk, ϵk− , Z0:k−) ≈ η exp

(

−1

2

(zk −Hkxk|k−

)⊤Rk

(zk −Hkxk|k−

))

, (6.12)

where

zk = LogGsI

(z−1k zk

)(6.13)

zk = h(xk|k− , 0

)(6.14)

Rk = VkRV⊤k , (6.15)

xk = xk|k−ExpGxI

(xk|k−

), h is the observation function defined in equation (4.1b), R is the

measurement noise covariance, and the Jacobians Hk and Vk are defined in equation (4.5)

and evaluated at the point ζkk =(xk|k− , 0

).

The proof of Lemma 6.3.1 is similar to the provided proof of Lemma 6.2.1 and will

not be given.

Lemma 6.3.2. We suppose directly that the probabilities p (zk | ψ, xk, ϵk− , Z0:k−) defined in

equation (6.12) and p (xk | ϵk− , Z0:k−) defined in equation (6.10) are known, and that a target

originate measurement zk is related to the target’s state by the observation function defined

in equation (4.1), then the Gaussian approximation of the probability of the measurement zk

conditioned on the previous measurements and it being target-originated is

p (zk | ψ, ϵk− , Z0:k−) ≈ η exp

(

−1

2ν⊤k S

−1k νk

)

(6.16)

79

where

zk = h(xk|k− , 0

)(6.17a)

νk = LogGsI

(z−1k zk

), (6.17b)

Sk = VkRV⊤k +HkPk|k−H

⊤k , (6.17c)

zk is the estimated measurement, xk|k− is the state estimate conditioned on the previous

measurements, R is the measurement noise covariance, Pk|k− is the track’s error covari-

ance conditioned on the previous measurements, Sk is the innovation covariance, and the

Jacobians Hk and Vk are defined in (4.5) and evaluated at the point ζhk =(xk|k− , 0

).

The proof of Lemma 6.3.2 is similar to the proof of Lemma 6.2.2 found in Ap-

pendix A.1 and will not be provided. There are similar equations to equation 6.17b through-

out this dissertation. This equation finds the tangent vector corresponding to the geodesic

that starts at zk and ends at zk where zk is the estimated measurement. We find the tangent

vector in order to calculate the probability of the measurement zk since Gaussian distri-

butions are defined on the tangent space, and p (zk | ψ, ϵk− , Z0:k−) = p (νk | ψ, ϵk− , Z0:k−) as

discussed in Subsection 3.4.6.

Using equations (6.16) and (6.17), we define the metric dV : Gs ×Gs → R as

dV (zk, zk) = ν⊤k S−1k νk, (6.18)

where the innovation covariance is used to normalize the metric; thus, the metric dV is the

sum of m squared Gaussian random variables where m is the dimension of the measurement

space, and the values of the metric are distributed according to a chi-square distribution

with m degrees of freedom.

The validation region is defined as the set

val (zk, τG) = z ∈ Gs | dV (z, zk) ≤ τG ,

where the parameter τG is called the gate threshold.

80

Following standard procedure with the Kalman filter, the volume of the validation

region is defined as

Vk = cm |τGSk|1/2 , (6.19)

where cm is the volume of the unit hypersphere of dimension m calculated as

cm =πm/2

Γ (m/2 + 1)

with Γ denoting the gamma function [74]. It is worth noting that the volume of the validation

region is dependent on the error covariance through the innovation covariance Sk. Therefore,

the validation region contains information on the quality of the state estimate. This concept

will be used in Section 6.4.

The gate probability is

PG =

∫

Vk

p (zk | ψ, xk, ϵk, Z0:k−) , (6.20)

and is the value of the chi-square cumulative distribution function (CDF) with parameter

τG [74].

Let θk denote a Bernoulli random variable that the measurement is target originated

and inside the validation region. Using (6.16) and (6.20), the probability of a measurement,

conditioned on the measurement being inside the validation region and target-originated, on

the track representing the target and the previous measurements is

p (zk | θk, ϵk, Z0:k−) = P−1G p (zk | ψ, ϵk, Z0:k−) . (6.21)

The data association step of the LG-IPDAF can be extended to multiple targets and

tracks by assigning a measurement to every track whose validation region the measurement

falls in without taking into account joint associations. This is done by copying the mea-

surement for every track it is associated with and giving each track the measurement is

associated with a copy of the measurement, and treating track associated measurements of

one track independently of track associated measurements of another track.

81

6.4 Update Step

The LG-IPDAF assumes that at most one measurement originates from the target

every sensor scan; thus, given the set of new validated measurements, Zk = zk,jmk

j=1, either

one of the measurements is target originated or none of them are. This leads to different

possibilities. As an example, all of the measurements could be false, the measurement zk,1

could be the target-originated measurement and all others false, zk,2 could be the target-

originated measurement and all other false, etc. These different possibilities are referred to

as association events denoted θk,j where the subscript j > 0 means that the jth validated

measurement is target originated and all others are false, and where j = 0 means that all of

the validated measurements are false. Hence, there are a total of mk + 1 association events.

Lemma 6.4.1. We suppose directly Assumptions 6.1.-6.8. and that Zk = zk,jmk

j=1 is the

set of validated measurements at time tk, then the probability of an association event θk,j

conditioned on the measurements Z0:k and the track representing a target, denoted βk,j ≜

p (θk,j | ϵk, Z0:k), is

βk,j =

Lk,j

1−PDPG+∑mk

i=1 Lk,ij = 1, . . . ,mk

(1−PGPD)

1−PDPG+∑mk

j=i Lk,ij = 0,

(6.22)

where

Lk,j =PDλp (zk,j | ψ, ϵk, Z0:k−) , (6.23)

and p (zk,j | ψ, ϵk, Z0:k−) is defined in equations (6.16) and (6.17).

The proof of Lemma 6.4.1 is in Appendix A.2.

Using the association events, we can update the track’s state estimate and error

covariance conditioned on each association event in order to generate the split tracks. This

update step is similar to the Kalman filter’s update step. The Kalman filter algorithm

updates the state estimate directly using vector space arithmetic; however, since not every

Lie group has a vector space structure, we have to approach the update step differently.

Recall that p (xk | ϵk, Z0:k−) = p (xk | ϵk, Z0:k−) with the relation xk =

xk|k−ExpGxI

(xk|k−

). The error state xk|k− ∼ N

(µk|k− , Pk|k−

)has a vector space structure.

This allows us to update the error state using the a method similar to the Kalman filter. We

82

denote the updated error state and its corresponding mean and error covariance conditioned

on θk,j as x−k|k,j ∼ N(

µ−k|k,j, P

c−

k|k,j

)

. Using the updated error state, the probability of the

track’s state conditioned on the association event θk,j, it representing the target and all the

track associated measurements is p (xk | θk,j, ϵk, Z0:k) = p(x−k | θk,j, ϵk, Z0:k

)with the relation

xk = xk|k−ExpGxI

(

x−k|k

)

.

The updated error state may have a non-zero mean. To reset the error state’s mean

µ−k|k,j to zero we add it to the state estimate xk|k− via the exponential map, and the error

covariance is modified accordingly. We denote the updated and reset error state and its cor-

responding mean and error covariance conditioned on θk,j as xk|k,j ∼ N(

µk|k,j = 0, P ck|k,j

)

.

Using the reset error state, we have p (xk | θk,j, ϵk, Z0:k) = p (xk | θk,j, ϵk, Z0:k) with the rela-

tion xk = xk|kExpGxI

(xk|k). In summary, we update the mean of the error state, and then

reset it to zero by adding it onto the state estimate. This process is the update step of the

indirect Kalman filter and is presented in the following lemma.

Lemma 6.4.2. We suppose directly the following: 1) the Assumptions 6.1.,6.3. and 6.8.. 2)

the probability p (xk | ϵk, Z0:k−) is known and defined in equation (6.10). 3) Zk = zk,jmk

j=1 is

the set of validated measurements at time tk, and 4) the probability p (zk,j | θk,j, xk, ϵk, Z0:k−)

is known and defined in equation (A.9), then the update of the track’s state estimate xk|k−

and error covariance Pk|k− conditioned on the association event θk,j, the measurements Z0:k,

and the track representing the target and provided that j > 0 is

xk|k,j = xk|k−ExpGxI

(

µ−k|k,j

)

(6.24)

P ck|k,j = JGx

r

(

µ−k|k,j

)

P c−

k|kJGxr

(

µ−k|k,j

)⊤

, (6.25)

where

µ−k|k,j = Kkνk,j (6.26a)

νk,j = LogGsI

(

h(xk|k− , 0

)−1zk,j

)

(6.26b)

Kk = Pk|k−H⊤k S

−1k (6.26c)

Sk = HkPk|k−H⊤k + VkRV

⊤k (6.26d)

83

P c−

k|k = (I −KkHk)Pk|k− (6.26e)

and the Jacobians Hk and Vk are defined in (4.5) and evaluated at the point ζhk =(xk|k− , 0

).

Otherwise, j = 0 and

xk|k,j=0 = xk|k− (6.27a)

Pk|k,j=0 = Pk|k− , (6.27b)

since θk,j=0 indicates that all measurements are false; thus, no additional information about

the target was gathered. Equation (6.26a) is the updated mean of the error state before

being reset to zero, equation (6.26b) is the innovation term, equation (6.26c) is the Kalman

gain, equation (6.26d) is the innovation covariance, and equation (6.26e) is the updated error

covariance before the updated mean is reset. Equation (6.24) adds the mean of the error state

to the state estimate essentially resetting the error state’s mean to zero, and equation (6.25)

is the covariance update as a consequence to resetting the error state’s mean to zero.

It follows directly that the Gaussian approximation of the probability of the split track

xk,j conditioned on the track representing the target, the association event θk,j and the mea-

surements Z0:k is

p (xk,j | θk,j, ϵk, Z0:k) = η exp(

x⊤k|k,j(P ck|k

)−1xk|k,j

)

, (6.28)

where xk|k,j = LogGxI

(

x−1k|k,jxk,j

)

.


Using the probabilities of the split tracks defined in equation (6.28) and the probabil-

ities of each association event defined in equation (6.22), the probability of the track’s state

conditioned on it representing the target and all track-associated measurements is calculated

using the theorem of total probability

p (xk | ϵk, Z0:k) =

mk∑

j=0

p (xk,j | θk,j, ϵk, Z0:k) βk,j. (6.29)

84

In essence, the probability of the track’s state is the weighted average of the split track’s

state probabilities where the weight is the probability of the corresponding association event.

The process of splitting tracks could be repeated every time new measurements are received

which would lead to an exponential growth of split tracks. This process would quickly

become computationally and memory expensive. To keep the problem manageable, the split

tracks are fused together using the smoothing property of conditional expectations discussed

in [87].

Normally if the track’s state estimate was expressed in Euclidean space, the validated

measurements would be used to calculate the updated state estimates xk|k,j, and then the

state estimates would be be fused together according to the equation

xk|k =

mk∑

j=0

xk|k,jβk,j. (6.30)

However, this approach does not work with arbitrary Lie groups since not every Lie group

has a vector space structure. Instead we have to solve equation (6.29) indirectly by pos-

ing the problem in the Cartesian algebraic space. This is done by using the relation in

equation (3.14a) to note that

p (xk,j | ϵk, Z0:k) = p(x−k,j | ϵk, Z0:k

), (6.31)

where x−k,j is the error state conditioned on θk,j after update but before its mean is reset to

zero as discussed in Lemma 6.4.2. We use this version of the error state since we can fuse

the means µ−k|k,j, defined in equation (6.26a), together in a similar way as in equation (6.30).

Using the relation in equation (6.31), an equivalent expression to equation (6.29) is

p(

x−k|k | ϵk)

=

mk∑

j=0

p(

x−k|k,j | θk,j, ϵk)

βk,j. (6.32)

In essence, the validated measurements are used to update the mean of the error state

for each split track µ−k|k,j according to Lemma 6.4.2. Since the means are elements of the

Cartesian algebraic space (in the same tangent space), they can be added together using a

85

Figure 6.3: The error state means for each split track before reset µ−k|k,j are fused together using

the association event probabilities βk,j in the tangent space to form the error state mean µ−k|k. The

mean is then used to update the track’s state estimate from xk|k− to xk|k.

weighted average to form a single mean µ−k|k. The mean of the error state is then reset to

zero by adding µ−k|k onto the state estimate xk|k− using the exponential map. This process

is depicted in Fig. 6.3.

Using the smoothing property of conditional expectations, the expected value of the

error state x−k|k is

E[

x−k|k

]

= E[

E[

x−k|k,j|θk,j, ϵk]]

(6.33a)

=

mk∑

j=0

E[

x−k|k,j|θk,j, ϵk]

βk,j (6.33b)

=

mk∑

j=0

µ−k|k,jβk,j (6.33c)

= µ−k|k (6.33d)

where βk,j is defined in Lemma 6.4.1 and µ−k|k,j is defined in Lemma 6.4.2.

86

Using the updated mean of the error state, the covariance is

cov[

x−k|k

]

= E

[(

x−k|k − µ−k|k

)(


)⊤]

.

From this point the derivation follows from [74] and results in

P−k|k = βk,0Pk|k− + (1− βk,0)P

c−

k|k + Pk|k

where

P c−

k|k = (I −KkHk)Pk|k− (6.34a)

Kk = Pk|k−H⊤k S

−1k (6.34b)


⊤k (6.34c)

Pk|k = Kk

(mk∑

j=1

βk,jνk,jν⊤k,j − νkν

⊤k

)

K⊤k (6.34d)

νk,j = LogGsI

(

h(xk|k− , 0

)−1zk,j

)

(6.34e)

νk =

mk∑

j=1

βk,jνk,j (6.34f)

µ−k|k = Kkνk, (6.34g)

Pk|k− is the error covariance before the update step, P c−

k|k is the error covariance of the error

state x−k|k,j derived in Lemma 6.4.2, Pk|k is the covariance that captures the “spread of

the means”, and the Jacobians Hk and Vk are defined in (4.5) and evaluated at the point

ζhk =(xk|k− , 0

).

To reset the error state’s mean to zero, the mean µ−k|k is added onto the state estimate

xk|k− by forming a geodesic from xk|k− to xk|k in the direction of µ−k|k as depicted in Fig 6.3.

To derive this process, let x−k|k = µ−k|k+ak|k where ak|k ∼ N

(

0, P c−k|k

)

contains the uncertainty

in the error state, then under the assumption that ak|k is small and using the property of

87

the right Jacobian defined in equation (3.7) we add µ−k|k onto xk|k− as follows:

xk|k = xk|k−ExpGxI

(

µ−k|k + ak|k

)

(6.35a)

≈ xk|k−ExpGxI

(

µ−k|k

)

︸︷︷︸

xk|k

ExpGxI

JGxr

(

µ−k|k

)

ak|k︸︷︷︸

xk|k

, (6.35b)

where xk|k = xk|k−ExpGxI

(

µ−k|k

)

is the updated state estimate, and xk|k = JGxr

(

µ−k|k

)

ak|k is

the updated and reset error state.

The error covariance of the error state xk|k is

cov(xk|k)= cov

(

JGxr

(

µ−k|k

)

ak|k

)

= JGxr

(

µ−k|k

)

cov(ak|k)JGxr

(

µ−k|k

)⊤

= JGxr

(

µ−k|k

)

P−k|kJ

Gxr

(

µ−k|k

)⊤

= Pk|k;

therefore, xk|k ∼ N(µk|k = 0, Pk|k

).

Lemma 6.4.3. We suppose directly Assumptions 6.1.-6.8., and that Zk is the set of validated

measurements, then the update for the track likelihood is

p (ϵk | Z0:k) =1− αk

1− αkp (ϵk | Z0:k−)p (ϵk | Z0:k−) , (6.36)

where

αk =

PDPG if mk = 0

PDPG −∑mk

j=1 Lk,j else.

. (6.37)

and Lk,j is defined in equation (6.23).


Using the track likelihood, a track is either rejected after the update step if the track

likelihood is below the threshold τRT , confirmed to represent a target if the track likelihood is

88

above the threshold τCT , or neither rejected nor confirmed until more information is gathered

with new measurements. In the case of tracking a single target, the track with the best track

likelihood is assumed to represent the target. In the case of tracking multiple targets, every

confirmed track is assumed to represent the target.

6.4.1 Summary

Theorem 6.4.1. Given Assumptions 6.1.-6.8., let Zk = zk,jmk

j=1 denote the set of mk

validated measurements at time k and xk|k−, Pk|k− and p (ϵk | Z0:k−) denote the track’s state

estimate, error covariance and track likelihood at time tk and conditioned on the previous

measurements, then the track’s updated state estimate, error covariance and track likelihood

are


(

µ−k|k

)

(6.38)

Pk|k = JGxr

(

µ−k|k

)

P−k|kJ

Gxr

(

µ−k|k

)⊤

(6.39)

p (ϵk | Z0:k) =1− αk

1− αkp (ϵk | Z0:k−)p (ϵk | Z0:k−) (6.40)

where

µ−k|k = Kkνk (6.41a)

νk =

mk∑

j=1

βk,jνk,j (6.41b)

νk,j = LogGsI

(

h(xk|k− , 0

)−1zk,j

)

(6.41c)

Kk = Pk|k−H⊤k S

−1k (6.41d)


⊤k (6.41e)

P c−

k|k = (I −KkHk)Pk|k− (6.41f)

Pk|k = Kk

(mk∑

j=1

βk,jνk,jν⊤k,j − νkν

⊤k

)

K⊤k (6.41g)

P−k|k = βk,0Pk|k− + (1− βk,0)P

c−

k|k + Pk|k, (6.41h)

89

βk,j is defined in equation (6.22), αk is defined in equation (6.37), νk,j denotes the innovation

term, Sk the innovation covariance, Kk the Kalman gain, P c−

k|k the error covariance after

update but before reset, Pk|k the covariance associated with the spread of the innovation

terms, R the measurement noise covariance, h the observation function, and Hk and Vk are

the Jacobians defined in (4.5) and evaluated at the point ζhk =(xk|k− , 0

).

The proof of the theorem follows directly from Lemmas 4.1.1, 6.3.2, 6.4.1, 6.4.2, and

6.4.3

6.5 Experiment and Results

We demonstrate the LG-IPDAF in simulation by tracking a car restricted to a flat

ground plane. This restriction allows us to model the car using the Lie group SE (2). For

a description of the Lie group SE (2) see Appendix C.4. In the simulation we assume

that the target’s pose is observable by the sensor with measurement noise covariance R =

diag (1e−1, 1e−1, 1e−2) and a surveillance region of 140 by 140 meters. The observation

function for this experiment is

zk = h (xk, rk) (6.42)

= gkExpSE(2)I (rk) , (6.43)

where xk = (gk, vk) ∈ Gx= SE (2)×RSE(2) is the target’s state, rk ∼ N (0, R) ∈ RSE(2) is

the measurement noise, and zk ∈ GSE(2) is the measurement. The complete system model is

defined by equations (4.1), (4.2), and (6.42).

The Jacobians of the observation function evaluated at the point ζhk = (xk, 0) are

Hk=∂h

∂x

∣∣∣∣ζhk

= I (6.44)

Vk=∂h

∂r

∣∣∣∣ζhk

= I. (6.45)

The derivation of these Jacobians are easily derived using the method in Section 4.1. The

Jacobians of the state transition function are derived in Section 4.1.

90

In the experiment, the car is given three trajectories: a circular trajectory with a 10

meter radius, a meandering trajectory consisting of curvy and straight lines, and a straight

trajectory. In the first two trajectories the car is given a translational velocity of 10 meters

per seconds which is about 22 mile per hour. In the third trajectory the car is given a

translational velocity of 7 meters per second.

We set the following LG-IPDAF parameters: the spatial density of false measure-

ments to λ = 0.01 implying that there are 196 false measurements every sensor scan, the

car’s probability of detection to PD = 0.9, gate probability to PG = 0.9, track likelihood

confirmation threshold to τCT = 0.7, and track likelihood rejection threshold to τRT = 0.1.

In this experiment we do not use the track initialization scheme presented in Chapter 8

since this scheme was not developed at the time of the experiment. Instead, we initialize

new tracks from two unassociated neighboring measurements from different times with an

initial error covariance of P = 5 ∗ I and track likelihood of ϵ = 0.2. The process noise of the

initialized tracks is Q = diag (1, 1, 0.1, 1, 1, 0.1) dt, where dt is the time step of the simulation.

The process noise is large enough to account for not knowning the car’s acceleration, but

small enough to prevent many false measurements from being associated with the track.

In our experiment we compare tracking on SE (2) using a constant velocity model to

tracking using a linear, time-invariant, constant-velocity (LTI-CV) model. For each model

and trajectory, we conduct a Monte-Carlo simulation consisting of 100 iterations 30 seconds

long and compute three statistical measures: track probability of detection (TPD), average

Euclidean error (AEE) in position, and average confirmation time (ACT). Track probability

of detection is the probability that the target is being tracked [88]. The average Euclidean

error, is the confirmed track’s average error in position. The average confirmation time is

the average amount of time until the first track is confirmed.

A depiction of the simulation for the circular, meandering and linear trajectories are

shown in Figs. 6.4, 6.5, 6.6. The target’s trajectory is shown as a black line, and the target’s

final pose is represented as the large black arrowhead. The confirmed tracks’ trajectory is

shown as a piece-wise, green line, and the confirmed tracks’ final pose is represented as a

large green arrowhead. The target’s measurements from the initial to the final time are

represented as magenta dots. The red arrowheads represent different unconfirmed and non-

91

rejected track’s at the final time step of the simulation. The blue asterisks represent the false

measurements received at the last time step of the simulation. Note that during non-linear

motions, the LTI-CV model struggles to track the target. This is shown by there not being a

green trajectory (confirmed track’s trajectory) during parts of the target’s non-linear motion,

or the green trajectory drifting from the black trajectory (target’s trajectory) .

-20 -10 0 10 20

X Pos (m)

-20

-10

0

10

20

Y P

os (

m)

(a) LTI-CV model

-20 -10 0 10 20

X Pos (m)

-20

-10

0

10

20

Y P

os (

m)

(b) SE(2)-CV model

Figure 6.4: Plots of the zoomed in circular trajectories for the LTI-CV and SE(2)-CV models.

-50 0 50

X Pos (m)

-60

-40

-20

0

20

40

60

Y P

os (

m)

(a) LTI-CV model

-50 0 50

X Pos (m)

-60

-40

-20

0

20

40

60

Y P

os (

m)

(b) SE(2)-CV model

Figure 6.5: Plots of the meandering trajectories for the LTI-CV and SE(2)-CV models.

92

-50 0 50

X Pos (m)

-60

-40

-20

0

20

40

60

Y P

os (

m)

(a) LTI-CV model

-50 0 50

X Pos (m)

-60

-40

-20

0

20

40

60

Y P

os (

m)

(b) SE(2)-CV model

Figure 6.6: Plots of linear trajectories for the LTI-CV and SE(2)-CV models.

Table 6.1: Statistical measures from the experiment

Circular Meandering LinearTPD AEE ACT TPD AEE ACT TPD AEE ACT

SE(2)-CV 0.96 0.29 0.90 0.96 0.29 0.87 0.96 0.29 0.74LTI-CV 0.54 1.32 0.49 0.88 0.45 0.44 0.97 0.34 0.42

The statistical measures from the experiment are in Table 6.1. As stated in the

table, the constant-velocity, target model on SE (2) tracked the target significantly better

for the circular and meandering trajectories, and about the same as the LTI model for the

linear trajectory. This is because the LTI model struggles to track non-linear motion. In

Subfigure 6.5a you can see that during the straight segments, the LTI model tracks the

target well, but looses the target as it turns. Table 6.1 shows that the target model on

SE (2) had less error than the LTI model; however, on average the LTI model had faster

track confirmation time.

6.6 Conclusion

In this chapter we have shown how to adapt the IPDAF to connected, geodesically

complete, Lie groups in detail. In our experiment, we showed that a constant-velocity target

model on SE (2) is significantly better at tracking non-linear motion than the constant-

93

velocity, LTI model in dense clutter. This is because the LTI model expresses only linear

motion and cannot predict non-linear motion such as curvy and circular trajectories. Since

the SE (2) model expresses non-linear and linear motion, it is able to predict both types of

motion and track them well. We have also shown that the LG-IPDAF is capable of quickly

rejecting and confirming tracks with high fidelity.

94

CHAPTER 7. CENTRALIZED MEASUREMENT FUSION

The Lie group integrated probabilistic data association filter (LG-IPDAF) is designed

to track targets using a single sensor. When tracking with multiple sensors, some or all of the

sensors may produce measurements at the same time. When this occurs, we must consider

how to properly integrate the track-associated measurements from the different sensors to

improve the state estimate, error covariance, and track likelihood of each track. One common

approach that we use is centralized measurement fusion (CMF).

In centralized measurement fusion, the sensors send their data to a single processing

node that runs the MTT algorithm (in this case G-MTT). The MTT algorithm uses the

measurements to update the tracks and then sends confirmed tracks to other consumer

systems. This architecture is depicted in Fig. 7.1. This approach is optimal in the sense

that all of the sensor data from all of the sensors is being used to improve the track, and is

commonly used in small sensor networks since the communication bandwidth is high enough

to send the sensor data from every sensor to the central processor.

Two common approaches to CMF are parallel centralized measurement fusion and

sequential centralized measurement fusion. In parallel fusion, when multiple sensors produce

measurements at the same time, all of the measurements from these sensors are taken into

account simultaneously to update the tracks in a single update. In sequential fusion, when

multiple sensors produce measurements at the same time the measurements are incorporated

sequentially using the standard single-sensor tracking algorithm (in this case the LG-IPDAF).

This is done by propagating the track forward in time to the time stamp of the measurements,

and then updating the tracks using the measurement from the first sensor only, followed by

the second, third, and so on until all of the track-associated measurements have been used

to update the tracks. These two approaches are depicted in Fig. 7.2.

95

Centralized Fusion

Figure 7.1: A depiction of centralized fusion. The circles represent the sensors, the trianglesrepresent the consumer systems, the square represent the processing node, the arrow shows thedirection data travels, Zsℓ denotes the data from sensor sℓ, and T N1 denotes the tracks from theprocessing node.

MultisensorData

Associationand Filtering

SingleSensorData

Associationand

Filtering

SingleSensorData

Associationand

Filtering

Parallel Centralized Fusion Sequential Centralized Fusion

Figure 7.2: A depiction of the parallel and sequential fusion. The current time and previous timeare denoted tk and tk− . The data from sensor sℓ is denoted Z

sℓk , the current jth track conditioned

on previous measurements is T jk|k−

, and the current track conditioned on the previous and new

measurements up to sensor sℓ is T jk|k,sℓ

.

The LG-IPDAF filter is based on the probabilistic data association filter (PDAF).

The PDAF is extended to perform parallel centralized fusion in [89]. Similar algorithms

to the PDAF, the joint probabilistic data association filter (JPDAF) and the interactive

multiple model JPDAF using linear motion models are extended to perform parallel fusion

in [90]–[92]. In addition, the parallel fusion algorithm is compared against the sequential

96

fusion algorithm, and the authors conclude that the sequential fusion approach tracks better

than the parallel fusion approach in dense clutter. This is an interesting result since in [93],

it is stated that for the Kalman filter the sequential and parallel approach are equivalent,

and the Kalman filter is similar to the JPDAF.

The integrated probabilistic data association filter (IPDAF) was extended to perform

parallel centralized fusion and distributed fusion in [94] for two sensors. See [95], [96] for an

overview of distributed sensor fusion. The IPDAF is extended to Lie groups in Chapter 6,

and in this chapter we extend the LG-IPDAF to work with parallel fusion. We refer to

this extension as multi-sensor LG-IPDAF (MS-LG-IPDAF). We also compare the parallel

fusion approach to the sequential fusion approach in simulation by tracking multirotors on

SE (2) using a network of radar sensors. Our comparison differs from [90]–[92], since we do

not assume that the tracks are already initialized, instead we rely on the tracking algorithm

presented in Chapter 8 to initialize the tracks.

The rest of this chapter is organized as follows. In Section 7.1 we provide an overview

of the MS-LG-IPDAF algorithm. In Section 7.2 we present the augmented system model used

to perform parallel centralized measurement fusion. In Section 7.3 we derive the update step

for the MS-LG-IPDAF algorithm. In Section 7.4 we present the experiment and experimental

results. Lastly, in Section 7.5 we conclude.

7.1 Parallel Centralized Measurement Fusion with the LG-IPDAF Overview

The MS-LG-IPDAF is the extension of the LG-IPDAF to work with a network of

sensors using parallel centralized measurement fusion and relies heavily on the assumptions

and theory presented in Chapter 6. We present the MS-LG-IPDAF under the assumption

of a single target and relax this constraint later to accommodate multiple targets.

Let S =1, 2, · · · , NS

denote the indexing set of sensors, and sℓ ∈ S denote the ℓth

sensor. A sensor scan occurs when one or more sensors observe their respective measurement

space and produces measurements. We denote the indexing set of sensors that produced mea-

surements during the sensor scan at time tk as Sk =a1, a2, . . . , aN | ai ∈ N and ai ≤ NS

⊆

S where N ≤ NS is the number of sensors that produced measurements during the sensor

97

scan. We denote sensor aℓ from Sk using bold font as sℓ. Note that sℓ may not be the same

sensor as sℓ.

Let msℓk denote the number of track-associated measurements from sensor sℓ at time

tk and Msℓk = 1, 2, · · · ,msℓ

k denote the indexing set of track-associated measurements. The

set of track-associated measurements from sensor sℓ at time tk is Zsℓk =

zsℓk,jm

sℓk

j=1, where

zsℓk,j is the jth associated measurement from sensor sℓ. The direct product of the associated

measurements from every sensor that produced measurements at time tk is

ZSk = Zs1

k × Zs2k × · · · × ZsN

k , (7.1)

and the set of all associated measurements from the initial time to the current time tk is

ZS0:k =

ZSi

k

i=0.

The MS-LG-IPDAF assumes that a sensor produces at most one target-originated

measurement when it produces measurements. Therefore, for sensor sℓ ∈ Sk, given the set of

new track-associated measurements Zsℓk either one of the measurements is target originated

or none of them are. These possibilities are referred to as association events denoted θsℓk,αℓ

with the subscript αℓ > 0 meaning that the track-associated measurement zsℓk,αℓis the target

originated measurement and all other measurements from sensor sℓ are false, and with αℓ = 0

meaning that all of the track-associated measurements from sensor sℓ are false. This indicates

that for sensor sℓ there are a total of msℓk + 1 association events.

For example, let Zsℓk =

zsℓk,1, z

sℓk,2

, then the three possible association events are:

zsℓk,1 is the target originated measurement and zsℓk,2 is a false measurement, zsℓk,2 is the target

originated measurement and zsℓk,1 is a false measurement, or both measurements are false.

Using the association events from the sensors in Sk, we construct all mutually exclusive

joint association events. Let

A =(α1, α2, . . . , αN) ∈ ZN | 0 ≤ αℓ ≤ msℓ

k + 1, (7.2)

98

denote the indexing set of all mutually exclusive joint association events, then for α ∈ A,

the corresponding joint association event is

θk,α = θs1k,α1∩ θs2k,α2

∩ · · · ∩ θsNk,αN. (7.3)

We use the same subsubscripts for the sensor and measurement to facilitate notation. We

let the joint association event θk,α=0 indicate the event that all of the measurements from all

of the sensors are false measurements.

Let βk,α denote the probability of the joint association event θk,α and βsℓk,αℓ

denote the

probability of an individual association event θsℓαℓ. As discussed in [94] the association events

θsℓk,αℓare not independent. This means that the probability of the joint association event βk,α

is not the product of the probability of each association event βsℓk,αℓ

. However, it is assumed

in [97] and [90] that the association events θsℓk,αℓare independent. It is shown by these authors

that this assumption works well. We use the same assumption in our algorithm.

The MS-LG-IPDAF makes the following assumptions:

7.1. There exists a single target that can be observed by multiple sensors and modeled by

a constant-velocity, white-noise driven target model defined in equation (4.1).

7.2. A sensor scan occurs whenever one or more sensor observes their respective sensor

surveillance regions. Each sensor sℓ ∈ Sℓ that produces measurements at the sensor

scan associated with time tk producesmsℓk validated measurements denoted

zsℓk,jmsℓ

j=1=

Zsℓk .

7.3. At every sensor scan, each sensor produces at most one target originated measurement

and all others are false (i.e. non target-originated measurements).

7.4. The senor sℓ detects the target with probability P sℓD ∈ [0, 1].

7.5. The target-originated measurement from sensor sℓ is associated with a track that rep-

resents the target with probability P sℓG . The probability P sℓ

G is discussed in Section 6.3.

7.6. The false measurements from sensor sℓ are independently identically distributed (iid)

with uniform spatial density λsℓ .

99

7.7. The expected number of false measurements every time sensor sℓ produces measure-

ments is modeled using the density function µsℓF . In this chapter, µsℓF denotes a Poisson

distribution defined as

µsℓF (ϕ) = exp (λsℓVsℓk )(λsℓVsℓk )ϕ

ϕ!, (7.4)

where Vsℓk is the volume of the validation region defined in Section 6.3, λsℓVsℓk is the

expected number of false measurements, and ϕ is the number of false measurements.

7.8. The past information about a track is summarized as(xk−|k− , Pk−|k− , ϵk−|k−

)where

xk−|k− , Pk−|k− , and ϵk−|k− denote the track’s state estimate, error covariance and track

likelihood at the previous time and conditioned on the previous track-associated mea-

surements.

7.9. The sensors are independent.

7.10. The association events θsℓk,αℓare independent.

7.11. The confirmed track with the highest track likelihood represents the target.

When new measurements are received, MS-LG-IPDAF performs three steps: the track

prediction step, the data association step and the update step. In the track prediction step,

the tracks are propagated forward in time using the LG-IPDAF prediction step presented

in Section 6.2. In the data association step, the measurements are associated to tracks as

discussed in the LG-IPDAF data association algorithm in Section 6.3 with the modification

that a validation region is constructed for each sensor. In the update step, the MS-LG-IPDAF

uses an augmented system model presented in Section 7.2 to simultaneously incorporate all

of the measurements to update the tracks using the method presented in Section 7.3.

7.2 Augmented System Model

The augmented system model uses the same state transition function defined in equa-

tion (4.2) with an augmented observation function. Using the system model defined in

equation (4.1), we create an augmented observation function for each joint association event

defined in equation (7.3). Let Sα = sa, sb, . . . ⊆ Sk denote the subset of sensors that

100

produce target originated measurements according to the joint association event θk,α. The

corresponding augmented observation function hSα is defined as

zk,α = hSα(xk, r

Sαk

)= (hsa (xk, r

sak ) , hsb (xk, r

sbk ) , . . .) , (7.5)

where hsℓ is the observation function for sensor sℓ, rsℓk ∼ N (0, Rsℓ) is the measurement

noise for sensor sℓ, xk is the target’s state, zk,α =(zsak,αa

, zsbk,αb, . . .

)∈ GSα

= Gsa ×Gsb

× . . . is

the augmented measurement where zsℓk,αℓis the target-originated measurement from sensor sℓ

according to the joint association event θk,α, and rSαk = (rsak , r

sbk , . . .) ∈ RSα

= Rsa×Rsb

×· · ·is the augmented measurement noise with covariance

RSα = diag (Rsa , Rsb , . . .) , (7.6)

where we used Assumption 7.9. that the sensors are independent to form the block diagonal

measurement noise covariance.

The Lie group of the augmented measurement is denotedGSα and is the direct product

of the sub Lie groups Gsℓ; thus, all of its properties are inherited from the sub Lie groups as

discussed in Section 3.4.4. Using Section 3.4.4, the exponential and logarithm maps of GSα

are

ExpGSαI

(rSαk)=(

ExpGsaI (rsak ) ,Exp

GsbI (rsbk ) , . . .

)

(7.7)

LogGSαI (zk,α) =

(

LogGsaI

(zsak,αa

),Log

GsbI

(zsbk,αb

), . . .

)

. (7.8)

Under the assumption that the sensors are independent, the Jacobians of the aug-

mented observation function are

HSαk

=∂hSα

∂x

∣∣∣∣ζhSαk

=[

Hsa⊤

k , Hsb⊤

k , . . .]⊤

(7.9a)

V Sαk

=∂hSα

∂rSα

∣∣∣∣ζhSαk

= diag (V sak , V sb

k , . . .) , (7.9b)

101

where ζhSαk= (xk, 0) ∈ Gx ×RSα , H

sℓk is the Jacobian of the observation function hsℓ with

respect to the target’s state and evaluated at the point ζhsℓk= (xk, 0), and V

sℓk is the Jacobian

of the observation function hsℓ with respect to the measurement noise rsℓk and evaluated at

the point ζhsℓk= (xk, 0).

The augmented observation function can be thought of as combining all of the sensors

into a single sensor for each joint association event. With that idea in mind, the derivation of

the update step for the MS-LG-IPDAF becomes very similar as the derivation of the update

step for the LG-IPDAF.

7.3 MS-LG-IPDAF Update Step

The prediction step provides us the probability of the track’s state conditioned on

the track representing the target and the previous measurements ZS0:k− defined as

p(xk | ϵk, ZS

0:k−

)= η exp

(

−1

2x⊤k|k−P

−1k|k− xk|k−

)

, (7.10)


(

x−1k|k−xk

)

is the error state, xk|k− is the state estimate and Pk|k− is the

error covariance at time tk and conditioned on the previous measurements.

The probability of the augmented measurement zk,α conditioned on θk,α, the track’s

state, the track representing the target and the previous measurements ZS0:k− is denoted as

p(zk,α | θk,α, xk, ϵk, ZS

0:k−

)and is approximated as Gaussian using the following lemma.

Lemma 7.3.1. Let zk,α =(zs1k,α1

, zs2k,α2, . . .

)denote the augmented measurement correspond-

ing to the joint association event θk,α = θs1k,α1∩ θs2k,α2

∩ · · · ∩ θsNk,αNwhere α = 0. We suppose

directly that the sensors are independent, that the association events θsℓαℓin θk,α are indepen-

dent, and that a target-originated measurement is related to the target’s state by the observa-

tion function as stated by Assumptions 7.1., 7.9. and 7.10., then the Gaussian approximation


0:k−

)is


0:k−

)=

[∏

sℓ∈Sα

(P sℓG )−1

]

p(zk,α | ψ, xk, ϵk, ZS

0:k−

)

102

=

[∏

sℓ∈Sα

(P sℓG )−1

]

η exp

(

−1

2

(zk,α −HSα xk|k−

)⊤ (RSα

)−1 (zk,α −HSα xk|k−

))

(7.11)

where

zk,α = LogGSαI

(z−1k,αzk,α

)(7.12a)

zk,α = hSα(xk|k− , 0

)(7.12b)

RSαk = V Sα

k RSα(V Sαk

)⊤, (7.12c)

xk|k− and xk|k− are the track’s state estimate and error state at time tk and conditioned on the

track representing the target and the measurements ZS0:k−. The augmented observation func-

tion hSα and the augmented measurement noise covariance RSα are defined in equations (7.5)

and (7.6). The Jacobians HSαk and V Sα

k are defined in equation (7.9) and evaluated at the

point ζhSαk=(xk|k− , 0

).

Proof. The joint association event θk,α in p(zk,α | θk,α, xk, ϵk, ZS

0:k−

)indicates that all of the

measurement in the augmented measurement zk,α are target originated and inside the val-

idation region. According to the definition of the validation region, a target-originated

measurement from sensor sℓ falls within the validation region with probability P sℓG . Using

the assumption that sensors are independent and the relation in equation (6.21),


0:k−

)=

[∏

sℓ∈Sα

(P sℓG )−1

]

p(zk,α | ψ, xk, ϵk, ZS

0:k−

), (7.13)

where ψ is a Bernoulli variable indicating that all of the measurements in the augmented mea-

surement zk,α are target-originated. The rest of the proof follows directly from Lemma 6.3.1

with the exception that we are using the augmented system model.

The probability of the track’s current state conditioned on the joint association event

θk,α, all validated measurements, and the track representing the target is denoted

p(xk,α | θk,α, ϵk, ZS

0:k

),

103

and is defined in the following lemma.

Lemma 7.3.2. Let zk,α =(zs1k,α1

, zs2k,α2, . . .

)denote the augmented measurement correspond-

ing to the joint association event θk,α. We suppose directly the following: 1) Assump-

tions 7.1.-7.10.. 2) The probability p(zk,α | θk,α, xk, ϵk, ZS

0:k−

)is known and defined in equa-

tion (7.11). 3) The probability p(xk | ϵk, ZS

0:k−

)with state estimate xk|k− and error covariance

Pk|k− is known and defined in equation (7.10), then the maximum a posteriori probability of

the updated track’s state estimate and error covariance conditioned on the track being target

originated, all the measurements, and the joint association event θk,α provided that α = 0

are

xk|k,α = xk|k−ExpGxI

(

µ−k|k,α

)

(7.14)

Pk|k,α = JGxr

(

µ−k|k,α

)

P−k|k,αJ

Gxr

(

µ−k|k,α

)⊤

(7.15)

where

µ−k|k,α = Kk,ανk,α (7.16a)

νk,α = LogGSαI

((zk,α)

−1 zk,α)

(7.16b)


)(7.16c)

Kk,α = P−k|k,α

(HSαk

)⊤ (RSαk

)−1(7.16d)

RSαk = V Sα

k RSα(V Sαk

)⊤(7.16e)

P−k|k,α =

((HSαk

)⊤ (RSα

)−1HSαk + P−1

k|k−

)−1

, (7.16f)

µ−k|k,α is the updated error state mean before begin added to the state estimate xk|k− and reset

to zero in equation (7.14), νk,α is the innovation term, Kk,α is the Kalman gain, P−k|k,α is the

updated error covariance before the error state mean is reset to zero, hSα is the augmented

observation function defined in equation (7.5), RSα is the augmented measurement noise

covariance defined in equation (7.6), and HSαk , V Sα

k are the Jacobians of the augmented

observation function defined in equation (7.3) and evaluated at the point ζhSαk=(xk|k− , 0

).

104

In the case that θk,α=0, then all of the track-associated measurements are false and

xk|k,α=0 = xk|k− (7.17)

Pk|k,α=0 = Pk|k− , (7.18)

since no new information was provided to improve the estimates.

Consequently, the probability of the track’s state conditioned on the joint association

event θk,α, it representing the target, and all track-associated measurements is


0:k

)= η exp

(

x⊤k|k,αP−1k|k,αxk|k,α

)

where xk|k,α = ExpGxI

(

x−1k|k,αxk

)

.

The proof of Lemma 7.3.2 is in Appendix B.1.

The probability of a joint association event conditioned on the track representing the

target and all validated measurements is denoted

βk,α ≜ p(θk,α | ϵk, ZS

0:k

).

As previously mentioned, we assume that the probability of the joint association event

βk,α is the product of composing association events βsℓk,αℓ

; therefore,

βk,α ≈ βsak,αa

βsbk,αb

· · · , (7.19)

where

βsℓk,αℓ

=

Lsℓk,αℓ

1−PsℓD P

sℓG +

∑msℓk

j=1 Lsℓk,j

αℓ ∈ Msℓk

(1−P sℓG P

sℓD )

1−PsℓD P

sℓG +

∑msℓk

j=1 Lsℓk,j

αℓ = 0,

(7.20)

Lsℓk,j =

P sℓD

λsℓp(zsℓk,j | ψ, ϵk, ZS

0:k−

), (7.21)

and p(zsℓk,j | ψ, ϵk, ZS

0:k−

)is defined in equations (6.16) and (6.17).

105

According to the law of total probability, the probability of the track’s current state

conditioned on it representing the target and all validated measurements is

p(xk | ϵk, ZS

0:k

)=∑

α∈A


0:k

)βk,α. (7.22)

The expected value and covariance of p(xk | ϵk, ZS

0:k

)are xk|k and Pk|k. These are the track’s

updated state estimate and error covariance. Equation (7.22) is approximated using the

smoothing property of conditional expectations to calculate the mean and the corresponding

covariance as discussed in [87].

If the state estimate were in Euclidean space, the smoothing property of conditional

expectation would use the validated measurements to calculate the updated state estimatesxk|k,α

α∈Aaccording to Lemma 7.3.2, and then fuse these state estimates together according

to the equation

xk|k =∑

α∈A

xk|k,αβk,α.

However, this approach does not work with arbitrary Lie groups since not every Lie group

has a vector space structure. Instead we solve equation (7.22) indirectly by posing the

problem in the Cartesian algebraic space. This is done by noting that p(xk,α | θk,α, ϵk, ZS

0:k

)=

p(x−k,α | θk,α, ϵk, ZS

0:k

)where x−k|k,α ∼ N

(

µ−k|k,α, P

−k|k,α

)

is the error state after being updated

by the measurement zk,α and before its mean µ−k|k,α is added to the state estimate xk|k− with

the relation xk,α = xk|k−ExpGxI

(

x−k|k,α

)

. The mean µ−k|k,α and covariance P−

k|k,α are defined

in Lemma 7.3.2. Using this relationship, an equivalent expression to equation (7.22) is

p(x−k | ϵk, ZS

0:k

)=∑

α∈A

p(x−k,α | θk,α, ϵk, ZS

0:k

)βk,α, (7.23)

where x−k|k ∼ N(

µ−k|k, P

−k|k

)

with the relation xk = xk|k−ExpGxI

(

x−k|k

)

. Since the means of

the error states

µ−k|k,α

α∈Aare in the Cartesian algebraic space they can be scaled, summed

together and then added onto the state estimate xk|k− .

Lemma 7.3.3. We suppose directly that the means of the error states

µ−k|k,α

α∈Aand their

covariances

P−k|k,α

α∈Aare known and defined in Lemma 7.3.2, and that the probabilities

106

of the joint association events βk,αα∈A are known and defined in equation (7.20), then

according to the smoothing property of conditional expectations, the mean and the error

covariance of the error state x−k|k are

µ−k|k =

∑

α∈A

µ−k|k,αβk,α (7.24)

P−k|k =

∑

α∈A

βk,αP−k|k,α +

∑

α∈A

βk,αµ−k|k,α

(

µ−k|k,α

)⊤

− µ−k|k

(

µ−k|k

)⊤

. (7.25)

Proof of Lemma 7.3.3 is in Appendix B.2.

The mean of the error state µ−k|k is reset to zero by adding it onto the state estimate

xk|k− and updating the error covariance accordingly. Using the process derived in the proof

of Lemma 7.3.2, the reset mean and corresponding error covariance are


(

µ−k|k

)

(7.26)

Pk|k = JGxr

(

µ−k|k

)

P−k|kJ

Gxr

(

µ−k|k

)⊤

. (7.27)

Therefore, the updated state estimate and error covariance of p(xk | ϵk, ZS

0:k

)are xk|k and

Pk|k.

Lemma 7.3.4. We suppose directly Assumptions 7.1.-7.10.. Given the set of the validated

measurements ZSk and the current track likelihood conditioned on the previous measurements

p(ϵk | ZS

0:k−

), the update for the track likelihood is

p(ϵk | ZS

0:k

)=

p(ϵk | ZS

0:k−

)υk

1 + p(ϵk | ZS

0:k−

)(υk − 1)

, (7.28)

where

υk =∏

sℓ∈sk

msℓk∑

j=1

Lsℓk,j + (1− P sℓ

D PsℓG )

, (7.29)

and Lsℓk,j is defined in equation (7.21).

107

The proof of Lemma 7.3.4 is in Appendix B.3. If the initial error covariance Pk|k−

is used to calculate the term Lsℓk,j for every sensor when performing sequential measurement

fusion, the track likelihood update for parallel and sequential fusion are equivalent.

Using the track likelihood, a track is either rejected after the update step if the track

likelihood is below the threshold τTR, confirmed to be a good, possible representation of the

target if the track likelihood is above the threshold τTC , or neither rejected nor confirmed

until more information is gathered with new measurements. The confirmed track with the

highest track likelihood is assumed to represent the target.

7.3.1 Summary

In this section we summarize all of the lemmas from this chapter into one theorem to

present the update step for the MS-LG-IPDAF.

Theorem 7.3.1. Let ZSk denote the set of track associated measurements at time tk from the

sensors in Sk. We assume directly Assumptions 7.1.-7.10., and that the track’s state estimate

xk|k−, error covariance Pk|k− and track likelihood p(ϵk | ZS

0:k−

)at time tk and conditioned

on the previous measurements are known, then the track’s updated state estimate, error

covariance, and track likelihood are


(

µ−k|k

)

(7.30)

Pk|k = JGxr

(

µ−k|k

)

P−k|kJ

Gxr

(

µ−k|k

)⊤

(7.31)

p(ϵk | ZS

0:k

)=

p(ϵk | ZS

0:k−

)υk

1 + p(ϵk | ZS

0:k−

)(υk − 1)

, (7.32)

where

µ−k|k =

∑

α∈A

µ−k|k,αβk,α (7.33a)

P−k|k =

∑

α∈A

βk,αP−k|k,α +

∑

α∈A

βk,αµ−k|k,α

(

µ−k|k,α

)⊤

− µ−k|k

(

µ−k|k

)⊤

(7.33b)

µ−k|k,α = Kk,ανk,α (7.33c)

108

νk,α = LogGSαI

((zk,α)

−1 zk,α)

(7.33d)


)(7.33e)

Kk,α = P−k|k,α

(HSαk

)⊤ (RSαk

)−1(7.33f)

RSαk = V Sα

k RSα(V Sαk

)⊤(7.33g)

P−k|k,α =

((HSαk

)⊤ (RSα

)−1HSαk + P−1

k|k−

)−1

(7.33h)

υk =∏

sℓ∈sk

msℓk∑

j=1

Lsℓk,j + (1− P sℓ

D PsℓG )

. (7.33i)

The term Lsℓk,j is defined in equation (7.21), βk,α is the probability of the joint association

event θk,α defined in equations (7.19) and (7.20), µ−k|k,α is the updated error state mean before

being reset to zero and conditioned on θk,α, νk,α is the innovation term, Kk,α is the Kalman

gain, P−k|k,α is the updated error covariance before the error state mean is reset to zero,

hSα is the augmented observation function defined in equation (7.5), RSα is the augmented

measurement noise covariance defined in equation (7.6), and HSαk , V Sα

k are the Jacobians

of the augmented observation function defined in equation (7.3) and evaluated at the point

ζhSαk=(xk|k− , 0

).

Even though the MS-LG-IPDAF is designed to track a single target we can relax this

constraint by modifying the original assumptions with the following assumptions: there are

multiple targets, a sensor detects the targets with the same probability provided that the

target is in the sensor’s sensor surveillance region, when a sensor produces measurements it

produces at most one target-originated measurement per target, and that the tracks and data

association process are independent. The last revised assumption means that a measurements

is associated to every track whose validation region it falls into, and the tracks are updated

without considering any other track. This allows us to use the normal MS-LG-IPDAF update

step for each track. Lastly, we assume that every confirmed track represents a target.

7.4 Experiment

The simulated experiment consists of tracking multirotors constrained to a plane us-

ing three radar systems that can detect the target’s relative azimuth and depth. The radar

109

systems are aligned in an array on the ground 20 meters apart from each other pointing

straight up. Each radar system has a range of 100 meters and a field of view (FOV) of

90 degrees. The variance of the relative depth and angle is 1e−3 meters and 1e−2 radians

respectively. The spatial density of each radar system is set to 1e−3; unless otherwise spec-

ified. Each radar detects the target with probability PD = 0.95, and the probability that a

target originated measurement is associated to the track is set to PG = 0.9. The three radar

systems simultaneously produce measurements at a rate of 13hz.

7.4.1 System Model

Since the target’s are constrained to a plane and have an orientation and position,

we represent their natural configuration manifold using the special Euclidean group SE (2).

The Lie group SE (2) is defined in Appendix C.4. The target’s state is x = (g, v) ∈ Gx=

SE (2)×RSE(2) where g ∈ SE (2) and v ∈ RSE(2).

We denote the target’s pose as

g =

R p

01×2 1

, (7.34)

where R ∈ SO (2) is the rotation from the target’s body frame to the tracking frame and

p = [px, py]⊤ is the target’s position expressed in the tracking frame and with respect to the

tracking frame. The Lie group SO (2) is defined in Appendix C.2.

We denote the target’s twist as

v =

ρ

ω

(7.35)

where ρ is the target’s translational velocity with respect to the tracking frame and expressed

in the body frame, and ω is the target’s angular velocity with respect to the tracking frame

and expressed in the body frame.

110

The state transition function defined in equation (4.2) and the Jacobians of the state

transition functions defined in equation (4.5) are used for the UAS with the definition and

properties of the Lie group SE (2) in Appendix C.4.

Since a radar measures the target’s relative azimuth and depth, we represents its

measurement manifold as the direct product of SO (2) and R. The special orthogonal group

SO (2) represents the natural configuration manifold of the azimuth since it expresses all

rotations and angles in two dimensions, and the Euclidean space R represents the natural

configuration manifold of the range. For the properties of SO (2) see Appendix C.2, and

for the properties of R see Appendix C.1. The measurement from sensor sℓ is denoted

zsℓ ∈ Gsℓ

= SO (2)× R.

The proposed observation function for sensor sℓ is

hsℓ (xk, rsℓk ) = (d (xk) , φ (xk)) Exp

GsℓI (rsℓk ) (7.36)

where

d (xk) =√

p⊤k pk (7.37a)

φ (xk) = ExpSO(2)I

(

arctan

(pykpxk

))

, (7.37b)

the function d calculates the relative depth, the function φ calculates the relative angle,

the target’s position is pk = [pxk , pyk ]⊤, and rsℓk ∼ N (0, Rsℓ) ∈ Rsℓ

= RSO(2) × RR is the

measurement noise.

Lemma 7.4.1. Given the observation function in equation (7.36), the Jacobians of the

observation function with respect to the state and measurement noise and evaluated at the

point ζhsℓk= (xk, 0) are

Hsℓk =

∂h

∂x

∣∣∣∣ζhsℓk

=

p⊤k Rk/d (xk) 01×4

p⊤k Rk [−1]1× /d (xk)2 01×4

(7.38)

V sℓk =

∂h

∂r

∣∣∣∣ζhsℓk

= I2×2, (7.39)

111

where vk, pk, and Rk are the target’s estimated velocity, position and orientation, [·]1× is the

skew symmetric operator defined in equation (C.3), and the function d : Gx → R is defined

in equation (7.37).

Proof. We begin with solving for Hsℓk . Since the measurement noise is evaluated at zero,

we set it to zero to simplify the derivation. We also drop the dependency on time since it

does not affect the derivation. Let τ = [τ px , τ py , τR, τ vx , τ vy , τω]⊤ ∈ Rx denote the vector we

are perturbing the state by when taking the derivative. Using τ , we condense notation by

writing the derivative as if we are taking the derivative with respect to all of the components

of τ simultaneously, but we are actually computing the derivative one component at a time.

We denote the partial derivative of a generic function Φ(x) w.r.t x as Φ′, and define

the functions

a1 (x) =pxpy

a2 (x) = p⊤p,

where p = [px, py]⊤ is the target’s position. We define the observation function with mea-

surement noise set to r = 0 as the composite function

h (x, 0) ≜(

sqrt (a2 (x)) ,ExpSO(2)I (arctan (a1 (x)))

)

where we dropped the superscript indicating the sensor since all the sensors have the same

observation function.

Using the chain rule we get

h′

(x, 0) =(

sqrt′

(a2 (x)) a′

2 (x) , ExpSO(2)

′

I (arctan (a1 (x))) arctan′

(a1 (x)) a′

1 (x)

)

=

1

2√a2(x)

a′

2 (x)

I1×11

1+a1(x)2a

′

1 (x)

, (7.40)

where we used the definition of the right Jacobian for SO (2) in equation (C.8) to note that

ExpSO(2)

′

I = I1×1, and where we have changed to using matrix notation.

112

Before calculating a′

1 and a′

2 using the definition of the derivative, we explicitly write

the perturbed pose gp = gExpSE(2)I (τ g) as

gp =

RExp

SO(2)I

(τR)

RV (τω) τ p + p

0 1

,

where τ g = [τ px , τ py , τR]⊤, τ p = [τ px , τ py ]⊤, g is the target’s pose, and the function V is

defined in equation (C.23).

Noting that both τ p and τR cannot both be nonzero when taking the derivative we

simplify the perturbed pose to

gp = xExpSE(2)I (τ g) =

RExp

SO(2)I

(τR)

Rτ p + p

0 1

.

We denote the components of the rotation matrix R as

R =

R11 R12

R21 R22

.

Using the definition of the derivative

a′

1 = limτ→0

py+R21τpx+R22τpy

px+R11τpx+R12τpy − py

px

τ

= limτ→0

py+R21τpx+R22τpy

px+R11τpx+R12τpy

px−R11τpx−R12τpy

px−R11τpx−R12τpy − py

px

τ

= limτ→0

pypx+P⊤R[−1]1×ρ−H.O.T.

(px)2−H.O.T

− pypx

τ

=[p⊤R[−1]1×

p2x01×4

]

, (7.41)

where H.O.T. indicate the higher order terms that go to zero, and

a′

2 = limτ→0

(p+Rτ p)⊤ (p+Rτ p)− p⊤p

τ

113

= limτ→0

2p⊤Rτ p

τ(7.42)

=[

2p⊤R 01×4

]

. (7.43)

Substituting equations (7.41) and (7.43) into equation (7.40) yields (7.38). The proof of V sℓk

is trivial and will not be provided.

7.4.2 Simulated Experiments

We test the performance of the parallel and sequential LG-IPDAF algorithms by in-

tegrating them with G-MTT. The parallel LG-IPDAF algorithm is the algorithm we derived

in the chapter called MS-LG-IPDAF.

In the experiment there are three multirotors. The first target moves left to right

with an altitude of 55 meters at 1.5m/s, the second target moves right to left with a relative

altitude of 20 meters at 1.5m/s, and the third target moves in a circle with a radius of 10

meters above the center radar. All of the targets begin and stay in the filed of view (FOV) of

at least one radar during the experiment except for the second target which leaves the FOV

of the radars towards the end of the simulation. A track is confirmed if the track likelihood is

above τTC = 0.7 and is rejected if the track likelihood is below τTR = 0.1. When a track is in

the FOV of a single radar, the sequential and parallel fusion approach for the corresponding

track are equivalent.

The simulation lasts for 60 seconds. At the end of each simulation, the average

Euclidean error (AEE) in position and the target probability of detection (TPD) is calculated

according to the method in [88]. To calculate these metrics, after running the update step

we associate a confirmed track to a target if the track is within 5 meters of the target. If

the track is associated to a target, then the Euclidean error in position is calculated and the

target is assumed to be detected.

A depiction of the simulation is shown in Fig. 7.3, and a video of the simulation can

be seen at https://www.youtube.com/watch?v=3pfLQghovD8&ab channel=MAGICCLab.

Plots of the target’s and track’s trajectories and the norm of the track’s error covariance are

shown in Fig. 7.4 to validate that the algorithms are working properly. Note that the norm

114

https://www.youtube.com/watch?v=3pfLQghovD8&ab_channel=MAGICCLab

Figure 7.3: An instant in time of the simulation. The black lines represent the FOV of each radarsystem. The red, yellow, and magenta dots represent the point measurements from each respectiveradar system. The green squares represent the targets, and the blue and cyan markers representthe tracks from the parallel and sequential algorithms. In this image, the markers for the targetsand the parallel and sequential algorithm are on top of each other.

of the error covariance changes according to the number of radar sensors that detect it at a

given time stamp.

To understand the purpose of our two experiments we will quickly discuss a key

difference between the sequential and parallel fusion approaches. For this discussion we

assume that the measurement noise for all of the sensors are the same and that the current

state estimate for each algorithm is the same. Under these assumptions and according to the

equations in Theorem 7.3.1, in the parallel fusion approach each sensor influences the state

estimate equally since the Kalman gain is the same for each sensor; however, in the sequential

fusion approach the first sensor influences the state estimate more than the others since the

Kalman gain is larger for the first sensor and decreases as the data from the other sensors

are processed sequentially. If the system is linear, the Kalman gain used in the parallel

115

ParallelFusion

0

20

40

60

80

100

-60 -40 -20 0 20 40 60

yp

os

xpos

EstimatedPositions

0.5

1

1.5

2

2.5

3

0 10 20 30 40 50 60err

or

inm

ete

rs

timeinseconds

ErrorCovarianceNorm

(a) Parallel FusionSequentialFusion

0

20

40

60

80

100

-60 -40 -20 0 20 40 60

yp

os

xpos

EstimatedPositions

0

1

2

3

4

5

6

7

0 10 20 30 40 50 60

err

or

inm

ete

rs

timeinseconds

ErrorCovarianceNorm

(b) Sequential Fusion

Figure 7.4: A comparison between the parallel and sequential fusion approach for a single simulationiteration. In the left images, the true positions are drawn in black and the tracks’ estimated positionsare drawn in different colors. In the right images, the norm of the error covariance for each trackare plotted against time.

116

0 50 100

Number of False Measurements

0.3

0.4

0.5

0.6

0.7

Ave

rag

e E

uclid

ea

n E

rro

r Parallel FusionSequential Fusion

0 50 100

Number of False Measurements

0.7

0.75

0.8

0.85

0.9

0.95

Pro

ba

bili

ty o

f D

ete

ctio

n

Parallel FusionSequential Fusion

Figure 7.5: The average Euclidean error and the probability of detection as a function of the numberof false measurements per sensor scan from each sensor.

fusion approach is the same as the Kalman gain used by the last sensor in the sequential

approach. If the system is non-linear, these two Kalman gains are approximately the same.

This means that overall, the measurements in the sequential approach influence the state

estimate more than the measurements in the parallel approach, and that the first sensor

processed in the sequential approach has more influence than the other sensors. This can

improve the performance of the sequential algorithm if the first sensor is very accurate and

has little measurement noise, or can hurt the performance if the first sensor is very inaccurate

and has lots of measurement noise.

In the first experiment, we perform ten simulations with the spacial density of false

measurements being set to increasing values; λ = 1e−3, 2e−3, 5e−3, 7e−3, 1e−2. These valuescorrespond to each sensor producing 9, 19, 49, 69, 99 false measurements per sensor scan.

We then average the average Euclidean error in position and the track probability of detection

over the ten simulations for each value of λ. The results of this experiment are shown in

Fig. 7.5, and show that sequential and parallel fusion have similar performance. This is a

different result than the ones in [90]–[92]. We believe that this is different because we do

not assume that the initial state of the targets is known. This requires the algorithm to

initialize the tracks using noisy measurements and results in there being error in the initial

state estimate. This initial error may nullify any advantage the sequential approach has on

the parallel approach for this experiment.

117

0 0.1 0.2

Relative Angle Variance in rads2

0.1

0.15

0.2

0.25

0.3

0.35

Avera

ge E

uclid

ean E

rror Parallel Fusion

Sequential Fusion

0 0.1 0.2

Relative Angle Variance in rads2

0.2

0.4

0.6

0.8

1

Pro

babili

ty o

f D

ete

ction Parallel Fusion

Sequential Fusion

Figure 7.6: The average Euclidean error and the probability of detection as a function of the relativeangle variance in the measurement.

The second simulated experiment is similar to the first except that the spacial density

of false measurements is fixed at λ = 1e−3 and the variance of the angle measurement is set to

the values var (φ) = 1e−2, 5e−2, 8−2, 1e−1, 1.5e−1, 2e−1. The idea behind this experiment

is that if the measurement from the first sensor has a lot of error, then the tracks from

the sequential approach would be more likely to drift from the target. The results of this

experiment are in Fig. 7.6 and show that the sequential fusion slightly outperforms the

parallel fusion as the measurement noise increases.

Both experiments show that the performance of the parallel and sequential algorithms

are similar with the sequential approach slightly outperforming the parallel approach. Due

to these results, we propose that the sequential algorithm is preferable since it is more

computationally efficient as discussed in [91], [92].

7.5 Conclusion

In this chapter we derived a parallel centralized measurement fusion for the LG-

IPDAF algorithm called multi-sensor Lie group integrated probabilistic data association

filter (MS-LG-IPDAF). We compared the MS-LG-IPDAF algorithm to the sequential ap-

proach in simulation. The experimental results showed that both approaches have about the

same tracking performance with the sequential approach slightly outperforming the paral-

lel approach. Thus, we propose that the sequential approach is favorable since it is more

computationally efficient.

118

CHAPTER 8. TRACK INITIALIZATION

This chapter derives the track initialization phase of G-MTT discussed in Section 2.3.

The objective of the track initialization algorithm is to initialize new tracks from clusters. A

target that is not initialized (not being tracked by G-MTT) produces measurements extracted

by the sensor. Provided that the measurements of the target are not in the validation region

of any initialized track, its measurements are associated to a cluster as described in Chapter 5.

After enough sensor scans, there will be sufficient measurements in a cluster to observe the

target and initialize a new track.

When tracking in the presence of dense clutter, target-originated measurements are

often mixed with non-target-originated measurements in a cluster. To estimate the current

state estimate of the target whose measurements are in the cluster, G-MTT has to filter out

the non-target-originated measurements when estimating the target’s current state estimate.

This is done using the random sample consensus (RANSAC) algorithm. RANSAC is a

robust regression algorithm that filters out gross outliers when estimating the parameters of

a model [60].

In G-MTT, RANSAC takes a cluster that has a least τCM measurements from distinct

times such that the target according to the system model in equation (4.1) is observable.

Using the cluster, RANSAC seeks to find a current state estimate (referred to as a state

hypothesis) that fits the largest subset of measurements from the cluster subject to the

system model. We refer to this subset of measurements as inliers denoted I. The state

hypothesis and inliers are given to the MS-LG-IPDAF algorithm (derived in Chapter 7)

to filter in the inliers and produce a current state estimate, error covariance, and track

likelihood, which compose a new track. In place of the MS-LG-IPDAF, you can also use

sequential, centralized measurement fusion with the LG-IPDAF as discussed in Chapter 7.

119

The outline of the track initialization process is depicted in Fig. 8.1 where three

clusters are given to the RANSAC algorithm. RANSAC calculates a state hypothesis and

inlier set for each cluster. The current state hypotheses and inliers are given to the MS-LG-

IPDAF algorithm to initialize new tracks.

RANSAC MS-LG-IPDAF

Figure 8.1: An overview of the track initialization process.

8.1 Preliminaries

Let ZCm denote the set of measurements in cluster Cm, and zsji,l ∈ ZCm denote the

lth measurement from sensor sj received at time ti that is in the cluster Cm. Let ψ denote

the Bernoulli random variable that a measurement is target originated, and let xk denote

a target’s state at the current time tk. We denote the probability of the measurement zsji,l

conditioned on the target’s state at time tk and the measurement being target originated as

p(zsji,l | xk, ψ

).

Lemma 8.1.1. Given that a target originated measurement is related to the target’s state

according to the system model defined in equations (4.1) and (4.2) and that the measurement

noise and process noise are independent, then the probability of the measurement zsji,l ∈ ZCm

conditioned on the target’s current state and the measurement originating from the target is

denoted p(zsji,l | xk, ψ

)and approximated as Gaussian as

120

p(zsji,l | xk, ψ

)≈ η exp

(

−1

2Log

Gsj

I

((zsji

)−1zsji,l

)⊤ (Rsjk:i

)−1Log

Gsj

I

((zsji

)−1zsji,l

))

, (8.1)

where

zsji = hsj (f (xk, 0, tk:i) , 0) ,

Rsjk:i = H

sji Gk:iQ (tk:i)G

⊤k:i

(Hsji

)⊤+ V

sji Rsj

(Vsji

)⊤,

f is the state transition function defined in equation (4.2), hsj is the observation function for

sensor sj defined in equation (4.1b), Q (tk:i) is the process noise covariance between the time

interval tk:i, and Hsji , V

sji and Gk:i are the system model Jacobians defined in equation (4.5)

and evaluated at the points ζfk:i = (xk, 0, tk:i) and ζhsji= (f (xk, 0, tk:i) , 0).

Proof. According to the system model defined in equation (4.1), the measurement model

that maps the target’s current state xt to a measurement from sensor ssj received at time ti

is

zsji = hsj

(f (xk, qk:i, tk:i) , r

sji

), (8.3)

where qk:i ∼ N (0, Q (tk:i)) is the process noise covariance. Using the chain rule and the

system Jacobians defined in equation (4.5), the affinization of the measurement model defined

in equation (8.3) is

zsji ≈ hsj (f (xk, 0, tk:i) , 0) Exp

Gsj

I

(Hsji Gk:iqk:i + V

sji r

sji

)(8.4a)

= zsji Exp

Gsj

I

(zsjk:i

), (8.4b)

where zsjk:i = H

sji Gk:iqk:i + V

sji r

sji , z

sji = hsj (f (xk, 0, tk:i) , 0), the system model Jacobians

are defined in equation (4.5) and evaluated at the points ζfk:i = (xk, 0, tk:i) and ζhsji

=

(f (xk, 0, tk:i) , 0).

The probability distribution of zsji is defined in its tangent space by the random

variable zsjk:i using the relation z

sjk:i = Log

GsiI

((zsji

)−1zsji

)

. Thus, to find the probability

distribution of zsji , we calculate the expected value and covariance of z

sjk:i. The expected

121

value and covariance of zsjk:i are

E[zsjk:i

]= E

[Hsji Gk:iqk:i + V

sji r

sji

]= 0 (8.5)

cov(zsjk:i

)= cov

(Hsji Gk:iqk:i + V

sji r

sji

)(8.6)

= Hsji Gk:icov (qk:i)

(Hsji Gk:i

)⊤+ V

sji cov

(rsji

) (Vsji

)⊤(8.7)

= Hsji Gk:iQ (tk:i)

(Hsji Gk:i

)⊤+ V

sji Rsj

(Vsji

)⊤(8.8)

= Rsj

k:i. (8.9)

Therefore, the Gaussian approximation of the probability density function of the measure-

ment is zsji ∼ N

(zsji ,R

sjk:i

), and the probability of the measurement z

sji,l is as defined in

equation (8.1).

8.2 RANSAC

In this section we derive the RANSAC algorithm used in G-MTT for a single clus-

ter. Let Cj denote the jth cluster that has at least τCM measurements from distinct times.

RANSAC randomly selects a subset of measurements from Cj, denoted ZCja ⊂ Cj, containing

τRSC measurements such that the system can be observed using the measurements in ZCja

and that at least one of the measurements in ZCja is from the current time. A depiction of

cluster Cj and the subset of measurements is shown in Fig. 8.2. The green dots represent

the measurements in cluster Cj and the blue dots represent the measurements in the random

subset ZCja ⊂ Cj. The fading of the measurement’s color is used to denote time where the

darker the color indicates the more recent the measurement.

122

(a) A depiction of cluster Cj . (b) A depiction of ZCj

a ⊂ Cj .

Figure 8.2: A depiction of the cluster Cj and a random subset of measurements ZCja ⊂ Cj , where

the measurements in the random subset are represented by blue dots.

RANSAC uses the subset ZCja to estimate the current state that maximizes the prob-

ability of the measurements in ZCja conditioned on the measurements being target originated

and the current state; denoted p(

ZCja | xk, ψ

)

. The output of this maximization is the cur-

rent state hypothesis associated with ZCja denoted xhk,a. RANSAC then finds measurements

in cluster Cj that supports xhk,a according to the system model. These measurements form the

inlier set Ia. Using the inlier set, the state hypothesis is given a score. If the score is greater

than the stopping score criteria τRSS, a new track is initialized using xhk,a and Ia; otherwise,RANSAC counts the number of iterations performed. If RANSAC has not performed τRMI

iterations, it randomly selects a new subset of measurements and repeats the previous steps

to form other state hypotheses and corresponding inlier sets; otherwise, it selects the state

hypothesis with the best score. We denote the state hypothesis and its corresponding inlier

set with the best score as(xhk,b, Ib

). If the best state hypothesis has a minimum score of at

least τRmS, it is used to initialize a new track; otherwise, RANSAC does not initialize a new

track from cluster Cj. The RANSAC algorithm is outlined in Fig. 8.3.

8.2.1 Generating a State Hypothesis

The current state hypothesis xhk,a is estimated using the subset of measurements

ZCja ⊂ Cj by finding the current state that maximizes the probability p

(

ZCja | xk, ψ

)

subject

123

Randomly select asubset of

measurements

Find inliermeasurements to Score

Score

Score greater thanstopping criteria

Performedpredetermined

number of iterations

Keep current statehypothesis

Initialize track with

Yes

No

Select statehypothesis with best

score

Yes

No

Score greater than minimum

requirement

Initialize track withYes

Do not initialize atrack

No

Figure 8.3: An overview of the RANSAC algorithm.

to the system model defined in equation (4.1). We write this optimization problem as

argmaxxk

p(ZCja | xk, ψ

)(8.10a)

s.t. equations (4.1) and (4.2). (8.10b)

The state hypothesis xhk,a is set to the current state than maximizes the optimization problem.

Lemma 8.2.1. Let ZCja ⊂ Cj denote a random subset of measurements from cluster Cj with

the condition that the system defined by the system model in equation (4.1) is observable by

the measurements in ZCja . Assuming that the measurements in the cluster are independent

and given the system model, then the Gaussian approximation of the negative log likelihood

maximization (NLLM) of equation (8.10) is

argminxk

∑

zsji,l∈Z

Cja

LogGsj

I

((zsji

)−1zsji,l

)⊤ (Rsjk:i

)−1Log

Gsj

I

((zsji

)−1zsji,l

)

(8.11)

124

where

zsji = hsj (f (xk, 0, tk:i) , 0) ,

Rsjk:i = H

sji Gk:iQ (tk:i)G

⊤k:i

(Hsji

)⊤+ V

sji Rsj

(Vsji

)⊤,

f is the state transition function defined in equation (4.2), hsj is the observation function for

sensor sj defined in equation (4.1b), Q (tk:i) is the process noise covariance between the time

interval tk:i, and Hsji , V

sji and Gk:i are the system model Jacobians defined in equation (4.5)

and evaluated at the points ζfk:i = (xk, 0, tk:i) and ζhsji= (f (xk, 0, tk:i) , 0).

Proof. Using Lemma 8.1.1 and the assumption that the measurements are independent, the

probability p(

ZCja | xk, ψ

)

can be written as

p(ZCja | xk, ψ

)=

∏

zsji,l∈Z

Cja

p(zsji,l | xk, ψ

)

≈∏

zsji,l∈Z

Cja

η exp

(

−1

2Log

Gsj

I

((zsji

)−1zsji,l

)⊤ (Rsjk:i

)−1Log

Gsj

I

((zsji

)−1zsji,l

))

,

where the terms are defined in Lemma 8.1.1. Taking the negative log of the above equation

yields

p(ZCja | xk, ψ

)

≈ − log

∏

zsji,l∈Z

Cja

η exp

(

−1

2Log

Gsj

I

((zsji

)−1zsji,l

)⊤ (Rsjk:i

)−1Log

Gsj

I

((zsji

)−1zsji,l

))

= − log (η)−∑

zsji,l∈Z

Cja

log

(

exp

(

−1

2Log

Gsj

I

((zsji

)−1zsji,l

)⊤ (Rsjk:i

)−1Log

Gsj

I

((zsji

)−1zsji,l

)))

= − log (η) +1

2

∑

zsji,l∈Z

Cja

(

LogGsj

I

((zsji

)−1zsji,l

)⊤ (Rsjk:i

)−1Log

Gsj

I

((zsji

)−1zsji,l

))

,

125

where we lumped all of the normalizing coefficient η’s into a single η. The term η and the

fraction 12has no impact on the optimization and can be removed from the optimization

resulting in equation (8.11).

In our implementation we use Ceres [98] to solve the least square optimization problem

defined in equation (8.11) for non-linear systems, and use the original method proposed in [63]

for linear systems.

8.2.2 Finding the Inlier Measurements to a State Hypothesis

After calculating the current state hypothesis xhk,a from the subset ZCja , RANSAC

finds other measurements in cluster Cj that support (i.e. inliers to) the state hypothesis xhk,a.A measurement zsib,m ∈ Cj is an inlier to the the state hypothesis xhk,a if the probability of

the measurement zsib,m conditioned on the state hypothesis and it being a target-originated

measurement, denoted p(zsib,m | xhk,a, ψ

), is above a specified threshold. From Lemma 8.1.1

the Gaussian approximation of p(zsib,m | xhk,a, ψ

)is

p(zsib,m | xhk,a, ψ

)≈ η exp

(

−1

2Log

GsiI

((zsib )

−1 zsib,m)⊤

(Rsik:b)

−1 LogGsiI

((zsib )

−1 zsib,m))

(8.13)

where

zsib = hsi(f(xhk,a, 0, tk:b

), 0),

Rsik:b = Hsi

b Gk:bQ (tk:b)G⊤k:b (H

sib )

⊤ + V sib R

si (V sib )⊤ ,

and the system model Jacobians are defined in equation (4.5) and evaluated at the points

ζfk:i = (xk, 0, tk:i) and ζhsji= (f (xk, 0, tk:i) , 0).

Using equation (8.13) we define the inlier metric dsiRI : Gx ×Gsi → R as

dsiRI(xhk,a, z

sib,m

)= Log

GsiI

((zsib )

−1 zsib,m)⊤

(Rsik:b)

−1 LogGsiI

((zsib )

−1 zsib,m), (8.15)

which has a chi-squared distribution. A measurement zsib,m ∈ Cj is an inlier to the state

hypothesis xhk,a if dsiRI(xhk,a, z

sib,m

)≤ τRI . The value of τRI can be chosen similarly as τ siV was

126

chosen for the validation region in Section 6.3. A depiction of the inlier metric and the inlier

measurements corresponding to the state hypothesis xhk,a is shown in Fig. 8.4. The blue dots

represent the measurements in the subset ZCja used to calculate the state hypothesis xhk,a.

The red dotted lines represent the boundary of the inlier threshold τRI , and the red and blue

dots represent the the inliers to the state hypothesis xhk,a.

Figure 8.4: A depiction of the inliers to the current state hypothesis xhk,a, where the red and bluedots are the inliers and the red lines are the boundaries of the inlier threshold according to theinlier metric defined in equation (8.15)

8.2.3 Computing the Score of a State Hypothesis

After computing the state hypothesis xhk,a and finding its inliers Ia, RANSAC gives

the state hypothesis a score to represent how strong the state hypothesis is. The score is

the number of measurements in Ia from distinct times for each sensor. For example, let

Ia =zsia,l, z

sja,m, z

sja,n, z

sib,c, z

sjk,l, z

sjk,m

, then the score is 4 since z

sja,m and z

sja,n count as one since

they are from the same time and sensor, and for the same reason zsjk,l and z

sjk,m count as

one. The reason for this method is because G-MTT assumes that each sensor produces at

most one target-originated measurement for every target per sensor scan. This assumption

implies that both zsja,m and z

sja,n can not be from the target, and hence should not be double

counted.

127

8.2.4 Initializing a New Track

We denote the best state hypothesis and inlier set calculated by RANSAC from a

cluster as(xhk,b, Ib

). If the best state hypothesis has a score of at least τRmS, G-MTT uses it

to initialize a new track. This is done by taking the state hypothesis xhk,b and propagating it

back in time to the same time as the oldest measurement in the inlier set Ib using the state

transition function defined in equation (4.2).

G-MTT then filters in the inlier measurements using the MS-LG-IPDAF algorithm to

compute a current state estimate, error covariance, and track likelihood for a new track. The

initial error covariance and track likelihood used in the filtering process are determined by

the user; however, we recommend setting the initial track likelihood to 0.5 to indicate that

the track is just as likely to represent a target as it is to not represent a target. The inlier

measurements are added to the new track’s consensus set and removed from the cluster.

128

CHAPTER 9. TRACK-TO-TRACK ASSOCIATION AND FUSION

On occasion when tracking multiple targets, some tracks will coalesce as illustrated

in Fig. 9.1. In these instances, the coalescing tracks need to be tested to see if they represent

the same target and fused together if they do. This process is referred to track-to-track

association and fusion. There is an extensive literature on track-to-track association and fu-

sion for linear and many non-linear systems [74], [99]–[101]; however, the literature regarding

track-to-track association and fusion is sparse for Lie groups. In this Chapter we present a

novel track-to-track association and fusion algorithm for Lie groups under the assumption

that tracks are independent.

Measurement Track Error covariance

Figure 9.1: A depiction of two coalescing tracks that represent the same target. The target isrepresented by the walking person. The measurements are represented as black dots. The tracksare denoted as dashed arrows and their covariances are represented as green ellipses.

129

9.1 Track-to-Track Association

A common approach to testing if two tracks should be fused together is by measuring

how far apart the track’s state estimates are from each other. In Euclidean space, this is

simply taking the norm of a weighted difference between the states; however, for most Lie

groups we cannot simply subtract one state from another. Instead, we form a geodesic with

its corresponding tangent vector from one state estimate to the other. The length of this

tangent vector is a measure of how far apart the two state estimates are from each other.

If the distance is small relative to the certainty of each state estimate, then there is a high

probability that the two tracks represent the same target and should be fused together. This

is the intuition behind the following Lemma.

Lemma 9.1.1. We suppose directly that tracks are independent, and that the dimension of

their state is n. Let τA be a parameter related to a certain probability of an n-dimensional

chi-squared cumulative distribution function. Also, let T ℓk|k =

(

xℓk|k, Pℓk|k, . . .

)

for ℓ ∈ i, jdenote the ith and jth track at time k conditioned on all the measurements, then the two

tracks represent the same target with probability determined by τA if

dA

(

T ik|k, T j

k|k

)

=(

eijk|k

)⊤ (

P ijk|k

)−1

eijk|k ≤ τA, (9.1)

where

eijk|k = LogGxI

((xik|k)−1

xjk|k

)

(9.2)

P ijk|k = JGx

−1

l

(

eijk|k

)

P ik|k

(

JGx−1

l

(

eijk|k

))⊤

+ JGx−1

r

(

eijk|k

)

P jk|k

(

JGx−1

r

(

eijk|k

))⊤

, (9.3)

and the inverse of the left and right Jacobians are defined in equations (3.6). The track-to-

track association metric defined in equation (9.1) has a chi-squared distribution of dimension

n where n is the dimension of the state; thus, the value of τA can be chosen from a chi-square

look up chart to correspond to the probability that the two tracks are similar. As an example,

130

if the state is dimension n = 4, then τA = 0.711 means that if two tracks are deemed similar

according to equation (9.1), there is a 95 percent chance that they represent the same target.

Proof. The metric is designed to measure how similar the two tracks are. If the tracks

represent the same target, then their states should be about the same. This implies that if

the probability of the distance between the tracks’ states being zero is high, then the tracks

have a high probability of representing the same target.

The state of the ℓth track at time tk and conditioned on the measurements up to time

tk is denoted xℓk = xℓk|kExp

GxI

(

xℓk|k

)

, where xℓk|k is the state estimate and xℓk|k ∼ N(

0, P ℓk|k

)

is the error state. Let eijk|k ≜ LogGxI

((

xik|k

)−1

xjk|k

)

. The error between these two tracks is

eijk = LogGxI

((xik)−1

xjk

)

= LogGxI

(

ExpGxI

(−xik|k

) (xik|k)−1

xjk|kExpGxI

(

xjk|k

))

= LogGxI

(

ExpGxI

(−xik|k

)ExpGx

I

(

eijk|k

)

ExpGxI

(

xjk|k

))

≈ eijk|k + JGx−1

r

(

eijk|k

)

xjk|k − JGx−1

l

(

eijk|k

)

xik|k (9.4)

where we used the property of the left and right Jacobians defined in equation (3.7). The

error is a tangent vector at the identity element of the state’s Lie group, and the Jacobians

are mapping the error states to the same tangent space so that they can be added together.

Under the assumption that the error states are independent, the mean and covariance

of the error is

E[eijk]= eijk|k,

cov(eijk)= JGx

l

(

eijk|k

)

P ik|k

(

JGxl

(

eijk|k

))⊤

+ JGxr

(

eijk|k

)

P jk|k

(

JGxr

(

eijk|k

))⊤

= P ijk|k. (9.5)

131

The probability that the two tracks represent the same target is the probability that the

error is zero;

p(eijk = 0

)≈ η exp

(

−1

2

(

eijk − eijk|k

)⊤ (

P ijk|k

)−1 (

eijk − eijk|k

))

= η exp

(

−1

2

(

eijk|k

)⊤ (

P ijk|k

)−1

eijk|k

)

. (9.6)

Using the probability that the error is zero, we define the metric

dT

(

T ik|k, T j

k|k

)

=(

eijk|k

)⊤ (

P ijk|k

)−1

eijk|k, (9.7)

which has a chi-squared distribution of dimension n where n is the dimension of the state.

The smaller the value of the metric is, the more probable that the two tracks represent the

same target.

9.2 Track-to-Track Fusion

Let T = a, b, . . . , n denote the indexing set of tracks to be fused. The objective

of track-to-track fusion is to use the tracks’ state estimates and error covariances to find a

new state estimate and error covariance that maximizes the joint probability of each track.

This is done by forming geodesics from one track’s state estimate xak|k to all of the other

tracks’ state estimates xℓk|k where ℓ ∈ T/a. By forming all of the geodesics to start at the

point xak|k, we ensure that all of the corresponding tangent vectors that form the geodesics

are in the same tangent space. This allows us to weigh each tangent vector according the

relative probability of each state estimate and add them together. We denote the weighted

tangent vector from xak|k to xℓk|k as µ

ℓak|k, and the sum of the weighted tangent vectors as µ−

k|k.

This fused tangent vector moves xak|k to the optimal state as discussed in Lemma 9.2.1 and

depicted in Fig. 9.2.

132

Figure 9.2: A depiction of the track fusion process. The state estimate of the ℓth track to be fusedis denoted xℓk|k and represented as a green dot on the manifold. The weighted tangent vector from

the state estimate xak|k to the ℓth track is denoted µℓak|k and represented as red arrows in the tangent

space. The fused tangent vector is denoted µ−k|k, and the resulting state estimate is denoted xk|k.

Lemma 9.2.1. We suppose directly that tracks are independent and that T = a, b, . . . , ndenotes the indexing set of tracks that represent the same target and are to be fused together,

then the value of the state xk ∼ N(xk|k, Pk|k

)that maximizes the joint probability

p

(⋂

ℓ∈T

xℓk|k | xk)

= η exp

(

−1

2X⊤Σ−1X

)

, (9.8)

where

X =

LogGxI

((

xak|k

)−1

xk

)

LogGxI

((

xbk|k

)−1

xk

)

...

LogGxI

((

xnk|k

)−1

xk

)

133

and Σ = cov(

X)

= diag(

P ak|k, P

bk|k, . . . , P

nk|k

)

is

xk|k = xak|kExpGxI

(

µ−k|k

)

Pk|k = JGxr

(

µ−k|k

)

P−k|kJ

Gxr

(

µ−k|k

)⊤

.

where

P−k|k =

(∑

ℓ∈T

JGx−1

r

(eℓak|k)⊤ (

P ℓk|k

)−1JGx

−1

r

(eℓak|k)

)−1

(9.9)

eℓak|k = LogGxI

((xℓk|k)−1

xak|k

)

(9.10)

µ−k|k =

∑

ℓ∈T

µℓak|k (9.11)

µℓak|k = −P−k|kJ

Gx−1

r

(eℓak|k)⊤ (

P ℓk|k

)−1eℓak|k (9.12)

and JGx−1

r is the inverse of the right Jacobian defined in equation (3.6).

Proof. Let xℓk|k and P ℓk|k denote the state estimate and error covariance of the ℓth ∈ T track

at time k conditioned on the measurements up to time k. Also, let xk denote the target’s

state with the relation xk = xℓk|kExpGxI

(

xℓk|k

)

for ℓ ∈ T. To simplify notation, we will drop

the subscripts indicating the dependency on time.

We are interested in maximizing the joint probability in equation (9.8) by finding the

value of x that maximizes it given the state estimates and error covariances of the tracks

in T. We do this by minimizing its negative log likelihood under the assumption that the

error states are independent (i.e. E[(xi)⊤xj] = 0 for i, j ∈ T and i = j. This simplifies the

optimization problem to

argminx

∑

ℓ∈T

(

LogGxI

((xℓ)−1

x)⊤ (

P ℓ)−1

LogGxI

((xℓ)−1

x))

. (9.13)

To pose the optimization problem in a single tangent space we set x = xaExpGxI (xa) and find

the value of xa that minimizes the optimization problem. This is done by approximating

LogGxI

((xℓ)−1

x)

as

134

LogGxI

((xℓ)−1

x)

= LogGxI

((xℓ)−1

xaExpGxI (xa)

)

= LogGxI

(ExpGx

I

(eℓa)ExpGx

I (xa))

≈ LogGxI

(

ExpGxI

(

eℓa + JGx−1

r

(eℓa)xa))

= eℓa + JGx−1

r

(eℓa)xa, (9.14)

where eℓa ≜ LogGxI

((xℓ)−1

xa)

, and we used the property of the right Jacobian given in

equation (3.7) under the reasonable assumption that xa is small. Substituting in equation

(9.14) into (9.13) yields

argminxa

∑

ℓ∈T

[(

eℓa + JGx−1

r

(eℓa)xa)⊤ (

P ℓ)−1

(

eℓa + JGx−1

r

(eℓa)xa)]

. (9.15)

Since (9.15) is quadratic in xa, the value of xa that minimizes (9.15) is found by

taking its first partial derivative with respect to xa, setting the derivative to 0, and solving

for xa. We set this value to the mean of xa denoted µ−. The corresponding error covariance,

denoted P−, is the second partial derivative of equation (9.15) with respect to xa. Therefore,

the value of xa and the corresponding error covariance that minimizes (9.15) are

P− =

(∑

ℓ∈T

JGx−1

r

(eℓa)⊤ (

P ℓ)−1

JGx−1

r

(eℓa)

)−1

(9.16)

µ− = −P−∑

ℓ∈T

JGx−1

r

(eℓa)⊤ (

P ℓ)−1

eℓa. (9.17)

At this point, xa = µ− + a− where a− ∼ N (0, P−). To reset the error state to have

zero mean, we add µ− onto xa and adjust the error covariance as follows

x = xaExpGxI

(µ− + a−

)(9.18a)

= xaExpGxI

(µ−)

︸︷︷︸

x

ExpGxI

JGxr

(µ−)a−

︸︷︷︸

x

. (9.18b)

135

Therefore, the fused state estimate and error state are

x = xaExpGxI

(µ−)

x ∼ N (µ = 0, P ) ,

where

P = JGxr

(µ−)P−JGx

r

(µ−)⊤. (9.19)

As stated, Lemma 9.2.1 assumes that the error states of the tracks being fused are

independent, which is not true. We make this assumption since the calculation of the cross

covariance is difficult if not impossible otherwise. A method that attempts to approximate

the cross covariance is discussed in [101]. This method can be adapted to Lie groups using

a process similar to Lemma 9.2.1.

When two or more tracks are fused together, G-MTT merges their consensus sets

together, gives the fused track the highest track likelihood and the oldest label from the

tracks used to create the fused track.

136

CHAPTER 10. MEASUREMENT LATENCY

Let xk|0:k/i denote a track’s state estimate at time tk and conditioned on the mea-

surements from the initial time t0 to the current time tk except the measurements at time ti

from sensor sℓ. When working with multiple asynchronous sensors, it is possible to receive

measurements from multiple sensors at time tk denoted ZSk and perform data association,

track propagation and updating to produce the state estimate xk|0:k/i only to later receive

measurements at time tk or a later time from sensor sℓ that were measured at time ti due

to measurement latency. We denote the delayed measurements as Zsℓi . This scenario is

illustrated in Fig. 10.1.

In this chapter we present a way to deal with measurement latency when working with

Lie groups. Our proposed solution is compatible with sequential centralized measurement

fusion, or a hybrid of sequential and parallel centralized measurement fusion where normal

propagation and updating is done using parallel centralized measurement fusion, and dealing

with measurement latency is done using sequential centralized measurement fusion. This

chapter relies heavily on the concepts of Chapter 6 to update the track with the delayed

measurements.

To relate this chapter to previous chapters, let T =(xk|k, Pk|k, ϵk|k, . . .

)denote the

track at the current time conditioned on all of the received measurements, and let ZS0:k/i

denote the set of track associated measurements from the initial time to the current time

excluding the measurements Zsℓi . When we receive the delayed measurements Zsℓ

i we equate

the following

xk|0:k/i = xk|k (10.1a)

Pk|0:k/i = Pk|k (10.1b)

p(

ϵk | Zsℓ0:k/i

)

= p (ϵk | Zsℓ0:k) , (10.1c)

137

to reflect the new information that Zsℓi has not been incorporated into the state estimate

and track likelihood due to latency.

Figure 10.1: A depiction of measurement latency. The black arrow represents the time line with t0denoting the initial time and tk denoting the current time. The state estimate at time tk conditionedon all of the measurements except the ones received at time ti from sensor sℓ is denoted xk|0:k/i.

The set ZStℓ

denotes the set of measurements received at time tℓ, and the set Zsℓti

denotes the setof measurements from sensor sℓ produced at time ti but received at time tk do to measurementlatency.

10.1 Preliminaries

To incorporate the measurements Zsℓi into the track, we need to relate the track’s

current state to a measurement from sensor sℓ taken at time ti. This is done by creating

a new observation function, using the system model defined in equation (4.1). The new

observation function hsℓk:i : Gx ×RGx × R ×RGsℓ→ Gsℓ is defined as

zsℓi = hsℓk:i (xk, qk:i, tk:i, rsℓi )

= hsℓ (f (xk, qk:i, tk:i) , rsℓi ) , (10.2)

where hsℓ is the observation function for sensor sℓ defined in equation (4.1b), f is the state

transition function defined in equation (4.2) and tk:i, qk:i ∼ N (0, Q (tk:i)) and rsℓi ∼ N (0, Rsℓ)

are the time interval from tk to ti, the process noise during the time interval tk:i and the

measurement noise from sensor sℓ at time ti. These terms are defined in Chapter 4.

The target’s state xk can be represented as xk = xkExpGxI (xk) where xk is a small

perturbation. Using the definition of the Taylor series in Subsection 3.4.5 and the affinization

138

process discussed in Section 4.1, the affine observation function of equation (10.2) is

zsℓi ≈ hsℓ (f (xk, 0, tk:i) , 0) ExpGsℓI (Hsℓ

k:ixk +Hsℓk:iqk:i + V sℓ

i rsℓi ) , (10.3)

whereHsℓk:i

= Hsℓ

i Fk:i, andHsℓi , Fk:i and V

sℓi are system model Jacobians defined in Section 4.1

and evaluated at the points ζfk:i = (xk, 0, tk:i) and ζhsℓ = (f (xk, 0, tk:i) , 0).

Let ψ denote a Bernoulli random variable indicating that a measurement is target-

originated. Using the affine observation function in equation (10.3) we approximate two

probabilities: the probability of the measurement zsℓi conditioned on the track’s state xk,

the measurement originating from the target, the track representing the target ϵk, and the

measurements ZS0:k/i, denoted as p

(

zsℓi | ψ, ϵk, xk, ZS0:k/i

)

and derived in Lemma 10.1.1; and

the probability of the measurement zsℓi conditioned on it originating from the target, the track

representing the target ϵk, and the measurements ZS0:k/i denoted as p

(

zsℓi | ψ, ϵk, ZS0:k/i

)

and

derived in Lemma 10.1.2.

Lemma 10.1.1. Let Zsℓi denote the set of delayed measurements and let ZS

0:k/i denote the set

of measurements associated to the track from the initial time to the current time excluding

Zsℓi . Given the affine observation function defined in equation (10.3) and Assumption 2.14.,

the Gaussian approximation of p(


)

is

p(zsℓi | ψ, ϵk, xk, ZS

0:k/i

)≈ η exp

(

−1

2(νk:i −Hsℓ

k:ixk)⊤ R−1

k:i (νk:i −Hsℓk:ixk)

)

, (10.4)

where

νsℓk:i = LogGsℓI

((zsℓk:i)

−1 zsℓi)

(10.5a)

zsℓk:i = hsℓ (f (xk, 0, tk:i) , 0) (10.5b)

Rk:i = Hsℓk:iQ (tk:i) (H

sℓk:i)

⊤ + V sℓi R

sℓ (V sℓi )⊤ (10.5c)

Hsℓk:i = Hsℓ

i Fk:i, (10.5d)

Rsℓ is the measurement noise covariance, Q (tk:i) is the process noise covariance during the

time interval tk:i and the Jacobians Hsℓi , Fk:i and V

sℓi defined in equation (4.5) and evaluated

139

at the points ζfk:i = (xk, 0, tk:i) and ζhsℓ = (f (xk, 0, tk:i) , 0). The covariances are defined in

Chapter 4.

Proof. From equation (10.3) let


((zsℓk:i)

−1 zsℓi)

≈Hsℓk:ixk +Hsℓ

k:iqk:i + V sℓi r

sℓi

The expected value and covariance of νsℓk:i using the assumption that the measurement and

process noise are independent and the fact that the measurement is conditioned on the state

xk are

E [νsℓk:i] = Hsℓk:iE [xk] +Hsℓ

k:iE[qk:i] + V sℓi E[rsℓi ]

= Hsℓk:ixk

cov (νsℓk:i) = Hsℓk:icov (qk:i) (H

sℓk:i)

⊤ + V sℓi cov (rsℓi ) (V

sℓi )⊤

= Hsℓk:iQ (tk:i) (H

sℓk:i)

⊤ + V sℓi R

sℓ (V sℓi )⊤ ;

therefore, the Gaussian approximation of p(


)

is as defined in equa-

tion (10.4)

Lemma 10.1.2. Let xk|0:k/i = xk|0:k/iExpGxI

(xk|0:k/i

)and Pk,0:k/i denote the tracks current

state and error covariance at time tk conditioned on the measurements ZS0:k/i. We assume

that the probability of the track’s state conditioned on the track representing the target and

the measurements ZS0:k/i is known. This probability is denoted and defined as

p(xk|0:k/i | ϵk, ZS

0:k/i

)= η

(

−1

2x⊤k|0:k/iP

−1k|0:k/ixk|0:k/i

)

. (10.6)

We also suppose directly the assumptions 2.1., 2.3., 2.6.,2.7. and 2.14., and that the Gaus-

sian approximation of p(


)

is known and defined in Lemma 10.1.1, then

the probability of the measurement zsℓi conditioned on it being target-originated, the track

140

representing the target and the measurements ZS0:k/i is denoted and defined as

p(zsℓi | ψ, ϵk, ZS

0:k/i

)≈ η exp

(

−1

2(νsℓk:i)

⊤ (Ssℓk:i)−1 νsℓk:i

)

(10.7)

where


((zsℓk:i)

−1 zsℓi)

(10.8a)

zsℓk:i = hsℓ (f (xk, 0, tk:i) , 0) (10.8b)

Ssℓk:i = Hsℓk:i

(Q (tk:i) + Pk|0:k/i

)(Hsℓ

k:i)⊤ + V sℓ

i Rsℓ (V sℓ

i )⊤ (10.8c)

Hsℓk:i = Hsℓ

i Fk:i, (10.8d)

νsℓk:i is the innovation term, zsℓk:i is the estimated measurement, Sk:i is the innovation co-

variance, Qk:i is the process noise covariance, Rsℓ is the measurement noise covariance,

hsℓ is the observation function, f is the state transition function defined in equation (4.2),

and the Jacobians Hsℓi , Fk:i and V sℓ

i are evaluated at the points ζfk:i = (xk, 0, tk:i) and

ζhsℓ = (f (xk, 0, tk:i) , 0) and are defined in equation (4.5).

Proof. According to the theorem of total probability

p(zsℓi | ψ, ϵk, ZS

0:k/i

)=

∫

xk|0:k/i

p(zsℓi | ψ, ϵk, xk, ZS

0:k/i

)p(xk|0:k/i | ϵk, ZS

0:k/i

)dxk|0:k/i. (10.9)

Using Lemma 10.1.1 for the Gaussian approximation of p(


)

and equa-

tion (10.6) for the probability p(

xk|0:k/i | ϵk, ZS0:k/i

)

, the proof is similar to the proof of

Lemma 6.2.2.

10.2 Updating With a Latent Measurements

The update step using the delayed measurements Zsℓi is very similar to the LG-

IPDAF’s update step in Chapter 6.

Lemma 10.2.1. Let Zsℓi =

zsℓi,jm

sℓi

j=1denote the set of delayed, track-associated measure-

ments produced from sensor sℓ at time ti and received at time tk with cardinality msℓi , Z

S0:k/i

141

denote the set of track-associated measurements from the initial time to the current time

excluding the measurements Zsℓi and θsℓi,j denote the association events corresponding to the

measurements Zsℓi as defined in Chapter 6. Also let xk|0:k/i and Pk|0:k/i denote the state es-

timate and the error covariance at time tk conditioned on the measurements ZS0:k/i , and let

p(

ϵk | ZS0:k/i

)

denote the track likelihood at time tk conditioned on ZS0:k/i.

Given the assumptions 2.3., 2.7., 2.9., 2.10., 2.11., 2.12. and 2.14., the updated track

likelihood, state estimate and error covariance that maximizes the a posteriori probability of

the track’s state conditioned on it representing the target and all the measurements are

xk|k = xk|0:k/iExpGxI

(

µsℓ−

k:i|k

)

(10.10)

Pk|k = JGxr

(

µsℓ−

k:i|k

)

P sℓ−

k:i|kJGxr

(

µsℓ−

k:i|k

)⊤

(10.11)

p(ϵk | ZS

0:k

)=

1− αsℓk:i1− αsℓk:ip

(ϵk | Z0:k/i

)p(ϵk | ZS

0:k/i

)(10.12)

where

µsℓ−

k|k = Ksℓk:iν

sℓk:i (10.13a)

νsℓk:i =

msℓi∑

j=1

βsℓk:i,jνsℓk:i,j (10.13b)

νsℓk:i,j = LogGsℓI

((zsℓk:i)

−1 zsℓi,j)

(10.13c)

zsℓk:i = f(xk|0:k/i, 0, tk:i

)(10.13d)

Ksℓk:i = Pk|0:k/i (H

sℓk:i)

⊤ (Ssℓk:i)−1 (10.13e)

Ssℓk:i = Hsℓk:i

(Q (tk:i) + Pk|0:k/i

)(Hsℓ

k:i)⊤ + V sℓ

i Rsℓ (V sℓ

i )⊤ (10.13f)

Hsℓk:i = Hsℓ

i Fk:i (10.13g)

P sℓ,c−

k:i|k = (I −Ksℓk:iH

sℓk:i)Pk|0:k/i (10.13h)

P sℓk:i|k = Ksℓ

k:i

msℓi∑

j=1

βsℓk:i,jνsℓk:i,j

(νsℓk:i,j

)⊤ − νsℓk:i (νsℓk:i)

⊤

(Ksℓk:i)

⊤ (10.13i)

P sℓ−

k:i|k = βsℓk:i,0Pk|0:k/i +(1− βsℓk:i,0

)P sℓ,c

−

k:i|k + P sℓk:i|k (10.13j)

142

βsℓk:i,j =

Lsℓk:i,j

1−PsℓD P

sℓG +

∑msℓi

j=1 Lsℓk:i,j

j = 1, . . . ,msℓi

(1−P sℓG P

sℓD )

1−PsℓD P

sℓG +

∑msℓi

j=1 Lsℓk:i,j

j = 0,

(10.13k)

Lsℓk:i,j =P sℓ

λp(zsℓi,j | ψ, ϵk, ZS

0:k/i

)(10.13l)

αsℓk:i =

P sℓD P

sℓG if msℓ

i = 0

P sℓD P

sℓG −∑m

sℓi

j=1 Lsℓk:i,j else

, (10.13m)

and the Jacobians Hsℓi , Fk:i and V

sℓi are evaluated at the points ζfk:i = (xk, 0, tk:i) and ζhsℓ =

(f (xk, 0, tk:i) , 0) and defined in equation (4.5).

Proof. By substituting in the probability p(


)

defined in equation (10.4)

for p (zk | ψ, ϵk− , xk− , Z0:k−) defined in Lemma 6.3.2, the probability p(

xk | ϵk, ZS0:k/i

)

de-

fined in equation (10.6) for p (xk | ϵk, Z0:k−) defined in equation (6.10), the probability

p(

zsℓi | ψ, ϵk, ZS0:k/i

)

defined in equation (10.7) for p (zsℓk | ψ, ϵk, Z0:k−) defined in equation

(6.16), the set ZS0:k/i for Z0:k− , and the measurement zsℓi for zk, the proof follows directly

from Theorem 6.4.1.

Jacobians capture the behavior of functions locally. The larger the time interval tk:i is,

the larger the perturbations xk and qk:i will be, and the less accurate the affine observation

function in equation (10.3) will be. This implies that the larger the time interval tk:i is,

the less accurate the update step will be. This statement is under the assumption that the

observation function in non-linear since in the linear case the Jacobians capture the global

behavior of functions perfectly.

143

CHAPTER 11. END-TO-END MTT FRAMEWORK

In this chapter we present an end-to-end target tracking and following architecture for

unmanned aerial vehicles (UAVs) that track ground-based objects using a fixed monocular

camera, an inertial measurement unit (IMU), an on-board computer, a flight control unit,

and a sensor to measure altitude. An illustration of the target tracking scenario is shown in

Fig. 11.1.

Ground plane

Figure 11.1: A depiction of the target tracking scenario where the target is moving on the groundand is being tracked by a monocular camera mounted on a UAV. The black dashed lines representthe camera’s field of view and the orange arrow represents the line of sight vector from the camerato the target.

Any vision-based target tracking method will need to extract measurements from the

images. Recently, there has been extensive research in object detection and identification

using deep neural networks such as YOLO [102], R-CNN [103] and others [104]. These

methods achieve high accuracy but are computationally expensive. We discuss other methods

in Section 11.2.

144

To follow the detected and tracked targets, a control approach known as image based

visual servoing (IBVS) is commonly implemented [105]. IBVS in its most basic form is im-

plemented using the proportional-integral-derivative controller (PID) [106] which is simple

to implement and responsive but causes increased error and overshoot due to UAV rota-

tions [107]. The IBVS-Sphere (IBVS-Sph) effectively combats this problem by mapping the

image plane to a virtual spherical image plane around the camera; however, this and similar

spherical mapping techniques become unstable as the target moves beneath the UAV [107].

Another solution is to map a virtual image plane parallel to the ground directly above the

detected targets. This reduces error due to the UAV’s pose and discrepancies caused by

spherical mapping. A few examples of this approach, such as the desired robust integral of

the sign of error (DCRISE), are [108], [109]. The controller algorithm described in this paper

takes advantage of this mapping of a virtual image plane and will be discussed in more detail

in Section 11.5.

This chapter is based on the work in [110], which uses the original Recursive-RANSAC

(R-RANSAC) algorithm presented in [63] to track targets. In Chapter 12 we show how to

use G-MTT instead of R-RANSAC in the end-to-end MTT framework to track targets on

SE (2).

The rest of this chapter is organized as follows. In Section 11.1 we present the tracking

and following architecture followed by the visual front end, R-RANSAC, and the controller in

Sections 11.2, 11.3 and 11.5. Finally we discuss our results and conclusions in Sections 11.6

and 11.7.

11.1 Architecture

In this chapter we assume that a monocular camera is rigidly mounted on a multirotor

equipped with IMU, on-board computer, autopilot, and an altitude sensor. The camera sends

images into the visual front-end algorithm. The visual front-end algorithm is responsible for

extracting point measurements of targets from the images and computing the homography

and essential matrices, as shown in Fig. 11.2.

The visual front-end produces point measurements that are processed by the tracking

algorithm, labeled R-RANSAC in Fig. 11.2, which produces tracks of targets. Target tracking

145

is done in the image frame on the image plane which requires transforming measurements

and tracks expressed in the previous image plane to the current image plane at the UAV

moves. This R-RANSAC configuration is sensor mode one as discussed in Chapter 2.

Since we assume that the target is moving on fairly planar ground, the scene ob-

served by the camera is planar. This allows us to use the homography to transform the

measurements and tracks to the current image frame as shown in Fig. 11.2.

The multiple target tracking algorithm R-RANSAC sends the tracks to the track

selector which determines which track to follow. The selected track is then passed to the

controller which sends commands to the flight control unit (FCU) that enables target fol-

lowing, as shown in Fig. 11.2.

Multirotor

Controller

Track Selector

Acceleration and Yaw

Rate

Commands

Tracks to follow

Moving/non-moving Segmentation

Estimate Essential Matrix

Estimate Homography

KLT Feature Tracker

•••

Other Measurement

Extractors

Measurement Extractors

Feature Pair

Homography

Measurements

Visual Front End

Visual MTT

Transform measurements and

tracks to current

camera frame

Initialize Tracks

Manage Tracks

R-RANSAC

Images

Tracks

Figure 11.2: Target tracking and following architecture.

11.2 Visual Front End

This section describes the visual front-end shown in Fig. 11.2. Images from the

camera along with the camera’s intrinsic parameters are given to the visual front-end, which

is responsible for producing measurements and calculating the homography and essential

146

matrix. The visual front end uses a variety of algorithms to extract features. Some of

the algorithms we have used include image differencing, color segmentation, YOLO, and

apparent feature motion methods to extract measurements.

The image difference method finds the difference between two images, looks for dis-

parities of certain shapes and sizes (blobs) which are caused by moving objects and takes the

center of the blobs as measurements. When the camera is moving, we use the homography

to transform one image into the frame of the other image to take the difference between the

images. The color segmentation method looks for blobs with specific color, size, and shape

profile to extract point measurements. This method assumes that the target of interest is a

unique color and is in general not useful except in simple controlled environments.

The method that we use to extract measurements is called apparent feature motion

and is described in the remainder of this section and is shown graphically in Fig. 11.2. In

particular, the KLT feature tracker extracts matching points between consecutive images.

Using the matching points, the homography and essential matrix are computed. The match-

ing points that are outliers to the homography and essential matrix are considered moving

and given to the tracking algorithm. To help describe the apparent motion algorithm we

will consider the scenario of tracking a car using a monocular camera fixed to a UAV as

illustrated in Fig.11.3.

Figure 11.3: A depiction of tracking a car using a monocular camera fixed to a UAV.

147

11.2.1 KLT Feature Tracker

To compute the homography matrix, the essential matrix and to calculate apparent

feature motion, good features need to be tracked between consecutive frames. A common

and popular method is to use Good Features to Track [111] to select good features from the

previous image and find their corresponding features in the current image using the Kanade-

Lucas-Tomasi (KLT) feature tracker [112], [113]. The combination of the two algorithms

yields matching features. These algorithms can be implemented using the OpenCV functions

goodFeaturesToTrack() and calcOpticFlowPyrLK() [114]. An illustration of how these

algorithms work to find matching features is shown in Fig. 11.4.

11.2.2 Estimate Homography

The homography describes the transformation between image frames and maps static

features from one image to static features in the other image provided that the image scene

is planar. The matching features obtained from the KLT Tracker are used to estimate the

homography. The relevant geometry of the Euclidian homography is shown in Fig. 11.5.

Suppose that pf is a feature point that lies on a plane defined by the normal (unit)

vector n. Let paf/a and pbf/b be the position of pf relative to frames a and b, expressed in

those frames respectively. Then as shown in Fig. 11.5 we have

pbf/b = Rbap

af/a + pba/b,

where Rba denotes the rotation between frames and pba/b denotes the translation between

frames.

Let da be the distance from the origin of frame a to the planar scene, and observe

that

da = n⊤paf/a =⇒n⊤paf/ada

= 1.

Therefore, we get that

pbf/b =

(

Rba +

pba/bda

n⊤

)

paf/a. (11.1)

148

(a) The algorithm good features to track findsfeatures to track. These features are representedas red dots.

(b) A depiction of the car moving and the cam-era rotating. Note how the scene is rotatedand translated as compared to the scene inFig. 11.4a.

(c) The KLT feature tracker finds matching points between the previous image frame and thecurrent image frame. The matching points in the current image frame are represented as greendots.

Figure 11.4: A depiction showing how the food features to track and the KLT feature tracker workto find matching points.

Let paf/a = (pxa, pya, pza)⊤, pbf/b = (pxb, pyb, pzb)

⊤ and ϵaf/a = (pxa/pza, pya/pza, 1)

⊤ represent

the normalized homogeneous coordinates of paf/a projected onto image plane a, and similarly

for ϵbf/b. Then Equation (11.1) can be written as

pzbpza

ϵbf/b =

(

Rba +

pba/bda

n⊤

)

ϵaf/a. (11.2)

149

εaf

<latexit sha1_base64="eNnw7dtuoVVchBbAXZzPDKO5bMg=">AAACdnicbVDLahRBFK1pH4ntIxN1F9DGIehChu4QUFcJuHGZgJMEptqhuvr2TDH1aKpuJw6VXuY3/AG3+jH+gv5AdrFmJkgeXijqcO653HNPUUvhME1/daI7d+/dX1l9ED989PjJWnf96YEzjeUw4EYae1QwB1JoGKBACUe1BaYKCYfF9OO8f3gM1gmjP+OshlyxsRaV4AwDNeq+pAWznhZGlm6mwucp1E5Io9t2VH1ho24v7aeLSm6D7BL0di66v7/pP/XeaL2zQ0vDGwUauWTODbO0xtwzi4JLaGPaOKgZn7IxDAPUTIHL/eKSNtkMTJlUxoanMVmwVyc8U27uMygVw4m72ZuT/+sNG6ze517oukHQfLmoamSCJpnHkpTCAkc5C4BxK4LXhE+YZRxDePH1NXJsgmKiBH/7D4e7NJxwoxTTpafaWNUOs9xTCRXS015GrRhPAgpCC1elCF/xRJTBt98Suo3jEHl2M+Db4GCrn233P+ynvd3nZFmrZIO8Im9IRt6RXfKJ7JEB4eSMfCc/yM/OefQi2oxeL6VR53LmGblWUfoXDuXHlw==</latexit>

paf/a

<latexit sha1_base64="inliH+e8VEVF9vqxNseOAqms2I0=">AAACa3icbVDLbtNAFJ2YVzGvlJYNsLCIKrFAwa6QCisqsWFZJNJWik10Pb5ORp2HNXPdkg7+F7bwI3wDH8EWtkySCvXBkUY6Ovdc3TOnbKRwlKY/e9G16zdu3lq7Hd+5e+/+g/76w31nWstxxI009rAEh1JoHJEgiYeNRVClxIPy6N1ifnCM1gmjP9K8wULBVItacKAgTfqbuQKalbVvuomvX0L3yUM36Q/SYbpEcpVkZ2SwGz3yP/6c/t6brPfe5pXhrUJNXIJz4yxtqPBgSXCJXZy3DhvgRzDFcaAaFLrCL+N3yVZQqqQ2NjxNyVI9v+FBOTdXZXAuwrrLs4X4v9m4pfp14YVuWkLNV4fqViZkkkUXSSUscpLzQIBbEbImfAYWOIXG4otn5NQEx0wJ/uIfD//SeMKNUqArn2tjVTfOCp9LrCn/MshyK6azwILR4nkr4Wc6EVXI7beF7uI4VJ5dLvgq2d8eZq+Gbz6E7jfZCmvsCXvGnrOM7bBd9p7tsRHj7JR9Zd/Y996vaCN6HD1dWaPe2c4Gu4Bo6y+4VMHW</latexit>

pbf/b

<latexit sha1_base64="OKMzj53bf8R2cPhn4N1akIMq1Xk=">AAACa3icbVDLbtNAFJ0YKMUtbUrKBrqwGlVigVK7QiqsqMSGZZFIWyl2o/H4OhllHtbMdUsy+F/Ywo/wDXwEW9gySSrU15FGOjr3XN0zJ68EtxjHv1rBg4ePVh6vPgnX1p9ubLa3np1YXRsGfaaFNmc5tSC4gj5yFHBWGaAyF3CaTz7M56cXYCzX6jNOK8gkHSleckbRS8P2diopjvPSVc3Qlft5c+7yZtjuxr14geguSa5I9yh47n7+nf05Hm613qeFZrUEhUxQawdJXGHmqEHOBDRhWluoKJvQEQw8VVSCzdwifhPteaWISm38Uxgt1OsbjkprpzL3znlYe3s2F++bDWos32aOq6pGUGx5qKxFhDqadxEV3ABDMfWEMsN91oiNqaEMfWPhzTNipL1jLDl7/Z/7fym4ZFpKqgqXKm1kM0gylwooMf3aTVLDR2PPvNHAdSvCF7zkhc/tDrhqwtBXntwu+C45Oeglb3rvPvnut8kSq+Ql2SWvSEIOyRH5SI5JnzAyI9/Id/Kj9TvoBC+CnaU1aF3tdMgNBHv/ALw+wdg=</latexit>

pf

<latexit sha1_base64="/gAZxVa7/adfF7ExLMwLSfZ0Rfc=">AAACYXicbVDLSuRAFK3O+IyOto+dm2AzMIuhSURQVwrOQlcq2Cp0QlOp3HQXXY9QdTNOE/Mbsx1/a9bjP7i1ulvE14GCw7nnck+dtBDcYhj+a3hfZmbn5hcW/aXlryurzbX1K6tLw6DDtNDmJqUWBFfQQY4CbgoDVKYCrtPh8Xh+/QuM5Vpd4qiARNK+4jlnFJ0Ux5LiIM2rou7lvWYrbIcTBB9J9Exah4+nD2f3Fz/Pe2uNwzjTrJSgkAlqbTcKC0wqapAzAbUflxYKyoa0D11HFZVgk2oSug6+OSULcm3cUxhM1NcbFZXWjmTqnOOQ9v1sLH4265aY7ycVV0WJoNj0UF6KAHUwbiDIuAGGYuQIZYa7rAEbUEMZup78t2dEXzvHQHL244W7fym4ZVpKqrIqVtrIuhslVSwgx/iuFcWG9weOOaOB11aE33jLM5e72uGq9n1XefS+4I/kaqcd7bYPLsLW0SaZYoFskW3ynURkjxyRE3JOOoSRgvwhf8l947+36DW99anVazzvbJA38LaeAF2bvf0=</latexit>

εbf

<latexit sha1_base64="m5UUEeBy+Zw0dtI5tREPDdifMmQ=">AAACdnicbVDLahRBFK1pH4ntIxN1F9DGIehChu4QUFcJuHGZgJMEptqhuvr2TDH1aKpuJw6VXuY3/AG3+jH+gv5AdrFmJkgeXijqcO653HNPUUvhME1/daI7d+/dX1l9ED989PjJWnf96YEzjeUw4EYae1QwB1JoGKBACUe1BaYKCYfF9OO8f3gM1gmjP+OshlyxsRaV4AwDNeq+pAWznhZGlm6mwucp1E5Io9t2VH0pRt1e2k8XldwG2SXo7Vx0f3/Tf+q90Xpnh5aGNwo0csmcG2ZpjblnFgWX0Ma0cVAzPmVjGAaomQKX+8UlbbIZmDKpjA1PY7Jgr054ptzcZ1AqhhN3szcn/9cbNli9z73QdYOg+XJR1cgETTKPJSmFBY5yFgDjVgSvCZ8wyziG8OLra+TYBMVECf72Hw53aTjhRimmS0+1saodZrmnEiqkp72MWjGeBBSEFq5KEb7iiSiDb78ldBvHIfLsZsC3wcFWP9vuf9hPe7vPybJWyQZ5Rd6QjLwju+QT2SMDwskZ+U5+kJ+d8+hFtBm9XkqjzuXMM3KtovQvENfHmA==</latexit>

Rb

a

<latexit sha1_base64="3GNifb2xHIVe+8w90pBh3dZD9bU=">AAACWnicbZDbSuRAEIZ74mE1ng93exMdFrzQIZEFFQQFb7zUxVFhkh06ncpMYx9Cd0Ud4jyDt7uPpvgYPoA9MyKeCgo+/vqLOqSF4BbD8KHmjY1PTP6YmvZnZufmFxaXls+tLg2DJtNCm8uUWhBcQRM5CrgsDFCZCrhIr44G9YtrMJZrdYa9AhJJO4rnnFF0UvNPm/5N24v1sBEOI/gK0SvUD5630t3H/bWT9lLtIM40KyUoZIJa24rCApOKGuRMQN+PSwsFZVe0Ay2HikqwSTXcth/8ckoW5Nq4VBgM1fcdFZXW9mTqnJJi136uDcTvaq0S892k4qooERQbDcpLEaAOBqcHGTfAUPQcUGa42zVgXWooQ/cg/+MY0dHO0ZWcbb6xu0vBDdNSUpVVsdJG9ltRUsUCcozv6lFseKfryBkNvLci3OINz9ze1TZXfd93L48+P/grnG83ot+NvdOwfrhKRjFFfpJ1skEiskMOyTE5IU3CCCf35B/5X3vyPG/amxlZvdprzwr5EN7qC9cTudg=</latexit>

Fa

<latexit sha1_base64="WLOWj+wHQy8NBW5qZvYOCGBsBUo=">AAACYnicbVBNaxRBEO0dNcaJmqzxpofBJeBBlpkgqKcEApJjBDcJ7AxLTU/NbpP+GLtrjMs4v8NrzN/IX8gfCF49+DPs/VBiYkHDq1evqNcvr6RwFMdXneDO3Xsr91cfhGsPHz1e3+g+OXSmthwH3Ehjj3NwKIXGAQmSeFxZBJVLPMpP9mbzo89onTD6I00rzBSMtSgFB/JUliqgCQfZvG9HMNroxf14XtFtkCxBb+fX5fkPuFg5GHU7O2lheK1QE5fg3DCJK8oasCS4xDZMa4cV8BMY49BDDQpd1sxdt9GWZ4qoNNY/TdGcvb7RgHJuqnKvnLl0N2cz8n+zYU3l26wRuqoJNV8cKmsZkYlmEUSFsMhJTj0AboX3GvEJWODkgwr/PSPHxismSvBXf7H/l8ZTbpQCXTSpNla1wyRrUoklpV97SWrFeOKRF1q8LiX8Qqei8L6bbaHbMPSRJzcDvg0Ot/vJ6/67D3Fv9ylb1Cp7xl6wlyxhb9gu22cHbMA4+8S+sTP2vfMzCINusLmQBp3lzrL/U8Hz3yZWvus=</latexit>

Fb

<latexit sha1_base64="EJiKLG6Z+k/WTqScJR2seJg1qX0=">AAACYnicbVBNaxRBEO0dNcaJmqzxpofBJeBBlpkgqKcEApJjBDcJ7AxLTU/NbpP+GLtrjMs4v8NrzN/IX8gfCF49+DPs/VBiYkHDq1evqNcvr6RwFMdXneDO3Xsr91cfhGsPHz1e3+g+OXSmthwH3Ehjj3NwKIXGAQmSeFxZBJVLPMpP9mbzo89onTD6I00rzBSMtSgFB/JUliqgCQfZvG9H+WijF/fjeUW3QbIEvZ1fl+c/4GLlYNTt7KSF4bVCTVyCc8MkrihrwJLgEtswrR1WwE9gjEMPNSh0WTN33UZbnimi0lj/NEVz9vpGA8q5qcq9cubS3ZzNyP/NhjWVb7NG6Kom1HxxqKxlRCaaRRAVwiInOfUAuBXea8QnYIGTDyr894wcG6+YKMFf/cX+XxpPuVEKdNGk2ljVDpOsSSWWlH7tJakV44lHXmjxupTwC52KwvtutoVuw9BHntwM+DY43O4nr/vvPsS93adsUavsGXvBXrKEvWG7bJ8dsAHj7BP7xs7Y987PIAy6weZCGnSWO8v+TwXPfwMoSL7s</latexit>

pba/b

<latexit sha1_base64="8DP+b5D6jHjIOVXQ/QzdepwVYMA=">AAACa3icbVDLbtNAFJ0YKMUtbUrKBrqwGlVigVK7QiqsqMSGZZFIWyl2o/H4OhllHtbMdUsy+F/Ywo/wDXwEW9gySSrU15FGOjr3XN0zJ68EtxjHv1rBg4ePVh6vPgnX1p9ubLa3np1YXRsGfaaFNmc5tSC4gj5yFHBWGaAyF3CaTz7M56cXYCzX6jNOK8gkHSleckbRS8P2diopjvPSVc3Q0f28OXd5M2x34168QHSXJFekexQ8dz//zv4cD7da79NCs1qCQiaotYMkrjBz1CBnApowrS1UlE3oCAaeKirBZm4Rv4n2vFJEpTb+KYwW6vUNR6W1U5l75zysvT2bi/fNBjWWbzPHVVUjKLY8VNYiQh3Nu4gKboChmHpCmeE+a8TG1FCGvrHw5hkx0t4xlpy9/s/9vxRcMi0lVYVLlTayGSSZSwWUmH7tJqnho7Fn3mjguhXhC17ywud2B1w1YegrT24XfJecHPSSN713n3z322SJVfKS7JJXJCGH5Ih8JMekTxiZkW/kO/nR+h10ghfBztIatK52OuQGgr1/smHB0w==</latexit>

da

<latexit sha1_base64="0DgAbAJKSXs2rxAXtp3tlp1e7P4=">AAACdnicbZHLahRBFIZrWo2xvSRRd4I0DkEXMnQHQV0ZcOMyopMEppvhdPXpniJ1aapOG4eyH8Gt7n0NX8EXEF/AR3BpzUyQZOKBgo///FXnUmUrhaM0/TmIrly9tnF980Z889btO1vbO3cPneksxzE30tjjEhxKoXFMgiQetxZBlRKPypPXi/zRB7ROGP2e5i0WChotasGBgvSumsJ0e5iO0mUklyE7g+Gr3z++/YLvGwfTnUGRV4Z3CjVxCc5NsrSlwoMlwSX2cd45bIGfQIOTgBoUusIve+2T3aBUSW1sOJqSpXr+hgfl3FyVwamAZm49txD/l5t0VL8ovNBtR6j5qlDdyYRMshg8qYRFTnIeALgVodeEz8ACp7Ce+GIZ2ZjgmCnBn/7jMJfGU26UAl35XBur+klW+FxiTfmnYZZb0cwCBaPF81bCj3QqqtC33xN67Z22rHufL8Ypa9/2fRyHD8nW138ZDvdG2bPRy7fpcP8+W8Ume8AesScsY8/ZPnvDDtiYcdawz+wL+zr4Ez2MdqPHK2s0OLtzj12IKP0L5ATHdA==</latexit>

db

<latexit sha1_base64="wk2mdBK/TpS0ZLJPx8Jqm2FnaWw=">AAACdnicbZHLShxBFIZrOtFoJ/GSuAuExkHMQoZuCZisImTjUklGhelmqK4+3VNYl6bqtGao9CNkm+zzGnkFX0B8AR8hy9TMSNDRAwUf//mrzqXyWnCLcXzVCZ48XVh8trQcPn/xcmV1bf3VsdWNYdBnWmhzmlMLgivoI0cBp7UBKnMBJ/nZ50n+5ByM5Vp9xXENmaSV4iVnFL30pRjmw7Vu3IunET2E5Ba6n24uf1/TP4uHw/VOlhaaNRIUMkGtHSRxjZmjBjkT0IZpY6Gm7IxWMPCoqASbuWmvbbTllSIqtfFHYTRV795wVFo7lrl3SoojO5+biI/lBg2WHzLHVd0gKDYrVDYiQh1NBo8KboChGHugzHDfa8RG1FCGfj3h/TKi0t4xkpzt/Gc/l4ILpqWkqnCp0ka2gyRzqYAS0+/dJDW8GnnyRgN3rQjf8IIXvm+3y9XcO3Veti6djJOXrm7bMPQfksyv/yEc7/aS972PR3F3f4PMYom8IZvkHUnIHtknB+SQ9AkjFflBfpJfnb/B22Ar2J5Zg87tndfkXgTxP+YUx3U=</latexit>

n

<latexit sha1_base64="Im4Jcf9M1Mt4JKe6z9b5aTWfghQ=">AAACfXicbVFdS9xAFJ2N2mqqrVrffAldBB9kSaRgfVLwxUeFroqbIJPJze7gfISZG3UZ8y98UWj/V3+Nnd1V0dUDA4dzz537lVeCW4zjf61gZnbu0+f5hfDL4tLXb8srqydW14ZBl2mhzVlOLQiuoIscBZxVBqjMBZzmlwej+OkVGMu1+o3DCjJJ+4qXnFH00nkqKQ7y0qnmYrkdd+IxovckeSLtvfsxHo4uVlpZWmhWS1DIBLW2l8QVZo4a5ExAE6a1hYqyS9qHnqeKSrCZG7fcRBteKaJSG/8URmP1dYaj0tqhzL1z1KKdjo3Ej2K9GstfmeOqqhEUmxQqaxGhjkbzRwU3wFAMPaHMcN9rxAbUUIZ+S+HbMqKvvWMgOdt64X4uBddMS0lV4VKljWx6SeZSASWmt+0kNbw/8MwbDby2ItzgNS98326bq6l/qrxs3PM1qqYJQ3+QZHr978nJdif52dk9jtv7a2SCebJOfpBNkpAdsk8OyRHpEkYUuSN/yN/WY7ARbAWdiTVoPeV8J28Q7PwHB5zKpw==</latexit>

planar scene

Figure 11.5: The geometry for the derivation of the homography matrix between two camera poses.

Defining the scalar γf = pzb/pza we get

γfϵbf/b = Hb

aϵaf/a, (11.3)

where

Hba

=

(

Rba +

pba/bda

n⊤

)

(11.4)

is called the Euclidian homography matrix between frames a and b [115]. Equation (11.3)

demonstrates that the Euclidian homography matrix Hba transforms the normalized homo-

geneous pixel location of p in frame a into a homogeneous pixel location of p in frame b.

The scaling factor γ, which is feature point dependent, is required to put ϵb in normalized

homogeneous coordinates, where the third element is unity.

The homography can be calculated using the openCV function findHomography(),

which combines a 4-point algorithm with a RANSAC [60], [116] process to find the homog-

raphy matrix that best fits the data. The findHomography() function scales the elements of

H so that the (3, 3) element is equal to one. Feature pairs that do not satisfy Equation (11.3)

are labeled as features that are potentially moving in the environment.

150

Applying the homography to the scenario in Fig. 11.4c transforms the measurements

from the previous image (represented as red dots) to the current image that contains the

matching features (represented as green dots) as illustrated in Fig. 11.6. Note that the

features corresponding to moving objects are not mapped to their counterpart feature using

the homography, and that there is a distance between them. This illustrates that we can

detect moving features by finding outliers to the homography.

Figure 11.6: A depiction of applying the homography to the scenario in Fig. 11.4c where thehomography transforms the measurements from the previous image (represented as red dots) tothe current image that contains the matching features (represented as green dots).

11.2.3 Estimate Essential Matrix

The homography works well to segment between moving and non-moving features

provided that the scene is planar; however, that is rarely the case due to trees, lamp posts,

and other objects that stick out of the ground. The objects that do not lie on the same

plane used to describe the homography will be outliers to the homography and appear as

moving features even if they are static. The essential matrix provides a strategy to filter out

the static features using the epipolar constraint.

Fig. 11.7 shows the essence of epipolar geometry. Let pf be the 3D position of a

feature point in the world, and let paf/a be the position vector of the feature point relative

to frame Fa expressed in frame Fa, and similarly for pbf/b.

151

Figure 11.7: Epipolar geometry.

The relationship between paf/a and pbf/b is given by

pbf/b = Rbap

af/a + pba/b. (11.5)

Multiplying both sides of Equation (11.5) on the left by[

pba/b

]

3×gives

[pba/b

]

3×pbf/b =

[pba/b

]

3×Rbap

af/a,

where [·]3× is defined in equation (C.12). Since[

pba/b

]

3×pbf/b = pba/b×pbf/b must be orthogonal

to pbf/b we have that

pb⊤f/b[pba/b

]

3×Rbap

af/a = 0. (11.6)

Dividing Equation (11.6) by the norm of pba/b, and defining

tba/b=

pba/b∥∥∥pba/b

∥∥∥

gives

pb⊤f/b[tba/b]

3×Rbap

af/a = 0. (11.7)

The matrix

Eba =

[tba/b]

3×Rba (11.8)

152

is called the essential matrix and is completely defined by the relative pose (Rba,p

ba/b). Di-

viding Equation (11.7) by the distances to the feature in each frame gives

ϵb⊤f/b E

ba ϵ

af/a = 0, (11.9)

where ϵaf/a and ϵ

bf/b are the normalized homogeneous image coordinates of the feature in

frame a (respectively frame b). This equation is the epipolar constraint and serves as a

constraint between static point correspondences. The epipoles ea and eb shown in Fig. 11.7

are the intersection of the line connecting Fa and Fb with each image plane. The epipolar

lines ℓa and ℓb are the intersection of the plane (Fa,Fb, Pf ) with the image planes. The

epipolar constraint with the pixel ϵaf/a is satisfied by any pixel on the epipolar line ℓb. In

other words, if Pf is a static feature or its motion is along the epipolar line then its point

correspondence ϵaf/a and ϵ

bf/b will satisfy the epipolar constraint [117].

The essential matrix can be calculated using the openCV function findEssentialMat

which uses the five point Nister algorithm [118] coupled with a RANSAC process.

11.2.4 Moving / Non-moving Segmentation

This section describes the “Moving / non-moving Segmentation” block shown in

Fig. 11.2. The purpose of this block is to segment the tracked feature pairs into those

that are stationary in the environment, and those that are moving relative to the environ-

ment. As shown in Fig. 11.2, the inputs to the “Moving / non-moving Segmentation” block

at time k are the homography Hkk−1, the essential matrix Ek

k−1, and the set of matching

feature points Mk = (ϵki , ϵk−1i ) between image Ik−1 and image Ik.

When the camera is mounted on a moving UAV observing a scene where most of

the objects in the scene are not moving, the homography computed from planar matching

features will correspond to the motion of the UAV. As previously stated, moving objects or

static objects not coplanar with the features used to compute the homography will appear

to have motion when their corresponding features from the previous image are mapped to

the current image. Therefore, given the set of matching feature points Mk, we can segment

153

Mk into two disjoint sets Mink for inliers and Mout

k for outliers where, for some small η1 > 0

Mink =

(ϵki , ϵ

k−1i ) ∈ Mk |

∥∥γiϵ

ki −Hk

k−1ϵk−1i

∥∥ ≤ η1

Moutk =

(ϵki , ϵ

k−1i ) ∈ Mk |

∥∥γiϵ

ki −Hk

k−1ϵk−1i

∥∥ > η1

.

Therefore Mink are all matching feature pairs that are explained by the homography

Hkk−1, and therefore correspond to ego-motion of the UAV, and Mout

k are all matching feature

pairs that are not explained by the homographyHkk−1, and therefore potentially correspond to

moving objects in the environment. Fig. 11.8 illustrates the application of this homography

segmentation scheme, where feature outliers Moutk have been retained.

Feature is moving relative to

scene

Figure 11.8: Motion detection using the homography matrix. Matching features are shown in redand blue. The set Min

k is shown in blue, and the set Moutk is shown in red.

The homography matrix provides good moving/non-moving segmentation if either

the motion of the UAV is purely rotational, or if the surrounding environment is planar.

A planar environment may be an adequate assumption for a high flying fixed-wing vehicle

moving over mostly-flat terrain. However, it is not a good assumption for multirotor UAV

moving in complex 3D environments, where non-planar, stationary features will appear to

be moving due to parallax. In that case, the potentially moving features Moutk need to be

further processed to discard features from the 3D scene that are not moving. Our approach

uses the epipolar constraint given in Equation (11.9) that is satisfied by stationary 3D points.

154

Therefore, potentially moving 3D points are given by

Mmovingk =

(ϵki , ϵ

k−1i ) ∈ Mout

k |∣∣ϵk⊤i Ek

k−1ϵk−1i

∣∣ > η2

for some small η2 > 0.

Fig. 11.9 illustrates the moving/non-moving segmentation scheme using video from

a multirotor flying in close proximity to 3D terrain. The blue feature points correspond to

features on 3D objects, which due to parallax are not discarded by the homography threshold

and are therefore elements of Moutk . However, these points satisfy the epipolar constraint

and therefore are not flagged as moving features. The red dots in Fig. 11.9 correspond

to Mmovingk and are actually moving in the scene. One drawback to this approach is that

features that are moving along the epipolar lines (i.e. moving in the same direction as the

camera) will be filtered out. However, this can be mitigated by controlling the camera so

that its motion is not aligned with the target’s motion.

Figure 11.9: Motion detection using the essential matrix. Matching pairs in Moutk are shown in

blue and red, where the red features are in Mmovingk .

11.3 R-RANSAC Multiple Target Tracker

As stated at the beginning of this chapter, this chapter is based off the paper [110]

that uses the original R-RANSAC algorithm, which we briefly presented in Subsection 1.2.9.

Since many of the components of R-RANSAC have already been discussed, we keep this

section short by only discussing key aspects of the three main phases of R-RANSAC: data

155

management, track initialization, and track management. During our discussion, we will

emphasize some of the difference between R-RNASAC and G-MTT.

Our implementation of R-RANSAC tracks objects in the current camera frame on

the current image plane using a linear time-invariant (LTI), constant-velocity model. The

track’s state is composed of the pixel location and pixel velocity of the target’s position and

velocity projected onto the image plane. Note that this is not a geometric representation of

the target since there is no notion of the target’s heading.

11.3.1 Data Management

R-RANSAC uses the probabilistic data association filter (PDAF) to associated mea-

surements to tracks, propagate tracks, and update tracks using the associated measurements.

The track likelihood is the number of track-associated measurements during the sliding time

window TW divided by the duration of the time window.

Since tracking is done on the image plane in the camera frame, when new measure-

ments are received, the previous measurements and tracks need to be transformed to the

current camera frame. The rest of this subsection describes the “Transform measurements

and tracks to current camera frame” block shown in Fig 11.2.

In the previous section we have shown how uncalibrated pixels are transformed be-

tween frames by the homography matrix as

γfϵbf/b = Hb

aϵaf/b.

We decompose the homography matrix into block elements as

Hba

=

H1 h2

h⊤3 h4

,

and the homogeneous image coordinates as ϵ=(

ϵ⊤ 1

)⊤

, where H1 ∈ R2×2, h2 ∈ R2×1,

h⊤3 ∈ R1×2 and h4 ∈ R.

156

Given the relationship

γf

ϵbf/b

1

=

H1 h2

h⊤3 h4

ϵaf/a

1

,

⇐⇒

γf ϵ

bf/b

γf

=

H1ϵ

a + h2

h⊤3 ϵ

af/a + h4

,

which implies that

ϵbf/b =

H1ϵaf/a + h2

h⊤3 ϵ

af/a + h4

and γf = h⊤3 ϵ

af/a + h4.

Defining the function

g(ϵ, H)=H1ϵ+ h2

h⊤3 ϵ+ h4

, (11.10)

we have that 2D-pixels are transformed between frames as ϵbf/b = g(ϵaf/a, Hba). Therefore, the

2D pixel velocity is transformed as

˙ϵbf/b =∂g

∂ϵ

∣∣∣ϵ=ϵ

af/a

˙ϵaf/a = G(ϵaf/a, Hba)˙ϵaf/a, (11.11)

where

G(ϵ, H) =(h⊤

3 ϵ+ h4)H1 − (H1ϵ+ h2)h⊤3

(h⊤3 ϵ+ h4)2

. (11.12)

The next lemma shows how position and velocity covariances are transformed between

images.

Theorem 11.3.1. Suppose that Hba is the homography matrix between frames a and b and

that ϵaf/a and ˙ϵaf/a are random vectors representing pixel location and velocity of feature f

in frame a with mean µaf/a, and

˙µaf/a, respectively, and covariances Σa

p and Σav respectively.

Suppose that ϵaf/a is transformed according to ϵbf/a = g(ϵaf/a, H

ba) where g is defined in Equa-

tion (11.10), then the mean and covariance of ϵbf/b and˙ϵbf/b are given by

µb = g(µa, Hb

a)

157

˙µb = G(µa, Hba)˙µa

Σbp = G(µa, Hb

a)ΣapG

⊤(µa, Hba)

Σbv = G(µa, Hb

a)ΣavG

⊤(µa, Hba)

where G is defined in Equation (11.12).

11.3.2 Track Initialization

Given that the measurements and tracks are expressed in and with respect to the

same coordinate frame, we use the new measurements that do not belong to any existing

track to initialize new tracks. All of the non-target associated measurements are stored in a

single batch of data. This is different than G-MTT, which organizes non-target associated

measurements into clusters.

To help illustrate the track initialization scheme, suppose that there is a single tar-

get in the camera’s field-of-view and a batch of measurements as depicted in Fig. 11.10a.

Some of the camera’s measurements correspond to the target while others are spurious false

measurements. Since there are multiple targets and false measurements, we need a way to

associate measurements to their respective targets or noise. We do this using the standard

RANSAC algorithm.

RANSAC takes a minimum subset of measurements such that the target’s state is

observable according to the system model. For our specific application, we assume an LTI

constant-velocity model, and that the target’s pixel location is measured by the camera using

the apparent motion algorithm discussed in Section 11.2. Therefore, the minimum subset

needs at least two measurements but can include more. We add the conditional constraint

that at least one of the measurements in the minimum subset is from the latest time step.

An illustration of the minimum subset is shown in Fig. 11.10b, where the measurements in

the minimum subset are circled in red.

Using the minimum subset of measurements and the least squares regression algo-

rithm discussed in [63], we calculate a hypothetical state estimate of the target’s state and

reconstruct its trajectory hypothesis using the system model. The trajectory hypothesis is

158

current measurement

zk−1

<latexit sha1_base64="uEreEZYNrmQ67MFn66BxHQ7aUSE=">AAADJnicbZLNbtNAEMe35qsNXykcuVhESBwgsisk4ICoxIVjkUhTKbai9Xocr7IfZnedEhYfuHPhCi/A03BDCHHhCXgGxkmFUoeRLP01v//Mzqw3qwS3Lop+7gQXLl66fGV3r3f12vUbN/v7t46trg2DEdNCm5OMWhBcwchxJ+CkMkBlJmCczV+0fLwAY7lWr92yglTSmeIFZ9Rhavxu6ucP42baH0TDaBXhtojPxOD5nw+alL8+Hk33g70k16yWoBwT1NpJHFUu9dQ4zgQ0vaS2UFE2pzOYoFRUgk39at4mvIeZPCy0wU+5cJXdrPBUWruUGToldaXtsjb5PzapXfEk9VxVtQPF1gcVtQidDtvlw5wbYE4sUVBmOM4aspIayhxeUe/8MWKm0VFKzh7807iXglOmpaQq94nSRjaTOPWJgMIl7wdxYvisRIVGA5tWB2/dKc9xbn/AVadPlRWNT9p1ssJXTYfmUMAb5NbhaAYENjOcqhles3/WNZebrcoulZtUduliky62ausVzrTI23+jRZtCUw8fT9x9Ktvi+GAYPxo+fRUNDvtkHbvkDrlL7pOYPCaH5CU5IiPCyJx8Ip/Jl+Br8C34HvxYW4Ods5rb5FwEv/8CpMIRFA==</latexit>

zk−1


zk

<latexit sha1_base64="2nYvYZG4u19xVirHWxfkGjWosfA=">AAADJHicbZLNitRAEMd7o6u749esHr0EB8GDDMmyoB7EBS8ePKzg7C5MhqHTqUya6Y/YXZl1bPMMgid9AZ/Gmyh4EcEXsTOzSDZjQeBP/f5VXdXptBTcYhT93AouXd6+cnVnt3ft+o2bt/p7t4+trgyDEdNCm9OUWhBcwQg5CjgtDVCZCjhJ588bfrIAY7lWr3FZwkTSmeI5ZxR9avRu6ub1tD+IhtEqwk0Rn4vBsz8/fr/c/tg/mu4Fu0mmWSVBIRPU2nEclThx1CBnAupeUlkoKZvTGYy9VFSCnbjVtHV432eyMNfGfwrDVbZd4ai0dilT75QUC9tlTfJ/bFxh/njiuCorBMXWB+WVCFGHzephxg0wFEsvKDPczxqyghrK0F9Q7+IxYqa9o5CcPfyn/V4KzpiWkqrMJUobWY/jiUsE5Ji8H8SJ4bPCK2800LYivMUznvm53T5XnT5lmtcuadZJc1fWHZpBDm88t+hHMyB8M8Opmvlrdk+75qLdquhS2aaySxdtutiorVY41SJr/o0WTcqbev7xxN2nsimO94fxwfDJq2hw2Cfr2CF3yT3ygMTkETkkL8gRGRFGOPlAPpHPwZfga/At+L62BlvnNXfIhQh+/QUCKxCB</latexit>

zk−2

<latexit sha1_base64="k7i0QAzZ1TWYhyAHuPxYYdreOtA=">AAADJnicbZLNbtNAEMe3LtA2fKXlyMUiQuIAkR0hUQ6olbhwLBJpKsVWtF6P7VX2w+yuU8LWB+5cuMIL8DTcEEJceAKegXVSIddhJEt/ze8/szPrTUpGtQmCn1ve9rXrN3Z293o3b92+c7e/f3CqZaUIjIlkUp0lWAOjAsaGGgZnpQLMEwaTZP6y4ZMFKE2leGOWJcQc54JmlGDjUpP3Mzt/Mqpn/UEwDFbhb4rwUgyO/nyQqPj18WS27+1FqSQVB2EIw1pPw6A0scXKUMKg7kWVhhKTOc5h6qTAHHRsV/PW/kOXSf1MKvcJ46+y7QqLudZLnjgnx6bQXdYk/8emlckOY0tFWRkQZH1QVjHfSL9Z3k+pAmLY0glMFHWz+qTAChPjrqh39RiWS+coOCWP/2m3l4BzIjnHIrWRkIrX0zC2EYPMRBeDMFI0L5xyRgVtq4F35pymbm47oqLTp0yy2kbNOklmy7pDU8jgrePauNEUMNdMUSxyd832RddctFsVXcrblHfpok0XG7XVCieSpc2/kaxJOVPPPZ6w+1Q2xeloGD4dPn8dDI77aB276D56gB6hED1Dx+gVOkFjRNAcfUKf0Rfvq/fN++79WFu9rcuae+hKeL//Aqd/ERU=</latexit>

zk−3

<latexit sha1_base64="wLnqx9IeykfLvMWUjn31HeGeXss=">AAADJnicbZLNbtNAEMe3LtA2fKVw5GIRIXGAyG6RKAdEJS4cW4k0lWIrWq/H9ir7YXbXKWHrA3cuXOEFeBpuqEJceAKegXVSIddhJEt/ze8/szPrTUpGtQmCnxve5rXrN7a2d3o3b92+c7e/e+9Ey0oRGBHJpDpNsAZGBYwMNQxOSwWYJwzGyex1w8dzUJpK8dYsSog5zgXNKMHGpcYfpnb2dL+e9gfBMFiGvy7CSzF49eejRMWvT0fTXW8nSiWpOAhDGNZ6EgaliS1WhhIGdS+qNJSYzHAOEycF5qBju5y39h+5TOpnUrlPGH+ZbVdYzLVe8MQ5OTaF7rIm+T82qUx2EFsqysqAIKuDsor5RvrN8n5KFRDDFk5goqib1ScFVpgYd0W9q8ewXDpHwSl58k+7vQScEck5FqmNhFS8noSxjRhkJjofhJGieeGUMypoWw28N2c0dXPbPSo6fcokq23UrJNktqw7NIUM3jmujRtNAXPNFMUid9dsX3bNRbtV0aW8TXmXztt0vlZbLXEiWdr8G8malDP13OMJu09lXZzsDcNnwxfHweCwj1axjR6gh+gxCtFzdIjeoCM0QgTN0Gf0BX31vnnfvR/excrqbVzW3EdXwvv9F6o8ERY=</latexit>

zk−3


zk−3


zk−4

<latexit sha1_base64="K/ZLv+sDWyFvo4Ud1Jy2UxGeQSE=">AAADJnicbZLNbtNAEMe3LtA2fDQtRy4WERIHiOyqEuWAWokLxyKRplJsRev12F5lP8zuOiVsfeDOhSu8AE/DDSHEhSfgGVgnFXIdRrL01/z+Mzuz3qRkVJsg+Lnhbd64eWtre6d3+87de7v9vf0zLStFYEQkk+o8wRoYFTAy1DA4LxVgnjAYJ7OXDR/PQWkqxRuzKCHmOBc0owQblxq/n9rZ08N62h8Ew2AZ/roIr8Tg+M8HiYpfH0+ne95OlEpScRCGMKz1JAxKE1usDCUM6l5UaSgxmeEcJk4KzEHHdjlv7T9ymdTPpHKfMP4y266wmGu94IlzcmwK3WVN8n9sUpnsKLZUlJUBQVYHZRXzjfSb5f2UKiCGLZzARFE3q08KrDAx7op6149huXSOglPy5J92ewm4IJJzLFIbCal4PQljGzHITHQ5CCNF88IpZ1TQthp4Zy5o6ua2B1R0+pRJVtuoWSfJbFl3aAoZvHVcGzeaAuaaKYpF7q7Zvuiai3arokt5m/IunbfpfK22WuJEsrT5N5I1KWfquccTdp/Kujg7GIaHw+evg8FJH61iGz1AD9FjFKJn6AS9QqdohAiaoU/oM/riffW+ed+9Hyurt3FVcx9dC+/3X6z5ERc=</latexit>

zk−4


(a) A batch of measurements with a single target.

current measurement

zk−1


zk−1


zk


zk−2


zk−3


zk−3


zk−3


zk−4


zk−4


(b) A minimum subset of measurements neededto create a track hypothesis.

current measurement

zk−1


zk−1


zk


zk−2


zk−3


zk−3


zk−3


zk−4


zk−4


(c) Black dots indicate measurements, and thecurrent batch of measurements are denoted withz∗. A particular minimum subset is denoted withred circle, including the current measurement zk.A track hypothesis generated from a minimumsubset of measurements, depicted with the redcurve.

Figure 11.10: A depiction of the first few steps of RANSAC

used to identify other measurement inliers (i.e. measurements that are close to the trajec-

tory hypothesis). A measurement is an inlier if it is within a user specified distance of the

trajectory hypothesis. The trajectory hypothesis is then scored using the number of inliers.

An example of a trajectory hypothesis is depicted in Fig. 11.10c by the red line. Note that

the definition of the inlier is a fixed distance that does not depend on the innovation covari-

ance, and that the score is the number of inliers and not the number of inliers from distinct

times. These definitions are both different than the ones used by the G-MTT as stated in

Chapter 8.

This above process is repeated up to a predetermined number of times, and the state

hypothesis with the most number of inliers is identified as shown in Fig. 11.11. The state

hypothesis that has the most number of inliers is then filtered using the PDAF to produce

a new current track.

159

current measurement

zk−1


zk−1


zk


zk−2


zk−3


zk−3


zk−3


zk−4


zk−4


zk−1


zk−1


zk


zk−2


zk−3


zk−3


zk−3


zk−4


zk−4


current measurement

zk−1


zk−1


zk


zk−2


zk−3


zk−3


zk−3


zk−4


zk−4


Figure 11.11: A depiction of generating multiple state hypotheses and their corresponding trajectoryhypotheses. Yellow trajectories represent previous state hypotheses, the red trajectory represent thecurrent track hypothesis being generated, and the green trajectory represents the state hypothesiswith the best score.

Now suppose that another target enters the FOV and new measurements are given to

RANSAC. Since RANSAC runs at every time step, it will begin to form new state hypotheses

for the new target and eventually initialize a new track.

11.3.3 Track Management

R-RANSAC maintains a bank of M tracks. Tracks are pruned to keep the number

of tracks at or below M . When there are more than M tracks, tracks with the lowest

track likelihood are pruned until there are only M tracks. As tracks are propagated and

updated, they may leave the field-of-view of the camera, they may coalesce, or they may

stop receiving measurements. To handle these situations, we remove tracks that have not

received a measurement for a predetermined period of time, and we merge similar tracks.

By merging, we mean that the track with the lowest rating is removed and the other kept.

Confirmed tracks, i.e., tracks that have a track likelihood, are given a unique numerical track

ID. The confirmed tracks passed to the track selection algorithm at every time step.

160

11.4 Track Selection

R-RANSAC passes confirmed tracks to the track selector which chooses a track to

follow. In this section, we list several possible options for target selection.

Target closest to the image center. One option is to follow the track that is closest to

the image center. If visual-MTT returns a set of normalized image coordinates ϵi for the

tracks, then select the track that minimizes ∥ϵi∥.Target Recognition A common automatic method for track selection is target recog-

nition using visual information. This method compares the tracks to a visual profile of the

target of interest. If a track matches the visual profile, then it is followed. A downside of

this method requires the visual profile to be built previously. For visual target recognition

algorithms see [119]–[121].

User input A manual method for track selection is to query a user about which track

should be followed. After the user has been queried, a profile of the target using gathered

data can be made to recognize the track in the future. One example of this is [122] which

uses a DNN to build the visual profile online.

The selected track is communicated to the target following controller.

11.5 Target Following Controller

This section presents one possible target following controller as shown in Fig. 11.2.

The controller consists of three parts: (1) A PID strategy that uses a height-above-ground

sensor to maintain a constant pre-specified height above the ground, (2) a position controller

that follows the selected track, and (3) a heading controller that aligns the UAV’s heading

with the target’s velocity vector. In this section we describe the position and heading con-

trollers in detail.

The provided track from R-RANSAC contains the state estimate of the target in

normalized image coordinates. Image coordinates are not invariant to the roll and pitch of

the UAV; therefore, we design the controller in the normalized virtual image plane.

Let pct/c denote the position of the target relative to the camera expressed in the

camera frame, the track produced by R-RANSAC is in normalized image coordinates and is

161

given by

ϵct/c =

K−1c pct/c

e⊤3K−1c pct/c

where Kc is the camera intrinsic parameters [123]. The target’s velocity is given by ϵct/c.

Note that the third element of ϵct/c is one, and the third element of ϵct/c is zero.

The coordinate axes in the camera frame are defined so that the z-axis points along

the optical axis, the x-axis points to the right when looking at the image from the optical

center in the direction of the optical axis, and the y-axis points down in the image to form

a right handed coordinate system. Alternatively, the virtual camera frame is defined so that

the z-axis points down toward the ground, i.e., is equal to e3, and the x and y axes are

projections of the camera x and y axis onto the plane orthogonal to e3. A notional depiction

of the camera and virtual camera frame is shown in Fig. 11.12.

Figure 11.12: A notional depiction of the camera frame and the virtual camera frame. The opticalaxis of the virtual camera frame is the projection of the optical axis of the camera frame onto thedown vector e3.

The virtual camera frame is obtained from the camera frame through a rotation that

aligns the optical axis with the down vector e3. The rotation, denoted Rvc is a function of the

roll and pitch angles of the multirotor as well as the geometry of how the camera is mounted

to the vehicle.

Therefore, the normalized virtual image coordinates of the track in the virtual camera

frame are given by

ϵvt/c =

Rvcϵct/c

e⊤3 Rvcϵct/c

. (11.13)

162

Similarly, the pixel velocity in normalized virtual image coordinates is given by

ϵvt/c =

1

(e⊤3 Rvcϵct/c)

2

((e⊤3 R

vcϵct/c)I −Rv

cϵct/ce

⊤3

) (

Rvc

[ωcc/v

]

3×ϵct/c +Rv

c ϵct/c

)

. (11.14)

Equations (11.13) and (11.14) are computed by vision data using the R-RANSAC

tracker described in the previous section. We also note, that ϵvt/c is the normalized line-of-

sight vector expressed in the virtual camera frame, i.e.,

ϵvt/c =

pvt/ceT3 p

vt/c

= λpvt/c,

where λ = 1/(eT3 pvt/c) is the constant height above ground. In addition we have that

ϵvt/c = λpvt/c = λ

(pvt/i − pvc/i

),

ϵvt/c = λpvt/c = λ

(pvt/i − pvc/i

),

where pvt/i and pvc/i are the inertial velocities of the target and camera, and pvt/i and pvc/i are

the inertial accelerations of the target and camera, all expressed in the virtual camera frame.

If we assume that the inertial acceleration of the target is zero, and that the center of the

camera frame is the center of the multirotor body frame, then

ϵvt/c = −λav,

where av = pvb/i = pvc/i is the commanded acceleration of the multirotor, and pvb/i is the

acceleration of the UAV with respect to the inertial frame and expressed in the virtual

camera frame.

Theorem 11.5.1. Assume that the inertial acceleration of the target is zero, and that the

height-above-ground is constant and known. Let ϵvd/c be the desired constant normalized line-

of-sight vector to the target, and let

av =1

λ

((k1 + k2)ϵ

vt/c + k1k2

(ϵvt/c − ϵ

vd/c

)), (11.15)

163

where k1 > 0 and k2 > 0 are control gains, then

ϵvt/c → ϵ

vd/c.

Proof. Define the error as e= ϵ

vt/c − ϵ

vd/c, then e = ϵ

vt/c and

e = ϵvt/c = ϵ

vt/i − ϵ

vc/i = −ϵ

vc/i = −λav.

Let S = e+ k1e, and define V = 12S⊤S. Then

V = S⊤S = S⊤ (e+ k1e) = S⊤ (−λav + k1e)

= −k2S⊤S,

where we have used the fact that Equation (11.15) is equivalent to a = 1λ(k1e+ k2S).

Therefore S → 0, which implies that e → 0 and e → 0.

The desired attitude is selected to align with the target’s velocity vector pnt/i as follows:

Rid =

(

r1 r2 r3

)

(11.16)

r1 =(I − e3e

⊤3 )p

vt/i

∥∥∥(I − e3e⊤3 )p

vt/i

∥∥∥

(11.17)

r2 = r1 × e3 (11.18)

r3 = e3. (11.19)

Therefore, the x-axis of the desired frame points in the direction of the desired velocity

vector, and the attitude is otherwise aligned with the body-level frame. The attitude control

scheme is derived using the technique given in [124].

11.5.1 Following Multiple Targets

We briefly mention two approaches to following multiple targets. If the targets are

clustered together, then following can be achieved by aligning their average position with

164

the camera’s optical center using a technique similar to the one presented in this paper. A

more realistic and common approach is a decentralized multiple target tracking scheme that

uses a fleet of UAVs to cooperatively track targets in their respective surveillance region and

share their information via a communication network [125].

11.6 Results

We implemented the target tracking and following pipeline in simulation using PX4

software-in-the-loop with Gazebo and ROS [126]. We used the IRIS multirotor model with

a camera pitched down by 45 degrees provided by the PX4. We used default simulated

noise values. We had a single target move in a square upon command. For simplicity, we

had the UAV find the target using visual MTT before telling the target to move. Once the

target began moving, the UAV followed it fairly well in the normalized virtual image plane.

Figure 11.13 shows the error plots. Notice that the yaw angle has large increases in error at

several points. This is when the target is turning 90 degrees. These turns also impact the

error in the North-East plane. The results show the effectiveness of the complete pipeline

and its robustness to target modeling errors. A video of the simulation is at

https://youtu.be/C6JWr1dGsBQ

270 280 290 300 310 320Time (s)

0.5

0.0

Erro

r NVI

P (m

) X Error

270 280 290 300 310 320Time (s)

0.10.00.1

Erro

r NVI

P (m

) Y Error

270 280 290 300 310 320Time (s)

0

1

Erro

r (ra

ds)

Yaw Error

Figure 11.13: The X and Y errors are in the normalized virtual image plane in units of meters andthe yaw error is in units of radians.

165

https://youtu.be/C6JWr1dGsBQ

11.7 Conclusion

We have presented an architecture for multiple target tracking and following on the

image plane using a fixed camera mounted to a moving UAV. The architecture used a visual

frontend algorithm to extract measurements, homography, and essential matrix from the

images. We presented the novel algorithm called apparent motion to find moving features

in the images.

The measurements and homography were given to R-RANSAC to track the targets

on the image plane. We derived the process that uses the homography to transform previous

measurements and tracks from the previous image frame to the current image frame.

We also presented a novel target following controller that uses an IBVS scheme with

a non-linear controller to follow a target. Future work regarding this project is to improve

the controller to track multiple targets simultaneously, and incorporate target recognition

for when tracks leave the camera’s field-of-view.

166

CHAPTER 12. TRACKING ON SE(2) USING A MONOCULAR CAMERA

This chapter shows how to track ground target’s that evolve on the special Eu-

clidean group SE (2) using G-MTT in place of R-RANSAC in the end-to-end multiple-

target-tracking framework presented in Chapter 11. An illustration of the multiple target

tracking (MTT) scenario this chapter discusses is in Fig. 12.1, where multiple ground targets

are being tracked by an unmanned air vehicle (UAV) via a monocular camera. The targets’

trajectories are represented as dashed lines, and the targets’ headings are represented as red

arrows relative to the x-axis represented as a black arrow.

Figure 12.1: A depiction of three targets being tracked by a UAV via a monocular camera. Thetargets’ trajectories are represented as dashed lines, and the targets’ headings are represented asred arrows.

Instead of tracking on the normalized image plane as done in Chapter 11, in this

chapter we track on the normalized, virtual image plane (NVIP) as discussed in Section 12.1.

167

We track on the NVIP to be able to transform measurements and tracks from the previous

tracking frame (previous NVIP frame) to the current tracking frame (NVIP frame) using the

homography as discussed in detail in Section 12.5.

The rest of this chapter is outlined as follows. In Section 12.1 we review the NVIP

and discuss how to adjust the visual front end in Chapter 11 to produce measurements

and the homography in the NVIP. In Section 12.2 we present the system model for our

specific application. In Sections 12.3 and 12.4 we prove that the system is locally-weakly-

observable and show how to seed the negative log likelihood optimization problem during

track initialization as discussed in Chapter 8. In Section 12.5 we derive the transformation

for the measurements and tracks from the previous NVIP to the current NVIP using the

homography. In Section 12.6 we present and discuss experimental results and then conclude

in Section 12.7

12.1 Normalized Virtual Image Plane

The virtual image plane is the rotated and projected image plane such that the

rotated z-axis is perpendicular to the ground, the rotated xy-plane is parallel to the ground,

the x,y,z-axes form a right handed rotation, and the virtual image plane is parallel to the

ground at a distance of 1 pixel along the z-axis from the camera. This relation is illustrated

in Fig. 12.2.

Virtual Image Plane

Image Plane

Ground Plane

Figure 12.2: An illustration of the relationship between the image plane and the virutal imageplane.

168

To describe the virtual image plane more formally, let pct/c =[

xct/c, yct/c, 1

]⊤

∈ R3

denote the pixel location of the target in the image plane expressed in and with respect to

the camera frame. Let Rvc denote the rotation from the camera frame to the virtual camera

frame. This rotation is a composite of the rotation about the camera’s z-axis and then the

rotation about the new camera’s x-axis so that the virtual image plane is parallel to the

ground. It can be thought of as un-rolling and un-pitching the camera frame. The pixel

location of the target expressed in the virtual camera frame and with respect to the camera

frame is

pvt/c =Rvcpct/c

e⊤3 Rvcpct/c

,

where e3 ∈ R3 is a unit vector with the third component being 1. The origin of the virtual

camera frame and the camera frame are the same; hence, pvt/c = pvt/v.

To transform a pixel from the image plane to the normalized virtual image plane

(NVIP), we normalize the pixel using the camera’s intrinsic parameters K, rotate it using

Rvc and then project it so that the NVIP is at a distance of one along the z-axis from the

camera. Mathematically, the pixel location of the target in the NVIP is

pnt/c =RvcK

−1pct/c

e⊤3 RvcK

−1pct/c. (12.1)

To produce measurements and compute the homography in the NVIP, the visual

frontend in Chapter 11 requires the rotation Rvc to be provided. After matching features

are detected according to Subsection 11.2.1, the visual frontend transforms the matched

features from the image plane to the NVIP using equation (12.1). The homography and

essential matrix are then calculated using the transformed matched features using the same

process as discussed in Subsections 11.2.2 and 11.2.3. The transformed measurements are

segmented into moving and non-moving using the homography and essential matrix in the

NVIP according to Subsection 11.2.4. The homography and “moving” features in the NVIP

are given to G-MTT.

The rest of this chapter assumes that the measurements and homography are in the

NVIP.

169

12.2 System Model

We assume that the target’s motion can be modeled using a discrete, constant-velocity

model driven by white noise as stated in Assumption 2.1.. We assume that the target’s

motion is constrained to the ground and that the ground can be approximated as flat.

This constraint implies that the target’s motion evolves on the special Euclidean group of

two dimensions SE (2) since this Lie group expresses all rotations and translations in two

dimensional space. Therefore, the target’s state is x = (g, v) ∈ Gx= SE (2)×RSE(2) where

the pose g is an element of the Lie group SE (2) defined by the set

SE (2) =

R p

01×2 1

∈ R3×3

∣∣∣∣∣∣

R ∈ SO (2) and p ∈ R2

equipped with matrix multiplication. In our application R denotes the rotation from the

body frame to the virtual camera frame and p denotes the target’s position expressed in the

virtual camera frame and with respect to the virtual camera frame.

The target’s velocity is an element of the corresponding Cartesian algebraic space

defined as the set

RSE(2) ≜

ρ

ω

∈ R3

∣∣∣∣∣∣


,

where ρ denotes the target’s velocity expressed in the body frame and with respect to the

virtual camera frame and ω denotes the target’s angular velocity expressed in the body frame

and with respect to the virtual camera frame. For more details on the Lie group SE (2) see

Appendix C.4.

We perform target tracking on the NVIP which is isomorphic to the ground plane.

This implies that the target’s pose and twist projected to the NVIP is a scaled version of

the target’s state that preserves relative shape and volume. Thus, we represent the target’s

state on the NVIP as x ∈ Gx= SE (2)×RSE(2).

We use the system model defined in equation (4.1) where the state transition function

is defined in equation (4.2). For the observation function, the visual frontend produces point

170

measurements (i.e. pixel locations) in the NVIP. Therefore, the observation function is

zk = h (xk, rk) (12.2)

= pk + rk, (12.3)

where pk is the target’s pixel location in the NVIP, rk ∼ N (0, R) is the measurement noise,

and zk ∈ Gs= R2 is the measurement in the NVIP. The measurement is an element of

two dimensional Euclidean space which is a Lie group. For details on this Lie group see

Appendix C.1.

12.2.1 System Affinization

The Jacobians of the state transition function are defined in equation (4.5). The

Jacobians of the observation function are defined in the following Lemma.

Lemma 12.2.1. We suppose directly that the observation function is as defined in equa-

tion (12.2) and that rk ∼ N (0, R) is a concentrated Gaussian distribution. Let xk =

xkExpGxI (xk) where xk ∈ RGx is a small perturbation, then the partial derivatives of the

observation function with respect to the state and measurement noise evaluated at the point

ζhk = (xk, 0) are

Hk=∂h

∂x

∣∣∣∣ζhk

= [RkV (ωk) , 02×4] (12.4)

Vk=∂h

∂r

∣∣∣∣ζhk

= I2×2, (12.5)

where V (ωk) is defined in equation (C.23). Consequently, the affine observation function is

zk ≈ h (xk, 0)ExpR2

I (Hkxk + rk)

= h (xk, 0) +Hkx+ rk, (12.6)

where we used the fact that ExpR2

I is the identity map.

171

Proof. The proof for Vk is trivial and will not be provided. The Jacobian Hk is evaluated

with the measurement noise set to zero. This let’s us simplify the derivation by initially

setting the measurement noise to zero since it will have no impact on the derivation. Let

τ =[

(τ p)⊤ , τR, (τ ρ)⊤ , τω]⊤

∈ Rx denote a perturbation of the state when computing the

derivative where τ p, τ ρ ∈ R2 and τR, τω ∈ R. The derivation of Hk is written as if we are

perturbing the state by the vector τ where all of its elements are simultaneously non-zero;

however, we are only perturbing the state by a single element of τ at a time.

Using the definition of the derivative in equation (3.11),

∂h

∂xk= lim

τ→0

LogR2

I

(h (xk, 0)

−1 h(xkExp

GxI (τ) , 0

))

τ

Using the definition of the observation function in equation (12.2), the definition of the

map ExpR2

I in equation (C.1) and the definition of ExpGxI in equation (C.21) the derivative

becomes

∂h

∂xk= lim

τ→0

h(xkExp

GxI (τ) , 0

)− h (xk, 0)

τ

= limτ→0

pk +RkV (ωk) τp − pk

τ

= limτ→0

RkV (ωk) τp

τ

= limτ→0

[

RcV (ωk) τp, 02×4

] 1

τ,

where V (ωk) is defined in equation (C.23). Therefore, the partial derivative of the observa-

tion function in matrix notation is as defined in equation (12.4).

12.3 Proving Local Observability

Let Σ denote the system model defined in equations (4.1), (4.2) and (12.2). To track

the target, the system model Σ must be observable. To properly discuss observability of the

system we first present several definitions taken from [127]–[129].

Definition 12.3.1. A pair of initial points x10 and x20 are said to be indistinguishable if

(Σ, x10) and (Σ, x20) realize the same input-output map for every admissible input ui for all

172

time. The system Σ is said to be observable if for any two indistinguishable initial states

xi0 and xj0, xi0 = xj0, i.e. they are equal.

In terms of target tracking the input ui is modeled as Gaussian white noise and is

incorporated into the process noise qi:k and the initial points x10 and x20 are initial states.

Notice that observability is a global concept and it might require traveling far from

the initial states before being able to distinguish them, implying that it can be very difficult

to show and prove that a system is observable from every initial condition. There are many

systems that are not observable. In these cases, we use weaker conditions of observability.

Definition 12.3.2. Let Gx denote the configuration manifold of the state x. The system Σ

is weakly observable if for all xi0 ∈ Gx, there exists a neighborhood U i ⊂ Gx of xi0 such

that if xj0 ∈ U i and xj0 is indistinguishable from xi0, then xi0 = xj0.

Essentially, if the system is weakly observable, then for every initial state xi0 there is

a neighborhood U i such that xi0 is distinguishable from every other point in U i. However,

this definition does not indicate how far xi0 must travel to be distinguishable from the other

points in U i. Also, this does not mean that xi0 is distinguishable from points outside of U i.

Definition 12.3.3. A pair of initial points x10 and x20 are said to be U-indistinguishable

if (Σ, x10) and (Σ, x20) realize the same input-output map for every admissible input ui that

maintains the trajectory of the system in U ⊂ Gx. The system Σ is locally weakly ob-

servable if for all xi0 ∈ Gx there exists a neighborhood U ⊂ Gx of xi0 such that if xj0 ∈ U

and xj0 is U-indistintiguishable from xi0, then xi0 = xj0.

This last definition means that if Σ is locally weakly observable, then neighboring

points can be distinguished from other neighboring points while staying inside the neighbor-

hood U .

To determine if a system is locally weakly observable we use the observability rank

condition which requires the observability matrix [127]–[129]. We construct the observability

matrix in a way to help build intuition.

Let

zi = h0:i (x0, q0:i, t0:i, ri)= h (f (x0, q0:i, t0:i) , ri) (12.7)

173

denote a new observation function that maps the initial state x0 to the measurement at time

ti where h is the observation function defined in equation (12.2), and f is the state transition

function defined in equation (4.2).

Let Z = (z0, z1, z2, . . . , zk) ∈ R2k = R2×R2×R2×· · ·×R2 be a tuple of measurements

taken at distinct times, and let Z = (z0, z1, z2, . . . , zk) ∈ R2k denote the corresponding

estimated measurements where zi = h0:i (x0, 0, t0:i, 0) and x0 = x0ExpGxI (x0). Also let

O =

∂h0:0∂x

∣∣ζh0:0

∂h0:i∂x

∣∣ζh0:i

∂h0:j∂x

∣∣∣ζh0:j...

∂h0:k∂x

∣∣ζh0:k

, (12.8)

denote the matrix of stacked Jacobians of the observation function h0:i evaluated at the point

ζh0:i = (x0, 0, t0:i, 0). The matrix O is called the observability matrix.

We construct the error term e = LogR2k

I

(

Z−1Z)

which can be approximated using

the affine observation function to get

e ≈ Ox0 + noise terms.

Let n denote the dimension of the state. If O is full rank (a rank of n), then O is an injective

map and x0 can be estimated using a least squares method. Therefore, the state can be

estimated as x0 = x0ExpGxI (x0). This does not mean that x0 is distinguishable from all

other initial states; rather, it is only locally distinguishable. This is what it means to be

locally weakly observable, and this concept is derived from the inverse function theorem.

174

Using the chain rule to get ∂h0:i∂x

= HiF0:i, the observability matrix simplifies to

O =

H0

HiF0:i

HjF0:j

...

HkF0:k

, (12.9)

where Hi is the Jacobian of the observation function with respect to the state defined in

equation (12.4) and evaluated at the point ζhi = (f (x0, 0, t0:i) , 0), and F0:i is the Jacobian of

the state transition function with respect to the state defined in equation (4.5) and evaluated

at the point ζf0:i = (x0, 0, t0:i).

Let the target’s velocity in the body frame be denoted ρ = [ρx, ρy]⊤, where ρx is the

velocity in the forward direction and ρy is the velocity perpendicular to the forward direction.

By analyzing the observability matrix, the system is not locally weakly observable. However,

by constraining the target’s velocity to be non-zero and aligned with the heading, the system

is observable. This means that for the system to be locally weakly observable, we require

that ρx = 0 and ρy = 0. This is a reasonable assumption when tracking moving people or

cars on the ground since they typically move in the direction of their heading.

12.4 Seeding Track Initialization

G-MTT uses a subset of measurements from cluster Cm, denoted ZCm , to generate

a state hypothesis xhk by finding the state hypothesis that maximizes the likelihood of the

largest subset of measurements in ZCm subject to the system model. The state hypothesis

is found by optimizing the negative log likelihood of the probability of the measurements

conditioned on the state hypothesis and subject to the system model and discussed in Sec-

tion 8.2.

To speed up the optimization algorithm, we can seed it with a good guess of what

the state hypothesis should be. In this section we discuss our seeding strategy.

175

Let ZCm = zk, zi, zj, . . . denote a subset of measurements from cluster Cm where zk

is a measurement from the current time, and the other measurements are from different and

distinct times. Since we observe the position of the target, we can approximate the position

of the target at the times tk, ti, and tj as pk = zk, pi = zi and pj = zj. Using these estimated

positions, we can numerically approximate the velocities

pk =pk − piti:k

pi =pi − pjtj:i

,

where pℓ = Rℓρℓ is the velocity of the target with respect to and expressed in the normalized

virtual camera frame.

Let pk = [pxk , pyk ]⊤. Using the constraint that the heading of the target is aligned

with the translational velocity,

Rk ≈1

∥pk∥

pxk −pykpyk pxk

,

and

ρk =

∥pk∥0

.

We can follow a similar procedure to approximate Ri.

According to the state transition function in equation (4.2),

Rk ≈ RiExpSO(2)I (ti:kωi) .

Solving for ωi yields

ωi ≈ LogSO(2)I

(R−1i Rk

)/ti:k.

Under the assumption that velocity is constant ωk = ωi. Therefore, we seed the optimization

algorithm with (pk, Rk, ρk, ωk).

176

12.5 Transforming Measurements and Tracks

We perform tracking on the NVIP which moves with the UAV. This requires G-MTT

to receive transformation data to transform the tracks and previous measurements from

the previous tracking frame (the previous normalized virtual camera frame) to the current

tracking frame (the current normalized virtual camera frame).

We use the homography to transform the measurements and tracks. The measure-

ments are transformed according to Theorem 11.3.1, and the tracks are transformed accord-

ing to the following theorem.

Lemma 12.5.1. Let xn−=(

gn−, vn

−)

∈ Gx and P n−denote the track’s state estimate and

error covariance in the previous tracking frame where we dropped the subscripts denoting

time, and

gn−

=

Rn−

pn−

01×2 1

(12.10a)

vn−

=

ρn

−

ωn−

(12.10b)

Rn−

=1

∥∥∥ ˙pn

−

∥∥∥

e1

(

˙pn−)

e2

(

− ˙pn−)

e2

(

˙pn−)

e1

(

˙pn−)

, (12.10c)

where p is the target’s estimated pixel location in the NVIP, R is the rotation from the body

frame to the normalized virtual camera frame, ρ is the translational velocity in the body

frame, ω is the angular velocity, and ei is a function that extracts the ith component of a

tuple. Also, let

H =

H1 h2

h⊤3 h4

,

denote the homography broken down into block elements as done in Subsection 11.3.1.

We suppose directly that the target’s state can be represented as x = (g, v) ∈ SE (2)×RSE(2), the target’s heading is aligned with it’s velocity, (i.e. the velocity in the body frame

perpendicular to the forward direction is zero), the ground the target moves on is approxi-

177

mately planar, tracking is done in the NVIP, and the homography calculated in the NVIP is

provided, then the transformed state estimate and error covariance are

Rn = H1Rn−

,

pn = gh

(

H, pn−)

,

ρn =ρn

−

h4,

ωn = ωn−

,

P n = CP n−

C⊤,

where

gh

(

H, pn−)

=H1p

n−+ h2

h4,

C = diag

(1

h4,1

h4, 1,

1

h4,1

h4, 1

)

.

Proof. We denote the track’s state in the previous NVIP as xn− = xn−ExpGxI

(

xn−)

. Ac-

cording to equation (11.4), the homography is

Hnn−

=

(

Rnn− +

pnn−/n

dnn⊤

)

, (12.11)

where d is the distance from the NVIP to the ground plane, Rnn− is the rotation between

the previous and current NVIP, pnn−/n is the translation between the previous and current

NVIP, and n is the unit vector perpendicular to the NVIP. Since the NVIP is always parallel

to the ground, n = [0, 0, 1]⊤, and the rotation between the previous and current NVIP is

about the vector n = [0, 0, 1]⊤. This implies that

Rnn− =

R (ψ) 02×1

01×2 1

, (12.12)

178

where R (ψ) ∈ SO (2) is a two dimensional rotation. These properties simplify the homog-

raphy to

H =

R (ψ) −πe3

(pn

⊤

d

)

0 1− e3

(pn

⊤

d

)

,

where we have dropped the frame notation, and πe3 extracts the first two components of it’s

argument. Thus

H1 = R (ψ) ,

h2 = −πe3(pn⊤

d

)

,

h⊤3 = 0,

h4 = 1− e3

(pn⊤

d

)

.

As given in equation (11.3), the homography satisfies the constraint

γf

pn

1

= H

pn

−

1

.

Expanding it out gives us

γfp

n

γf

=

H1p

n−+ h2

h4

,

which indicates that γf = h4. Substituting in h4 for γf and solving for pn yields

pn =1

h4

(

H1pn−

+ h2

)

= gh

(

H, pn−)

. (12.13)

Taking the above equation with respect to time yields

pn =∂gh

(

H, pn−)

∂pn− pn−

=H1

h4pn

−

179

= Ghpn−

, (12.14)

where pn = Rnρn.

Using the definition of the rotation matrix Rn in equation (12.10) and substituting

in equation (12.14) for pn yields

Rn =1

∥pn∥

e1 (p

n) e2 (−pn)e2 (p

n) e1 (pn)

=1

∥Ghpn−∥

e1

(

Ghpn−)

−e2(

Ghpn−)

e2

(

Ghpn−)

e1

(

Ghpn−)

= H1Rn−

. (12.15)

Noting that pn = Rnρn and using equations (12.14) and (12.10), the velocity in the

body frame is

ρn = (Rn)−1 pn

= (Rn)−1Ghpn−

= (Rn)−1GhRn−

ρn−

=(

Rn−)−1

H−11

H1

h4Rn−

ρn−

=ρn

−

h4. (12.16)

Since a rotation and translation does not affect the angular velocity in SE (2),

ωn = ωn−

. (12.17)

Let T : Gx → Gx denote the transformation of the state from the previous NVIP to

the current NVIP. Using the method in Subsection 3.4.5, the first-order Taylor series of the

transformation is

xn ≈ T(

xn−)

ExpGxI

(

Cxn−)

, (12.18)

180

where C= ∂T

∂x

∣∣xn− is the partial derivative of the transformation T with respect to the state,

xn = T(

xn−)

is the transformed state estimate and xn = Cxn−is the transformed error

state.

The covariance of the transformed error state is

cov (xn) ≈ cov(

Cxn−)

(12.19)

= Ccov(

xn−)

C⊤ (12.20)

= CP n−

C⊤. (12.21)

To calculate C we can decompose it into the blocks

C =

C11 C12

C21 C22

,

where Cij ∈ R3×3. Let T1 denote the portion of T that transforms the state’s pose, and T2

denote the portion of T that transforms the state’s twist.

Using equations (12.13) and (12.15), letting τ g =(τ p, τR

)∈ RSE(2) denote the

perturbation of the pose when taking the derivative and using the method in Subsection 3.4.5

to compute the derivative, the first block is

C11 =∂T1 (g)

∂g

= limτg→0

LogSE(2)I

(

T1 (g)−1 T1

(

gExpSE(2)I (τ g)

))

τ g

= limτg→0

LogSE(2)I

H1R

1h4

(H1p+ h2)

01×2 1

−1

H1RExp

SO(2)I

(τR)

1h4

(H1 (p+Rτp) + h2)

01×2 1

τ g

= limτg→0

LogSE(2)I

R⊤H−1

1 H1RExpSO(2)I

(τR)

R⊤H−11

1h4

(H1 (p+Rτp − p) + h2)

01×2 1

τ g

= limτg→0

LogSE(2)I

Exp

SO(2)I

(τR)

1h4τp

01×2 1

τ g

181

=diag

(1h4, 1h4, 1)

τ g

τ g;

thus, C1 = diag(

1h4, 1h4, 1)

.

Using equations (12.16) and (12.17), letting τ v = (τ ρ, τω) ∈ RSE(2) denote a pertur-

bation of the twist and using the method in Subsection 3.4.5 to compute the derivative, the

block C22 is

C22 =∂T2∂v

(12.22)

= limτv→0

1h4

(ρ+ τ ρ)

ω + τω

−

ρ

ω

τ v(12.23)

= limτv→0

1h4τ ρ

τω

τ v(12.24)

= diag

(1

h4,1

h4, 1

)

. (12.25)

It is trivial to show that C21 = C12 = 03×3. Putting all of the blocks together yields

C = diag

(1

h4,1

h4, 1,

1

h4,1

h4, 1

)

.

12.6 Experiment

In this section we compare tracking using the constant-velocity target model on SE (2)

(SE (2)-CV) described in this chapter with a linear, time-invariant, constant-velocity (LTI-

CV) model via simulation and hardware results.

The simulation consists of twenty targets moving on the ground with random trajec-

tories being observed by a camera 30 meters above ground. The simulated camera’s FOV is

70 degrees with image dimensions 1080× 1920 which results in the dimensions of the NVIP

being 0.8 × 1.4 m. The simulated camera also produces 100 false measurements per sensor

182

0 5 100

5

10

15

20

NV

T

LTI-CV

SE2

0 5 100

5

10

15

20

NM

T

LTI-CV

SE2

Figure 12.3: The image shows the number of valid tracks and the number of missed targets at eachtime step of the simulation. The x-axis is time in seconds.

scan. The standard deviation of the measurement noise is√1e−5 and the standard deviation

of the process noise is set to 0.01. The simulation is performed 100 times with a time step

of 0.1 seconds for a duration of 10 seconds.

To compare the two target models we calculate the performance measures: number of

valid targets (NVT), number of missed targets (NMT), average Euclidean error (AEE), and

the track probability of detection (TPD) as described in [88]. Fig. 12.3 shows the number of

valid targets and the number of missed targets at each time step of the simulation. Initially,

neither of the methods are tracking the targets. This is because G-MTT requires a batch

of measurements before it begins to initialize tracks. With that being said, it is easily

noticeable that the SE (2)-CV model finds the targets better than the LTI-CV model. The

track probability of detection for the SE(2)-CV model is 0.874 and for the LTI-CV model is

0.56 which indicates that the SE (2)-CV model has a lot higher probability of tracking the

targets. Lastly the average Euclidean error in position for the SE (2)-CV model is 8e−4 and

for the LTI-CV model is 7e−4. The error is so small because the area of the NVIP is small.

For hardware results we recorded a video from a multirotor UAV hovering above a

field where individuals were moving with different types of trajectories. The SE (2)-CV

model never looses a target once; however, the LTI-CV model does. The LTI-CV was able

to perform fairly well since the targets were moving fairly slow. The hardware results are

shown at https://www.youtube.com/watch?v=kTQOi0bAuek&list=PLYgEzAHuTvsAvUDxp

raTkIfxcIAQ5wSIu&index=4&ab channel=MAGICCLab for tracking using SE(2)-CV models,

and at and https://www.youtube.com/watch?v=C6o7f1KXkIE&list=PLYgEzAHuTvsAvUD

xpraTkIfxcIAQ5wSIu&index=5&ab channel=MAGICCLab for tracking using LTI-CV models.

183

https://www.youtube.com/watch?v=kTQOi0bAuek&list=PLYgEzAHuTvsAvUDxpraTkIfxcIAQ5wSIu&index=4&ab_channel=MAGICCLab

https://www.youtube.com/watch?v=kTQOi0bAuek&list=PLYgEzAHuTvsAvUDxpraTkIfxcIAQ5wSIu&index=4&ab_channel=MAGICCLab

https://www.youtube.com/watch?v=C6o7f1KXkIE&list=PLYgEzAHuTvsAvUDxpraTkIfxcIAQ5wSIu&index=5&ab_channel=MAGICCLab

https://www.youtube.com/watch?v=C6o7f1KXkIE&list=PLYgEzAHuTvsAvUDxpraTkIfxcIAQ5wSIu&index=5&ab_channel=MAGICCLab

12.7 Conclusion

The purpose of this chapter is to modify the end-to-end MTT framework presented

in Chapter 11 to track targets whose motion evolves on SE (2) using the G-MTT algorithm.

The necessary contributions for this modification consisted of defining the system model in

Section 12.2, proving that the system is locally weakly observable in Section 12.3, designing

a seeding strategy for tack initialization in Section 12.4, and deriving the transformation

between the previous NVIP and the current NVIP in Section 12.5.

Through experimental simulation and hardware results, we have shown that tracking

on SE (2) significantly increase tracking performance as compared to tracking with an LTI-

CV model.

184

CHAPTER 13. TRACKING ON SE(3) USING A MONOCULAR CAMERA

Tracking highly maneuverable targets is a challenge. Consider tracking multirotors

where the multirotors have a non-linear trajectory composed of turning, rotational and linear

motions. Tracking these targets can be difficult especially for agile maneuvers. One possible

solution to compensate for the target’s high maneuverability is to use higher-order, linear,

time-invariant (LTI) models such as constant acceleration or jerk LTI models.

The authors of [23] show that using an LTI constant jerk model outperforms an LTI

constant acceleration model in tracking a highly maneuverable target in three dimensional

space. This idea is tested and verified in [130], which also provides a short survey of different

LTI constant jerk models. A similar experiment is conducted in [22] which implements

LTI constant acceleration and jerks models with the MTT algorithm Recursive RANSAC

(R-RANSAC) originally developed in [63]. It is shown in [22], that the LTI constant jerk

model outperforms the LTI, constant-velocity model in tracking vehicles, especially when

the vehicles performs a U-turn.

While it has been shown that tracking on SE (2) using a constant-velocity model

outperforms the LTI, constant-velocity model in [9], [32], [131], the constant velocity model

on SE (2) has not been compared to higher-order LTI models such as constant-acceleration

and jerk models.

In this chapter we show how to track multiple quadrotors using a constant-velocity

target model on SE (3) (SE (3)-CV) using the G-MTT algorithm. We also compare the

performance of the SE (3)-CV target model to three other target models using the G-MTT

algorithm: an LTI, constant-velocity (LTI-CV) target model, an LTI, constant-acceleration

(LTI-CA) target model, and an LTI, constant-jerk (LTI-CJ) target model as defined in [8].

In Section 13.1 we derive the system model for the different target models. In Sec-

tion 13.2 we present the experiment, and in Section 13.3 we conclude the chapter.

185

13.1 System Model

In this section we first define the system model for the SE (3)-CV target model, and

then define the system model for the LTI-CV, LTI-CA, and LTI-CJ target models.

13.1.1 System Model SE(3)-CV

The quadrotors move in three dimensional space; therefore, we can represent their

natural configuration manifold using the Lie group SE (3), which is the set of all rotations

and translations in three dimensional space. For a description of SE (3) see Appendix C.5.

We model the target’s state as xk = (gk, vk) ∈ Gx= SE (2) ×RSE(2), where gk ∈ G is the

target’s pose at time tk and vk ∈ RSE(2) is the target’s twist at time tk.

The targets are observed by a realsense D435i camera. The images are given to a

deep neural network (DNN) described in the paper [132] but extended to 3D boxes as in the

paper [133]. The DNN extracts the poses of the targets relative to the camera frame, and

feeds them to the G-MTT algorithm. We model the sensor’s observation function as

zk = h (xk, rk) (13.1)

= gkExpGSE(3)

I (rk) , (13.2)

where gk is the target’s pose, rk ∼ N (0, R) ∈ RGSE(3)is the measurement noise, and

zk ∈ SE (3) is the measurement.

The target’s system model is defined by the equations (4.1), (4.2), and (13.1). The

state transition function in Chapter (4.2) is defined generically for any Lie groups, and is

modified to SE (3) using the appropriate terms found in Appendix C.5.

The Jacobians of the observation function evaluated at the point ζhk = (xk, 0) are

Hk=∂h

∂x

∣∣∣∣ζhk

= I (13.3)

Vk=∂h

∂r

∣∣∣∣ζhk

= I. (13.4)

186

The derivation of these Jacobians are easily derived using the method in Section 4.1. The

Jacobians of the state transition function are derived in Section 4.1.

13.1.2 System Model LTI

Tracking the quadrotors using an LTI target model is a non-geometric approach, but

since LTI target models are designed in Euclidean space and Euclidean space is a Lie group

as shown in Appendix C.1, we can use the G-MTT algorithm.

The G-MTT algorithm as we have presented it is designed to be used with constant-

velocity target models. To use it with higher-order, LTI models we must redefine the system

model, and the system model’s Jacobians. We do this generically for any n-dimensional

Euclidean space.

Let pk ∈ Rn denote the target’s position in n-dimensional space at time tk, and

x = [pk, pk, pk, . . .]⊤ ∈ Ro∗n denote the target’s state of order o where pk denotes the velocity,

pk denotes the acceleration, etc. For example, the target’s state of order o = 3 consists of

position, velocity and acceleration. The system process noise over the time interval ti:k is

modeled as a Wiener process as defined in [24] and is denoted qi:k = N (0, Q (ti:k)) ∈ Ro∗n.

Let

A =

0n×n In×n 0n×n · · · 0n×n

0n×n 0n×n In×n · · · 0n×n...

......

. . ....

0n×n 0n×n 0n×n · · · In×n

0n×n 0n×n 0n×n · · · 0n×n,

(13.5)

be an o∗n×o∗n matrix with block identity matrices on the off diagonal. The state transition

function is defined as

xk = f (xi, qi:k, ti:k)

= expm (Ati:k) xi + qi:k, (13.6)

187

where expm is the matrix exponential. The Jacobians of the state transition function are

are

Fi:k=∂f

∂x

∣∣∣∣ζfi:k

= expm (Ati:k) (13.7)

Gi:k=∂f

∂q

∣∣∣∣ζfi:k

= Io∗n×o∗n. (13.8)

We assume that the target’s position is measured by the sensor; thus, the measure-

ment is defined as zk ∈ Rn and the measurement noise is denoted as rk ∼ N (0, R) ∈ Rn.

Let

C =[

In×n 0n×(o−1)∗n

]

, (13.9)

then the observation function is

zk = h (xk, rk)

= Cxk + rk, (13.10)

and its Jacobians are

Hk=∂h

∂x

∣∣∣∣ζhk

= C (13.11)

Vk=∂h

∂r

∣∣∣∣ζhk

= In×n. (13.12)

For LTI target models of any order, the exponential map and its Jacobians are still

the identity map.

13.2 Experiment

The experiment consists of tracking three quadrotors in a motion capture room for

40 seconds using a realsense D435i camera running at 10 frames per second with image

dimensions 640 × 480. The measurements from the DNN are given to G-MTT to produce

188

confirmed tracks. The confirmed tracks’ poses are compared against true pose data measured

by the motion capture system.

The targets are tracked using four different implementation of G-MTT where each

implementation uses one of the four target models: SE (3)-CV, LTI-CV, LTI-CA, and LTI-

CJ. In each implementation we use the same parameters. The process noise covariance

is set to Q (ti:k) = 0.05ti:kI, the initial error covariance to P0 = 0.25I, the measurement

noise covariance to R = 0.05I, the spacial density of false measurements to λ = 1e−5 since

the sensor produces few false measurements, the probability of detection to PD = 0.8, the

gate probability to PG = 0.7, the minimum subset of measurement used to create a state

hypothesis to τRSC = 4 with a minimum score requirement of τRmS = 7 to be initialized

into a track, and the probability of a measurement being an inlier to a state hypothesis to

PI = 0.7.

In our experiment, a track is considered a true track if it is within a Euclidean

positional distance of 0.5 meters. In the case that two or more tracks are within that

distance to a target, the closer track is assumed to be the true track and the other false.

To compare the different implementations we use the following performance measures:

average number of missed targets (AMT), average number of false tracks (AFT), average

Euclidean error (AEE) in position, and the track probability of detection (TPD) as described

in [88]. The track probability of detection is the probability that if a target exists, it is being

tracked. The statistical data for the performance measures are recorded whenever truth

states from the motion capture room are available for all three quadrotors providing a total

of 3415 data samples. A visualization of the experiment is at https://www.youtube.com/wa

tch?v=NRwcyZJHvmg&list=PLYgEzAHuTvsAvUDxpraTkIfxcIAQ5wSIu&index=11&ab chann

el=MAGICCLab. The visualization shows the video from the camera with the measurements

and the tracks from the different implementations of G-MTT drawn on the video. An image

from the video is shown in Figure 13.1. The performance measures of the experiment are

shown in Table 13.1.

As shown in Table 13.1, the SE (3)-CV target model had the lowest average Euclidean

error, the highest track probability of detection and the lowest average number of false tracks

and missed targets. Thus, this target model performed better than the others. The LTI-CV

189

https://www.youtube.com/watch?v=NRwcyZJHvmg&list=PLYgEzAHuTvsAvUDxpraTkIfxcIAQ5wSIu&index=11&ab_channel=MAGICCLab



Figure 13.1: A visualization of the experiment. The image on the left shows the bounding boxesproduced by the DNN. The images on the right shows the tracks produced by the different imple-mentations of G-MTT. The tracks for the SE (3)-CV are shown using bounding boxes to indicatethe orientation and position, and the tracks for the LTI models are shown as points since onlypositional data is provided by the LTI models.

Table 13.1: The performance measures of the experiment

AEE TPD AFT AMTSE (3)-CV 0.11658 0.93079 0.36164 0.19649LTI-CV 0.12207 0.91346 0.441 0.24568LTI-CA 0.12092 0.89933 0.65827 0.2858LTI-CJ 0.13533 0.64332 0.76105 1.0126

target model had the second best performance with slightly worse performance measures

than the SE (3)-CV target model. The LTI-CJ target model had the third best performance

with slightly worse performance than the LTI-CV target model. The LTI-CJ target model

performed poorly and had the worse performance. This was a surprising result since the

results in [22], [23], [130] showed that the constant jerk model outperformed lower-order,

LTI target models.

We investigated what caused the constant jerk model to perform poorly and dis-

covered the cause to be in the track initialization scheme. G-MTT uses a least squares

regression algorithm to generate the state hypothesis as discussed in Chapter 8. When the

190

Table 13.2: Linear Least Squares Regression Results

AEE Position AEE VelocityLTI-CV 0.2381 0.5849LTI-CA 0.2914 2.0731LTI-CJ 0.3065 4.7125

process noise and measurement noise are large, the state hypothesis for higher-order models

can have significant errors in the velocity, acceleration and jerk. To demonstrate this, we

ran a Monte Carlo simulation consisting of 1e4 iterations. In each iteration, we performed

the linear regression algorithm in Chapter 8 using five position measurements to compute

the state estimate for the LTI constant-velocity, acceleration, and jerk target models. The

position and velocity components of the state for each target model were randomly initialized

and the acceleration and jerk components were set to zero for the constant acceleration and

jerk models. The process noise was set to zero, the measurement noise covariance was set

to R = 0.05I and the time step was set to 0.166 seconds to simulate a frame rate of 10fps.

We then calculated the average Euclidean error in position and velocity for each state target

model. The results are shown in Table 13.2.

Table 13.2 shows that the initial state estimate in the target’s position and velocity

gets worse as the order of the target model increases. Note that the error in position is

similar, but the error in velocity greatly increases as the order of the target model increases.

This explains why the higher-order tracks were able to initialize but quickly drift away from

the target’s true position.

The track initialization issue is mitigated in [23] by using three measurements to

estimate the position, velocity and acceleration of the target and initializing the jerk to zero.

Similarly in [22], the linear regression algorithm presented in [63] was used to estimate the

target’s position and velocity while the acceleration and jerk were initialized to zero. These

methods are valid under the assumption that the target has constant velocity or acceleration.

We did not implement these track initialization schemes since we wanted to stay true to the

original algorithm presented in [63].

191

13.3 Conclusion

In this chapter we have shown how to use G-MTT to track targets that evolves on

SE (3) using a monocular camera. To track the targets, we implemented four versions of G-

MTT where each version used one of the following target models: SE (3)-CV, LTI-CV, LTI-

CA and LTI-CJ. We implemented these different versions to see how the SE (3)-CV compares

against the higher-order LTI models when tracking targets. Our experiment showed that the

target model SE (3)-CV outperformed the other target models in tracking three multirotors.

192

CHAPTER 14. CONCLUSION AND FUTURE RESEARCH

The main contribution of this dissertation is the presentation of the novel MTT algo-

rithm G-MTT that implements the different MTT components to work with target models

defined on Lie groups. This dissertation also serves as a guide on how other MTT algorithms

can be modified to work with geometric target models. As part of the presentation we pro-

vide various experimental results that strengthen the argument that a geometric approach

to target modeling improves tracking performance.

In Chapter 4 the main contribution is the definition of a discrete, constant-velocity

system model designed on generic, geodesically-complete, unimodular Lie groups. We also

derived the system model Jacobians. This generic system model allows G-MTT to be applied

to any geodesically-complete, unimodular Lie group. In Chapter 5 the main contribution is

the definition of a cluster where the measurements are defined on Lie groups. We also show

how to compare measurements from different sensors on the condition that the sensors are

compatible.

In Chapter 6 the main contribution is deriving the LG-IPDAF algorithm which is

the extension of the IPDAF algorithm presented in [38] to Lie groups. The LG-IPDAF

algorithm is capable of associating measurements to tracks using a validation region defined

on Lie groups and propagating and updating a track’s state estimate, error covariance and

track likelihood. In this chapter we also demonstrated the LG-IPDAF in simulation by

tracking a car on SE (2) and compared its performance to the IPDAF which tracked the

same car using an LTI-CV target model. The experimental results showed that tracking

on SE (2) as compared to tracking using the LTI-CV model significantly improves tracking

performance.

In Chapter 7 the main contribution is the derivation of the parallel centralized mea-

surement fusion with the LG-IPDAF called MS-LG-IPDAF to allow tracking using multiple

193

sensors. In the same chapter, we compared the MS-LG-IPDAF algorithm to sequential cen-

tralized measurement fusion with the LG-IPDAF and concluded that the tracking perfor-

mance of both algorithms are about the same, and that the sequential algorithm is preferred

since it is more computationally efficient.

In Chapter 8 the main contribution is the derivation of a track initialization scheme

designed on Lie groups that is robust to outliers. The main contribution of Chapter 9 is

the derivation of a track-to-track association and fusion algorithm for tracks whose states

are modeled on Lie groups under the assumption that the tracks are independent. The

main contribution of Chapter 10 is the derivation of an algorithm to deal with measurement

latency for system models designed on Lie groups.

In Chapter 11 the main contribution is the novel end-to-end target tracking and fol-

lowing algorithm designed to track multiple ground targets and to follow a single ground

target using a monocular camera fixed to a UAV. As part of the algorithm, we present

an algorithm called apparent feature motion that is capable of detecting moving features

across images and producing point measurements of those moving features. These moving

features correspond to moving targets that we wish to track. We show how to transform

measurements and tracks from the previous image plane to the current image plane using

the homography as the UAV moved. We design a novel image-based, visual-servoing, target

following controller that is non-linear and proved that it is globally stable. Lastly, we demon-

strate the end-to-end target tracking and following algorithm by tracking and following a

moving target in simulation.

In Chapter 12 the main contribution is the adaptation of the end-to-end target track-

ing and following algorithm presented in Chapter 11 to track targets using a target model

defined on SE (2). We prove that the system model on SE (2) is locally weakly observable.

We show how to seed the track initialization scheme to speed up the optimization algo-

rithm used to find a state hypothesis. We derive a transformation using the homography to

transform tracks on SE (2) from the previous tracking frame to the current tracking frame.

Lastly, we compare the tracking performance of the target model on SE (2) to the LTI-CV

target model in simulation and hardware. The experiment shows that the target model on

SE (2) significantly outperforms the LTI-CV target model.

194

In Chapter 13 the main contributing is showing how the G-MTT can be used to track

targets on SE (3) that are observed by a monocular camera using a deep neural network to

detect the targets and provide the targets’ relative poses to the camera frame to G-MTT.

We compare the tracking performance of the constant-velocity target model on SE (3) to

the tracking performance of the LTI constant-velocity, constant-acceleration, and constant-

jerk target models and find that the target model on SE (3) outperformed all other target

models.

14.1 Future Research

There are many things regarding G-MTT that deserve further research. In this section

we only discuss two of them.

14.1.1 Higher Order System Model

G-MTT assumes that the targets have constant velocity. A constant velocity target

model only captures the kinematics of the target, but not the dynamics. The dynamics

express how forces are able to act on the target and captures holonomic constraints. For

example, the kinematics of a car indicates that the car can move in any direction; however,

the dynamics of a car includes the holonomic constraints and indicates that a car can only

move in the direction of its heading. The corresponding process noise would capture that

behavior by having large process noise along the direction of its heading and small process

noise along any direction perpendicular to its heading.

To include holonomic constraints, the target’s state would be augmented to include

acceleration and a new system model would have to be designed. Depending on the Lie

group, the new system model may have to be continuous. A continuous system model would

require a numerical integration scheme to propagate the tracks on the manifold like the one

presented [134]–[136]. There are other parts of the algorithm that would need to be modified

to accommodate this higher order system model.

195

14.1.2 Measurement Management

G-MTT assumes that all of the sensors are compatible in the sense that we can form

some metric to compare the difference between measurements of different sensors. This

comparison is what allows G-MTT to assign measurements to clusters. However, it is not

always the case that sensors are compatible. For example, a common sensing modality is to

use cameras and radars to observe the target where a camera observes the target’s line of sight

vector from the camera to the target (providing a heading) and a radar observes the relative

depth. There is no direct way to compare the measurements from the two sensors, and

so we cannot use the measurement management strategy presented in Chapter 5 to assign

the measurements to clusters. A possible approach to solve this issue is to use a graph-

based framework like the one presented in [137] that tries to find large sets of measurements

consistent with the system model, and use these measurements to form a cluster. This way

we are not comparing the measurement directly but through the relation given by the system

model.

196

REFERENCES

[1] A. Rathore, A. Sharma, N. Sharma, and V. Guttal, “Multi-Object Tracking in Het-erogeneous Environments (MOTHe) for Animal Space-use Studies,” 2020. 1

[2] T. Hong, W. Zhang, W. Chen, X. Chen, and X. Fu, “Multi-target tracking of birds incomplex low-altitude airspace based on GM PHD filter,” The Journal of Engineering,vol. 2019, no. 21, pp. 7672–7676, 2019. 1

[3] X. Lu and D. Li, “Research on Target Detection and Tracking System of RescueRobot,” 2017 Chinese Automation Congress (CAC), pp. 6623–6627, 2017. 1

[4] K. Yun, L. Nguyen, T. Nguyen, D. Kim, S. Eldin, A. Huyen, T. Lu, andE. Chow, “Small target detection for search and rescue operations using distributeddeep learning and synthetic data generation,” SPIE 10995, Pattern Recognitionand Tracking XXX, vol. 10995, no. May 2019, p. 6, 2019. [Online]. Available:https://doi.org/10.1117/12.2520250 1

[5] H. Morgan, “Small-Target Detection and Observation with Vision-Enabled Fixed-Wing Unmanned Aircraft Systems,” Ph.D. dissertation, Brigham Young University,2021. [Online]. Available: https://scholarsarchive.byu.edu/etd/8998 1

[6] I. Hwang, H. Balakrishnan, K. Roy, and C. Tomlin, “Multiple-target tracking and iden-tity management with application to aircraft tracking,” Journal of Guidance, Control,and Dynamics, vol. 30, no. 3, pp. 641–653, 2007. 1

[7] W. S. Chen, J. Liu, and J. Li, “Classification of UAV and bird target in low-altitudeairspace with surveillance radar data,” The Aeronautical Journal, vol. 123, no. 1260,pp. 191–211, feb 2019. [Online]. Available: https://www.cambridge.org/core/product/identifier/S0001924018001586/type/journal article 1

[8] X Rong Li and V. P. Jilkov, “Survey of Maneuvering Targettracking. Part I: DynamicModels,” IEEE Transactions on Aerospace and Electronic Systems, vol. 39, no. 4, pp.1333–1364, 2003. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1261132 2, 4, 185

[9] R. Schubert, C. Adam, M. Obst, N. Mattern, V. Leonhardt, and G. Wanielik, “Em-pirical Evaluation of Vehicular Models for Ego Motion Estimation,” IEEE IntelligentVehicles Symposium, Proceedings, no. Iv, pp. 534–539, 2011. 2, 185

[10] F. Bullo, A. D. Lewis, and B. Goodwine, Geometric Control of Mechanical Systems.New York: Springer Science+Business Media, LCC, 2005, vol. 50, no. 12. 3, 34

197

https://doi.org/10.1117/12.2520250

https://scholarsarchive.byu.edu/etd/8998

https://www.cambridge.org/core/product/identifier/S0001924018001586/type/journal_article

https://www.cambridge.org/core/product/identifier/S0001924018001586/type/journal_article

http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1261132

http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1261132

[11] J. G. Mangelson, M. Ghaffari, R. Vasudevan, and R. M. Eustice, “Characterizing theUncertainty of Jointly Distributed Poses in the Lie Algebra,” arXiv, pp. 1–19, 2019. 4

[12] T. D. Barfoot and P. T. Furgale, “Associating uncertainty with three-dimensional posesfor use in estimation problems,” IEEE Transactions on Robotics, vol. 30, no. 3, pp.679–693, 2014. 4

[13] T. Barfoot, State Estimation For Robotics. Cambridge University Press, 2019. 4, 34,47, 50

[14] J. Sola, J. Deray, and D. Atchuthan, “A Micro Lie theory for State Estimation inRobotics,” pp. 1–17, 2018. [Online]. Available: http://arxiv.org/abs/1812.01537 4,34, 50, 52

[15] F. C. Park, J. E. Bobrow, and S. R. Ploen, “A Lie Group Formulation of RobotDynamics,” The International Journal of Robotics Research, vol. 14, no. 6, pp. 609–618, 1995. 4

[16] T. Lee, M. Leok, and N. H. Mcclamroch, “Geometric Tracking Control of a QuadrotorUAV on SE(3),” 49th IEEE Conference on Decision and Control (CDC), 2010. 4

[17] X. R. Li and V. P. Jilkov, “Survey of Maneuvering Target Tracking. Part II: MotionModels of Ballistic and Space Targets,” IEEE Transactions on Aerospace and Elec-tronic Systems, vol. 46, no. 1, pp. 96–119, 2010. 4

[18] ——, “Survey of maneuvering target tracking: Part III. Measurement models,” Signaland Data Processing of Small Targets 2001, vol. 4473, no. November 2001, p. 423,2001. 4

[19] ——, “A Survey of Maneuvering Target Trackin. Part IV: Decision-Based Methods,”Signal and Data Processing of Small Targets 2002, vol. 4728, no. April, pp. 511–534,2002. 4

[20] ——, “Survey of maneuvering target tracking. Part V: Multiple-model methods,” IEEETransactions on Aerospace and Electronic Systems, vol. 41, no. 4, pp. 1255–1321, 2005.4, 17

[21] X.-R. Li and V. P. Jilkov, “A survey of maneuvering target tracking. Part VI: ap-proximation techniques for nonlinear filtering,” Signal and Data Processing of SmallTargets 2004, vol. 5428, no. August 2004, pp. 537–550, 2004. 4

[22] K. Ingersoll, P. C. Niedfeldt, and R. W. Beard, “Multiple Target Tracking and Sta-tionary Object Detection in Video with Recursive-RANSAC and Tracker-Sensor Feed-back,” 2015 International Conference on Unmanned Aircraft Systems, ICUAS 2015,pp. 1320–1329, 2015. 6, 24, 185, 190, 191

[23] K. Mehrotra and P. R. Mahapatra, “A Jerk Model for Tracking Highly ManeuveringTargets,” IEEE Transactions on Aerospace and Electronic Systems, vol. 33, no. 4, pp.1094–1105, 1997. 6, 185, 190, 191

198

http://arxiv.org/abs/1812.01537

[24] D. J. Higham, “An Algorithmic Introduction to Numerical Simulation of StochasticDifferential Equations,” SIAM Review, vol. 43, no. 3, pp. 525–546, 2001. 11, 56, 187

[25] M. P. do. Carmo and M. P. do. Carmo, Riemannian Geometry. Birkhauser Boston,1992. 12, 41, 42, 43

[26] M. D. Shuster, “A Survey of Attitude Representations,” The Journal of the Astronau-tical Sciences, vol. 41, no. 4, pp. 439–517, 1993. 14

[27] A. W. Long, C. K. Wolfe, M. J. Mashner, and G. S. Chirikjian, “The banana dis-tribution is Gaussian: A localization study with exponential coordinates,” Robotics:Science and Systems, vol. 8, pp. 265–272, 2013. 16, 55

[28] A. F. Genovese, “The interacting multiple model algorithm for accurate state estima-tion of maneuvering targets,” Johns Hopkins APL Technical Digest (Applied PhysicsLaboratory), vol. 22, no. 4, pp. 614–623, 2001. 16

[29] M. R. Yap, “ScholarWorks @ UNO University of New Orleans Theses and BiomassIntegrated Gasification Combined Cycles ( BIGCC ),” Ph.D. dissertation, Universityof New Orleans, 2004. 17

[30] R. R. Pitre, V. P. Jilkov, and X. R. Li, “A comparative study of multiple-modelalgorithms for maneuvering target tracking,” Signal Processing, Sensor Fusion, andTarget Recognition XIV, vol. 5809, p. 549, 2005. 17

[31] J. M. Lee, An Introduction to Smooth Manifolds. Springer Science+Business Media,LCC, 2013. 18, 34

[32] G. De Moura Magalhaes, E. Dranka, Y. Caceres, J. B. Do Val, and R. S. Mendes,“EKF on Lie Groups For Radar Tracking Using Polar and Doppler Measurements,”2018 IEEE Radar Conference, RadarConf 2018, vol. 1573, no. 1, pp. 1573–1578, 2018.18, 185

[33] J. Kwon and F. C. Park, “Visual tracking via particle filtering on the affine group,”International Journal of Robotics Research, vol. 29, no. 2-3, pp. 198–217, 2010. 18

[34] J. Kwon, M. Choi, F. C. Park, and C. Chun, “Particle filtering on the Euclidean group:Framework and applications,” Robotica, vol. 25, no. 6, pp. 725–737, 2007. 18

[35] C. Choi and H. I. Christensen, “Robust 3D visual tracking using particle filteringon the SE(3) group,” Proceedings - IEEE International Conference on Robotics andAutomation, no. 3, pp. 4384–4390, 2011. 18

[36] G. Bourmaud, R. Megret, A. Giremus, and Y. Berthoumieu, “Discrete ExtendedKalman Filter on Lie groups,” European Signal Processing Conference, pp. 1–5, 2013.18, 55

[37] J. Cesic, I. Markovic, I. Cvisic, and I. Petrovic, “Radar and stereo visionfusion for multitarget tracking on the special Euclidean group,” Roboticsand Autonomous Systems, vol. 83, pp. 338–348, 2016. [Online]. Available:http://dx.doi.org/10.1016/j.robot.2016.05.001 18, 21, 72

199

http://dx.doi.org/10.1016/j.robot.2016.05.001

[38] D. Musicki and R. Evans, “Joint Integrated Probabilistic Data Association - JIPDA,”IEEE Transactions on Aerospace and Electronic Systems, vol. 40, no. 3, pp. 1093–1099,2004. 18, 21, 25, 193

[39] R. Kalman and R. Bucy, “New Results in Linear Filtering and Prediction Theory,”Journal of Basic Engineering, pp. 95–108, 1961. 19

[40] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics. Cambridge: MassachusettsInstitue of Technology, 2006. 19, 209

[41] G. Pulford, “Taxonomy of Multiple Target Tracking Methods,” IEE Proceedings-Radar, Sonar and Navigation, vol. 152, no. 4, pp. 291–304, 2005. [Online]. Available:http://arxiv.org/abs/1409.7618 19

[42] Y. Bar-Shalom and F. T. E., Tracking and Data Association. Academin Press, Inc.,1988. 19, 71

[43] N. Bhatia and Vandana, “Survey of Nearest Neighbor Techniques,” InternationalJournal of Computer Science and Information Security, vol. 8, no. 2, pp. 302–305,2010. [Online]. Available: http://arxiv.org/abs/1007.0085 19, 71

[44] P. Konstantinova, A. Udvarev, and T. Semerdjiev, “A study of a Target TrackingMethod Using Global Nearest Neighbor Algorithm,” International Conference on Com-puter Systems and Technologies, 2003. 19, 71

[45] Y. Bar-Shalom and E. Tse, “Tracking in a cluttered environment with probabilisticdata association,” Automatica, vol. 11, no. 5, pp. 451–460, 1975. 20, 72

[46] Y. Bar-Shalom, F. Daum, and J. Huang, “The Probabilistic Data Association Fil-ter: Estimation in the Presence of Measurement Origin Uncertainty,” IEEE ControlSystems, vol. 29, no. 6, pp. 82–100, 2009. 20, 72

[47] D. Musicki, R. Evans, and S. Stankovic, “Integrated Probabilistic Data Association,”IEEE Transactions on Automatic Control, vol. 39, no. 6, pp. 1237–1241, 1994. 20, 72,78

[48] S. S. Blackman, “Multiple hypothesis tracking for multiple target tracking,” IEEEAerospace and Electronic Systems Magazine, vol. 19, no. 1 II, pp. 5–18, 2004. 21, 72

[49] T. Sathyan, T. J. Chin, S. Arulampalam, and D. Suter, “A multiple hypothesis trackerfor multitarget tracking with multiple simultaneous measurements,” IEEE Journal onSelected Topics in Signal Processing, vol. 7, no. 3, pp. 448–460, 2013. 21

[50] D. B. Reid, “An Algorithm for Tracking Multiple Targets,” IEEE Transactions onAutomatic Control, vol. 24, no. 6, 1979. 21

[51] T. Kurien, “Issues in the Design of Practical Multitarget Tracking Algorithms,”Multitarget-multisensor tracking: advanced applications, pp. 43–83, 1990. 21

[52] R. L. Streit and T. E. Luginbuhl, “Probabilistic Multi-Hypothesis Tracking,” NavalUndersea Warfare Center Division Newport, Tech. Rep., 1995. 22

200



[53] T. Luginbuhl, Y. Sun, and P. Willett, “A Track Management System for the PMHTAlgorithm,” 2010. 22

[54] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum Likelihood from Incom-plete Data Via the EM Algorithm,” Journal of the Royal Statistical Society: Series B(Methodological), vol. 39, no. 1, pp. 1–22, 1977. 22

[55] D. F. Crousey, M. Guerrieroy, P. Willetty, R. Streitz, and D. Dunham, “A look at thePMHT,” 2009 12th International Conference on Information Fusion, FUSION 2009,vol. 4, no. 2, pp. 332–339, 2009. 22

[56] R. P. S. Mahler, “A Theoretical Foundation for the Stein-Winter ”ProbabilityHypothesis Density (PHD)” Multitarget Tracking Approach,” Proc. 2000 MSSNat’l Symp. on Sensor and Data Fusion,, vol. 1, 2000. [Online]. Available:http://www.dtic.mil/cgi-bin/GetTRDoc?AD=ADA400161 22

[57] Z. Prihoda, A. Jamgochian, B. Moore, and B. Lange, “Probability Hypothesis DensityFilter Implementation and Application,” 2019. 23

[58] R. Maher, “A survey of PHD filter and CPHD filter implementations,” Signal Process-ing, Sensor Fusion, and Target Recognition XVI, vol. 6567, no. May 2007, p. 65670O,2007. 23

[59] P. C. Niedfeldt, “Recursive-RANSAC: A Novel Algorithm for Tracking MultipleTargets in Clutter,” All Theses and Dissertations, no. July, p. Paper 4195, 2014.[Online]. Available: http://scholarsarchive.byu.edu/etd/4195 23, 24

[60] M. A. Fischler and R. C. Bolles, “Random Sample Consensus: A Paradigm for ModelFitting with Applications to Image Analysis and Automated Cartography,” Graphicsand Image Processing, vol. 24, no. 6, pp. 381–395, 1981. 23, 24, 31, 119, 150

[61] A. Vedaldi, H. Tin, P. Favaro, and S. Soatto, “KALMANSAC: Robust filtering byconsensus,” Proceedings of the IEEE International Conference on Computer Vision,vol. I, pp. 633–640, 2005. 23

[62] P. C. Niedfeldt and R. W. Beard, Recursive RANSAC: Multiple SignalEstimation with Outliers. IFAC, 2013, vol. 9, no. PART 1. [Online]. Available:http://dx.doi.org/10.3182/20130904-3-FR-2041.00213 24

[63] ——, “Multiple Target Tracking using Recursive RANSAC,” Proceedings of the Amer-ican Control Conference, pp. 3393–3398, 2014. 24, 25, 126, 145, 158, 185, 191

[64] ——, “Robust Estimation with Faulty Measurements using Recursive-RANSAC,” Pro-ceedings of the IEEE Conference on Decision and Control, vol. 2015-Febru, no. Febru-ary, pp. 4160–4165, 2014. 24

[65] ——, “Convergence and Complexity Analysis of Recursive-RANSAC: A New MultipleTarget Tracking Algorithm,” IEEE Transactions on Automatic Control, vol. 61, no. 2,pp. 456–461, 2016. 24

201

http://www.dtic.mil/cgi-bin/GetTRDoc?AD=ADA400161

http://scholarsarchive.byu.edu/etd/4195

http://dx.doi.org/10.3182/20130904-3-FR-2041.00213

[66] K. Ingersoll, “Vision Based Multiple Target Tracking Using Recursive RANSAC,”Ph.D. dissertation, Brigham Young University, 2015. 24

[67] P. C. Defranco, R. W. Beard, K. F. Warnick, and T. W. Mclain, “Detecting andTracking Moving Objects from a Small Unmanned Air Vehicle,” Ph.D. dissertation,Brigham Young University, 2015. 24

[68] P. C. Lusk and R. W. Beard, “Visual Multiple Target Tracking from a DescendingAerial Platform,” Proceedings of the American Control Conference, vol. 2018-June,pp. 5088–5093, 2018. 24

[69] P. C. Niedfeldt, K. Ingersoll, and R. W. Beard, “Comparison and Analysis of Recursive-RANSAC for Multiple Target Tracking,” IEEE Transactions on Aerospace and Elec-tronic Systems, vol. 53, no. 1, pp. 461–476, 2017. 24

[70] J. K. Wikle, “Integration of a Complete Detect and Avoid System for Small UnmannedAircraft Systems,” Ph.D. dissertation, Brigham Young University, 2017. 24

[71] J. Millard, “Multiple Target Tracking in Realistic Environments Using Recursive-RANSAC in a Data Fusion Framework,” Ph.D. dissertation, Brigham YoungUniversity, 2017. [Online]. Available: http://hdl.lib.byu.edu/1877/etd9640 24

[72] F. Yang, W. Tang, and H. Lan, “A Density-Based Recursive RANSAC Algorithm forUnmanned Aerial Vehicle Multi-Target Tracking in Dense Clutter,” IEEE Interna-tional Conference on Control and Automation, ICCA, no. k 1, pp. 23–27, 2017. 24,25

[73] F. Yang, W. Tang, and Y. Liang, “A Novel Track Initialization Algorithm Basedon Random Sample Consensus in Dense Clutter,” International Journal of AdvancedRobotic Systems, vol. 15, no. 6, pp. 1–11, 2018. 24

[74] Y. Bar-Shalom, P. Willett, and X. Tian, Tracking And Data Fusion: A Handbook ofAlgorithms. YBS Publishing, 2011. 25, 71, 81, 87, 129

[75] C. Hertzberg, R. Wagner, U. Frese, and L. Schroder, “Integrating generic sensor fu-sion algorithms with sound state representations through encapsulation of manifolds,”Information Fusion, 2013. 34

[76] J. Stillwell, Naive Lie Theory. Springer Science+Business Media, LCC, 2008. 34

[77] B. C. Hall, Lie Groups, Lie Algebras, and Representations: An Elementary Introduc-tion. Springer-Verlag New York, Inc, 2003. 34, 48, 50

[78] R. Abraham, J. Mardsen, and T. Ratiu, Manifolds, Tensor Analysis, and Applications,1st ed. New York: Springer-Verlag, 1998. 34

[79] J. Sola, “Quaternion kinematics for the error-state Kalman filter,” CoRR, vol.abs/1711.0, 2017. [Online]. Available: http://arxiv.org/abs/1711.02508 46, 76

202

http://hdl.lib.byu.edu/1877/etd9640


[80] K. Engø, “Partitioned Runge-Kutta Methods in Lie-group Setting,” Reportsin Informatics, vol. 43, no. 202, pp. 21–39, 2000. [Online]. Available: http://www.ii.uib.no/publikasjoner/texrap/ps/2000-202.ps 52

[81] Y. Wang and G. S. Chirikjian, “Error Propagation on the Euclidean Group with Ap-plications to Manipulator Kinematics,” IEEE Transactions on Robotics, vol. 22, no. 4,pp. 591–602, 2006. 54

[82] Y. Kim and A. Kim, “On the Uncertainty Propagation: Why Uncertainty on LieGroups Preserves Monotonicity?” IEEE International Conference on IntelligentRobots and Systems, vol. 2017-Septe, pp. 3425–3432, 2017. 55

[83] G. Bourmaud, R. Megret, M. Arnaudon, and A. Giremus, “Continuous-Discrete Ex-tended Kalman Filter on Matrix Lie Groups Using Concentrated Gaussian Distribu-tions,” Journal of Mathematical Imaging and Vision, vol. 51, no. 1, pp. 209–228, 2014.55

[84] A. M. Sjøberg and O. Egeland, “An EKF for Lie Groups with Application to CraneLoad Dynamics,” Modeling, Identification and Control, vol. 40, no. 2, pp. 109–124,2019. 57, 76

[85] A. Wald, Sequential Analysis. New York: John Wiley & Sons Inc, 1947. 71

[86] X. R. Li, N. Li, and V. P. Jilkov, “SPRT-Based track confirmation and rejection,”Proceedings of the 5th International Conference on Information Fusion, FUSION 2002,vol. 2, pp. 951–958, 2002. 72

[87] Y. Bar-Shalom, X.-R. Li, and T. Kirubarajan, Estimation with Applications to Track-ing and Navigation. John Wiley & Sons, Inc, 2001, vol. 9. 85, 106

[88] A. A. Gorji, R. Tharmarasa, and T. Kirubarajan, “Performance Measures for MultipleTarget Tracking Problems,” Fusion 2011 - 14th International Conference on Informa-tion Fusion, pp. 1560–1567, 2011. 91, 114, 183, 189

[89] G. W. Pulford and R. J. Evans, “Probabilistic Data Association for Systems withMultiple Simultaneous Measurements,” Automatica, vol. 32, no. 9, pp. 1311–1316,1996. 96

[90] W. Saidani, Y. Morsly, and M. S. Djouadi, “Sequential Versus Parallel Architecturefor Multiple Sensors Multiple Target Tracking,” 2008 Conference on Human SystemInteraction, HSI 2008, pp. 903–908, 2008. 96, 97, 99, 117

[91] L. Y. Pao and C. W. Frei, “Comparison of Parallel and Sequential Implementations ofa Multisensor Multitarget Tracking Algorithm,” Proceedings of the American ControlConference, vol. 3, pp. 1683–1687, 1995. 96, 97, 117, 118

[92] C. W. Frei and L. Y. Pao, “Alternatives to Monte-Carlo Simulation Evaluations ofTwo Multisensor Fusion Algorithms,” Automatica, vol. 34, no. 1, pp. 103–110, 1998.96, 97, 117, 118

203

http://www.ii.uib.no/publikasjoner/texrap/ps/2000-202.ps

http://www.ii.uib.no/publikasjoner/texrap/ps/2000-202.ps

[93] D. Willner, C.-B. Chang, and K.-P. Dunn, “Kalman Filter Configurations for MultipleRadar Systems,” DTIC Document, vol. 0, no. April 1976, 1976. [Online]. Available:https://apps.dtic.mil/sti/citations/ADA026367 97

[94] E. H. Lee, D. Musicki, and T. L. Song, “Multi-sensor distributed fusion based on inte-grated probabilistic data association,” FUSION 2014 - 17th International Conferenceon Information Fusion, 2014. 97, 99

[95] W. Li, Z. Wang, G. Wei, L. Ma, J. Hu, and D. Ding, “A Survey on MultisensorFusion and Consensus Filtering for Sensor Networks,” Discrete Dynamics in Natureand Society, vol. 2015, 2015. 97

[96] M. E. Liggins, C. Y. Chong, I. Kadar, M. G. Alford, V. Vannicola, S. Thomopoulos,M. E. Liggins, M. G. Alford, and V. Vannicola, “Distributed fusion architectures andalgorithms for target tracking,” Proceedings of the IEEE, vol. 85, no. 1, pp. 95–106,1997. 97

[97] L. Y. Pao, “Centralized Multisensor Fusion Algorithms for Tracking Applications,”Control Engineering Practice, vol. 2, no. 5, pp. 875–887, 1994. 99

[98] S. Agarwal, K. Mierle, and Others, “Ceres Solver.” [Online]. Available:http://ceres-solver.org 126

[99] K. H. Kim, “Development of Track to Track Fusion Algorithms,” Proceedings of theAmerican Control Conference, vol. 1, pp. 1037–1041, 1994. 129

[100] Z. Deng, P. Zhang, W. Qi, J. Liu, and Y. Gao, “Sequential covariance intersectionfusion Kalman filter,” Information Sciences, vol. 189, pp. 293–309, 2012. [Online].Available: http://dx.doi.org/10.1016/j.ins.2011.11.038 129

[101] X. Tian, Y. Bar-Shalom, and G. Chen, “A No-Loss Covariance Intersection Algorithmfor Track-to-Track Fusion,” Signal and Data Processing of Small Targets 2010, vol.7698, no. April 2010, p. 76980S, 2010. 129, 136

[102] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified,real-time object detection,” Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition, vol. 2016-Decem, pp. 779–788, 2016. 144

[103] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accu-rate object detection and semantic segmentation,” Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition, pp. 580–587, 2014.144

[104] Z. Q. Zhao, P. Zheng, S. T. Xu, and X. Wu, “Object Detection with Deep Learning:A Review,” IEEE Transactions on Neural Networks and Learning Systems, vol. 30,no. 11, pp. 3212–3232, 2019. 144

[105] S. Hutchinson, G. D. Hager, and P. I. Corke, “A tutorial on visual servo control,” IEEETransactions on Robotics and Automation, vol. 12, no. 5, pp. 651–670, 1996. 145

204

https://apps.dtic.mil/sti/citations/ADA026367

http://ceres-solver.org

http://dx.doi.org/10.1016/j.ins.2011.11.038

[106] D. Pebrianti, O. Y. Peh, R. Samad, M. Mustafa, N. R. Abdullah, and L. Bayuaji,“Intelligent control for visual servoing system,” Indonesian Journal of Electrical En-gineering and Computer Science, vol. 6, no. 1, pp. 72–79, 2017. 145

[107] P. I. Corke, “Spherical image-based visual servo and structure estimation,” Proceedings- IEEE International Conference on Robotics and Automation, pp. 5550–5555, 2010.145

[108] N. Liu and X. Shao, “Desired compensation RISE-based IBVS control of quadrotorsfor tracking a moving target,” Nonlinear Dynamics, vol. 95, no. 4, pp. 2605–2624,2019. [Online]. Available: https://doi.org/10.1007/s11071-018-4700-5 145

[109] H. Xie and A. Lynch, “Dynamic image-based visual servoing for unmanned aerialvehicles with bounded inputs,” Canadian Conference on Electrical and Computer En-gineering, vol. 2016-October, pp. 1–5, 2016. 145

[110] M. Petersen, C. Samuelson, and R. W. Beard, “Target Tracking and Following from aMultirotor UAV,” Current Robotics Reports, 2021. 145, 155

[111] J. Shi and C. Tomasi, “Good features to track,” in Proceedings of IEEE Conferenceon Computer Vision and Pattern Recognition CVPR-94. IEEE, 1994, pp. 593–600.148

[112] B. D. Lucas and T. Kanade, “An iterative image registration technique with an appli-cation to stereo vision,” in Proceedings of the Imaging Understanding Workshop, 1981,pp. 121–130. 148

[113] C. Tomasi and T. Kanade, “Detection and Tracking of Point Features,” CarnegieMellon University Technical Report CMU-CS-91-132, 1991. 148

[114] G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of Software Tools, 2000. 148

[115] M. K. Kaiser, N. R. Gans, and W. E. Dixon, “Vision-Based Estimation for Guidance,Navigation, and Control of an Aerial Vehicle,” IEEE Transactions on Aerospace andElectronic Systems, vol. 46, no. 3, pp. 1064–1077, jul 2010. 150

[116] S. Choi, T. Kim, and W. Yu, “Performance Evaluation of RANSAC Family,” BritishMachine Vision Conference, BMVC 2009 - Proceedings, no. January, 2009. 150

[117] Y. Ma, S. Soatto, J. Kosecka, and S. S. Shankar, An Invitation to 3-D Vision FromImages to Geometric Models. Springer, 2010. 153

[118] D. Nister, “An Efficient Solution to the Five-Point Relative Pose Problem,” IEEETRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,vol. 26, no. 6, pp. 756–770, 2004. 153

[119] Z. Zou, Z. Shi, Y. Guo, and J. Ye, “Object Detection in 20 Years: A Survey,” pp.1–39, 2019. [Online]. Available: http://arxiv.org/abs/1905.05055 161

[120] L. Jiao, F. Zhang, F. Liu, S. Yang, L. Li, Z. Feng, and R. Qu, “A survey of deeplearning-based object detection,” IEEE Access, vol. 7, pp. 128 837–128 868, 2019. 161

205

https://doi.org/10.1007/s11071-018-4700-5


[121] L. Liu, W. Ouyang, X. Wang, P. Fieguth, J. Chen, X. Liu, and M. Pietikainen,“Deep Learning for Generic Object Detection: A Survey,” International Journalof Computer Vision, vol. 128, no. 2, pp. 261–318, 2020. [Online]. Available:https://doi.org/10.1007/s11263-019-01247-4 161

[122] E. Teng, R. Huang, and B. Iannucci, “ClickBAIT-v2: Training an Object Detector inReal-Time,” 2018. [Online]. Available: http://arxiv.org/abs/1803.10358 161

[123] R. Hartley and A. Zisserman, Multiple View Geometry in computer vision. CambridgeUniversity Press, 2003. 162

[124] T. Lee, M. Leok, and N. H. McClamroch, “Geometric tracking control of a quadrotoruav on se(3),” Proceedings of the IEEE Conference on Decision and Control, pp. 5420–5425, December 2010. 164

[125] N. Farmani, L. Sun, and D. Pack, “Tracking multiple mobile targets using cooperativeUnmanned Aerial Vehicles,” 2015 International Conference on Unmanned AircraftSystems, ICUAS 2015, pp. 395–400, 2015. 165

[126] L. Meier, D. Honegger, and M. Pollefeys, “Px4: A node-based multithreaded opensource robotics framework for deeply embedded platforms,” in 2015 IEEE InternationalConference on Robotics and Automation (ICRA), 2015, pp. 6235–6240. 165

[127] R. Hermann and A. J. Krener, “Nonlinear Controllability and Observability,” IEEETransactions on Automatic Control, vol. 22, no. 5, pp. 728–740, 1977. 172, 173

[128] N. D. Powel and K. A. Morgansen, “Empirical Observability Gramian for StochasticObservability of Nonlinear Systems,” arXiv, no. Cdc, pp. 6342–6348, 2020. 172, 173

[129] A. J. Krener and I. Kayo, “Measure of Unobservability,” in Joint 48th IEEE Conferenceof Decision and Control and 28th Chinese Control Conference, 2009. 172, 173

[130] Q. Meng, B. Hou, D. Li, Z. He, and J. Wang, “Performance Analysis and Comparisonfor High Maneuver Target Track Based on Different Jerk Models,” Journal of ControlScience and Engineering, vol. 2018, 2018. 185, 190

[131] J. Cesic, I. Markovic, and I. Petrovic, “Moving object tracking employing rigid bodymotion on matrix Lie groups,” 19th International Conference on Information Fusion(FUSION), Special Session on Directional Estimation, no. i, pp. 2109–2115, 2017. 185

[132] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg,“SSD: Single Shot Multibox Detector,” Lecture Notes in Computer Science (includingsubseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),vol. 9905 LNCS, pp. 21–37, 2016. 186

[133] G. Wang, F. Manhardt, F. Tombari, and X. Ji, “GDR-Net: Geometry-Guided DirectRegression Network for Monocular 6D Object Pose Estimation,” 2021. [Online].Available: http://arxiv.org/abs/2102.12145 186

[134] H. Munthe-Kaas, “Lie-Butcher theory for Runge-Kutta methods,” BIT NumericalMathematics, vol. 35, no. 4, pp. 572–587, 1995. 195

206

https://doi.org/10.1007/s11263-019-01247-4



[135] ——, “Runge-Kutta methods on Lie groups,” BIT Numerical Mathematics, vol. 38,no. 1, pp. 92–111, 1998. 195

[136] ——, “High order Runge-Kutta methods on manifolds,” Applied Numerical Mathe-matics, vol. 29, no. 1, pp. 115–127, 1999. 195

[137] P. C. Lusk, K. Fathian, and J. P. How, “Clipper: A graph-theoretic framework forrobust data association,” 2021. 196

207

APPENDIX A. PROOFS FOR THE LG-IPDAF

A.1 Proof of Lemma 6.2.2: Prediction Step

Proof. According to the theorem of total probability, the probability of the track’s current

state conditioned on the previous measurements and it representing the target is

p (xk | ϵk− , Z0:k−) =

∫

xk−|k−

p (xk | xk− , ϵk− , Z0:k−)

p (xk− | ϵk− , Z0:k−) dxk−|k− , (A.1)

where we integrate over the error state instead of the state by recalling that xk−|k− =

LogGxI

(

x−1k−|k−xk−

)

, and using the relation in equation (3.14a).

Substituting in the definitions of p (xk | xk− , ϵk− , Z0:k−) and p (xk− | ϵk− , Z0:k−) stated

in equations (6.8) and (6.3) into equation (A.1) and combining the exponents yields

p (xk | ϵk− , Z0:k−) ≈ η

∫

xk−|k−

exp (−L) dxk−|k− , (A.2)

where

L =1

2


)⊤Qk−:k


)

+1

2x⊤k−|k−P

−1k−|k− xk−|k− ,

and Qk−:k is defined in equation (6.6).

Taking the first and second derivatives of L with respect to xk−|k− yields

∂L

∂xk−|k−= −


)⊤Qk−:kFk−:k + x⊤k−|k−P

−1k−|k− (A.3a)

208

∂2L

∂x2k−|k−= Ψ−1 ≜ F⊤

k−:kQk−:kFk−:k + P−1k−|k− . (A.3b)

Setting the first derivative to zero and solving for xk−|k− yields

xk−|k− = ΨF⊤k−:kQ

−1k−:kxk.

Using the first and second partial derivatives of L in equation (A.3) we construct the

term

L1

(xk|k− , xk−|k−

)=

1

2

(xk−|k− −ΨF⊤

k−:kQ−1k−:kxk|k−

)⊤

Ψ−1(xk−|k− −ΨF⊤

k−:kQ−1k−:kxk|k−

),

which is quadratic in xk|k− . Subtracting L1


)from L gives the term

L2

(xk|k−

)=

1

2

(

x⊤k|k−(Q−1k−:k − Q−1

k−:kFk−:kΨF⊤k−:kQ

−1k−:k

)xk|k−

)

, (A.4)

which is quadratic in xk|k− and not dependent on xk−|k− . Applying the inversion lemma

presented in [40] to equation (A.4) and letting Pk|k− =(Fk−:kPk|k−F

⊤k−:k + Qk−:k

)gives

L2

(xk|k−

)=

1

2x⊤k|k−P

−1k|k− xk|k− .

Substituting in L1


)and L2

(xk|k−

)for L into equation (A.2) and pulling

out L2

(xk|k−

)from the integral since it is not dependent on xk−|k− yields

p (xk | ϵk− , Z0:k−) ≈ η exp(−L2

(xk|k−

))∫

xk−|k−

exp(−L1


))dxk−|k−

= η exp(−L2

(xk|k−

)),

where the term in the integral integrates to a constant that is absorbed by the normalizing

coefficient η as discussed in [40].

209

Recalling from Lemma 6.2.1 that xk|k− = f(xk−|k− , 0, tk−:k

)and noting that xk|k− =

LogGxI

(

x−1k|k−xk

)

, we get that p (xk | ϵk− , Z0:k−) ≈ N(xk|k− , Pk|k−

).

A.2 Proof of Lemma 6.4.1: Association Events

Proof. The probability of an association event conditioned on the measurements Z0:k and

that the track represents the target is inferred using the location of the validated measure-

ments with respect to the track’s estimated measurement zk = h(xk|k− , 0

)and the number of

validated measurements. The basic idea is that the closer a validated measurement is to the

estimated measurement relative to how close the other validated measurements are to the

estimated measurement, the more likely the validate measurement is the target-originated

measurement relative to the other validated measurements, and therefore the more likely its

respective association event it.

The probability of an association event conditioned on the measurements and the

track representing the target is

p (θk,j | ϵk, Z0:k) = p (θk,j | mk, Zk, ϵk, Z0:k−) (A.5)

where we have separated Zk and Z0:k− from Z0:k and explicitly written the inference on the

number of validated measurements mk. Using Bayes’ rule, the probability in equation (A.5)

is

p (θk,j | ϵk, Z0:k) =p (Zk | θk,j,mk, ϵk, Z0:k−) p (θk,j | mk, ϵk, Z0:k−)

p (Zk | mk, ϵk, Z0:k−)(A.6)

where

p (Zk | mk, ϵk, Z0:k−) =

mk∑

j=0

(p (Zk | θk,j,mk, ϵk, Z0:k−) p (θk,j | mk, ϵk, Z0:k−)) . (A.7)

210

Since the validated measurements are independent, the joint density of the validated

measurements is

p (Zk | θk,j,mk, ϵk, Z0:k−) =

mk∏

ℓ=1

p (zk,ℓ | θk,j, ϵk, Z0:k−) . (A.8)

Since the false measurements are assumed uniformly distributed in the validation

region with volume Vk, the probability of a false measurement in the validation region is

V−1k . Also, since the target originated measurement is validated with probability PG, the

probability of a target originated measurement being validated is P−1G p (zkℓ | ψ, ϵk, Z0:k−),

which is defined in equations (6.16) (6.20) and (6.21). Therefore, the probability of each

measurement conditioned on the respective association event is

p (zk,ℓ | θk,j, ϵk, Z0:k−) =

V−1k if j = ℓ

P−1G p (zk,ℓ | ψ, ϵk, Z0:k−) if j = ℓ

. (A.9)

Let m−k ≜ mk − 1, and M = 1, . . . ,mk. Substituting equation (A.9) into equation

(A.8) yields the joint probability of the measurements

p (Zk | θk,j,mk, ϵk, Z0:k−) =

V−m−k

k P−1G p (zk,j | ψ, ϵk, Z0:k−) if j ∈ M

V−mkk if j = 0

. (A.10)

We now proceed to calculate p (θk,j | mk, ϵk, Z0:k−). Let ϕ denote the number of false

measurements. Under the assumption that there is at most one target originated measure-

ment, there are two possibilities for the number of false measurements, ϕ = mk, denoted

ϕmk, or ϕ = m−

k denoted ϕm−k. Therefore, the a priori probability of an association event

conditioned on the number of measurements and previous measurements is

p (θk,j | mk, ϵk, Z0:k−) = p(

θk,j | ϕm−k,mk, ϵk, Z0:k−

)

p(

ϕm−k| mk, ϵk, Z0:k−

)

+ p (θk,j | ϕmk,mk, ϵk, Z0:k−) p (ϕmk

| mk, ϵk, Z0:k−)

211

=

(1mk

)

p(


)

j = 1, . . . ,mk

p (ϕmk| mk, ϵk, Z0:k−) j = 0,

(A.11)

where the probability p(


)

= 1mk

when j > 0 since each association

event, θk,j>0, is just a likely to be true with the specified conditions, and the probability

p(


)

= 0 when j = 0 since one validated measurement originated

from the target; thus, not all of the measurements are false.

Using Bayes’ formula, the conditional probabilities of the number of false measure-

ments are

p(


)

=p(

mk | ϕm−k, ϵk, Z0:k−

)

p(

ϕm−k| ϵk, Z0:k−

)

p (mk | Z0:k−)

=PGPDµF

(m−k

)

p (mk | ϵk, Z0:k−)(A.12)

and

p (ϕmk| mk, ϵk, Z0:k−) =

p (mk | ϕmk, ϵk, Z0:k−) p (ϕmk

| Z0:k−)

p (mk | ϵk, Z0:k−)

=(1− PGPD)µF (mk)

p (mk | ϵk, Z0:k−). (A.13)

where µF is the probability density function of the number of false measurements, and

p(


)

= PGPD since p(


)

is the probability that the tar-

get is detected and the target originated measurement is inside the validation region.

According to the theorem of total probability

p (mk | ϵk, Z0:k−) = p(

mk | ϕm−k, Z0:k−

)

p(

ϕm−k| ϵk, Z0:k−

)

+ p (mk | ϕmk, ϵk, Z0:k−) p (ϕmk

| ϵk, Z0:k−)

= PGPDµF(m−k

)+ (1− PGPD)µF (mk) . (A.14)

212

Substituting equations (A.12), (A.13), and (A.14) into equation (A.11) yields

p (θk,j | mk, ϵk, Z0:k−) =

1mk

PDPGµF (m−k )

PGPDµF (m−k )+(1−PGPD)µF (mk)

j ∈ M

(1−PGPD)µF (mk)

PGPDµF (m−k )+(1−PGPD)µF (mk)

j = 0

, (A.15)

where M = 1, 2, . . . ,mk is the indexing set of track associated measurements.

Substituting equations (A.10) and (A.11), into equation (A.7) yields

p (Zk | mk, ϵk, Z0:k−) =

V−m−

kk

mkPDµF

(m−k

)∑mk

ℓ=1 p (zk,ℓ | ψ, ϵk, Z0:k−)

PDPGµF(m−k

)+ (1− PDPG)µF (mk)

+V−mkk (1− PDPG)µF (mk)

PDPGµF(m−k

)+ (1− PDPG)µF (mk)

. (A.16)

Substituting equations (A.10), (A.11) and (A.16) into (A.6) yields

p (θk,j | ϵk, Z0:k) =

PDp(zk,j |ψ,ϵk,Z0:k−)PD

∑mkℓ=1 p(zk,j |ψ,ϵk,Z0:k−)+mkV

−1k (1−PDPG)

µF (mk)

µF (m−k )

j ∈ M

mkV−1k (1−PDPG)

µF (mk)

µF (m−k )

PD∑mk

ℓ=1 p(zk,j |ψ,ϵk,Z0:k−)+mkV−1k (1−PDPG)

µF (mk)

µF (m−k )

j = 0

(A.17)

Setting the probability density function of false measurements µF in equation (A.17)

to a Poisson density function defined in equation (6.2) yields equation (6.22) in Lemma 6.2.2.

A.3 Proof of Lemma 6.4.2: Split Track Update

Proof. Using Bayes rule,

p (xk,j | θk,j, ϵk, Z0:k) =p (Zk|θk,j, xk, ϵk, Z0:k−) p (xk | ϵk, Z0:k−)

p (Zk | θk,j, ϵk, Z0:k−). (A.18)

213

We solve for the probability p (xk,j | θk,j, ϵk, Z0:k) by using the maximum a posterior (MAP)

optimization algorithm to find the value of xk|k− and its corresponding error covariance Pk|k−

that maximizes the right hand side of equation (A.18).

Since the MAP does not depend on p (Zk | θk,j, ϵk, Z0:k−), it can be absorbed into the

normalizing coefficient simplifying the problem to

maxxk|k− ,Pk|k−

ηp (Zk|θk,j, xk, ϵk, Z0:k−) p (xk | ϵk, Z0:k−) .

In the case θk,j=0, none of the validated measurements are target originated; thus,

p(Zk|θk,j=0, xk, ϵk, Z0|k−

)simplifies to p (Zk|θk,j=0, Z0:k−) and no longer has any dependency

on the track’s state. This reduces the MAP optimization problem to

maxxk|k− ,Pk|k−

ηp (xk | ϵk, Z0:k−) ;

hence, the solution is p (xk,j=0 | θk,j=0, ϵk, Z0:k) = p (xk | ϵk, Z0:k−).

In the case θk,j>0, only the measurement zk,j has any dependency and influence on

the track’s state since all other measurements are false. Using this fact with the relation in

equation (A.9), the optimization problem simplifies to

maxxk|k− ,Pk|k−

ηP−1G p (zk,j|ψ, xk, ϵk, Z0:k−) p (xk | ϵk, Z0:k−) .

Multiplying the probabilities p(zk,α | ψ, xk, ϵk, ZS

0:k−

)and p

(xk | ϵk, ZS

0:k−

)defined in

equations (7.11) and (7.10) together, combining their exponents and dropping the terms P sℓG

since they have no impact on the optimization, simplifies the MAP optimization problem to

maxxk|k− ,Pk|k−

ηP−1G exp (−L) ,

where

L =1

2

(νk,j −Hkxk|k−

)⊤R−1k

(νk,j −Hkxk|k−

)+

1

2x⊤k|k−P

−1k|k− xk|k−

νk,j = LogGsI

(z−1k zk

)

214

zk = h(xk|k− , 0

)

Rk = VkRV⊤k ,

xk|k− = LogGxI

(

x−1k|k−xk

)

, h is the observation function defined in equation (4.1) and Hk and

Vk are the Jacobians of the observation function defined in equation (4.5) and evaluated at

ζhk =(xk|k− , 0

).

The MAP is solved by finding the value of xk|k− that minimizes L. Since L is quadratic

in xk|k− , the value of xk|k− that minimizes L is found by taking the first derivative of L with

respect to to xk|k− , setting it to zero and solving for xk|k− . This value becomes the new error

state mean µ−k|k. The corresponding covariance is found by taking the second derivative of

L with respect to xk|k− and setting this value to the new covariance P c−

k|k.

Taking the first and second partial derivatives of L with respect to xk|k− yields

∂L

∂xk|k−= −

(νk,j −Hkxk|k−

)⊤R−1k Hk + x⊤k|k−P

−1k|k−

∂2L

∂x2k|k−= H⊤

k R−1k Hk + P−1

k|k− =(

P c−

k|k

)−1

.

Setting the first derivative to zero, solving for xk|k− and setting this value to µ−k|k gives

µ−k|k =

(

H⊤k R

−1k Hk + P−1

k|k−

)−1

H⊤k R

−1k νk,j.

With algebraic manipulation the updated error covariance and error state mean are

P c−

k|k = (I −KkHk)Pk|k−

µ−k|k,j = Kkνk,j,

where the Kalman gain Kk and innovation term νk,j are

Kk = Pk|k−H⊤k Sk

νk,j = LogGsI

(z−1k zk,j

)

zk = h(xk|k− , 0

)

215

Sk = VkRV⊤k +HkPk|k−H

⊤k .

Since the error state’s mean is no longer zero, the error state no longer has a concen-

trated Gaussian distribution. In order to reset the mean of the error state to zero, we add

µ−k|k,j onto the state estimate xk|k− and adjust the covariance of the error state. In partic-

ular, let the error state after update but before being reset be x−k|k,j = µ−k|k,j + ak|k where

ak|k ∼ N(

0, P c−

k|k

)

, then under the assumption that ak|k is small and using the property of

the right Jacobian defined in equation (3.7) we add µ−k|k,j onto xk|k− as follows:


(

µ−k|k,j + ak|k

)

(A.19a)

= xk|k−ExpGxI

(

µ−k|k,j

)

︸︷︷︸

xk|k,j

ExpGxI

JGxr

(

µ−k|k,j

)

ak|k︸︷︷︸

xk|k,j

, (A.19b)

where

xk|k,j = xk|k−ExpGxI

(

µ−k|k,j

)

, (A.20)

is the updated state estimate, and xk|k,j = JGxr

(

µ−k|k,j

)

ak|k is the updated error state after

reset. Equation (A.20) can be thought of as moving from xk|k− to xk|k,j along the geodesic

defined by the tangent vector µ−k|k,j as depicted in Fig. A.1.

The error covariance of the error state xk|k,j is

cov(xk|k,j

)= cov

(

JGxr

(

µ−k|k,j

)

ak|k

)

= JGxr

(

µ−k|k,j

)

cov(ak|k)JGxr

(

µ−k|k,j

)⊤

= JGxr

(

µ−k|k,j

)

P c−

k|kJGxr

(

µ−k|k,j

)⊤

= P ck|k;

therefore, xk|k,j ∼ N(

µk|k,j = 0, P ck|k,j

)

.

216

Figure A.1: A depiction of the state estimate update conditioned on θk,j by using µ−k|k,j to form a

geodesic from xk|k− to xk|k,j .

A.4 Proof of Lemma 6.4.3: Track Likelihood

Proof. The probability of the track likelihood conditioned on the measurements Z0:k are

inferred using the location of the validated measurements with respect to the track’s esti-

mated measurement zk = h(xk|k− , 0

)and the number of validated measurements. The basic

idea is that the more likely at least one of the validated measurements is target originated

compared to the likelihood that none of the validated measurements is target originated, the

more likely the track represents a target. Also, if the number of validated measurements

is more than the expected number of false measurements inside the validation region, then

the more likely one of the validated measurements originated from the target, and the track

represents the target.

We write the track likelihood conditioned on the measurements as

p (ϵk | Z0:k) = p (ϵk | mk, Zk, Z0:k−) (A.21)

where we have separated Zk and Z0:k− from Z0:k and explicitly written the inference on the

number of validated measurements mk. Using Bayes’ rule, equation (A.21) can be written

as

p (ϵk, | Z0:k) =p (Zk | ϵk,mk, Z0:k−) p (ϵk | mk, Z0:k−)

p (Zk | mk, Z0:k−), (A.22)

217

where

p (Zk | mk, Z0:k−) = p (Zk | ϵk,mk, Z0:k−) p (ϵk | mk, Z0:k−)

+ p (Zk | ϵk = F,mk, Z0:k−) p (ϵk = F | mk, Z0:k−) , (A.23)

and ϵk = F denotes that the track does not represent the target.

The probability p (Zk | mk, ϵk, Z0:k−) is derived in Appendix A.2 and defined in equa-

tion (A.16). The probability p (Zk | ϵk = F,mk, Z0:k−) is

p (Zk | ϵk = F,mk, Z0:k−) =

mk∏

j=1

p (zk,j | ϵk = F,mk, Z0:k−)

=

mk∏

j=1

V−1k

= V−mkk , (A.24)

because under the condition that the track does not represent the target, all of the vali-

dated measurements must be false, and the validated false measurements are assumed to be

independent and uniformly distributed in the validation region with volume Vk.Using Bayes’ rule we get

p (ϵk | mk, Z0:k−) =p (mk | ϵk, Z0:k−) p (ϵk |, Z0:k−)

p (mk | Z0:k−)(A.25)

and

p (ϵk = F | mk, Z0:k−) =p (mk | ϵk = F,Z0:k−) p (ϵk = F | Z0:k−)

p (mk | Z0:k−)(A.26)

The probability p (mk | ϵk, Z0:k−) is derived in Appendix A.2 and defined in equation

(A.14). The probability p (mk | ϵk = F,Z0:k−) is

p (mk | ϵk = F,Z0:k−) = µF (mk) , (A.27)

218

since all of the measurements are false under the condition that the track does not represent

a target.

Using the theorem of total probability with equations (A.14) and (A.27), the prob-

ability of the number of measurements mk conditioned on the previous measurements is

p (mk | Z0:k−) = p (mk | ϵk, Z0:k−)PT + p (mk | ϵk = F,Z0:k−) p(ϵk|k− = F

)(A.28a)

= PGPDµF(m−k

)PT +

(1− PDPGp

(ϵk|k−

))µF (mk) , (A.28b)

where PT= p (ϵk | Z0:k−).

Substituting equations (A.14) and (A.28) into equation (A.25) yields

p (ϵk | mk, Z0:k−) =

(PGPDµF

(m−k

)+ (1− PGPD)µF (mk)

)PT

PGPDµF(m−k

)PT + (1− PDPGPT )µF (mk)

. (A.29)

Using the fact that

p (ϵk = F | mk, Z0:k−) = 1− p (ϵk | mk, Z0:k−)

and substituting the equations (A.29), (A.24), and (A.16) into equation (A.23) yields the

probability of the validated measurements conditioned on the previous measurements

p (Zk | mk, Z0:k−) =

V−m−

kk

mkPDµF

(m−k

)∑mk

ℓ=1 p (zk,j | ψ, ϵk, Z0:k−)PT

PGPDµF(m−k


+V−mkk (1− PDPGPT )µF (mk)

PGPDµF(m−k


. (A.30)

Substituting in equations (A.16), (A.29), and (A.30) into equation (A.22) and setting

the density of false measurements to the Poisson distribution defined in equation (6.2) yields

the probability of the track likelihood conditioned on the previous measurements given in

equations (6.36) and (6.37).

219

APPENDIX B. PROOFS FOR THE MS-LG-IPDAF

B.1 Proof of Lemma 7.3.2

Proof. Using Bayes rule,


0:k

)=p(ZSk |θk,α, xk, ϵk, ZS

0:k−

)p(xk | ϵk, ZS

0:k−

)

p(ZSk | θk,α, ϵk, ZS

0:k−

) . (B.1)

We solve for the probability p(xk,α | θk,α, ϵk, ZS

0:k

)by using the maximum a poste-

rior (MAP) optimization algorithm to find the value of xk|k− and its corresponding error

covariance Pk|k− that maximizes the right hand side of equation (B.1).

Since the MAP does not depend on p(ZSk | θk,α, ϵk, ZS

0:k−

), it can be absorbed into

the normalizing coefficient simplifying the problem to

maxxk|k− ,Pk|k−

ηp(ZSk |θk,α, xk, ϵk, ZS

0:k−

)p(xk | ϵk, ZS

0:k−

).

In the case θk,α=0, none of the track-associated measurements are target originated;

thus, p(ZSk |θk,α, xk, ϵk, ZS

0:k−

)simplifies to p

(ZSk |θk,α, ZS

0:k−

)and no longer has any depen-

dency on the track’s state. This reduces the MAP optimization problem to

maxxk|k− ,Pk|k−

ηp(xk | ϵk, ZS

0:k−

);

hence, the solution is p(xk,α=0 | θk,α=0, ϵk, Z

S0:k

)= p

(xk | ϵk, ZS

0:k−

).

In the case θk,α =0, only the measurement zk,α has any dependency and influence on

the track’s state since all other measurements are false according to the association event.

220

Using this fact with the relation in equation (7.11), the optimization problem simplifies to

maxxk|k− ,Pk|k−

[∏

sℓ∈Sα

(P sℓG )−1

]

ηp(zk,α|ψ, xk, ϵk, ZS

0:k−

)p(xk | ϵk, ZS

0:k−

).

Multiplying the probabilities p(zk,α | ψ, xk, ϵk, ZS

0:k−

), defined in equation (7.11), and

p(xk | ϵk, ZS

0:k−

), defined in equation (7.10), together, combining their exponents, and drop-

ping the terms P sℓG since they have no impact on the optimization, simplifies the MAP

optimization problem to

maxxk|k− ,Pk|k−

η exp (−L) ,

where

L =1

2

(νk,α −HSα

k xk|k−)⊤ (

RSαk

)−1 (νk,α −HSα

k xk|k−)+

1

2x⊤k|k−P

−1k|k− xk|k−

νk,α = LogGSαI

(z−1k,αzk,α

)


)

RSαk = V Sα

k RSα(V Sαk

)⊤,

xk|k− = LogGxI

(

x−1k|k−xk

)

is the track’s error state, hSα is the augmented observation function

defined in equation (7.5), RSα is the augmented measurement noise covariance defined in

equation (7.6), and HSαk and V Sα

k are the Jacobians of the augmented observation function

defined in equation (7.9) and evaluated at ζhSαk=(xk|k− , 0

).

The MAP is solved by finding the value of xk|k− that minimizes L. Since L is quadratic

in xk|k− , the value of xk|k− that minimizes L is found by taking the first derivative of L with

respect to to xk|k− , setting it to zero and solving for xk|k− . This value becomes the new error

state mean µ−k|k,α. The corresponding covariance is found by taking the second derivative of

L with respect to xk|k− and setting this value to the new covariance P−k|k,α.

Taking the first and second partial derivatives of L with respect to xk|k− yields

∂L

∂xk|k−= −

(

νk,α −HSαk xk|k−

)⊤ (

RSαk

)−1HSαk + x⊤k|k−P

−1k|k−

∂2L

∂x2k|k−

=(

HSαk

)⊤ (

RSαk

)−1HSαk + P−1

k|k−=(

P−k|k,α

)−1.

221

Setting the first derivative to zero, solving for xk|k− and setting this value to µ−k|k,α

gives

µ−k|k,α = Kk,ανk,α,

where Kk,α =(

P−k|k,α

)−1 (HSαk

)⊤ (RSαk

)−1

Since the error state’s mean is no longer zero, the error state no longer has a con-

centrated Gaussian distribution. In order to reset the mean of the error state to zero, we

add µ−k|k,α onto the state estimate xk|k− and adjust the covariance of the error state. In

particular, let the error state after update but before being reset be x−k|k,α = µ−k|k,α + ak|k,α

where ak|k,α ∼ N(

0, P−k|k,α

)

, then under the assumption that ak|k,α is small and using the

property of the right Jacobian defined in equation (3.7) we add µ−k|k,α onto xk|k− as follows:

xk = xk|k−ExpGxI

(

µ−k|k,α + ak|k,α

)

(B.2a)

= xk|k−ExpGxI

(

µ−k|k,α

)

︸︷︷︸

xk|k,α

ExpGxI

JGxr

(

µ−k|k,α

)

ak|k,α︸︷︷︸

xk|k,α

, (B.2b)

where

xk|k,α = xk|k−ExpGxI

(

µ−k|k,α

)

, (B.3)

is the updated state estimate, and xk|k,α = JGxr

(

µ−k|k,α

)

ak|k,α is the updated error state after

reset.

The error covariance of the error state xk|k,α is

cov(xk|k,α

)= cov

(

JGxr

(

µ−k|k,α

)

ak|k,α

)

= JGxr

(

µ−k|k,α

)

cov(ak|k,α

)JGxr

(

µ−k|k,α

)⊤

= JGxr

(

µ−k|k,α

)

P−k|k,αJ

Gxr

(

µ−k|k,α

)⊤

= Pk|k,α;

therefore, xk|k,α ∼ N(µk|k,j = 0, Pk|k,α

)and xk ∼ N

(xk|k, Pk|k,α

).

222


Proof. Using the smoothing property of conditional expectations, the mean of

p(x−k | ϵk, ZS

0:k

)is

µ−k|k = E

[

E[

x−k|k,α|θk,α, ϵk]]

(B.4a)

=∑

α∈A

E[

x−k|k,α|θk,α, ϵk]

βk,α (B.4b)

=∑

α∈A

µ−k|k,αβk,α (B.4c)

By the definition of the covariance, the covariance of x−k|k is

P−k|k =

∫ (


)(


)⊤

p(x−k | ϵk, ZS

0:k

)dx−k|k

Using equation (7.23), the covariance can be written as

P−k|k =

∑

α∈A

βk,α

[∫ (


)(


)⊤p(x−k | θk,α, ϵk, ZS

0:k

)dx−k|k

]

=∑

α∈A

βk,α

[∫ ((

x−k|k − µ−k|k,α

)(

x−k|k − µ−k|k,α

)⊤

(

µ−k|k,α − µ−

k|k

)(


k|k

)⊤)

p(x−k | θk,α, ϵk, ZS

0:k

)dx−k|k

]

=∑

α∈A

βk,αP−k|k,α +

∑

α∈A

βk,α

(


k|k

)(


k|k

)⊤.

Noting that∑

α∈A βk,α = 1, using the definition of µ−k|k and µ−

k|k,α from equations

(B.4) and (7.16) and with algebraic manipulations we get equation 7.25.


Proof. The probability of the current track likelihood conditioned on all the validated mea-

surements is

p(ϵk | ZS

0:k

)= p

(ϵk | mS

k , ZS0:k− , Z

Sk

), (B.6)

223

where we have separated ZSk and ZS

0:k− from ZS0:k and explicitly written the inference on the

number of validated measurements from each sensor denoted mSk = msℓ

k sℓ∈Sα . Using Bayes’rule

p(ϵk | ZS

0:k

)=p(ZSk | ϵk,mS

k , ZS0:k−

)p(ϵk | mS

k , ZS0:k−

)

p(ZSk | mS

k , ZS0:k−

) , (B.7)

where

p(ZSk | mS

k , ZS0:k−

)= p

(ZSk | ϵk,mS

k , ZS0:k−

)p(ϵk | mS

k , ZS0:k−

)

+p(ZSk | ϵk = F,mS

k , ZS0:k−

)p(ϵk = F | mS

k , ZS0:k−

), (B.8)

and ϵk = F denotes that the track does not represent the target.

The probability p(ZSk | ϵk,mS

k , ZS0:k−

)is equal to

p(ZSk | ϵk,mS

k , ZS0:k−

)=∏

sℓ∈Sk

p(Zsℓk | ϵk,msℓ

k , ZS0:k−

), (B.9)

since the number of measurements from one sensor has no impact on the measurements from

a different sensor due to the sensors being independent. Let msℓ−

k

= msℓ

k −1. The probability


k , ZS0:k−

)is derived in Lemma 6.4.3 and is


k , ZS0:k−

)=

(Vsℓk )

−msℓ

−

k

msℓk

P sℓD µ

sℓF

(

msℓ−

k

)∑m

sℓk

j=1 p(zsℓk,j | ψ, ϵk, ZS

0:k−

)

P sℓD P

sℓG µ

sℓF

(

msℓ−

k

)

+ (1− P sℓD P

sℓG )µsℓ

F (msℓk )

+(Vsℓ

k )−msℓk (1− P sℓ

D PsℓG )µsℓ

F (msℓk )

P sℓD P

sℓG µ

sℓF

(

msℓ−

k

)

+ (1− P sℓD P

sℓG )µsℓ

F (msℓk ), (B.10)

where Vsℓk is the volume of the validation region for sensor sℓ, and µ

sℓF denotes the Poisson

density function of the number of false measurements for sensor sℓ defined in equation (7.4).

The probability p(ZSk | ϵk = F,mS

k , ZS0:k−

)is the probability that all of the validated

measurements are false since the track the measurements were associated with does not

represent the target as indicated by the condition ϵk = F . False measurements are assumed

independent and uniformly distributed, so the probability of a false measurement from sensor

224

sℓ inside the validation region of volume Vsℓk is (Vsℓ

k )−1. Therefore,

p(ZSk | ϵk = F,mS

k , ZS0:k−

)=∏

sℓ∈Sk

(Vsℓk )−m

sℓk (B.11)

Next we proceed to calculate the probabilities p(ϵk | mS

k , ZS0:k−

)and

p(ϵk = F | mS

k , ZS0:k−

). Using Bayes’ rule

p(ϵk | mS

k , ZS0:k−

)=p(mSk | ϵk, ZS

0:k−

)p(ϵk | ZS

0:k−

)

p(mSk | ZS

0:k−

) (B.12a)

p(ϵk = F | mS

k , ZS0:k−

)=p(mSk | ϵk = F,ZS

0:k−

)p(ϵk = F | ZS

0:k−

)

p(mSk , Z

S0:k−

) . (B.12b)

Since the sensors are independent, the number of measurements from each sensor is inde-

pendent from the other sensors. Therefore, the above equations simplify to

p(ϵk | mS

k , ZS0:k−

)=

p(ϵk | ZS

0:k−

)∏

sℓ∈Skp(m

sℓk | ϵk, ZS

0:k−

)

∏

sℓ∈Skp(m

sℓk | ZS

0:k−

) (B.13a)

p(ϵk = F | mS

k , ZS0:k−

)=

(1− p

(ϵk | ZS

0:k−

))∏

sℓ∈Skp(m

sℓk | ϵk = F,ZS

0:k−

)

∏

sℓ∈Skp(m

sℓk | ZS

0:k−

) , (B.13b)

where we used the fact p(ϵk = F | ZS

0:k−

)=(1− p

(ϵk | ZS

0:k−

)).

The probability p(msℓk | ϵk, ZS

0:k−

)is derived in Lemma 6.4.3 and is

p(msℓk | ϵk, ZS

0:k−

)= P sℓ

G PsℓD µ

sℓF

(

msℓ−

k

)

+ (1− P sℓG P

sℓD )µsℓ

F (msℓk ) . (B.14)

The probability p(msℓk | ϵk = F,ZS

0:k−

)is the probability of the number of measure-

ments provided that the track does not represent the target. Thus, it is the probability that

all of the measurements are false.

p(msℓk | ϵk = F,ZS

0:k−

)= µsℓ

F (msℓk ) . (B.15)

We will not solve for p(msℓk | ZS

0:k−

)since it is not needed.

225

Substituting in equations (B.8), (B.10), (B.11) and (B.13) into (B.7) yields

p(ϵk | ZS

0:k

)=

p(ϵk | ZS

0:k−

)∏

sℓ∈skp(Z

sℓk | ϵk,msℓ

k , ZS0:k−

)p(m

sℓk | ϵk, ZS

0:k−

)

c, (B.16)

where

c =p(ϵk | ZS

0:k−

) ∏

sℓ∈sk


k , ZS0:k−

)p(msℓk | ϵk, ZS

0:k−

)

+(1− p

(ϵk | ZS

0:k−

)) ∏

sℓ∈sk

(p(Zsℓk | ϵk = F,msℓ

k , ZS0:k−

)p(msℓk | ϵk = F,ZS

0:k−

))(B.17)

Substituting in equations (7.4), (B.10), (B.11), (B.14), and (B.15) yields equation (7.28).

226

APPENDIX C. COMMON LIE GROUPS

This appendix presents the Lie groups used in this dissertation.

C.1 Rn

The Euclidean space of n-dimension, Rn, represents points in n-dimensional space and

also represents translations in n-dimensional space. Its Lie algebra represents all translational

velocities in n-dimensional space. The Lie group Rn is the set

Rn ≜

[a1, a2, . . . , an]⊤ | ai ∈ R

equipped with matrix addition. Let a, b, c ∈ Rn, the group operation and axioms are defined

as follows:

a+ b = [a1 + b1, a2 + b2, . . . , an + bn]⊤ group operation

a+ (b+ c) = (a+ b) + c associativity

a−1 = [−a1,−a2, . . . ,−an]⊤ inverse

I = [0, 0, . . . , 0]⊤ identity element

The corresponding Lie algebra is Rn, and the corresponding Cartesian algebraic space

is RRn ≜ Rn. The wedge, ·∧ : RRn → Rn, and vee, ·∨ : Rn → RRn maps are the identity

functions.

227

C.1.1 Adjoint

The matrix adjoint of g ∈ Rn and v ∈ RRn are

AdRn

g = In×n

adRn

v = 0n×n,

where 0n×n is the n× n zero matrix.

C.1.2 Exponential Map and Jacobians

Let g ∈ Rn and v ∈ RRn . The exponential and logarithm maps are defined as

ExpRn

I (v) = v (C.1)

LogRn

I (g) = g. (C.2)

The left and right Jacobians and their inverses are defined as

JRn

r (v) = In×n

JRn

l (v) = In×n

JRn−1

r (v) = In×n

JRn−1

l (v) = In×n,

where In×n is the n× n identity matrix.

C.2 SO(2)

The special orthogonal group of two dimensions, SO (2), represents all rotations and

orientations in two dimensional space. Its Lie algebra represents all angular velocities in two

dimensional space. The matrix Lie group SO (2) is the set

SO (2) ≜R ∈ R2×2 | RR⊤ = I2×2 and det (R) = 1

228

equipped with matrix multiplication. The inverse operation is the matrix inverse and the

identity element is I2×2.

The corresponding Lie algebra is the set

so (2) ≜[ω]1× | ω ∈ R

,

where

[ω]1× =

0 −ωω 0

(C.3)

is the skew symmetric operator. The corresponding Cartesian algebraic space is the set

RSO(2) ≜ ω ∈ R .

The wedge, ·∧ : RSO(2) → so (2), and vee, ·∨ : so (2) → RSO(2) maps are defined as

ω∧ = [ω]1×([ω]1×

)∨= ω.

C.2.1 Adjoint

The matrix adjoint of R ∈ SO (2) and ω ∈ RSO(2) is

AdSO(2)R = I1×1 (C.4)

adSO(2)ω = 01×1, (C.5)

where In×n is the n× n identity matrix, and 0n×n is the n× n zero matrix.

229


Let R =

R11 R12

R21 R22

∈ SO (2) and ω ∈ RSO(2). The exponential and logarithm maps

are defined as

ExpSO(2)I (ω) =

cos (ω) − sin (ω)

sin (ω) cos (ω)

(C.6)

LogSO(2)I (R) = arctan2 (R21, R11) . (C.7)


JSO(2)r (ω) = I1×1 (C.8)

JSO(2)l (ω) = I1×1 (C.9)

JSO2−1

r (ω) = I1×1 (C.10)

JSO2−1

l (ω) = I1×1. (C.11)

C.3 SO(3)

The special orthogonal group of three dimensions, SO (3), is a matrix Lie group used

to describe rotations and orientation in three dimensional space, and its Lie algebra is used

to describe angular velocities in three dimensional space. The Lie group SO (3) is the set

SO (3) ≜R ∈ R3×3 | RR⊤ = I3×3 and det (R) = 1

,



The Lie algebra of SO (3) is

so (3) ≜[ω]3× ∈ R3×3 | ω ∈ R3

,

230

where ω = [ωx, ωy, ωz]⊤ and

[ω]3× =

0 −ωz ωy

ωz 0 −ωx−ωy ωx 0

(C.12)

is the three dimensional skew symmetric operator. The corresponding Cartesian algebraic

space is

RSO(3) ≜ω ∈ R3

.

The wedge, ·∧ : RSO(3) → so (3), and vee, ·∨ : so (3) → RSO(3) maps are defined as

ω∧ = [ω]3×([ω]3×

)∨= ω

C.3.1 Adjoint

The matrix adjoint of R ∈ SO (3) and ω ∈ RSO(3) is

AdSO(3)R = R (C.13)

adSO(3)ω = [ω]3× . (C.14)


Let R ∈ SO (3) and ω ∈ RSO(3). The element ω can be written as ω = θu where

θ = ∥ω∥ and u = ω∥ω∥

. The exponential and logarithm maps are defined as

ExpSO(3)I (ω) =

I + sin (θ) [u]3× + (1− cos (θ))([u]3×

)2if θ = 0

I3×3 else

(C.15)

LogSO(3)I (R) =

03×1 if R = I3×3

φ (R)(R−R⊤)

∨

2 sin(φ(R))else

(C.16)

231

where

φ (R) = arccos

(Tr (R)− 1

2

)

,

and Tr (R) is the trace of R.


JSO(3)r (ω) = I +

cos (θ)− 1

θ[u]3× +

θ − sin (θ)

θ

([u]3×

)2(C.17)

JSO(3)l (ω) = I +

1− cos (θ)

θ[u]3× +

θ − sin (θ)

θ

([u]3×

)2(C.18)

JSO(3)−1

r (ω) = I +1

2[ω]3× − θ cot

(θ2

)− 2

2θ2([ω]3×

)2(C.19)

JSO(3)−1

l (ω) = I − 1

2[ω]3× − θ cot

(θ2

)− 2

2θ2([ω]3×

)2. (C.20)

C.4 SE(2)

The special Euclidean group of two dimensions, SE (2), represents all rotations and

translations in two dimensional space. It can also represents all orientations and positions

in two dimensional space. Its Lie algebra represents all angular and translational velocities

in two dimensional space. The Lie group SE (2) is the set

SE (2) =

R p

01×2 1

∈ R3×3

∣∣∣∣∣∣

R ∈ SO (2) and p ∈ R2



The Lie algebra is

se (2) =

[ω]1× ρ

0 0

∈ R3×3

∣∣∣∣∣∣

ω ∈ RSO(2) and ρ ∈ R2

,

232

where [·]1× is the skew symmetric operator defined in equation (C.3). The corresponding

Cartesian algebraic space is

RSE(2) ≜

ρ

ω

∈ R3

∣∣∣∣∣∣


.

The wedge, ·∧ : RSE(2) → se (2), and vee, ·∨ : se (2) → RSE(2) maps are defined as

[ω]1× ρ

0 0

∨

=

ρ

ω

ρ

ω

∧

=

[ω]1× ρ

0 0

.

C.4.1 Adjoint

The matrix adjoint of g ∈ SE (2) and v ∈ RSE(2) is

AdSE(2)g =

R − [1]1× p

01×2 1

adSE(2)v =

[ω]1× − [1]1× ρ

01×2 0

.


Let g ∈ SE (2) and v ∈ RSE(2). The exponential and logarithm maps are defined as

ExpSE(2)I (v) =

Exp

SO(2)I (ω) V (ω) ρ

01×2 1

(C.21)

LogSE(2)I (g) =

V(

LogSO(2)I (R)

)−1

p

LogSO(2)I (R)

, (C.22)

233


SO(2)I are defined in equations (C.6) and (C.7) and

V (ω) =sin (ω)

ωI2×2 +

1− cos (ω)

ω[1]1× . (C.23)


JSE(2)r (v) =

Wr (ω) Dr (ω) ρ

0 1

, (C.24)

JSE(2)l (v) =

Wl (ω) Dl (ω) ρ

0 1

(C.25)

JSE(2)−1

r (v) =

W−1r (ω) −W−1

r (ω)Dr (ω) ρ

0 1

(C.26)

JSE(2)−1

l (v) =

W−1l (ω) −W−1

l (ω)Dl (ω) ρ

0 1

. (C.27)

where

Wr (ω) =cos (ω)− 1

ω[1]× +

sin (ω)

ωI

Dr (ω) =1− cos (ω)

ω2[1]× +

ω − sin (ω)

ω2I

Wl (ω) =1− cos (ω)

ω[1]× +

sin (ω)

ωI

Dl (ω) =cos (ω)− 1

ω2[1]× +

ω − sin (ω)

ω2I,

C.5 SE(3)

The special Euclidean group of three dimensions, SE (3), represents all rotations and

translations in three dimensional space. It can also represents all orientations and positions

in three dimensional space. Its Lie algebra represents all angular and translational velocities

234

in three dimensional space. The Lie group SE (3) is the set

SE (3) ≜

R p

01×3 1

∈ R4×4

∣∣∣∣∣∣

R ∈ SO (3) and p ∈ R3

,



The Lie algebra of SE (3) is

se (3) ≜

[ω]3× ρ

0 0

∈ R4×4

∣∣∣∣∣∣

ω ∈ R3 and ρ ∈ R3

,

where [·]3× is the skew symmetric operator defined in equation (C.12). The corresponding

Cartesian algebraic space is

RSE(3) ≜

ρ

ω

∈ R6

∣∣∣∣∣∣

ω ∈ R3 and ρ ∈ R3

.

The wedge, ·∧ : RSE(3) → se (3), and vee, ·∨ : se (3) → RSE(3) maps are defined as

[ω]3× ρ

0 0

∨

=

ρ

ω

ρ

ω

∧

=

[ω]3× ρ

0 0

.

C.5.1 Adjoint

The matrix adjoint of g ∈ SE (3) and v ∈ RSE(3) is

AdSE(3)g =

R [ρ]3×R

03×3 R

235

adSE(3)v =

[ω]3× [ρ]3×

03×3 [ω]3×

.


Let g ∈ SE (3) and v ∈ RSE(3). The exponential and logarithm maps are defined as

ExpSE(3)I (v) =

Exp

SO(3)I (ω) J

SO(3)l (ω) ρ

01×3 1

(C.28)

LogSE(3)I (g) =

JSO(3)−1

l

(

LogSO(3)I (R)

)

p

LogSO(3)I (R)

, (C.29)


SO(3)I are defined in equations (C.15) and (C.16) and J

SO(3)l and

JSO(3)−1

l are defined in equations (C.18) and (C.20).


JSE(3)r (v) =

JSO(3)r (ω) Br (v)

0 JSO(3)r (ω)

(C.30)

JSE(3)l (v) =

JSO(3)l (ω) Bl (v)

0 JSO(3)l (ω)

(C.31)

JSE(3)−1

r (v) =

JSO(3)−1

r (ω) −JSO(3)−1

r (ω)Br (v) JSO(3)−1

r (ω)

0 JSO(3)−1

r (ω)

(C.32)

JSE(3)−1

l (v) =

JSO(3)−1

l (ω) −JSO(3)−1

l (ω)Bl (v) JSO(3)−1

l (ω)

0 JSO(3)−1

l (ω)

. (C.33)

where

θ = ∥ω∥

aθ =cos (θ)− 1

θ2

bθ =θ − sin (θ)

θ3

236

cθ = − 1

θ3sin (θ) + 2

(1− cos (θ)

θ4

)

dθ = − 2

θ4+

3

θ5sin (θ)− 1

θ4cos (θ)

qr (ω) =(ω⊤ρ

) (

dθ([ω]3×

)2+ cθ [ω]3×

)

ql (ω) =(ω⊤ρ

) (

dθ([ω]3×

)2 − cθ [ω]3×

)

Br (v) =(aθ [ρ]3× + bθ

([ω]3× [ρ]3× + [ρ]3× [ω]3×

)+ qr (ω)

)

Bl (v) =(−aθ [ρ]3× + bθ

([ω]3× [ρ]3× + [ρ]3× [ω]3×

)+ ql (ω)

),

JSO(3)r , J

SO(3)l , JSO(3)−1

r and JSO(3)−1

l are defined in equations (C.17), (C.18), (C.19) and (C.20),

and [·]3× is the skew symmetric matrix operator defined in equation (C.12).

237

APPENDIX D. GLOSSARY

cluster A grouping of neighboring measurements in time and space.

confirmed track A track that has a high probability of representing a target. Typically, a

confirmed track has a track likelihood above some user defined threshold.

consensus set The set of measurements associated to the track.

false measurement A measurement that doesn’t correspond to a target of interest. A false

measurement is often referred to as clutter.

measurement space The disjoint union of all possible observations given the suite of sen-

sors.

rejected track A track that has a sufficiently low probability of representing a target that

is removed and deleted. Typically, a rejected track has a track likelihood below some

user defined threshold.

sensor frame A frame that coincides with a sensor surveillance region in which the sensor’s

measurements are described.

sensor mode one The sensor configuration where the measurements are transformed into

the tracking frame before being given to G-MTT.

sensor mode two The sensor configuration where each sensor provides a transformation

in addition to the measurements that transforms a track’s state estimate from the

tracking frame to the sensor frame.

238

sensor model A mathematical description of the sensor that maps the target’s state and

measurement noise to a measurement.

sensor scan A sensor scan occurs when a sensor observes it’s local surveillance region and

encodes its observation in meaningful data from which measurements are extracted.

sensor surveillance region The subset of measurement space that is observable by a single

sensor.

surveillance region The union of every sensor surveillance region.

system model The combination of the target and sensor models.

target model A mathematical description of the target’s state and dynamics.

time window An interval of time extending into the past from the current time.

track Generically, a track can be thought of as a representation of a target. Specifically in

our work, we define a track to be a tuple consisting of a state estimate, error covariance,

consensus set, track likelihood, and a unique label.

track initialization The process of using measurements to construct new tracks.

track likelihood A statistical measure that indicates the probability that a track represents

a target.

tracking frame The frame in which the tracks are expressed.

true measurement A measurement that corresponds to a target of interest.

validation region A volume in measurement space centered on a track’s estimated mea-

surement, and serves as a gate during data association.

239

Documents

A Geometric Approach to Multiple Target Tracking Using Lie