Differential-algebraic equations and matrix-valued singular …276784/... · 2009-11-24 · ing. Martin Enqvist and Daniel Petersson contributed with outstandingly thorough proofreading

Linköping studies in science and technology. Dissertations.No. 1292

Differential-algebraic equations andmatrix-valued singular perturbation

Henrik Tidefelt

Department of Electrical EngineeringLinköping University, SE–581 83 Linköping, Sweden

Linköping 2009

Cover illustration: Stereo pair showing the entries of a sampled uncertain lti daeof nominal index 2 in its canonical form, displayed as A

E . The uniform distributionscorrespond to the intervals in table 7.3, signs were ignored, and values transformedby the map x 7→ 1

2

(|x| + |x|1/7

)to enhance resolution near 0 and 1. Sampled values

are encoded both as area of markers and as height in the image. Left eye’s view to theleft, right eye’s view to the right.

Linköping studies in science and technology. Dissertations.No. 1292

Differential-algebraic equations and matrix-valued singular perturbation

Henrik Tidefelt

[email protected] of Automatic Control

Department of Electrical EngineeringLinköping UniversitySE–581 83 Linköping

Sweden

ISBN 978-91-7393-479-4 ISSN 0345-7524

Copyright © 2009 Henrik Tidefelt

Printed by LiU-Tryck, Linköping, Sweden 2009

To Nina

λ@#

Abstract

With the arrival of modern component-based modeling tools for dynamic systems,the differential-algebraic equation form is increasing in popularity as it is generalenough to handle the resulting models. However, if uncertainty is allowed in theequations — no matter how small — this thesis stresses that such equations generallybecome ill-posed. Rather than deeming the general differential-algebraic structureuseless up front due to this reason, the suggested approach to the problem is to askwhat assumptions that can be made in order to obtain well-posedness. Here, well-posedness is used in the sense that the uncertainty in the solutions should tend tozero as the uncertainty in the equations tends to zero.

The main theme of the thesis is to analyze how the uncertainty in the solution to adifferential-algebraic equation depends on the uncertainty in the equation. In par-ticular, uncertainty in the leading matrix of linear differential-algebraic equationsleads to a new kind of singular perturbation, which is referred to as matrix-valuedsingular perturbation. Though a natural extension of existing types of singular per-turbation problems, this topic has not been studied in the past. As it turns out thatassumptions about the equations have to be made in order to obtain well-posedness,it is stressed that the assumptions should be selected carefully in order to be realis-tic to use in applications. Hence, it is suggested that any assumptions (not countingproperties which can be checked by inspection of the uncertain equations) should beformulated in terms of coordinate-free system properties. In the thesis, the locationof system poles has been the chosen target for assumptions.

Three chapters are devoted to the study of uncertain differential-algebraic equationsand the associated matrix-valued singular perturbation problems. Only linear equa-tions without forcing function are considered. For both time-invariant and time-varying equations of nominal differentiation index 1, the solutions are shown to con-verge as the uncertainties tend to zero. For time-invariant equations of nominal in-dex 2, convergence has not been shown to occur except for an academic example.However, the thesis contains other results for this type of equations, including thederivation of a canonical form for the uncertain equations.

While uncertainty in differential-algebraic equations has been studied in-depth, tworelated topics have been studied more passingly.

One chapter considers the development of point-mass filters for state estimation onmanifolds. The highlight is a novel framework for general algorithm developmentwith manifold-valued variables. The connection to differential-algebraic equations isthat one of their characteristics is that they have an underlying manifold-structureimposed on the solution.

One chapter presents a new index closely related to the strangeness index of adifferential-algebraic equation. Basic properties of the strangeness index are shownto be valid also for the new index. The definition of the new index is conceptuallysimpler than that of the strangeness index, hence making it potentially better suitedfor both practical applications and theoretical developments.

v

Populärvetenskaplig sammanfattning

Avhandlingen handlar främst om att beräkna hur osäkerhet i så kallade differential-algebraiska ekvationer påverkar osäkerheten i ekvationernas lösning. Genom att stu-dera problem som tillåter osäkerheter med mindre struktur i jämfört med tidigareforskning, leder problemet snabbt vidare till att studera en ny klass av singulära per-turbationsproblem, som här kallas matris-värda singulära perturbationsproblem.

Förutom ett verktyg för att förstå osäkerhet i lösningen till ekvationer med osäkerhet,syftar analysen i avhandlingen till att skapa verktyg som kan användas även för pro-blem utan osäkerhet. Som ett första exempel på sådana problem kan nämnas ekva-tioner som är formulerade i symbol-hanterande mjukvara för differential-algebraiskaekvationer, där det inte alltid går att lita på att mjukvaran klarar av att bevisa att ettvisst uttryck kommer vara noll längs ekvationens lösningstrajektoria. Då kan det va-ra fördelaktigt att kunna betrakta uttrycket som ett osäkert värde nära noll. Som ettandra exempel på sådana problem kan nämnas differential-algebraiska ekvationermed tidsberoende, där den ledande matrisen som beror kontinuerligt av tiden tap-par rang tid en viss tidpunkt. Då kan det vara fördelaktigt att kunna approximeraden ledande matrisen med en annan som har den lägre rangen i ett helt intervall avtidpunkter kort före och kort efter tidpunkten där den faktiska rangen är lägre.

Utöver de resultat som rör osäkerhet i differential-algebraiska ekvationer innehålleravhandlingen ett kapitel med resultat som handlar om att utifrån mätningar medosäkerhet göra en uppskattning av en okänd variabel som tillhör en mångfald (ensfär används som exempel). Den föreslagna metoden bygger på att dela upp mångfal-den i små bitar, och beräkna sannolikheten för att variabeln befinner sig i respektivebit. Metoden i sig är inte ny, utan fokus ligger på att föreslå ett ramverk för algoritmerför den här typen av problem. Problem med mångfalds-struktur dyker regelmässigtupp i samband med differential-algebraiska ekvationer.

Ett annat kapitel i avhandlingen handlar om ett nytt så kallat index-koncept fördifferential-algebraiska ekvationer. Det nya indexet är nära relaterat till ett annatväletablerat index, men är definierat på ett enklare sätt. Det nya indexet kan vara avvärde både i sig självt och som ett sätt att belysa det som är väl etablerat.

vii

Acknowledgments

My thanks to Professor Lennart Ljung, head of the Division of Automatic Control,for generously allowing me to conduct research in his group. Lennart has been myco-supervisor, and my work would only be half-finished by now if it was not for hisefforts to make me complete it. My thanks also to Professor Torkel Glad for being mysupervisor, I’ll get back to his name soon. Ulla Salaneck, secretary at the group, is aSwiss army knife capable of solving any practical issue you can think of. Everybodyknows this, but what is probably less known is that she is also very good at pokingLennart when it’s about time to prod slow students into finishing their theses.

Johan Sjöberg has been an important source of inspiration for the work on differen-tial-algebraic equations, in particular the strangeness index. Marcus Gerdin was alsothere with experienced advice when it all started.

Gustaf Hendeby has served the group with technical expertise in many areas relatedto computer software, including being the LATEX guru for many years. He developedthe rtthesis class used to typeset this thesis, and taught me enough about LATEX so thatI was able to tweak the class to my own taste. Gustaf also helped out with proofread-ing. Martin Enqvist and Daniel Petersson contributed with outstandingly thoroughproofreading. Thanks goes to Thomas Schön for letting me work with him in the pop-ular field of state estimation. As you, my dear reader, might already have guessed,Torkel has been involved in proofreading most of the chapters. Those of you whoknow him can easily imagine the value of this contribution to a thesis in the regionof automatic control where there is only little connection to reality. Umut Orgunerin the office on the opposite side of the corridor knows too much, so everyone askshim all the questions, and he never refuses to answer.

Christian Lyzell brings his great attitude to work, is a hobby hard core Guitar Heroguru, and has also provided valuable feedback on mpscatter, the Matlab toolboxused to create most plots in this thesis. I also like our discussions on numerical maths,even though it isn’t really related to my research. Talking about good discussions,Daniel Petersson deserves a second mention for his interest in and many solutions toa long list of mathematical problems I’ve had during the last years here.

There are many things I like to do in my spare time, and I’m very happy to havehad so many nice persons in the group to share my interests with. I’m thinking ofsurfing waves and wind, nightly work on Shapes, gathering people for eating andplay, hiking and kayaking, disc golf, climbing, coffee and lunch breaks, and more.It’s been great, and I hope that completing this thesis is not the end of it!

I am indebted to the Swedish Research Council for financial support of this work.

Nina, you make me ¨ and laugh. In addition, your contribution to this thesis as theexcellent chef behind it all has been worth a lot, so many thanks to you too. I’m sadthat all the writing has prevented me so much lately from sharing my time with you,but I’m yours now!

Linköping, November 2009Henrik Tidefelt

ix

Contents

Notation xv

I Background

1 Introduction 11.1 Differential-algebraic equations in automatic control . . . . . . . . . . 11.2 Introduction to matrix-valued singular perturbation . . . . . . . . . . 2

1.2.1 Linear time-invariant examples . . . . . . . . . . . . . . . . . . 21.2.2 Application to quasilinear shuffling . . . . . . . . . . . . . . . . 51.2.3 A missing piece . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.4 How to approach the nominal equations . . . . . . . . . . . . . 61.2.5 Final remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.5 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.6 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.6.1 Mathematical notation . . . . . . . . . . . . . . . . . . . . . . . 101.6.2 dae and ode terminology . . . . . . . . . . . . . . . . . . . . . 12

2 Theoretical background 152.1 Models in automatic control . . . . . . . . . . . . . . . . . . . . . . . . 15

2.1.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.1.2 Use in estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 172.1.3 Use in control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.1.4 Model classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.1.5 Model reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.1.6 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2 Differential-algebraic equations . . . . . . . . . . . . . . . . . . . . . . 222.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.2.2 Common forms . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.2.3 Indices and their deduction . . . . . . . . . . . . . . . . . . . . 282.2.4 Transformation to quasilinear form . . . . . . . . . . . . . . . . 35

xi

xii CONTENTS

2.2.5 Structure algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 382.2.6 lti dae, matrix pencils, and matrix pairs . . . . . . . . . . . . 402.2.7 Initial conditions . . . . . . . . . . . . . . . . . . . . . . . . . . 462.2.8 Numerical integration . . . . . . . . . . . . . . . . . . . . . . . 472.2.9 Existing software . . . . . . . . . . . . . . . . . . . . . . . . . . 51

2.3 Initial condition response bounds . . . . . . . . . . . . . . . . . . . . . 512.3.1 lti ode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522.3.2 ltv ode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572.3.3 Uncertain lti ode . . . . . . . . . . . . . . . . . . . . . . . . . . 58

2.4 Regular perturbation theory . . . . . . . . . . . . . . . . . . . . . . . . 582.4.1 lti ode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582.4.2 ltv ode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602.4.3 Nonlinear ode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

2.5 Singular perturbation theory . . . . . . . . . . . . . . . . . . . . . . . . 672.5.1 lti ode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 672.5.2 Generalizations of scalar singular perturbation . . . . . . . . . 682.5.3 Multiparameter singular perturbation . . . . . . . . . . . . . . 692.5.4 Perturbation of dae . . . . . . . . . . . . . . . . . . . . . . . . . 72

2.6 Contraction mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . 732.7 Interval analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 772.8 Gaussian elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . 802.9 Miscellaneous results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

3 Shuffling quasilinear dae 853.1 Index reduction by shuffling . . . . . . . . . . . . . . . . . . . . . . . . 86

3.1.1 The structure algorithm . . . . . . . . . . . . . . . . . . . . . . 863.1.2 Quasilinear shuffling . . . . . . . . . . . . . . . . . . . . . . . . 863.1.3 Time-invariant input affine systems . . . . . . . . . . . . . . . . 873.1.4 Quasilinear structure algorithm . . . . . . . . . . . . . . . . . . 90

3.2 Proposed algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 913.2.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 913.2.2 Zero tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 943.2.3 Longevity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 963.2.4 Seminumerical twist . . . . . . . . . . . . . . . . . . . . . . . . 993.2.5 Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 993.2.6 Sufficient conditions for correctness . . . . . . . . . . . . . . . 102

3.3 Consistent initialization . . . . . . . . . . . . . . . . . . . . . . . . . . 1043.3.1 Motivating example . . . . . . . . . . . . . . . . . . . . . . . . . 1043.3.2 A bootstrap approach . . . . . . . . . . . . . . . . . . . . . . . . 1053.3.3 Comment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

CONTENTS xiii

II Results

4 Point-mass filtering on manifolds 1094.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1104.2 Background and related work . . . . . . . . . . . . . . . . . . . . . . . 1124.3 Dynamic systems on manifolds . . . . . . . . . . . . . . . . . . . . . . 1134.4 Point-mass filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

4.4.1 Point-mass distributions on a manifold . . . . . . . . . . . . . . 1144.4.2 Measurement update . . . . . . . . . . . . . . . . . . . . . . . . 1164.4.3 Time update in general . . . . . . . . . . . . . . . . . . . . . . . 1174.4.4 Dynamics that simplify time update . . . . . . . . . . . . . . . 118

4.5 Point estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1194.5.1 Intrinsic point estimates . . . . . . . . . . . . . . . . . . . . . . 1194.5.2 Extrinsic point estimates . . . . . . . . . . . . . . . . . . . . . . 120

4.6 Algorithm and implementation . . . . . . . . . . . . . . . . . . . . . . 1204.6.1 Base tessellations (of spheres) . . . . . . . . . . . . . . . . . . . 1204.6.2 Software design . . . . . . . . . . . . . . . . . . . . . . . . . . . 1214.6.3 Supporting software . . . . . . . . . . . . . . . . . . . . . . . . 122

4.7 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1224.8 Conclusions and future work . . . . . . . . . . . . . . . . . . . . . . . . 1244.A Populating the spheres . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

5 A new index close to strangeness 1295.1 Two definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

5.1.1 Derivative array equations and the strangeness index . . . . . 1305.1.2 Analysis based on the strangeness index . . . . . . . . . . . . . 1325.1.3 The simplified strangeness index . . . . . . . . . . . . . . . . . 134

5.2 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1385.3 Uniqueness and existence of solutions . . . . . . . . . . . . . . . . . . 1415.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

5.4.1 Computational complexity . . . . . . . . . . . . . . . . . . . . . 1465.4.2 Notes from experiments . . . . . . . . . . . . . . . . . . . . . . 146

5.5 Conclusions and future work . . . . . . . . . . . . . . . . . . . . . . . . 147

6 lti ode of nominal index 1 1496.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1506.2 Schematic overview of nominal index 1 analysis . . . . . . . . . . . . . 1526.3 Decoupling transforms and initial conditions . . . . . . . . . . . . . . 1546.4 A matrix result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1596.5 An lti ode result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1646.6 The fast and uncertain subsystem . . . . . . . . . . . . . . . . . . . . . 1676.7 The coupled system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1686.8 Extension to non-zero pointwise index . . . . . . . . . . . . . . . . . . 1706.9 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1746.10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1786.A Details of proof of lemma 6.8 . . . . . . . . . . . . . . . . . . . . . . . 180

xiv CONTENTS

7 lti ode of nominal index 2 1857.1 Canonical form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

7.1.1 Derivation based on Weierstrass decomposition . . . . . . . . . 1877.1.2 Derivation without use of Weierstrass decomposition . . . . . 1907.1.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

7.2 Initial conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1957.3 Growth of eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . 1987.4 Case study: a small system . . . . . . . . . . . . . . . . . . . . . . . . . 202

7.4.1 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2037.4.2 Transition matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 2057.4.3 Simultaneous consideration of initial conditions and transition

matrix bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2077.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2087.A Decoupling transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

7.A.1 Eliminating slow variables from uncertain dynamics . . . . . . 2117.A.2 Eliminating uncertain variables from slow dynamics . . . . . . 2167.A.3 Remarks on duality . . . . . . . . . . . . . . . . . . . . . . . . . 219

7.B Example data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

8 ltv ode of nominal index 1 2278.1 Slowly varying systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 2288.2 Time-varying systems with timescale separation . . . . . . . . . . . . 232

8.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2338.2.2 Eliminating slow variables from uncertain dynamics . . . . . . 2348.2.3 Eliminating uncertain variables from slow dynamics . . . . . . 237

8.3 Comparison with scalar perturbation . . . . . . . . . . . . . . . . . . . 2398.4 The decoupled system . . . . . . . . . . . . . . . . . . . . . . . . . . . 2398.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2428.A Dynamics of related systems . . . . . . . . . . . . . . . . . . . . . . . . 243

9 Concluding remarks 247

A Sampling perturbations 249A.1 Time-invariant perturbations . . . . . . . . . . . . . . . . . . . . . . . 249A.2 Time-varying perturbations . . . . . . . . . . . . . . . . . . . . . . . . 250

Bibliography 255

Index 267

Notation

These tables are provided as quickly accessible complements to the lengthier expla-nations of notation in section 1.6.

Some sets, manifolds, and groups

Notation Meaning

N Set of natural numbers.R Set of real numbers.C Set of complex numbers.Rn Set of n-tuples of real numbers, or n-dimensional Eu-

clidean space.Sn The n-sphere, that is, the sphere of dimension n.SO(n) Special orthogonal group of dimension n. SO(3) is the

group of rigid body rotations.M Standard manifold in chapter 4.

Lν Solution set of FνS( x, x, . . . , x(ν+1), t ) != 0, in chapter 5.

Matrix properties

Notation Meaning

λ(X ) The set of eigenvalues of the matrix X.α(X ) max {Reλ : λ ∈ λ(X ) }λmin(X ) min { |λ| : λ ∈ λ(X ) }λmax(X ) max { |λ| : λ ∈ λ(X ) }max(X) maxi,j

∣∣∣Xij ∣∣∣‖X‖2 Induced 2-norm of X. supu,0

|X u||u|

‖X‖I supt∈I ‖X(t)‖2, where I is a given interval of time.X � Y X − Y is positive semidefinite.n Matrix dimension, see section 1.6.1.

xv

xvi Notation

Basic functions and operators

Notation Meaning

I Identity matrix or the identity map.δ Dirac delta “function”, in chapter 4.ex Exponential function evaluated at x.et vp Exponential map based in p, evaluated at t v.|x| Modulus (absolute value) of x if x is scalar, 2-norm of x

if x is vector.bxc Floor of x, that is, the largest integer not greater than x.dxe Ceiling of x, that is, the smallest integer not less than x.

d( x, y ) Distance between x and y in induced Riemannian metric.X \ Y Set difference, set of elements in X that are not in Y .∂X Boundary of the set X.• Argument of function in bullet notation. Example:

f ( x, •, z ) = y 7→ f ( x, y, z ).

Differentiation and shift operators

Notation Meaning

x′ Derivative of x, with x being a function of a single realargument. Avoid confusion with x.

x′(i) Derivative of x or order i. Avoid confusion with x(i).x′(i+) Sequence or concatenation of x′(i), x′(i+1), . . . , x′(ν+1), for

some ν determined by context. Analogous definition forx(i+).

x′{i} Sequence or concatenation of x, x′ , . . . , x′(i). Analogousdefinition for x{i}.

q Shift operator for sequences. (qx)( n ) = x( n + 1 ).∇x Gradient of x, see section 1.6.1.∇ix Gradient of x with respect to argument number i, see

section 1.6.1.∇i+x Concatenated gradients of x with respect all arguments

starting from number i, see section 1.6.1.

Probability theory and filtering

Notation Meaning

P(H ) Probability of the event H .fx Probability density function for stochastic variable x.fx|y fx|Y=y , with x and Y stochastic variables, and y a point.

N (m, C ) Gaussian distribution with mean m and covariance C.VarX( x ) Variance of X at the point x, see (4.7).y0..t Measurements up to time t, in chapter 4.xs|t State estimate at time s given y0..t , in chapter 4.

Notation xvii

Ordo notation

Notation Meaning

y = O( x ) ∃ δ > 0, k < ∞ : |x| < δ =⇒∣∣∣y∣∣∣ < k |x|

y = O( x0 ) Exception! ∃ δ > 0, k < ∞ : |x| < δ =⇒∣∣∣y∣∣∣ < k

y = o( x ) limx→0|y||x| = 0

Intervals

Notation Meaning

[ a, b ] { x ∈ R : a ≤ x ≤ b }( a, b ) { x ∈ R : a < x < b }( a, b ] { x ∈ R : a < x ≤ b }[ a, b ) { x ∈ R : a ≤ x < b }

Logical operators

Notation Meaning

P ∧ Q Logical conjunction of P and Q, “P and Q”.P ∨ Q Logical disjunction of P and Q, “P or Q”.

Abbreviations

Abbreviation Meaning

bdf Backwards difference formuladae Differential-algebraic equation(s)irk Implicit Runge-Kuttalti Linear time-invariantltv Linear time-varyingode Ordinary differential equation(s)

Part I

Background

1Introduction

This chapter gives an introduction to the thesis by explaining very briefly the fieldin which it has been carried out, presenting the contributions in view of a problemformulation, and giving some reading directions and explanations of notation.

1.1 Differential-algebraic equations in automatic control

This thesis has been carried out at the Division of Automatic Control, Linköping Uni-versity, Sweden, within the research area nonlinear and hybrid systems. Differential-algebraic equations is one of a small number of research topics in this area. We shallnot dwell on whether these equations are particularly nonlinear or related to hybridsystems; much of the research so far in this group has been on linear time-invariantdifferential-algebraic equations, although there has lately been some research also ondifferential-algebraic equations that are not linear. From here on, the abbreviationdae will be used for differential-algebraic equation(s).

In the field of automatic control, various kinds of mathematical descriptions are usedto build models of the objects to be controlled. Sometimes, the equations are usedprimarily to compute information about the object (estimation), sometimes the equa-tions are used primarily to compute control inputs to the object (control ), and oftenboth tasks are performed in combination. From the automatic control point of viewthe dae are thus of interest due to their ability to model objects. Not only are theyable to model many objects, but in several situations they provide a very convenientway of modeling these objects, as is further discussed in section 2.2. In practice, thedae generally contain parameters that need to be estimated using measurements onthe object; this process is called identification.

1

2 1 Introduction

In this thesis the concern is neither primarily with estimation, control, nor identifica-tion of objects modeled by dae. Rather, we focus on the more fundamental questionsregarding how the equations relate to their solution in so-called initial value prob-lems�. It is believed that this will be beneficial for future development of the otherthree tasks.

1.2 Introduction to matrix-valued singular perturbation

Section 2.5 will give some background on scalar and multi-parameter singular per-turbation problems, and in chapters 6, 7 and 8 methods from scalar singular pertur-bation theory (Kokotović et al., 1986) will play a key role in the theoretical develop-ment. In view of the problems encountered when analyzing dae under uncertainty,we have coined the term matrix-valued singular perturbation to denote the general-ization of the singular perturbation problems to the case when the uncertainties forma whole matrix of small values.

For nonlinear time-invariant systems, the basic concern is systems in the form�

x′(t) + gx( x(t), z(t) ) != 0

E z′(t) + gz( x(t), z(t) ) != 0(1.1)

where E is an unknown small matrix; max(E) ≤ m. For time-varying systems, E isalso allowed to be time-varying, and even more general nonlinear systems are ob-tained by allowing E to also depend on x(t). The problem is to analyze the solutionsto the equations as m → 0. However, the nonlinear form (1.1) is much more generalthan the forms addressed in the thesis. Below, we give examples, clarifications, andfurther motivation for the study of matrix-valued singular perturbation.

1.2.1 Linear time-invariant examples

The linear time-invariant (often abbreviated lti) examples below are typical in thatthe equations are not given in matrix-valued singular perturbation form. Instead, theform appears after some well-conditioned operations on the equations and changesof variables, which will allow the solution to the original problem to be reconstructedfrom the solution to the matrix-valued singular perturbation problem.

1.1 ExampleStarting from an index 0 dae in two variables,[

1. 7.1. 3.

]x′(t) +

[3. 2.2. 1.

]x(t) != 0

an index 1 dae in three variables is formed by making a copy of the second variable;x1 = x1, x2 = x2, x3 = x2. In the leading matrix (in front of x′(t)), the second variable

� The problem of computing the future trajectory of the variables given external inputs and sufficient infor-mation about the variables at the initial time.� Note that the subscripts on g are just meaningful ornaments, and do not denote partial derivatives.

1.2 Introduction to matrix-valued singular perturbation 3

is replaced to 70% by the new variable. In the trailing matrix (in front of x(t)) we add

a row with the coefficients of the copy relation x2(t) − x3(t) != 0.1. 2.1 4.91. 0.9 2.10 0 0

︸︷︷︸E

x′(t) +

3. 2. 02. 1. 00 1 −1

︸︷︷︸A

x(t) != 0 (1.2)

To analyze this dae, it is noted that the leading matrix E is row reduced (its non-zerorows are linearly independent), and the row where the leading matrix has only zeroscan be differentiated without introducing higher order derivatives. This leads to1. 2.1 4.9

1. 0.9 2.10 1 −1

x′(t) +

3. 2. 02. 1. 00 0 0

x(t) != 0

where x′(t) can be solved for, so that an ode is obtained. In the terminology of dae,index reduction has successfully revealed an underlying (implicit) ode.

Now, instead of performing index reduction on (1.2) directly, consider first applyingthe well-conditioned change of equations given by the matrix

T B 4. ·

2. −9. 0.8. −5. 3.1. −5. 7.

−1

It is natural to expect that this should not make a big difference to the difficulty insolving the dae via an underlying ode, but when the computation is performed on acomputer, the picture is not quite as clear. The new dae has the matrices T E and T A.By computing a QR factorization (using standard computer software) of the leadingmatrix, a structurally upper triangular leading matrix was obtained together with anorthogonal matrix Q associated with this form. The corresponding trailing matrix isobtained as QTA. This leads to−0.62 −0.95 −2.2

0 0.62 1.40 0 3.4 · 10−16

x′(t) +

−1.6 −0.53 −0.410.51 0.56 −0.048

−7.2 · 10−17 0.46 −0.46

x(t) != 0

where a well-conditioned change of variables can bring the equations into the lineartime-invariant form of (1.1) with E ∈ R1×1. (One can just as easily construct exampleswhere E is of dimension larger than 1 × 1.)

Although looking like an implicit ode, this view is unacceptable for two reasons.First, the system of equations is extremely stiff. (Even worse, the fast mode happensto be unstable this time, not at all like the original system.) Second, consideringnumerical precision in hardware, it would not make sense to compute a solution thatdepends so critically on a coefficient that is not distinctly non-zero.

4 1 Introduction

The ad hoc solution to the problem in the example is to replace the tiny coefficientin the leading matrix by zero, and then proceed as usual, but suppose ad hoc is notgood enough. How can one then determine if 3.4 · 10−16 is sufficiently tiny, or justlooks tiny due to equation and variable scalings? What is the theoretical excuse forthe replacement of small numbers by zeros? What assumptions have to be made?

The next example suggests that the ill-posedness may be possible to deal with. Theassumptions made here are chosen theoretically insufficient on purpose — the pointis that making even the simplest assumptions seems to solve the problem. The ex-ample also contains some very preliminary observations regarding how to scale theequations in order to make it possible to make decisions based on the absolute sizeof the perturbations.

1.2 Example

Having equations in the form

E x′(t) + A x(t) != 0

modelling a two-timescale system (see section 2.5) where the slow dynamics is knownto be stable, we now decide that unstable fast dynamics is unreasonable for the sys-tem at hand. In terms of assumptions, we assume that the fast dynamics of the systemis stable. We then generate random perturbations in the equation coefficients that weneed to replace by zero, discarding any instances of the equations that disagree withour assumption, and use standard software to solve the remaining instances. Twotrailing matrices are used, given by selecting δ from

{1, 10−2

}in the pattern

A =

A11 A12

A21 A22

B 0.29 0.17 0.046

0.34 δ 0.66 δ 0.660.87 δ 0.83 δ 0.14

and then scaling the last two rows so they get the same norm as the first row. In theleading matrix,

E =

E11 E12

0 E22

B

1. 1. 1.0 ?11 ?12

0 ?21 ?22

it is the block E22 that will be instantiated with small random perturbations. As in theprevious example, the form of E is just a well-conditioned change of variables awayfrom the lti form of (1.1). In order to illustrate what happens when the perturbationsbecome smaller, the perturbations are generated such that max(E22) = m, for a fewvalues of m. To achieve this, an intermediate matrix T of the same dimensions as E22is generated by sampling each entry from a uniform distribution over [−1, 1 ], andthen E22 B

mmax(T ) T .

The example is chosen such that m = 0 yields a stable slow system. Thus the pertur-bations of interest are those that make all modes of the stiff system stable. The initialconditions are chosen with x1(0) = 1 and consistent with m = 0.


0 2 4 60

0.5

1

0 2 4 60

0.5

1

0 2 4 60

0.5

1

0 2 4 60

0.5

1

0 2 4 60

0.5

1

0 2 4 60

0.5

1

Figure 1.1: Solutions for x1 obtained by generating 50 random perturbationsof given magnitudes. Details are given in the text. Left: A defined by δ = 1.Right: A defined by δ = 10−2. Top: m = 1. · 10−1. Middle: m = 1. · 10−3. Bottom:m = 1. · 10−5.

Simulation results are shown in figure 1.1. By choosing a threshold for m based onvisual appearance, the threshold can be related to δ. Finding that 1. · 10−3 and 1. · 10−5

could be reasonable choices for δ being 1 and 10−2, respectively, it is tempting toconclude that it would be wise to base the scaling of the last two rows on A22 alone.

1.2.2 Application to quasilinear shuffling

In theory, index reduction of equations in the quasilinear form

E( x(t), t ) x′(t) + A( x(t), t ) != 0 (1.3)

is simple. Similar to how the linear time-invariant equations were analyzed in ex-ample 1.1, the equations are manipulated using invertible row operations so thatthe leading matrix becomes separated into one block which is completely zeroed,and one block with independent rows. The discovered non-differential equations arethen differentiated, and the procedure is repeated until the leading matrix gets fullrank. As examples of the in-theory ramifications of this description, consider thefollowing list:

• It may be difficult to perform the row reduction in a numerically well-conditioned way.

• The produced equations may involve very big expressions.

• Testing whether an expression is zero is highly non-trivial.

6 1 Introduction

The forthcoming discussion applies to the last of these ramifications. Typical ex-amples in the literature have leading matrices whose rank is determined solely bya zero-pattern. For instance, if some variable does not appear differentiated in anyequation, the corresponding column of the leading matrix will be structurally zero.It is then easy to see that this column will remain zero after arbitrarily complex rowoperations, so if the operations are chosen to create structural zeros in the othercolumns at some row, it will follow that the whole row is structurally zero. Thusa non-differential equation is revealed, and when differentiating this equation, thepresence of variables in the equation determines the structural zero-pattern of thenewly created row in the leading matrix, and so the index reduction may be contin-ued.

Now, recall how the zero-pattern was lost by a seeminly harmless transform of theequations in example 1.1. Another situation when linear dependence between rowsin the leading matrix is not visible in a zero-pattern, is when a user happens to writedown equations that are dependent up to available accuracy. It must be emphasizedhere that available accuracy is often not a mere question of floating point numberrepresentation in numerical hardware (as in our example), but a consequence of un-certainties in estimated model parameters.

In chapter 3, it is proposed that a numerical approach is taken to zero-testing when-ever tracking of structural zeros does not hold the answer, where an expression istaken for being (re-writable to) zero if it evaluates to zero at some trial point. Clearly,a tolerance will have to be used in this test, and showing that a meaningful thresholdeven exists is one of the main topics in the thesis. When there are many entries inthe leading matrix which need numerical evaluation at the time, well-conditionedoperations on the equations (row operations and a change of variables) lead to theform (1.1) where E contains all the small expressions, and generally depends on bothx(t) and t.

1.2.3 A missing piece in singular perturbation of ODE

Our final attempt to convince the reader that our topic is interesting is to remark thatmatrix-valued singular perturbations are not only a delicate problem in the world ofdae. These problems are also interesting in their own right when the leading matrixof a dae (or E in (1.1)) is known to be non-singular so that the dae is really justan implicit ode. Then the matrix-valued singular perturbation problem is a naturalgeneralization of existing singular perturbation problems for ode (see section 2.5).

In the language of the thesis, we say that these equations are dae of pointwise in-dex 0, and most of the singular perturbation results in the thesis will actually berestricted to this type of equations.

1.2.4 How to approach the nominal equations

The perturbation results in the thesis are often formulated using O( max(E) ) (whereE is the matrix-valued perturbation, and not the complete leading matrix) or formu-lated to apply as m→ 0, where m is a bound on all uncertainties in the equations. It


is motivated to ask what the practical implications of such bounds really mean, andthere are several answers.

In some situations, the uncertainties in the equations will be given without any pos-sibility to be reduced. In that case, our results provide that we will be able to test thesize of the uncertainties to see if they are small enough for the perturbation analysisto apply, and if they are, there will be a bound on the uncertainty in the solutions (orwhatever other property that the result at hand concerns). On the other hand, there isalways the alternative to artificially increase uncertainty in the model. Increasing theuncertainty of a near-zero interval with large relative uncertainty, so that it includeszero, may sometimes be interpreted as model reduction, where very stiff equationsare approximated by non-stiff equations.

In other situations, there may be a possibility to reduce the size of the uncertainties,typically at the cost of spending more resources of some kind. Then, our results maybe interpreted as spending enough resources, property so and so can be obtained.Examples of how spending more resources may reduce uncertainty are given below.

• If the equations contain parameters that are the result of a system identificationprocedure, uncertainty can often be reduced by using more estimation data orby investing in more accurate sensors.

• If the uncertainty in the equations is due to uncertainties in floating pointarithmetic, a multi-precision library for floating point arithmetic (such as GMP(2009)) may be used to reduce uncertainty.

• In a time-varying system (and hopefully non-linear systems in the future),where the matrix-valued singular perturbation problem arises when the lead-ing matrix looses rank at some point, the size of the matrix-valued perturba-tion can be reduced by integrating the increasingly stiff lower-index equationscloser to the point where the rank drops. Due to the increasing stiffness, it willbe computationally costly to integrate the lower-index equations with adequateprecision.

1.2.5 Final remarks

There is also another application of results on matrix-valued singular perturbation,more closely related to the field of automatic control. This concerns the use of un-structured dae as models of dynamic systems; not until well-posedness of solutionsto such equations has been established does it make sense to consider problems suchas system identification or control for such models.

It should also be mentioned that we are not aware of any strong connection betweenphysical models from any field, and matrix-valued singular perturbation. Electricalsystems, for instance, have scalar quantities, and the singular perturbation problemsone encounters are generally of multiparameter type. A natural place to search formatrix-valued singular perturbations would be inertia-matrices of rod-like objects.However, these matrices are normal, and a norm-preserving linear (although typi-cally uncertain) change of variables can be used to make the inertia-matrix diago-

8 1 Introduction

nal.� Hence, only if one is unwilling to make use of the uncertain change of variablesand ignores the normality constraint which is satisfied by all inertia matrices, willthe matrix-valued singular perturbation be necessary to deal with in its full general-ity. Furthermore, even if a single rod would be handled as a matrix-valued singularperturbation, the dimension of the uncertain subsystem is just 1, so scalar singularperturbation techniques would apply. The rotation of point-like objects on the otherhand, does not have a non-trivial nominal solution, making also these objects unsuit-able for demonstrations.

While we are not aware of physical modeling in any field which would require theuse of matrix-valued singular perturbation theory, if modeled carefully, we are awarethat many models are not developed carefully in order to avoid matrix-valued sin-gular perturbation problems. Matrix-valued singular perturbation theory needs tobe developed for the rescue of these models, as well as all algorithms and softwarewhich systematically produce such problems.

1.3 Problem formulation

The long term goal of the work in this thesis is a better understanding of uncertaintyin differential-algebraic equations used as models in automatic control and relatedfields. While we were originally concerned with dae in the quasilinear form (1.3),the questions arising regarding uncertainty in this form turned out to be unansweredalso for the much more restricted lti dae.

In order to understand the solutions of a dae one generally applies some kind ofindex reduction scheme involving differentiation of the equations with respect totime. One of the more recent approaches to analysis of general nonlinear dae centersaround the strangeness index, and one of the problems considered in the thesis is thata better understanding of this analysis is needed in order to even see the structure ofthe associated perturbation problems arising from the uncertainty in the equations.

The main problem addressed in the thesis is related to the less sophisticated indexreduction schemes associated with the differentiation index and the shuffle algo-rithm. Here the perturbation problems turn out to be readily transformable into thematrix-valued singular perturbation form, and we ask how these problems can beapproached, what qualitative properties they possess, and how the relation betweenuncertainty in the equations and the uncertainty in the solution may be quantified.

Another problem considered in the thesis, related to differential-algebraic equationsused as models in automatic control, is how to develop geometrically sound algo-rithms with manifold-valued variables.

� More generally, this suggests that matrix-valued singular perturbation can be avoided in rigid body me-chanics as long as the kinetic energy is a quadratic form in the time derivative of the generalized coordi-nates.

1.4 Contributions 9

1.4 Contributions

The main contributions in this thesis are, in approximate order of appearance:

• Introduction of the so-called matrix-valued singular perturbation problem as anatural generalization of existing singular perturbation problem classes, withapplications to uncertainty and approximation in differential-algebraic equa-tions.

• An application related to modeling with differential-algebraic equations: point-mass filtering on manifolds.

• The proposed simplified strangeness index along with basic properties and itsrelation to the closely related strangeness index.

• Extension of previous perturbation results for linear time-invariant differential-algebraic equations of nominal index 1, introducing assumptions about eigen-values as the main tool to obtain convergence.

• A canonical form for uncertain matrix pairs of nominal index 2.

• Generalizations of some of the linear time-invariant perturbation results fromnominal index 1 to nominal index 2.

• Perturbation results for linear time-varying differential-algebraic equations ofnominal index 1.

1.5 Thesis outline

The thesis is divided into two parts dividing the thesis into theoretical background(first) and new results (second).

Some notation is explained in the next section, completing the first chapter in thefirst part. Most readers will probably find it worth-while skimming through thatsection before proceeding to later chapters. The theoretical background of the thesisis, with very few exceptions, given in chapter 2. When exceptions are made in thesecond part of the thesis, this will be made clear so that there is no risk of confu-sion with new results. Chapter 3 contains material from the author’s licentiate thesisTidefelt (2007), and is included in the present thesis mainly to show the connectionbetween nonlinear systems and the matrix-valued singular perturbation results forlinear systems in the second part of the thesis. Readers interested in index reduc-tion of quasilinear daemay find some of the ideas in the chapter interesting, but thechapter is put in the first part of the thesis since the seminumerical schemes it pro-poses are incomplete as long as the related singular perturbation problems are betterunderstood. Other readers may safely skip chapter 3.

Turning to the second part, the first two chapters are only loosely related to the title ofthe thesis. Chapter 4 presents a state estimation technique with potential applicationto systems described by differential-algebraic equations. Then, chapter 5 proposes anew index concept which is closely related to the strangeness index, but unlike the

10 1 Introduction

index reduction scheme of chapter 3, the structure of the perturbation problems as-sociated with the strangeness-like indices is not yet analyzed. Hence, it is not clearwhether the results on matrix-valued singular perturbation in the following threechapters will find applications in solution techniques related to the strangeness in-dex.

Chapter 6 extends the early results on matrix-valued singular perturbation that ap-peared in Tidefelt (2007). These results apply to lti dae of nominal index 1, andlti dae of nominal index 2 are considered in chapter 7. In chapter 8 some of thenominal index 1 results are extended to time-varying equations.

Chapter 9 contains conclusions and directions for future research.

1.6 Notation

The present section introduces basic terminology and notation used throughout thethesis. Not all the terminology is defined here though. Abbreviations and somesymbols are defined in the tables on page xv, and the reader will find referencesto all definitions (including those given here) in the subject index at the end of thethesis, after the bibliography.

1.6.1 Mathematical notation

The terms and factors of sums and products over index sets have unlimited extent tothe right. For example, (

∏i |λi | ) + 1 ,

∏i |λi | + 1 =

∏i (|λi | + 1).

If α is a scalar, Σ is a set of scalars, and ∼ is a relation between scalars, then α ∼ Σ(or Σ ∼ α) means ∀ σ ∈ Σ : α ∼ σ (or ∀ σ ∈ Σ : σ ∼ α). For instance, Reλ(X ) < 0means that all eigenvalues of X have negative real parts. In the example, we alsoused that functions automatically map over sets (in Mathematica, the function is saidto thread over its argument) if there is no ambiguity.

The symbol != is used to indicate an equality that shall be thought of as an equation.Compare this to the plain =, which is used to indicate that expressions are equal inthe sense that one can be rewritten as the other, possibly using context-dependentassumptions. For example, assuming x ≥ 0, we may write

√x2 = x.

The symbol B is used to introduce names for values or expressions. The meaningof expressions can be defined using the symbol 4=. Note that the difference betweenf B

(x 7→ x2

)and f ( x ) 4= x2 is mainly conceptual; in many contexts both would

work equally well.

If x is a function of one variable (typically thought of as time), the derivative of xwith respect to its only argument is written x′ . The composed symbol x shall be usedto denote a function which is independent of x, but intended to coincide with x′ . Forexample, in numeric integration of x′′ = u, where u is a forcing function, we write

1.6 Notation 11

the ordinary differential equation as {x′ = u

x′ = x

Higher order derivatives are denoted x′′ , x′(3), . . . , or x, x(3), . . . . When the high-est order of dots, say x(ν+1), is determined by context, x(i+) is a short hand for thesequence or concatenation of x(i), . . . , x(ν+1). Conversely, the sequence or concate-nation of x, x′ , . . . , x′(i) is denoted x′{i}, and we define x{i} analogously. Making thedistinction between x′ and x this way — and not the other way around — is partlyfor consistency with the syntax of the Mathematica language, in which our algorithmsare implemented.

Gradients (Jacobians), are written using the operator ∇. For example, ∇f is the gra-dient (Jacobian) of f , assuming f takes one vector-valued argument. If a functiontakes several arguments, a subscript on the operator is used to denote with respectto which argument the gradient is computed. For example, if f is a function of 3arguments, then

∇2f = ( x, y, z ) 7→ ∇ (w 7→ f ( x, w, z ) ) ( y )

The notation ∇i+ is used to denote concatenated gradients with respect to all argu-ments starting from number i. For example, with f as before

∇2+f = ( x, y, z ) 7→[∇ (w 7→ f ( x, w, z ) ) ( y ) ∇ (w 7→ f ( x, y, w ) ) ( z )

]Bullet notation is used for compact notation of functions of one unnamed argument.The expression which becomes the “body” of the function is the smallest completeexpression containing the bullet. For example, let the first argument of f be real,then

f ( •, y, z )′( x ) = ∇1f ( x, y, z )

For a time series ( xn )n, the forward shift operator q is defined as qxn4= xn+1.

Matrices are constructed within square brackets. Vectors are constructed by verticalalignment within parentheses. A single row of a matrix, thought of as an objectwith only one index, is constructed by horizontal alignment within parentheses. If acolumn of a matrix is though of as having only one index, it is constructed using thesame notation as a vector. There is no distinction in notation between contravariantand covariant vectors. (Square brackets are, however, also used sometimes in thesame way as parentheses are used to indicate grouping in equations.) Square bracketsand parentheses are also used with two real scalars separated by a comma to denoteintervals of real numbers, see the notation table on page xv. Tuples (lists) are alsodenoted using parentheses and elements separated by commas, but it will be clearfrom the context when ( a, b ) is an open interval and when it is a two-tuple.

If the variable n has no other meaning in the current context, but there is a squarematrix that can be associated with the current context, then n denotes the dimensionof this matrix.

12 1 Introduction

1.6.2 DAE and ODE terminology

In accordance with most literature on this subject, equations not involving differen-tiated variables will often be denoted algebraic equations, although non-differentialequations — a better notation from a mathematical point of view — will also be usedinterchangeably.�

The quasilinear form of dae has already been introduced, repeated here,

E( x(t), t ) x′(t) + A( x(t), t ) != 0 (1.3)

The matrix-valued function E which determines the coefficients for the differenti-ated variables, as well as the expression E( x(t), t ), will be referred to as the leadingmatrix. This terminology is also used for the important subtypes of quasilinear daebeing the linear dae, see below. The function A as well as the expression A( x(t), t )will be referred to as the algebraic term.� This terminology will only be used whenthe algebraic term is not affine in x(t), for otherwise the terminology of linear dae ismore precise. This brings us to the linear dae.

An autonomous lti dae has the form

E x′(t) + A x(t) != 0 (1.4)

where E and A are constant matrices. By autonomous, we mean that there is no wayexternal inputs can enter this equation, so the system evolves in a way completely de-fined by its initial conditions. Adding a forcing function (often representing externalinputs) while maintaining the lti property� leads to the general lti dae form

E x′(t) + A x(t) + B u(t) != 0 (1.5)

where u is a vector-valued function representing external inputs to the model, and Bis a constant matrix.�

In the linear dae (1.5) and (1.4), the matrix A of coefficients for the non-differentiatedvariables, is denoted the trailing matrix.† It may be a function of time, if the lineardae is time-varying.

To complement the terminology that has been introduced for dae, we shall introducesome corresponding terminology for ode. In the nonlinear ode (sometimes written

� Seeking a notation which is both short and not misleading, the author would prefer static equations, butthis notation is avoided to make the text more accessible.� By this definition, the algebraic term with reversed sign is sometimes referred to as the right hand side of

the quasilinear dae.� The solution will be linear in initial conditions (regardless of the initial time) if the forcing function is

zero, and linear in the forcing function (regardless of the initial time, if the forcing function is suitablytime-shifted) if the initial conditions are zero.� In the terminology of quasilinear dae, the expression A x(t) + B u(t) would constitute the algebraic term

here. However, it is affine in x(t) so we prefer to use the more specific terminology of linear dae.† This terminology may seem in analogy with the term leading matrix. However, the reason why the leading

matrix has received its name is unknown to the author, and trailing matrix was invented for the thesisto avoid ambiguity with the state feedback matrix, to there is no common source of analogy. Rather, theterm trailing matrix appears natural in view of the leading matrix being the matrix which is listed first ina matrix pair, see section 2.2.6.

1.6 Notation 13

with “=” instead of “ !=” to stress that the differentiated variables are trivial to solvefor)

x′(t) != f ( x(t), t ) (1.6)

the function f as well as the expression f ( x(t), t ) are called the right hand side ofthe ode . If f is affine in its first argument, that is,

f ( x, t ) = M( t ) x(t) + b( t )

the matrix M as well as the expression M( t ) are called the state feedback matrix.�

When an ode or a dae has a term which only depends on time, such as b(t) here,this term will be denoted the forcing term of the equation. Often, the forcing termis in the form b(t) = β( u(t) ), where the function β is considered fixed, while u isconsidered an unknown external input to the equation. The function u is then de-noted the forcing function or input to the equation. If b(t) is linear in u(t), that is,b(t) = B(t) u(t) where B(t) is a matrix, then this matrix as well as B are called theinput matrix of the equation. In case of ode, this leads to the ltv ode form

x′(t) != M(t) x(t) + B(t) u(t) (1.7)

and the lti ode form

x′(t) != M x(t) + B u(t) (1.8)

When f does not depend on its second argument the ode is said to be time-invariant �.The autonomous counterparts of (1.7) and (1.8) are hence obtained by setting u B 0.�

1.3 Example

As an example of the notation, note that if E in (1.5) is non-singular, then there isa corresponding ode in x with state feedback matrix −E−1A. Since the term inputmatrix is being used both for dae and ode, care must be taken when using the termin a context where a system is being represented both as a dae and an ode; the inputmatrix of the ode here would be −E−1B, while the input matrix of the dae (1.5) is B.

For dae, the autonomous ltv form is

E( t ) x′(t) + A( t ) x(t) != 0 (1.9)

and the general ltv dae form with forcing function is

E( t ) x′(t) + A( t ) x(t) + B( t ) u(t) != 0 (1.10)

� This notation is borrowed from Kailath (1980). We hereby avoid the perhaps more commonly used nota-tion system matrix, because of the other — yet related — meanings this term also bears.� While this terminology is widely used in the automatic control community, mathematicians tend to denote

the ode autonomous rather than time-invariant. Our use of autonomous indicates the absence of a forcingterm in the equations, and is only used with equation forms where there is a natural counterpart withforcing function.� Note that our definition of autonomous linear time-varying ode is not autonomous in the sense often used

by mathematicians. For linear time-invariant ode, however, the two uses of autonomous are compatible.

14 1 Introduction

While the solution x to the ode is referred to as the state vector or just the state ofthe ode, the elements of the solution x to the dae are referred to as the variables ofthe dae.

A dae is denoted square if the number of equations and variables match. Whena set of equations characterizing the solution manifold has been derived, these aresometimes completed with differential equations so that a square dae of strangenessindex 0 is obtained. This dae will then be referred to as the reduced equation.

By an initial value problem we refer to the problem of computing trajectories of thevariables of a dae (or ode), over an iterval [ t0, t1 ], given sufficient information aboutthe variables and their derivatives at time t0.

2Theoretical background

The intended audience of this thesis is not expected to have prior experience withboth automatic control and differential-algebraic equations. For those without back-ground in automatic control, we start the chapter in section 2.1 by providing generalmotivation for why we study equations, and dae in particular. For those with back-ground in automatic control, but with only very limited experience with dae, we tryto fill that gap in section 2.2.

The remaining sections of the chapter present other theoretical background materialthat will be used in later chapters. To keep it clear what the contributions of thethesis are, there are just a few exceptions (most notably in chapter 5, as is explainedin the introduction to that chapter) to the rule that existing results used in the secondpart of the thesis are presented here.

2.1 Models in automatic control

Automatic control tasks are often solved by engineers without explicit mathematicalmodels of the controlled or estimated object. For instance, a simple low pass filtermay be used to get rid of measurement noise on the signal from a sensor, and this canwork well even without saying Assume that the correct measurement is distorted byzero mean additive high frequency noise. Speaking out that phrase would expressthe use of a simple model of the sensor (whether it could be called mathematical is amatter of taste). As another example, many processes in industry are controlled by aso-called pid controller, which has a small number of parameters that can be tuned toobtain good performance. Often, these parameters are set manually by a person withexperience of how these parameters relate to production performance, and this canbe done without awareness of mathematical models. Most advances in control and

15

16 2 Theoretical background

estimation theory do, however, build on the assumption that a more or less accuratemathematical model of the object is available, and how such models may be used,simplified, and tuned for good numerical properties is the subject of this section.

2.1.1 Examples

The model of the sensor above was only expressed in words. Our first example of amathematical model will be to say the same thing with equations. Since equationsare typically more precise than words, we will loose some of the generality, a price weare often willing to pay to get to the equations which we need to be able to apply ourfavorite methods for estimation and/or control. Denote, at time t, the measurementby y(t), the true value by x(t), and let e be a white noise� source with variance σ2. Letv(t) be an internal variable of our model:

y(t) != x(t) + v(t) (2.1a)

v(t) + v′(t) != e′(t) (2.1b)

A drawback of using a precise model like this is that our methods may depend tooheavily on that this is the correct model; we need to be aware of how sensitive ourmethods are to errors in the mathematical model. Imagine, for instance, that we builda device that can remove disturbances at 50 Hz caused by the electric power supply.If this device is too good at this, it will be useless if we move to a country wherethe alternate current frequency is 60 Hz, and will even destroy information of goodquality at 50 Hz. The model (2.1) is often written more conveniently in the Laplacetransform domain, which is possible since the differential equations are linear:

Y (s) != X(s) + V (s) (2.2a)

V (s) !=s

1 + sE(s) (2.2b)

Here, the s/ ( 1 + s ) is often referred to as a filter; the white noise is turned into highfrequency noise by sending it through the filter.

As a second example of a mathematical model we consider a laboratory process oftenused in basic courses in automatic control. The process consists of a cylindrical watertank, with a drain at the bottom. Water can be pumped from a reservoir to the tank,and the drain leads water back to the reservoir. There is also a gauge that senses thelevel of water in the tank. The task for the student is to control the level of waterin the tank, and what makes the task interesting is that the flow of water throughthe drain varies with the level of water; the larger the level of water, the higher theflow. Limited performance can be achieved using for instance, a manually tunedpid controller, but to get good performance at different desired levels of water, amodel-based controller is the natural choice. Let x denote the level of water, andu the flow we demand from the pump. A common approximation is that the flowthrough the drain is proportional to the square root of the level of water. Denote the

� White noise and how it is used in the example models is a non-trivial subject, but to read this chapterit should suffice to know that white noise is a concept which is often used as a building block of moresophisticated models of noise.

2.1 Models in automatic control 17

corresponding constant cd, and let the constant relating the flow of water to the timederivative of x (that is, this constant is the inverse of the bottom area of the tank) bedenoted ca. Then we get the following mathematical model with two parameters tobe determined from some kind of experiment:

x′(t) = ca

(u(t) − cd

√x(t)

)(2.3)

The constant ca could be determined by plugging the drain, adding a known volumeof water to the tank, and measuring the resulting level. The other constant can alsobe determined from simple experiments.

2.1.2 Use in estimation

The first model example above was introduced with a very easy estimation problemin mind. Let us instead consider the task of computing an accurate estimate of thelevel of water, given a sensor that is both noisy and slow. We will not go into detailshere, but just mention the basic idea of how the model can be used.

Since the flow we demand from the pump, u, is something we choose, it is a knownquantity in (2.3). Hence, if we were given a correct value of x(0) and the model wouldbe correct, we could compute all future values of x simply by integration of (2.3).However, our model will never be correct, so the estimate will only be good during ashort period of time, before the estimate has drifted away from the true value. Theerrors in our model are not only due to the limited precision in the experiments usedto determine the constants, but more importantly because the square root relationis a rather coarse approximation. In addition, it is unrealistic to assume that we getexactly the flow we want from the pump. This is where the sensor comes into play;even though it is slow and noisy, it is sufficient to take care of the drift. The best ofboth worlds can then be obtained by combining the simulation of (2.3) with use ofthe sensor in a clever way. A very popular method for this is the so-called extendedKalman filter (for instance, Jazwinski (1970, theorem 8.1)).

2.1.3 Use in control

Let us consider the laboratory process (2.3) again. The task was to control the levelof water, and this time we assume that the errors in the measurements are negligible.There is a maximal flow, umax, that can be obtained from the pump, and it is impos-sible to pump water backwards from the tank to the reservoir, so we shall demand aflow subject to the constraints 0 ≤ u(t) ≤ umax. We denote the desired level of waterthe set point, symbolized by xref. The theoretically valid control law,

u(t) =

0, if x(t) ≥ xref(t)umax, otherwise

(2.4)

will be optimal in theory (when changes in xref cannot be foreseen) in the sense thatdeviations from the set point are eliminated as quickly as possible. However, thistype of control law will quickly wear the pump since it will be switching rapidlybetween off and full speed once the level gets to about the right level. Although stillunrealistically naïve, at least the following control law somewhat reduces wear of the


pump, at the price of allowing slow and bounded drift away from the set point. Ithas three modes, called the drain mode, the fill mode, and the open-loop mode:

Drain mode:{u(t) = 0

Switch to open-loop mode if x(t) < xref(t)

Fill mode:{u(t) = umax

Switch to open-loop mode if x(t) > xref(t)

Open-loop mode:

u(t) = cd

√xref(t)

Switch to drain mode if x(t) > ( 1 + δ ) xref(t)

Switch to fill mode if x(t) < ( 1 − δ ) xref(t)

(2.5)

where the parameter δ is a small parameter chosen by considering the trade-off be-tween performance and wear of the pump. In the open-loop mode, the flow de-manded from the pump is chosen to match the flow through the drain to the best ofour knowledge. Note that if δ is sufficiently large, errors in the model will make thelevel of water settle at the wrong level; to each fixed flow there is a correspondinglevel where the level will settle, and errors in the model will make cd

√xref(t) corre-

spond to something slightly different from xref(t). More sophisticated controllers canremedy this.

2.1.4 Model classes

When developing theory, be it system identification, estimation or control, one hasto specify the structure of the models to work with. We shall use the term modelclass to denote a set of models which can be easily characterized. A model class isthus a rather vague term such as, for instance, a linear system with white noise onthe measurements. Depending on the number of states in the linear system, and howthe linear system is parameterized, various model structures are obtained. When de-veloping theory, a parameter such as the number of states is typically represented bya symbol in the calculations — this way, several model structures can be treated inparallel, and it is often possible to draw conclusions regarding how such a parame-ter affects some performance measure. In the language of system identification, onewould thus say that theory is developed for a parameterized family of model struc-tures. Since such a family is a model class, we will often have such a family in mindwhen speaking of model classes. The concepts of models, model sets, and modelstructures are rigorously defined in the standard Ljung (1999, section 4.5) on systemidentification, but we shall allow these concepts to be used in a broader sense here.

In system identification, the choice of model class affects the ability to approximatethe true process as well as how efficiently or accurately the parameters of the modelmay be determined. In estimation and control, applicability of the results is relatedto how likely it is that a user will choose to work with the treated model structure,in light of the power of the results; a user may be willing to identify a model froma given class if that will enable the user to use a more powerful method. The choiceof model class will also allow various amount of elaboration of the theory; a modelclass with much structural information will generally allow a more precise analysis,


at the cost of increased complexity, both in terms of theory and implementation ofthe results.

Before we turn to some examples of model classes, it should be mentioned that mod-els are often describing a system in discrete time. However, this thesis is predom-inantly concerned with continuous time models, so the examples will all be of thiskind.

Continuing on our first example of a model class, in the sense of a parameterizedfamily of model structures, it could be described as all systems in the linear statespace form

x′(t) = A x(t) + B u(t)

y(t) = C x(t) + D u(t) + v(t)(2.6)

where u is the vector of system inputs, y the vector of measured outputs, v is a vectorwith white noise, and x is a finite-dimensional vector of states. For a given numberof states, n, a model is obtained by instantiating the matrices A, B, C, and D withnumerical values.

It turns out that the class (2.6) is over-parameterized in the sense that it containsmany equivalent models. If the system has just one input and one output, it is well-known that it can be described by 2 n + 1 parameters, and it is possible to restrictthe structure of the matrices such that they only contain this number of unknownparameters without restricting the possible input-output relations.

Our second and final example of a model class is obtained by allowing more freedomin the dynamics than in (2.6), while removing the part of the model that relates thesystem output to its states. In a model of this type, all states are considered outputs:

x′(t) = A( x(t) ) + B u(t) (2.7)

Here, we might pose various types of constraints on the function A. For instance, as-suming Lipschitz continuity is very natural since it ensures that the model uniquelydefines the trajectory of x as a function of u and initial conditions. Another inter-esting choice for A is the polynomials, and if the degree is at most 2 one obtains asmall but natural extension of the linear case. Another important way of extendingthe model class (2.6) is to look into how the system inputs u are allowed to enter thedynamics.

2.1.5 Model reduction

Sophisticated methods in estimation and control may result in very computation-ally expensive implementations when applied to large models. By large models, wegenerally refer to models with many states. For this reason methods and theory forapproximating large models by smaller ones have emerged. This approximation pro-cess is referred to as model reduction. Our interest in model reduction owes to itsrelation to index reduction (explained in section 2.2), a relation which may not bewidely recognized, but one which this thesis tries bring attention to. This sectionprovides a small background on some available methods.


In view of the dae for which index reduction is considered in detail in later chapters,we shall only look at model reduction of lti systems here, and we assume that thelarge model is given in state space form as in (2.6).

If the states of the model have physical meaning it might be desirable to produce asmaller model where the set of states is a subset of the original set of states. It thenbecomes a question of which states to remove, and how to choose the system matricesA, B, C, and D for the smaller system. Let the states and matrices be partitioned suchthat x2 are the states to be removed (this requires the states to be reordered if thestates to be removed are not the last components of x), and denote the blocks of thepartitioned matrices according to(

x′1(t)x′2(t)

)=

[A11 A12A21 A22

] (x1(t)x2(t)

)+

[B1B2

]u(t)

y(t) =[C1 C2

] (x1(t)x2(t)

)+ D u(t) + v(t)

(2.8)

If x2 is selected to consist of states that are expected to be unimportant due to thesmall values those states take under typical operating conditions, one conceivableapproximation is to set x2 = 0 in the model. This results in the truncated model

x′1(t) = A11 x1(t) + B1 u(t)

y(t) = C1 x1(t) + D u(t) + v(t)(2.9)

Although — at first glance — this might seem like a reasonable strategy for modelreduction, it is generally hard to tell how the reduced model relates to the originalmodel. Also, selecting which states to remove based on the size of the values theytypically take is in fact a meaningless criterion, since any state can be made small byscaling, see section 2.1.6.

Another approximation is obtained by formally replacing x′2(t) by 0 in (2.8). Theunderlying assumption is that the dynamics of the states x2 is very fast comparedto x1. A necessary condition for this to make sense is that A22 be Hurwitz, whichalso makes it possible to solve for x2 in the obtained equation A21 x1(t) + A22 x2(t) +

B2 u(t) != 0. Inserting the solution in (2.8) results in the residualized model

x′1(t) =(A11 − A12 A

−122A12

)x1(t) +

(B1 − A12 A

−122B2

)u(t)

y(t) =(C1 − C2 A

−122A12

)x1(t) +

(D − C2 A

−122B2

)u(t) + v(t)

(2.10)

It can be shown that this model gives the same output as (2.8) for constant inputs u.

If the states of the original model do not have interpretations that we are keen topreserve, the above two methods for model reduction can produce an infinite num-ber of approximations if combined with a change of variables applied to the states;applying the change of variables x = T ξ to (2.6) results in

ξ ′(t) = T −1AT ξ(t) + T −1B u(t)

y(t) = C T ξ(t) + D u(t) + v(t)(2.11)


and the approximations will be better or worse depending on the choice of T . Con-versely, by certain choices of T , it will be possible to say more regarding how closethe approximations are to the original model. If T is chosen to bring the matrix Ain Jordan form, truncation is referred to as modal truncation, and residualization isthen equivalent to singular perturbation approximation (see section 2.5). (Skogestadand Postlethwaite, 1996)

The change of variables T most well developed is that which brings the system intobalanced form. When performing truncation or residualization on a system in thisform, the difference between the approximation and the original system can be ex-pressed in terms of the system’s Hankel singular values. We shall not go into detailsabout what these values are, but the largest of them defines the Hankel norm of asystem. Neither shall we give interpretations of this norm, but it turns out that it isactually possible to compute the reduced model of a given order which minimizes theHankel norm of the difference between the original system and the approximation.

By now we have seen that there are many ways to compute smaller approximationsof a system, ranging from rather arbitrary choices to those which are clearly definedas minimizers of a coordinate-independent objective function.

Some model reduction techniques have been extended to lti dae. (Stykel, 2004)However, although the main question in this thesis is closely related to model re-duction, these techniques cannot readily be applied in our framework since we areinterested in defending a given model reduction (this view should become clear inlater chapters) rather than finding one with good properties.

2.1.6 Scaling

In section 2.1.5, we mentioned that model reduction of a system in state space form,(2.6), was a rather arbitrary process unless thinking in terms of some suitable coordi-nate system for the state space. The first example of this was selecting which states totruncate based on the size of the values that the state attains under typical operatingconditions, and here we do the simple maths behind that statement. Partition thestates such that x2 is a single state which is to be scaled by the factor a. This resultsin (

x′1(t)x′2(t)

)=

[A11

1a A12

a A21 A22

] (x1(t)x2(t)

)+

[B1a B2

]u(t)

y(t) =[C1

1a C2

] (x1(t)x2(t)

)+ D u(t) + v(t)

(2.12)

(not writing out that also initial conditions have to be scaled accordingly). Note thatthe scalar A22 on the diagonal does not change (if it would, that would change thetrace of A, but the trace is known to be invariant under similarity transforms).

In the index reduction procedure studied in later chapters, the situation is reversed:it is not a question about which states are small, but which coefficients that are small.The situation is even worse for lti dae than for the state space systems considered sofar, since in a dae there is also the possibility to scale the equations independently of


the states. Again, it becomes obvious that this cannot be answered in a meaningfulway unless the coordinate systems for the state space and the equation residuals arechosen suitably. Just like in model reduction, the user may be keen to preserve theinterpretation of the model states, and may hence be reluctant to use methods thatapply variable transforms to the states. However, unlike model reduction of ordinarydifferential equations, the dae may still be transformed by changing coordinates ofthe equation residuals. In fact, changing the coordinate system of the equation resid-uals is the very core of the index reduction algorithm.

Pure scaling of the equation residuals is also an important part of the numericalmethod for integration of dae that will be introduced in section 2.2.8. There, scalingis important not because it facilitates analysis, but because it simply improves nu-meric quality of the solution. To see how this works, we use the well-known (see, forinstance, Golub and Van Loan (1996)) bound on the relative error in the solution to

a linear system of equations A x != b, which basically says the relative errors in A andb are propagated to x by a factor bounded by the (infinity norm) condition numberof A. Now consider the linear system of equations in the variable qx (that is, x isgiven) [

1ε E1 + A1

A2

]qx !=

[1ε E1

0

]x (2.13)

where ε is a small but exactly known parameter. If we assume that the relative errorsin E and A are of similar magnitudes, smallness of ε gives both that the matrix onthe left hand side is ill-conditioned, and that the relative error of this matrix is ap-proximately the same as the relative error in E1 alone. Scaling the upper row of theequations will hence make the matrix on the left hand side better conditioned, whilenot making the relative error significantly larger. On the right hand side, scaling ofthe upper block by ε is the same as scaling all of the right hand side by ε, and hencethe relative error does not change. Hence, scaling by ε will give a smaller boundon the relative error in the solution. Although the scaling by ε was performed forthe sake of numerics, it should be mentioned that, generally, the form (2.13) is onlyobtained after choosing a suitable coordinate system for the dae residuals.

Another important situation we would like to mention — when scaling matters — iswhen gradient-based methods are used in numerical optimization. (Numerical opti-mization in one form or another is the basic tool for system identification.) Generally,the issue is how the space of optimization variables is explored, not so much the nu-merical errors in the evaluation of the objective function and its derivatives. It turnsout that the success of the optimization algorithm depends directly on how the opti-mization variables (that is, model parameters to be identified) are scaled. One of theimportant advantages of optimization schemes that also make use of the Hessian ofthe objective function is that they are unaffected by linear changes of variables.

2.2 Differential-algebraic equations

Differential-algebraic equations (generally written just dae) is a rather general kindof equations which is suitable for describing systems which evolve over time. The

2.2 Differential-algebraic equations 23

advantage they offer over the more often used ordinary differential equations is thatthey are generally easier to formulate. The price paid is that they are more difficultto deal with.

The first topic of the background we give in this section is to try to clarify why daecan be a convenient way of modeling systems in automatic control. After lookingat some common forms of dae, we then turn to the basic elements of analysis andsolution of dae. Finally, we mention some existing software tools. For recent resultson how to carry out applied tasks such as system identification and estimation fordaemodels, see Gerdin (2006), or for optimal control, see Sjöberg (2008).

2.2.1 Motivation

Nonlinear differential-algebraic equations is the natural outcome of component-based modeling of complex dynamic systems. Often, there is some known structureto the equations, for instance, it was mentioned in chapter 1 that we would like tounderstand a method that applies to equations in quasilinear form,

E( x(t), t ) x′(t) + A( x(t), t ) != 0 (2.14)

In the next section, we approach this form by looking at increasingly general types ofequations.

Within many fields, equations emerge in the form (2.14) without being recognizedas such. The reason is that when x′(t) is sufficiently easy to solve for, the equation isconverted to the state space form which can formally be written as

x′(t) != −E( x(t), t )−1 A( x(t), t )

Sometimes, the leading matrix may be well conditioned, but nevertheless non-trivialto invert. It may then be preferable to leave the equations in the form (2.14). In thiscase, the form (2.14) is referred to as an implicit ode or an index 0 dae. One reasonfor not converting to state space form is that one may loose sparsity patterns.� Hence,the state space form may require much more storage than the implicit ode, and mayalso be a much more expensive way of obtaining x′(t). Besides, even when the inverseof a sparse symbolic matrix is also sparse, the expressions in the inverse matrix aregenerally of much higher complexity.�

� Here is an example that shows that the inverse of a sparse matrix may be full:

1 0 11 1 00 1 1

−1

=12

1 1 −1−1 1 11 −1 1

� If the example above is extended to a 5 by 5 matrix with unique symbolic constants at the non-zero po-

sitions, the memory required to store the original matrix in Mathematica (Wolfram Research, Inc., 2008)is 528 bytes. If the inverse is represented with the inverse of the determinant factored out, the memoryrequirement is 1648 bytes, and without the factorization the memory requirement is 6528 bytes.


Although an interesting case by itself, the implicit ode form is not the purpose in thisthesis. What remains is the case when the leading matrix is singular. Such equationsappear naturally in many fields, and we will finish this section by looking briefly atsome examples.

As was mentioned above, quasilinear equations are the natural outcome of compo-nent-based modeling, and these will generally have a singular leading matrix. Thistype of modeling refers to the bottom-up process, where one begins by making smallmodels of simple components. The small models are then combined to form biggermodels, and so on. Each component, be it small or large, have variables that arethought of as inputs and outputs, and when models are combined to make modelsat a higher level, this is done by connecting outputs with inputs. Each connectionrenders a trivial equation where two variables are “set” equal. These equations con-tain no differentiated variables, and will hence have a corresponding zero row of theleading matrix. The leading matrix must then be singular, but the problem has aprominent structure which is easily exploited.

Our next example is models of electric networks. Here, many components (or sub-networks) may be connected in one node, where all electric potentials are equal andKirchoff’s Current Law provides the glue for currents. While the equations for thepotentials are trivial equalities between pairs of variables, the equations for the cur-rents will generate linear equations involving several variables. Still, the correspond-ing part of the leading matrix is a zero row, and the coefficients of the currents are±1, when present. This structure is also easy to exploit.

The previous example is often recognized as one of the canonical applications of theso-called bond graph theory. Other domains where (one-dimensional) bond graphsare used are mechanical translation, mechanical rotation, hydraulics (pneumatics),some thermal systems, and some systems in chemistry. While the one-dimensionalbond graphs are the most widely known, there is an extension systems which amongother applications overcomes the limitation in mechanical systems to objects whicheither translate along a given line or rotate about a given axis. This generalizationis known as multi-bond graphs or vector bond graphs, see Breedveid (1982), Ingrimand Y. (1991), and references therein. In the bond graph framework, the causality ofa model needs to be determined in order to generate model equations in ode form.However, the most frequently used technique for assigning causality to the bondgraph, named Sequential Causality Assignment Procedure (Rosenberg and Karnopp,1983, section 4.3), suffers from a potential problem with combinatorial blow-up. Oneway of avoiding this problem is to generate a dae instead.

Although some chemical processes can be modeled using bond graphs, this frame-work is rarely mentioned in recent literature on dae modeling in the chemistry do-main. Rather, equation-based formulations prevail, and according to Unger et al.(1995), most models have the quasilinear form. The amount on dae research withinthe field of chemistry is remarkable, which is likely due to their extensive applica-bility in a profitable business where high fidelity models are a key to better controlstrategies.


2.2.2 Common forms

Having presented the general idea of finding suitable model classes to work with insection 2.1.4, this section contains some common cases from the dae world. As weare moving our focus away from the automatic control applications that motivateour research, towards questions of more generic mathematical kind, our notationchanges; instead of using model class, we will now speak of the form of an equation.

We begin with some repetition of notation defined in section 1.6.

Beginning with the overly simple, an autonomous lti dae has the form

E x′(t) + A x(t) != 0 (2.15)

where E and A are constant matrices. A large part of this thesis is devoted to thestudy of this form. Adding forcing functions (often representing external inputs)while maintaining the lti property, leads to the general lti dae form

E x′(t) + A x(t) + B u(t) != 0 (2.16)

where u is a vector-valued function representing external inputs to the model, and Bis a constant matrix. The function u may be subject to various assumptions.

2.1 Example

In automatic control, system inputs are often computed as functions of the systemstate or an estimate thereof — this is called feedback — but such inputs are notexternal. To see how such feedback loops may be conveniently modeled using daemodels, let

EG x′(t) + AG x(t) +

[BG1 BG2

] (u1(t)u2(t)

)!= 0 (2.17)

be a model of the system without the feedback control. Here, the inputs to the systemhas been partitioned into one part, u1, which will later be given by feedback, and onepart, u2, which will be the truly external inputs to the feedback loop. Let

EH x′(t) + AH x(t) +

[BH1 BH2

] (u1(t)u2(t)

)!= 0 (2.18)

be the equations of the observer, generating the estimate x of the true state x. Finally,let a simple feedback be given by

u1(t) = L x(t) (2.19)

Now, it is more of a matter of taste whether to consider the three equations (2.17),(2.18), and (2.19) to be in form (2.16) or not; if not, it just remains to note that if u1is made an internal variable of the model, the equations can be writtenEG

EH

x′(t)x′(t)u′1(t)

+

AG BG1AH BH1−L I

x(t)x(t)u1(t)

+

BG2BH2

u2(t) != 0 (2.20)


Of course, eliminating u1 from these equations would be trivial;[EG

EH

] (x′(t)x′(t)

)+

[AG −BG1 L

AH − BH1 L

] (x(t)x(t)

)+

[BG2BH2

]u2(t) != 0

but the purpose of this example is to show how the model can be written in a formthat is both a little easier to formulate and that is better at displaying the logicalstructure of the model.

One way to generalize the form (2.16) is to remove the restriction to time-invariantequations. This leads to the linear time-varying form of dae:

E( t ) x′(t) + A( t ) x(t) + B( t ) u(t) != 0 (2.21)

While this form explicitly displays what part of the system’s time variability that isdue to “external inputs”, one can, without loss of generality, assume that the equa-tions are in the form

E( t ) x′(t) + A( t ) x(t) != 0 (2.22)

This is seen by (rather awkwardly) writing (2.21) as[E( t )

I

] (x′(t)α′(t)

)+

[A( t ) B( t ) u(t)

] (x(t)α(t)

)!= 0

α(t0) != 1

where the variable α has been included as an awkward way of denoting the constant1. Still, the form (2.21) is interesting as it stands since it can express logical structurein a model, and if algorithms exploit that structure one may obtain more efficient im-plementations or results that are easier to interpret. In addition, it should be notedthat the model structures are not fully specified without telling what constraints thevarious parts of the equations must satisfy. If one can handle a larger class of func-tions representing external inputs in the form (2.21) than the class of functions at thealgebraic term in (2.22), there are actually systems in the form (2.21) which cannotbe represented in the form (2.22). The same kind of considerations should be madewhen considering the form

E( t ) x′(t) + A( t ) x(t) + f (t) != 0 (2.23)

as a substitute for (2.21).

A natural generalization of (2.23) is to allow dependency of all variables where (2.23)only allows dependency of t. With the risk of loosing structure in problems withexternal inputs etc the resulting equations are then in the quasilinear form, repeatedhere,

E( x(t), t ) x′(t) + A( x(t), t ) != 0 (2.14)

The most general form of dae is

f ( x′(t), x(t), t ) != 0 (2.24)


but it takes some analysis to realize why writing this equation as f ( x(t), x(t), t ) != 0

x(t) − x′(t) != 0(2.25)

does not show that (2.14) is the most general form we need to consider.

So far, we have considered increasingly general forms of dae without consideringhow the equations can be analyzed. For instance, modeling often leads to equationswhich are clearly separated into differential and non-differential equations, and thisstructure is often possible to exploit. Since discussion of the following forms requiresthe reader to be familiar with the contents of section 2.2.3, the forms will only bementioned quickly to give some intuition about what forms with this type of struc-tural properties may look like. What follows is a small and rather arbitrary selectionof the forms discussed in Brenan et al. (1996).

The semi-explicit form looks like

x′1(t) != f1( x1(t), x2(t), t )

0 != f2( x1(t), x2(t), t )(2.26)

and one often speaks of semi-explicit index 1 dae (the concept of an index will bediscussed further in section 2.2.3), which means that the function f2 is such that x2can be solved for:

∇2f2 is square and non-singular (2.27)

Another often used form is the Hessenberg form of size r,

x′1(t) != f1( x1(t), x2(t), . . . , xr (t), t )

x′2(t) != f2( x1(t), x2(t), . . . , xr−1(t), t )

...

x′i(t)!= f2( xi−1(t), xi(t), . . . , xr−1(t), t )

...

0 != fr ( xr−1(t), t )

(2.28)

where it is required that(∂fr ( xr−1, t )

∂xr−1

) (∂fr−1( xr−2, t )

∂xr−2

)· · ·

(∂f2( x1, x2, . . . , xr−1, t )

∂x1

) (∂f1( x1, x2, . . . , xr , t )

∂xr

)(2.29)

is non-singular.


2.2.3 Indices and their deduction

In the previous sections, we have spoken of the index of a dae and index reduction,and we have used the notions as if they were well defined. This is not the case;there are many definitions of indices. In this section, we will mention some of thesedefinitions, and define what shall be meant by just index (without qualification) inthe remainder of the thesis. We shall do this in some more length than what is neededfor the following chapters, since this is a good way of introducing readers with noneor very limited experience with dae to typical dae issues.

At least three categories of indices can be identified:

• For equations that relate forcing functions to the equation variables, there areindices that are equal for any two equivalent equations. In other words, theseindices are not a property of the equations per se, but of the abstract systemdefined by the equations.

• For equations written in particular forms, one can introduce perturbations orforcing functions at predefined slots in the equations, and then define indicesthat tell how the introduced elements are propagated to the solution. Sinceequivalence of equations generally do not account for the slots, these indicesare generally not the same for two equations considered equivalent. In otherwords, these indices are a property of the equations per se, but are still definedabstractly without reference to how they are computed.

• Analysis (for instance, revealing the underlying ordinary differential equationon a manifold) and solution of dae has given rise to many methods, and onecan typically identify some natural number for each method as a measure ofhow involved the equations are. This defines indices based on methods. Ba-sically these are a property of the equations, but can generally not be definedabstractly without reference to how to compute them.

The above categorization is not a clear cut in every case. For instance, an index whichwas originally formulated in terms of a method may later be given an equivalent butmore abstract definition.

Sometimes, when modeling follows certain patterns, the resulting equations may beof known index (of course, one has to specify which index is referred to). It may thenbe possible to design special-purpose algorithms for automatic control tasks such assimulation, system identification or state estimation.

In this thesis, we regard the solution of initial value problems as a key to under-standing other aspects of dae in automatic control. We are not so much interestedin the mathematical questions of exactly when solutions exist or how the solutionsmay be described abstractly, but often think in terms of numerical implementation.For equations of unknown, higher index, all existing approaches to numerical solu-tion of initial value problems that we know of perform index reduction so that oneobtains equations of low index (typically 0 or 1), which can then be fed to one ofthe many available solvers for such equations. The index reduction algorithm usedin the following chapters on singular perturbation (described in chapter 3) relates to


the differentiation index, which we will define first in terms of this algorithm. Wewill then show an equivalent but more abstract definition. See Campbell and Gear(1995) for a survey (although incomplete today) of various index definitions and forexamples of how different indices may be related.

The algorithm that we use to reveals the differentiation index is a so-calledelimination-differentiation approach. These have been in use for a long time,and as is often the case in the area of dynamic systems, the essence of the idea is bestintroduced by looking at linear time-invariant (lti) systems, while the extensionto nonlinearities brings many subtleties to the surface. The linear case was con-sidered in Luenberger (1978), and the algorithm is commonly known as the shufflealgorithm.

For notational convenience in algorithm 2.1 (on page 30), we recall the followingdefinition from section 1.6.1

u′{i}4=

uu′

...u′(i)

In the algorithm, there is a clear candidate for an index: the final value of i. We makethis our definition of the differentiation index.

2.2 Definition (Differentiation index). The differentiation index of a square ltidae is given by the final value of i in algorithm 2.1.

While the compact representation of lti systems makes the translation of theory tocomputer programs rather straightforward, the implementation of nonlinear theoryis not at all as straightforward. This seems, at least to some part, to be explainedby the fact that there are no widespread computer tools for working with the math-ematical concepts from differential geometry. A theoretical counterpart (called thestructure algorithm, see section 2.2.5) of the shuffle algorithm, but applying to gen-eral nonlinear dae, was used in Rouchon et al. (1992). However, its implementationis nontrivial since it requires a computable representation of the function whose exis-tence is granted by the implicit function theorem. For quasilinear dae, on the otherhand, an implicit function can be computed explicitly, and our current interest inthese methods owes to this fact. For references to implementation-oriented indexreduction of quasilinear dae along these lines, see for example Visconti (1999) orSteinbrecher (2006). Instead of extending the above definition of the differentiationindex of square lti dae to the quasilinear form, we shall make a more general defi-nition, which we will prove is a generalization of the former.

The following definition of the differentiation index of a general nonlineardae can befound in Campbell and Gear (1995). It should be mentioned, though, that the authorsof Campbell and Gear (1995) are not in favor of using this index to characterize amodel, and define replacements. On the other hand, in the context of particularalgorithms, the differentiation index may nevertheless be a relevant characterization.


Algorithm 2.1 The shuffle algorithm.

Input: A square lti dae,

E x′(t) + A x(t) + B u(t) != 0

Output: An equivalent non-square dae consisting of a square lti dae with non-singular leading matrix (and redefined forcing function) and a set C =

⋃i Ci of linear

equality constraints involving x and u′{i} for some i.

Algorithm:E0 B E

A0 B A

B0 B B

i B 0while Ei is singular

Manipulate the equations by row operations so that Ei becomes partitioned as[EiEi

], where Ei has full rank and Ei = 0. This can be done by, for instance,

Gaussian elimination or QR factorization.Perform the same row operations on the other matrices, and partition the result

similarly.

Ci B(Ai x + Bi u′{i}

!= 0)

Ei+1 B

[EiAi

]Ai+1 B

[Ai0

]Bi+1 B

[— B — 00 — B —

]i B i + 1if i > dim x

abort with “ill-posed”end

end

Remark: The new matrices computed in each iteration simply correspond to differ-entiating the equations from which the differentiated variables have been removedby the row operations. (This should clarify the notation used in the construction theBi .) Since the row operations generate equivalent equations, and the equations thatget differentiated are also kept unaltered in C, it is seen that the output equations areequivalent to the input equations.

See the notes in algorithm 2.2 regarding geometric differentiation, and note that as-sumptions about constant Jacobians are trivially satisfied in the lti case.


Consider the general nonlinear dae

f ( x′(t), x(t), t ) != 0 (2.30)

By using the notation

x{i}(t) =(x(t), x(t), . . . , x(i)(t)

)(2.31)

the general form can be written f0( x{1}(t), t ) != 0. Note that differentiation with

respect to t yields an equation which can be written f1( x{2}(t), t ) != 0. Introducingthe derivative array

Fi( x′{i+1}(t), t ) =

f0( x′{1}(t), t )

...fi( x′{i+1}(t), t )

(2.32)

the implied equation

Fi( x{i+1}(t), t ) != 0 (2.33)

is called the derivative array equations accordingly.

2.3 Definition (Differentiation index). Suppose (2.30) is solvable. If x(t) isuniquely determined given x(t) and t in the non-differential equation (2.33), for allx(t) and t such that a solution exist, and νD is the smallest i for which this is possible,then νD is denoted the differentiation index of (2.30).

Next, we show that the two definitions of the differentiation index are compatible.

2.4 Theorem. Definition 2.3 generalizes definition 2.2.

Proof: Consider the derivative array equations Fi( x{i+1}, t ) != 0 for the square ltidae of definition 2.2:

A0 E0A0 E0

. . .. . .A0 E0

xx...

x(i+1)

+

B u(t)B u′(t)...

B u′(i)(t)

!= 0 (2.34)

Suppose definition 2.2 defines the index as i. Then Ei in algorithm 2.1 is non-singularby definition. The first row elimination of the shuffle algorithm on (2.34) yields

A0 E0A0

A0 E0A0

. . .. . .A0 E0A0

xx...

x(i+1)

+

B u(t)B u(t)B u′(t)B u′(t)...

B u′(i)(t)B u′(i)(t)

!= 0


Reordering the rows as

A0 E0A0. . .

. . .A0 E0

A0A0 E0

A0

xx...

x(i+1)

+

B u(t)B u′(t)...

B u′(i−1)(t)B u′(i)(t)B u′(i)(t)B u(t)

!= 0 (2.35)

and ignoring the last two rows, this can be writtenA1 E1

A1 E1. . .

. . .A1 E1

xx...x(i)

+ · · · != 0

using the notation in algorithm 2.1. The forcing function u has been suppressed forbrevity. After repeating this procedure i times, one obtains[

Ai Ei] (xx

)+ · · · != 0

which shows that definition 2.2 gives an upper bound on the index defined by defi-nition 2.3.

Conversely, it suffices to show that the last two rows of (2.35) do not contribute tothe determination of x. The last row only restricts the feasible values for x, which isconsidered a given in the equation. The second last row contains no information thancan be propagated to x since it can be solved for any x(i) by a suitable choice of x(i+1)

(which appears in no other equation). Since this shows that no information about xwas discarded, we have also found that if the index as defined by definition 2.2 isgreater than i, then Ei is singular, and hence the index as defined by definition 2.3must also be greater than i. That is, definition 2.2 gives a lower bound on the indexdefined by definition 2.3.

Many other variants of differentiation index definitions can be found in Campbelland Gear (1995), which also provides the relevant references. However, they avoiddiscussion of geometric definition of differentiation indices. While not being impor-tant for lti dae, where the representation by numeric matrices successfully capturesthe geometry of the equations, geometric definitions turn out to be important fornonlinear dae. This is emphasized in Thomas (1996), as it summarizes results byother authors.(Rabier and Rheinboldt, 1994; Reich, 1991; Szatkowski, 1990, 1992)It is noted that the geometrically defined differentiation index is bounded by thedimension of the equations, and cannot be computed reliably using numerical meth-ods; the indices which can be computed numerically are not geometric and may notbe bounded even for well-posed equations. The presentation in Thomas (1996) isfurther developed in Reid et al. (2001) to apply also to partial differential-algebraicequations.


Having discussed the differentiation index with its strong connection to algorithms,we now turn to an index concept of another kind, namely the perturbation index.The following definition is taken from Campbell and Gear (1995), which refers toHairer et al. (1989).

2.5 Definition. The dae f ( x′(t), x(t), t ) != 0 has perturbation index νP along a so-lution x on the interval I = [ 0, T ] if νP is the smallest integer such that if

f ( x′(t), x(t), t ) != δ(t)

for sufficiently smooth δ, then there is an estimate�

‖x(t) − x(t)‖ ≤ C(‖x(0) − x(0)‖ + ‖δ‖tνP−1

)Clearly, one can define a whole range of perturbation indices by considering various“slots” in the equations, and each form of the equations may have its own naturalslots. There are two aspects of these indices we would like to emphasize. First, theyare defined completely without reference to a method for computing them, and inthis sense they seem closer to capturing intrinsic features of the system described bythe equations, than indices that are defined by how they are computed. Second, andon the other hand, the following example shows that these indices may be stronglyrelated to which set of equations are used to describe a system.

2.6 ExampleConsider computing the perturbation index of the dae

f ( x′(t), x(t), t ) != 0

We must then examine how the solution depends on the forcing perturbation func-tion δ in

f ( x′(t), x(t), t ) != δ(t)

Now, let the matrix K( x(t), t ) define a smooth, non-singular transform of the equa-tions, leading to

K( x(t), t ) f ( x′(t), x(t), t ) != 0

with perturbation index defined by examination of

K( x(t), t ) f ( x′(t), x(t), t ) != δ(t)

� Here, the norm with ornaments is defined by

‖δ‖tm =m∑i=0

supτ∈[ 0, t ]

∥∥∥∥δ′(i)(τ)∥∥∥∥ , m ≥ 0

‖δ‖t−1 =

t∫0

∥∥∥∥δ′(i)(τ)∥∥∥∥ dτ


Trying to relate this to the original perturbation index, we could try rewriting theequations as

f ( x′(t), x(t), t ) != K( x(t), t )−1 δ(t)

but this introduces x(t) on the right hand side, and is no good. Further, since the per-turbation index does not give bounds on the derivative of the estimate error, there areno readily available bounds on the derivatives of the factor K( x(t), t )−1 that dependonly on t.

In the special case when the perturbation index is 0, however, a bound on K allows usto translate a bound in terms of

∥∥∥K( x(t), t )−1 δ(t)∥∥∥t−1

to a bound in terms of ‖δ(t)‖t−1.This shows that, at least, this way of rewriting the equations does not change theperturbation index.

It would be interesting to relate the differentiation index to the perturbation index,but we have already seen an example of how different index definitions can be re-lated, and shall not dwell more on this. Instead, there is one more index we wouldlike to mention since it is instrumental to a well developed theory and will be thestarting point for chapter 5. This is the strangeness index, developed for time-varyinglinear dae in Kunkel and Mehrmann (1994), see also Kunkel and Mehrmann (2006).

Perhaps due to its ability of revealing a more intelligent characterization of a sys-tem compared to, for instance, the differentiation index, the strangeness index issomewhat expensive to compute. This becomes particularly evident in the associatedmethod for solving initial value problems, where the index computations are per-formed at each step of the solution. This is addressed in the relatively recent Kunkeland Mehrmann (2004), see also Kunkel and Mehrmann (2006, remark 6.7 and re-mark 6.9). However, one caveat remains, being that the implications of determiningranks numerically are not understood, see for instance Kunkel and Mehrmann (2006,remark 6.7) or Kunkel et al. (2001, remark 8). The kind of results that are missinghere is highly related to the matrix-valued perturbation problems considered in thisthesis, although our analysis is related with the differentiation index rather than thestrangeness index.

A quite different method which reduces the index is the Pantelides’ algorithm (Pan-telides, 1988) and the dummy derivatives (Mattsson and Söderlind, 1993) extensionthereof. This technique is in extensive use in component-based modeling and sim-ulation software for the Modelica language, such as Dymola (Mattsson et al., 2000;Brück et al., 2002) and OpenModelica (Fritzson et al., 2006a,b). A major differencebetween the previously discussed index reduction algorithms and Pantelides’ algo-rithm is that the former use mathematical analysis to derive the new form, while thelatter uses only the structure of the equations (the equation–variable graph). Sincethe equation–variable graph does not require the equations to be in any particularform, the technique is applicable to general nonlinear dae. While the graph-basedtechnique is expected to be mislead by a change of variables and other manipulationsof the equations (see section 1.2.1), it is well suited for the equations as they arise inthe software systems mentioned above.


Hereafter, when speaking of just the index (without qualification), we refer to thedifferentiation index, often thinking of it as the number of steps required to shufflethe equations to an implicit ode.

In the presence of uncertainty, there are two more index concepts which we need todefine. Thinking of the uncertain dae as some element (point) in a set of dae, wemay use the term point dae to denote an exact dae. When a dae is uncertain, itsindex also becomes uncertain in the general case, which is emphasized by the nextdefinition.

2.7 Definition (Pointwise index). Let just “index” refer to one of the notions ofindex for a point dae. The pointwise index of the uncertain dae

f ( x(t), x′(t), t ) != 0 for some fixed f ∈ F

is defined as the set {index of

(f ( x(t), x′(t), t ) != 0

): f ∈ F

}If the set contains exactly one element by construction or by assumption, we willmake reuse of notation and let the pointwise index refer to this element instead ofthe set containing it.

We will often make assumptions regarding the pointwise index of a dae, and it is im-portant to see that this is a way of removing unwanted point dae from the uncertaindae.

The second index concept appears when the uncertain dae is being approximatedby one with less (or none) uncertainty, typically of higher index than the dae beingapproximated.

2.8 Definition (Nominal index). When an uncertain dae is being approximated byanother dae where some of the uncertainty is removed, the nominal index refers tothe index of the latter dae.

The nominal index is thus something defined by how the uncertain dae is beingapproximated, and there will generally be a trade-off between stiffness in an approx-imation of low nominal index, and large uncertainty bounds on the solution to anapproximation of higher nominal index.

2.2.4 Transformation to quasilinear form

In this section, the transformation of a general nonlinear dae to quasilinear form isconsidered. This may seem like a topic for section 2.2.2, but since we need to refer tothe index concept, waiting until after section 2.2.3 has been motivated.

For ease of notation, we shall only deal with equations without explicit dependenceon the time variable in this section. This way, it makes sense to write a time-invariantnonlinear dae as

f ( x, x′ , x′′ , . . . ) != 0 (2.36)


The variable in this equation is the function x, and the zero on the right hand sidemust be interpreted as the mapping from all of the time domain to the constantreal vector 0. We choose to interpret the equality relation of the equation point-wise, although other measure-zero interpretations could be made (we are not seekingnew semantics, only a shorter notation compared to (2.24)). Including higher orderderivatives in the form (2.36) may seem just like a minor convenience compared tousing only first order derivatives in (2.24), but some authors remark that this is notalways the case (see, for instance, (Mehrmann and Shi, 2006)), and this is a topic forthe discussion below.

The time-invariant quasilinear form looks like

E( x ) x′ + A( x ) != 0 (2.37)

Assuming that (2.24) has index νD but is not in the form (2.37), can we say somethingabout the index of the corresponding (2.37)?

Not being in the form (2.37) can be for two reasons:

• There are higher-order derivatives.

• The residuals are not linear in the derivatives.

To remedy the first, one simply introduces new variables for all but the highest andthe higher-than-1-order derivatives. Of course, one also adds the equations relatingthe introduced variables to the derivatives they represent; each new variable gets oneassociated equation. This procedure does not raise the index, since the derivativeswhich have to be solved for really have not changed. If the highest order derivativescould be solved for in terms of lower-order derivatives after νD differentiations of(2.24), they will be possible to solve for in terms of the augmented set of variablesafter νD differentiations of (2.37) (of course, there is no need to differentiate the in-troduced trivial equations). The introduced variables’ derivatives that must also besolved for are trivial (that is why the definitions of index does not have to mentionsolution of the lower-order derivatives).

Turning to the less trivial reason, nonlinearity in derivatives, the fix is still easy;introduce new variables for the derivatives that appear nonlinearly and add the linear(trivial) equations that relate the new variables to derivatives of the old variables;change

f ( x, x′ ) != 0 (2.38)

to x′!= x

f ( x, x ) != 0

Note the important difference to the previous case: this time we introduce new vari-ables for some highest-order derivatives. This may have implications for the index.If the index was previously defined as the number of differentiations required to be


able to solve for x′ , we must now be able to solve for x′ = x′′ . Clearly, this can beobtained by one more differentiation once x′ has been solved for, as in the followingexample.

2.9 Example

Consider the index-0 dae ex′2

!= ex1

x′1!= −x2

Taking this into the form (2.37) brings us tox′2

!= x

ex!= ex1

x′1!= −x2

where x′ cannot be solved for immediately since it does not even appear. However,after differentiating the purely algebraic equation once, all derivatives can be solvedfor;

x′2!= x

ex x′!= ex1x′1

x′1!= −x2

However, the index is not raised in general; it is only in case the nonlinearly appear-ing derivatives could not be solved for in less than νD steps that the index will raise.The following example shows a typical case where the index is not raised.

2.10 Example

By modifying the previous example we get a system that is originally index-1,ex′2

!= ex1

x′1!= −x2

x3!= 1

Taking this into the form (2.37) brings us to

x′2!= x

ex!= ex1

x′1!= −x2

x3!= 1


which is still index-1 since all derivatives can be solved for after one differentiationof the algebraic equations:

x′2!= x

ex x′!= ex1x′1

x′1!= −x2

x′3!= 0

Although the transformation discussed here may raise the index, it may still be auseful tool in case the equations and forcing functions are sufficiently differentiable.The transformation has been implemented as a part of a tool for finding the quasi-linear structure in equations represented in general form. However, even thoughautomatic transformation to quasilinear form is possible, it should be noted that for-mulating equations in the quasilinear form is a critical part of the modeling process,and should be done carefully. This is emphasized in the works on dae with prop-erly state leading terms by März and coworkers (Higueras and März, 2004; März andRiaza, 2006, 2007, 2008).

2.2.5 Structure algorithm

The application of the structure algorithm to dae described in this section is due toRouchon et al. (1992), which relies on results in Li and Feng (1987).

The structure algorithm was developed for the purpose of computing inverse sys-tems; that is, to find the input signal that produces a desired output. It assumes thatthe system’s state evolution is given by an ode and that the output is a function ofthe state and current input. Since the desired output is a known function, it can beincluded in the output function; that is, it can be assumed without loss of generalitythat the desired output is zero. The algorithm thus provides a means to determine uin the setup x

′(t) != h( x(t), u(t), t )

0 != f ( x(t), u(t), t )

The algorithm produces a new function η such that u can be determined from 0 !=η( x, u, t ). By taking h( x, u, t ) = u, this reduces to a means for determining thederivatives of the variables x in the dae

0 != f ( x(t), x′(t), t )

In algorithm 2.2 we give the algorithm applied to the dae setup. It is assumed thatdim f = dim x, that is, that the system is square.


Algorithm 2.2 The structure algorithm.

Input: A square dae,

f ( x(t), x′(t), t ) != 0

Output: An equivalent non-square dae consisting of a square dae from which x′

can be solved for, and a set of constraints C =∧i

(Φi( x(t), t, 0 ) != 0

). Let α be the

smallest integer such that ∇2fα( x, x, t ) has full rank, or∞ if no such number exists.

Invariant: The sequence of fk shall be such that the solution is always determinedby fk( x, x, t ) = 0, which is fulfilled for f0 by definition. Reversely, this will makefk( x, x, t ) = 0 along solutions.

Algorithm:f0 = f

i B 0while ∇2fi( x, x, t ) is singular

Since the rank of ∇2fi( x, x, t ) is not full, it makes sense to split fi into twoparts; fi being a selection of components of fi such that ∇2 fi( x, x, t ) has fulland maximal rank (that is, the same rank as ∇2fi( x, x, t )), and fi being theremaining components.

Locally (and as all results of this kind are local anyway, this will not be furtheremphasized), this has the interpretation that the dependency of fi on x can beexpressed in terms of fi( x, x, t ) instead of x; there exists a function Φi suchthat fi( x, x, t ) = Φi( x, t, fi( x, x, t ) ).

Since fi( x, x, t ) = 0 along solutions, we replace the equations given by fi bythe residuals obtained by differentiating Φi( x(t), t, 0 ) with respect to t andsubstituting x for x′ ;

fi+1 =(

fi( x, x, t ) 7→ ∇1Φi( x, t, 0 ) x + ∇2Φi( x, t, 0 )

)i B i + 1if i > dim x

abort with “ill-posed”end

end

Remark: Assuming that all ranks of Jacobian matrices are constant, it is safe toabort after dim x iterations.(Rouchon et al., 1992) Basically, this condition meansthat the equations are not used pointwise, but rather as geometrical (algebraic) ob-jects. Hence, in the phrasing of Thomas (1996), differentiations are geometric, and αbecomes analogous to the geometric differentiation index.

In Rouchon et al. (1992), additional assumptions on the selection of components toconstitute fi are made, but we will not use those here.


2.2.6 LTI DAE, matrix pencils, and matrix pairs

The linear time-invariant dae

E x′(t) + A x(t) != 0 (2.39)

is closely connected to the concepts of matrix pencils and matrix pairs.

To the equation we may associate the matrix pencil s 7→ s E+A, and a large amount ofdae analysis in the literature is formulated using matrix pencil theory. The sign con-vention we use (with addition instead of subtraction in the expression for the pencil)differs from much of the literature on lti dae (for instance, Stewart and Sun (1990)),but is natural in view of how the dae (2.39) is written, and is also the conventionwhich generalizes to higher order matrix polynomials, compare Higham et al. (2006).We will not go deep into this theory in this background, since the theory is basicallyconcerned with exactly known matrices, while the theme of this thesis is uncertainmatrices. Just to show the close connection between (2.39) and the correspondingmatrix pencil, note that the Laplace transform of the equation is

E ( s X(s) − x(0) ) + AX(s) != 0

or

( s E + A )X(s) != E x(0)

If the matrix pencil is invertible at some s, it will be invertible at almost every s, thepencil is called regular, X(s) can be evaluated at almost every point, and it will bepossible to find x(t) by inverse Laplace transform. In the other case, when the pencilis singular at every s, the pencil is called singular, and the next theorem explainswhy we will avoid singular pencils in this thesis.

2.11 Theorem. If the matrix pencil associated with the dae (2.39) is singular, thedae with initial conditions x(0) = 0 has an infinite number of solutions, includingboth bounded and exponentially increasing functions. Among the non-zero boundedsolutions, there are solutions with arbitrarily slow exponential decay.

Proof: The following proof is similar to that of Kunkel and Mehrmann (2006, theo-rem 2.14).

Since λ E + A is singular for every λ, we can take n + 1 numbers {λi}n+1i=1 and corre-

sponding vectors { vi , 0 }ni=1 such that (λi E + A ) vi = 0 for all i. To construct realsolutions, we make the selection such that if Imλi , 0, then the complex conjugateof λi appears as λj for some j, and the corresponding vi and vj are also related bycomplex conjugate. Since the number of elements in { vi , 0 }ni=1 exceeds the dimen-sion of space, they are linearly dependent, and there is a linear combination whichvanishes, ∑

i

αi vi = 0


where not all αi are zero. It follows that the function

x(t) 4=∑i

αi vi eλi t

is real-valued, satisfies x(0) = 0, is not identically zero, and solves (2.39).

Since the choice of {λi}n+1i=1 was arbitrary up to the pairing of complex conjugates, and

two disjoint such sets cannot produce the same solution, the number of solutions isinfinite, and since the Reλi may be chosen all negative as well as all positive, there areboth bounded and exponentially increasing solutions. Arbitrarily slow exponentialdecay is obtained by selecting all Reλi negative, but close to zero, and by setting oneeigenvalue to zero, the solutions will have a non-zero constant asymptote.

From the linearity of the differential equation (2.39), it follows that if there exists asolution with non-zero initial conditions, it will not be unique since all the solutionswith zero initial conditions may be added to it to produce new solutions. In view ofthis, we call the dae singular (regular) if its matrix pencil is singular (regular).

2.12 Corollary. An lti dae with singular pencil does not have a finite index.

Proof: If it had, the definition of index would imply that the dae had a unique solu-tion. This contradicts singularity of its pencil.

The next lemma helps us detect singular pencils.

2.13 Lemma. If the respective right null spaces of E and A intersect, or the respec-tive left null spaces intersect, the pencil s 7→ s E + A is singular.

Proof: First, the case of intersecting right null spaces. Take a v , 0 belonging to the

intersection. Then v is a non-trivial solution to ( s E + A ) v != 0 for any s, and hences E + A is not invertible for any s.

For intersecting left null spaces, take v , 0 from the intersection. Then ( s E + A )Tv!=

0 has a nontrivial solution for every s. Hence det ( s E + A )T = det ( s E + A ) = 0 forevery s.

The consequence of intersecting right null spaces is easy to characterize. It meansthat a change of variables will reveal one or more transformed variables that do notappear in the equations at all. Clearly, any differentiable functions passing throughthe origin will do as a solution for these variables, if the problem has a solution at all.However, the dae will generally have no solutions since the remaining variables willbe over-determined.

The case of intersecting left null spaces is illustrated as an example of theorem 2.11at the end of the section.

For many purposes, the asymmetric roles of E and A in the matrix pencil may bedisturbing, and it may make more sense to speak of the matrix pair ( E, A ) instead.


In accordance with lti dae, the first matrix in the pair (here, E) will be denotedthe leading matrix of the pair, while the second matrix (here, A) will be denotedthe trailing matrix of the pair. The notions of regular and singular carry over frommatrix pencils to matrix pairs by the obvious association.

2.14 Definition (Eigenvalues of a matrix pair). The eigenvalues of the matrix pair( E, A ) are defined as the equivalence classes of complex scalar pairs (α, β ) , ( 0, 0 )such that there exists a vector x , 0 for which

α E x + β A x != 0

Here, equivalence is defined as two pairs being equivalent if one equals the othertimes some complex scalar.�

By identifying the eigenvalue [(α, 0 ) ] with ∞, and any eigenvalue [(α, β ) ] whereβ , 0 with the common ratio α

β , we may also consider the eigenvalues as belongingto C ∪ {∞ }.

Two matrix pairs ( E1, A1 ) and ( E2, A2 ) are said to be equivalent if there exist non-singular matrices T and V such that E2 V = T E1 and A2 V = T A1. From the defi-nition of eigenvalues, it is easy to see that two equivalent matrix pairs have the sameeigenvalues.

The symmetric view on an eigenvalue as an equivalence class of pairs of scalars isthe natural choice when the symmetric relation between the two matrices in a matrixpair is to be maintained. However, in our view of the matrix pair as a representationof an lti dae, the matrices in the pair do not have a truly symmetric relation — it isalways the leading one which is multiplied by the scalar parameter in a matrix pencil.The following trivial theorem justifies the other view on matrix pair eigenvalues inthis thesis.

2.15 Theorem. If E is non-singular in the matrix pair ( E, A ), then the eigenvaluesof the pair are the same as the eigenvalues of the matrix −E−1A.

Proof: The pair ( E, A ) is equivalent to(I , E−1A

). Clearly [(α, 0 ) ] is not an eigen-

value, so any eigenvalue is in the form [(α, 1 ) ], satisfying the equation

α x!= −1 E−1A x

This shows that [(α, 1 ) ], identified with α1 = α, is also a matrix eigenvalue of −E−1A.

The argument also works in the converse direction.

The following theorem generalizes the Jordan canonical form for matrices to matrixpairs.

� The definition follows Stewart and Sun (1990), although the sign conventions differ due to different signconventions for matrix pencils.


2.16 Theorem (Weierstrass canonical form). Let ( E, A ) be a regular matrix pair.Then it is equivalent to a pair in the form( [

IN

],

[JI

] )(2.40)

where J is in Jordan canonical form, and N is a nilpotent matrix in Jordan canonicalform.

Proof: This result is easy to find in the literature, and we suggest Stewart and Sun(1990, chapter vi, theorem 1.13) since it has other connections to this thesis as well.

It is easy to see that the index of nilpotency of N in the canonical form coincideswith the differentiation index of the corresponding dae, with the convention that theindex of nilpotency of an empty matrix is 0.

2.17 Definition (Singular/regular uncertain matrix pair). An uncertain matrixpair is said to be singular if it admits at least one singular point matrix pair. Comparesingular interval matrix. Otherwise, it is said to be regular.

The definitions of singular/regular are also applied to uncertain lti dae in the ob-vious manner.

The section ends with an example of a dae with singular matrix pair.

2.18 ExampleRow and column reductions of the matrix pair are often useful tools to discover struc-ture in linear dae. Row reduction corresponds to replacing equations by equivalentones, while column reduction corresponds to invertible changes of coordinates. Sup-pose row reduction of the leading matrix of some pair resulted in

1 1 1 10 1 1 10 0 0 00 0 0 0

,

1 0 1 00 1 1 21 1 1 11 1 1 1

(2.41)

where the lower part of the trailing matrix does not have full rank since it has linearlydependent rows. By also performing row reduction on the trailing matrix, a commonleft null space is revealed,

1 1 1 10 1 1 10 0 0 00 0 0 0

,

1 0 1 00 1 1 21 1 1 10 0 0 0

That is, the vector(0 0 0 1

)proves that the matrix pencil is singular according

to lemma 2.13, and next we will construct some of the solutions whose existence aregiven by theorem 2.11.


Before we start solving the right null space, column operations (that is, a change ofvariables) are applied to the leading matrix to reveal as many zeros as possible,

1 0 0 00 1 0 00 0 0 00 0 0 0

,

1 −1 1 00 1 0 11 0 0 00 0 0 0

The pencil, at some point λ, 1 + λ −1 1 0

0 1 + λ 0 11 0 0 00 0 0 0

can now be column reduced using a change of variables that will depend on λ, re-vealing its right null space

1 + λ −1 1 00 1 + λ 0 11 0 0 00 0 0 0

1 0 0 00 1 0 0

−( 1 + λ ) 1 1 00 −( 1 + λ ) 0 1

=

0 0 1 00 0 0 11 0 0 00 0 0 0

Hence,

v(λ ) 4=

1 0 0 00 1 0 0

−( 1 + λ ) 1 1 00 −( 1 + λ ) 0 1

0100

=

011

−( 1 + λ )

shows how to find non-trivial elements in the right null space. Inspection shows thatonly one of the components of v(λ ) actually depends on λ, so any set of three or moresuch vectors will be linearly dependent (form a matrix with the v(λ ) as columns, andconsider the row rank). Denoting three values for λ by {λi }3i=1, it can be seen that forevery α,

α (λ2 − λ3 ) v(λ1 ) + α (λ3 − λ1 ) v(λ2 ) + α (λ1 − λ2 ) v(λ3 ) = 0

For the coefficients to be real, α must be purely imaginary if there is a pair of complexconjugates.

Hence,

x( t, α, λ1, λ2, λ3 ) 4= α

1 −1 0 00 1 −1 −10 0 1 00 0 0 1

(λ2 − λ3 ) v(λ1 ) eλ1 t

+

(λ3 − λ1 ) v(λ2 ) eλ2 t

+

(λ1 − λ2 ) v(λ3 ) eλ3 t

(2.42)

is a family of nontrivial solutions (the matrix undoes the change of variables usedto eliminate entries in the leading matrix). Figure 2.1 shows a random selection ofbounded solutions.


−0.5

0

0.5

1

1.5

5 10 15

0

1

2

0 5 10 15

−0.1

−0.05

0

0.05

0.1

5 10 15

−0.5

0

0.5

5 10 15

t

t

t t

Figure 2.1: First coordinate of randomly generated solutions to the singular dae(2.41), given by (2.42). Random numbers have been sampled uniformly fromthe interval [ 0.1, 1 ]. Random real parts of exponents have been chosen withnegative sign to produce bounded solutions. The number α in (2.42) is chosenwith modulus 1, and such that real solutions are produced. Upper left, onerandom real exponent, and one pair of random complex conjugates. Upper right,one exponent at 0, and one pair of complex conjugates. Lower left three randomreal exponents. Lower right, one exponent at 0, two random real exponents.


The example shows that if a singular dae is within the set of dae defined by an un-certain dae, the infinite set of solutions produced by the singular dae will containsolutions which could be the output of a regular system with any eigenvalues. Hence,the type of assumptions used in the thesis to restrict the set of solutions to the un-certain system, formulated in terms of system poles, will not be capable of ruling outthe solutions of the singular dae. Indeed, the eigenvalues of a singular pencil are noteven defined. On the other hand, as will be seen in section 7.1.2, the uncertain daethat admit singular ones will be possible to detect without additional assumptions,and this allows us to disregard this case as one which is not covered by our theory.Conversely, when our methods (including assumptions we have to make) show thatthe solutions to the dae are converging uniformly as the uncertainties tend to zero,this shows that the uncertain dae is regular, and hence that the singular uncertaindae that are excluded from our theory are exceptional in this sense.

2.2.7 Initial conditions

The reader might have noticed that the shuffle algorithm (on page 30) not only pro-duces an index and an implicit ode, but also a set of constraints. These constrains thesolution at any point in time, and the implicit ode is only to be used where the con-straints are satisfied. The constraints are often referred to as the algebraic constraintswhich emphasizes that they are non-differential equations. They can be explicit asin the case of non-differential equations in the dae as it is posed, or implicit as inthe case of the output from the shuffle algorithm. Of course, the constraint equationsare not unique, and it may well happen that some of the equations output from theshuffle algorithm were explicit in the original dae.

Making sure that numerical solutions to dae do not leave the manifold defined bythe algebraic constraints is a problem in itself, and several methods to ensure thisexist. However, in theory, no special methods are required, since the produced im-plicit ode is such that an exact solution starting on the manifold will remain on themanifold. This brings up another practical issue, namely that initial value problemsare ill-posed if the initial conditions they specify are inconsistent with the algebraicconstraints.

Knowing that a dae can contain implicit algebraic constraints, how can we know thatall implicit constraints have been revealed at the end of the index reduction proce-dure? If the original dae is square, any algebraic constraints will be present in differ-entiated form in the index 0 square dae. This implies that the solution trajectory willbe tangent to the manifold defined by the algebraic constraints, and hence it is suf-ficient that the initial conditions for an initial value problem are consistent with thealgebraic constraints for the whole trajectory to remain consistent. In other words,there exist solutions to the dae starting at any point which is consistent with thealgebraic constraints, and this shows that there can be no other implicit constraints.

We shall take a closer look at this problem in section 3.3. Until then, we just notethat rather than rejecting initial value problems as ill-posed if the initial conditionsthey specify are inconsistent with algebraic constraints, one usually interprets theinitial conditions as a guess, and then applies some scheme to find truly consistent


initial conditions that are close to the guess in some sense. The importance of thistask is suggested by the fact that the influential Pantelides (1988) addressed exactlythis, and it is no surprise (Chow, 1998) since knowing where a dae can be initializedentails having a characterization of the manifold to which all of the solution mustbelong. Another structural approach to system analysis is presented in Unger et al.(1995). Their approach is similar to the one we propose in chapter 3. However, justas Pantelides’ algorithm, it considers only the equation-variable graph, although it isnot presented as a graph theoretical approach. A later algorithm which is presentedas graph theoretical, is given in Leitold and Hangos (2001), although a comparisonto Pantelides’ algorithm seems missing.

In Leimkuhler et al. (1991), consistent initial conditions are computed using dif-ference approximation of derivatives, assuming that the dae is quasilinear and ofindex 1. Later, Veiera and Biscaia Jr. (2000) gives an overview of methods to com-pute consistent initial conditions. It is noted that several successful approaches havebeen developed for specific applications where the equations are in a well under-stood form, and among other approaches (including one of their own) they men-tion that the method in Leimkuhler et al. (1991) has been extended by combiningit with Pantelides’ algorithm to analyze the system structure rather than assumingthe quasilinear index 1 form. Their own method, presented in some more detail inVeiera and Biscaia Jr. (2001), is used to find initial conditions for systems starting insteady state, but allows for a discontinuity in forcing functions at the initial time.Of all previously presented methods for analysis of dae, the one which most resem-bles that proposed in chapter 3 is found in Chowdhry et al. (2004). They propose amethod similar to that in Unger et al. (1995), but take it one step further by makinga distinction between linear and nonlinear dependencies in the dae. This allows ltidae to be treated exactly, which is an improvement over Unger et al. (1995), whileperforming at least as good in the presence of nonlinearities. In view of our method,the partitioning into structural zeros, constant coefficients, and nonlinearities seemssomewhat arbitrary. However, they suggest that even more categories could be addedto extend the class of systems for which the method is exact. The need for a rigorousanalysis of how tolerances affect the algorithm is not mentioned.

2.2.8 Numerical integration

There are several techniques in use for the solution of dae. In this section, we men-tion some of them briefly, and explain one in a bit more detail. A classic accessibleintroduction to this subject is Brenan et al. (1996), which contains many referencesto original papers and further theory.

The method we focus on in this section is applicable to equations with differentiationindex 1, and this is the one we describe first. It belongs to a family referred to asbackward difference formulas or bdfmethods. The formula of the method tells howto treat x′(t) in

f ( x′(t), x(t), t ) != 0

when the problem is discretized. By discretizing a problem we refer to replacing


the infinite-dimensional problem of computing the value of x at each point of aninterval, with a finite-dimensional problem from which the solution to the originalproblem can be approximately reconstructed. The most common way of discretizingproblems is to replace the continuous function x by a time series which approximatesx at discrete points in time:

xi ≈ x(ti)

Reconstruction can then be performed by interpolation. A common approach to theinterpolation is to do linear interpolation between the samples, but this will give afunction which is not even differentiable at the sample points. To remedy this, in-terpolating splines can be used. This suggests another way to discretize problems,namely to represent the discretized solution directly in spline coefficients, whichmakes both reconstruction as well as treatment of x′ trivial. However, solving forsuch a discretization is a much more intricate problem than to solve for a pointwiseapproximation.

Before presenting the bdf methods, let us just mention how the simple (forward)Euler step for ode fits into this framework. The problem is discretized by point-

wise approximation, and the ode x′(t) != g( x(t), t ) is written as a dae by definingf ( x, x, t ) 4= −x + g( x, t ). Replacing x′(tn) by the approximation ( xn+1 − xn )/( tn+1 −tn ) then yields the familiar integration method:

0 != f (xn+1 − xntn+1 − tn

, xn, tn ) ⇐⇒

0 != −xn+1 − xntn+1 − tn

+ g( xn, tn ) ⇐⇒

xn+1!= xn + ( tn+1 − tn ) g( xn, tn )

The k-step bdfmethod also discretizes the problem by pointwise approximation, butreplaces x′(tn) by the derivative at tn of the polynomial which interpolates the points( tn, xn ), ( tn−1, xn−1 ), . . . , ( tn−k , xn−k ). (Brenan et al., 1996, section 3.1) We shall takea closer look at the 1-step bdf method, which given the solution up to ( tn−1, xn−1 )and a time tn > tn−1 solves the equation

f

(xn − xn−1

tn − tn−1, xn, tn

)!= 0

to obtain xn. Of course, selecting how far from tn−1 we may select tn without gettingtoo large errors in the solution is a very important question, but it is outside the scopeof this background to cover this. A related topic of great importance is to ensurethat the discretized solution converges to the true solution as the step size tends tozero, and when it does, to investigate the order of this convergence. Such analyzesreveal how the choice of k affects the quality of the solution, and will generally alsogive results that depend on the index of the equations. The following example isnot giving any theoretical insights, but just shows the importance of the index whensolving a dae by the 1-step bdfmethod.


2.19 Example

Consider applying the 1-step bdfmethod to the square index 1 lti dae

E x′(t) + A x(t) + B u(t) != 0

Discretization leads to

Exn − xn−1

hn+ A xn + B u(tn) != 0

Where hn = tn − tn−1. By writing this as

( E + hn A ) xn!= E xn−1 − hn B u(tn)

we see that the iteration matrix

E + hn A (2.43)

must be non-singular for the solution to be well defined. Recalling that the differen-tiation index is revealed by the shuffle algorithm, we know that there exists a non-singular matrix K such that[

I1hnI

]K ( E + hn A ) =

[I

1hnI

] ( [E0

]+ hn

[AA

] )=

[EA

]+ hn

[A0

]where the first term is non-singular. This proves the non-singularity of the iterationmatrix (2.43) in general, since it is non-singular for hn = 0, and will hence only besingular for finitely many values of hn. Had the index been higher than 1, interpreta-tion of the index via the shuffle algorithm reveals that the iteration matrix is singularfor hn = 0, and hence ill-conditioned for small hn. (It can be shown that it is pre-cisely the dae where the iteration matrix is singular for all hn, that are not solvableat all. (Brenan et al., 1996, theorem 2.3.1)) This shows that this method is limited tosystems of index no more than 1.

Note that the row operations that revealed the non-singularity also have practicaluse, since if applied before solving the dae, the condition number of the iterationmatrix is typically improved significantly, and this condition is directly related tohow errors in the estimate xn−1 are propagated to errors in xn.

The following example shows how to combine the shuffle algorithm with the 1-stepbdfmethod to solve lti dae of arbitrary index.

2.20 Example

Consider solving an initial value problem for the square higher-index (solvable) ltidae

E x′(t) + A x(t) + B u(t) != 0

After some iterations of the shuffle algorithm (it can be shown that the index isbounded by the dimension of x for well-posed problems, see the remark in algo-


rithm 2.1), we will obtain the square dae[EνD−1

0

]x′(t) +

[AνD−1AνD−1

]x(t) + · · · != 0

where the dependence on u and its derivatives has been omitted for brevity. At thisstage, the full set of algebraic constraints has been revealed, which we write

CνDx(t) + · · · != 0

It is known that [EνD−1AνD−1

]is full-rank, where the lower block is contained in CνD

. This shows that it is possibleto construct a square dae of index 1 which contains all the algebraic constraints, byselecting as many independent equations as possible from the algebraic constraints,and completing with differential equations from the upper block of the index 0 sys-tem.

Note that the resulting index 1 system has a special structure; there is a clear sep-aration into differential and non-differential equations. This is valuable when theequations are integrated, since it allows row scaling of the equations so as to improvethe condition of the iteration matrix — compare the previous example.

In the previous example, a higher index dae was transformed to a square index 1dae which contained all the algebraic constraints. Why not just compute the im-plicit ode and apply an ode solver, or apply a bdf method to the index 1 equationsjust before the last iteration of the shuffle algorithm? The reason is that there is nomagic in the ode solvers or the bdf method; they cannot guarantee that algebraicconstraints which are not present in the equations they see remain satisfied eventhough the initial conditions are consistent. Still, the algebraic constraints are notviolated arbitrarily; for consistent initial conditions, the true solution will remain onthe manifold defined by the algebraic constraints, and it is only due to numerical er-rors that the computed solution will drift away from this manifold. By including thealgebraic constraints in the index 1 system, it is ensured that they will be satisfied ateach sampling instant of the computed solution.

There is another approach to integration of dae which seems to be gradually re-placing the bdf methods in many implementations. These are the implicit Runge-Kutta methods, and early work on their application to dae include Petzold (1986)and Roche (1989). Although these methods are basically applicable to dae of higherindex, poor convergence is prohibitive unless the index is low. (Compare the 1-stepbdfmethod which is not at all applicable unless the index is at most 1.) The class ofirkmethods is large, and this is where the popular Radau IIa belongs.

Having seen that higher index dae require some kind of index-reducing treatment,we finish this section by reminding that index reduction and index deduction areclosely related, and that both the shuffle algorithm (revealing the differentiation in-

2.3 Initial condition response bounds 51

dex) and the algorithm that is used to compute the strangeness index may be usedto produce equations of low index. In the latter context, one speaks of producingstrangeness-free equations.

2.2.9 Existing software

To round off our introductory background on dae topics, some existing software forthe numerical integration of dae will be mentioned. However, as numerical integra-tion is merely one of the applications of the work in this thesis, the methods will onlybe mentioned very briefly just to give an idea of what sort of tools there are.

The first report on dassl (Brenan et al., 1996) was written by Linda Ruth in Septem-ber 1982. It is probably the best known dae solver, but has been superseded byan extension called daspk (Brown et al., 1994). Both dassl and daspk use a bdfmethod with dynamic selection of order (1-step through 5-step) and step size, butthe latter is better at handling large and sparse systems, and is also better at findingconsistent initial conditions.

The methods in daspk can also be found in the more recent ida (dating 2005) (Hind-marsh et al., 2004), which is part of the software package sundials (Hindmarshet al., 2005). The name of this software package is an abbreviation of SUite of Non-linear and DIfferential/Algebraic equation Solvers, and the emphasis is on the move-ment from Fortran source code to C. The ida solver is the dae solver used by thegeneral-purpose scientific computing tool Mathematica�.

While the bdf methods in the software mentioned so far require that the user en-sures that the index is sufficiently reduced, the implementations built around thestrangeness index perform index reduction on the fly. Another interesting differenceis that the solvers we find here implement also irk methods beside bdf. In 1995,the first version of gelda (Kunkel et al., 1995) (A GEneral Linear Differential Al-gebraic equation solver) appeared. It applies to linear time-varying dae, and thereis an extension called genda (Kunkel and Mehrmann, 2006) which applies to gen-eral nonlinear systems. The default choice for integration of the strangeness-freeequations is the Radau IIa irkmethod implemented in radau5 (Hairer and Wanner,1991).

2.3 Initial condition response bounds

The initial condition response of a system is the solution to the dynamic equationsof the system when all forcing functions have been set to zero, given an initial state.Since setting all forcing functions to zero yields an autonomous system, the study ofinitial condition responses is the study of autonomous systems. For linear systems,the output of a system with forcing functions is the sum of the initial condition re-sponse, and the response to the forcing functions from a zero initial state. Hence, ini-tial condition responses are also important for the understanding of systems whichare not autonomous.

� Version 7 being the current version, see http://reference.wolfram.com/mathematica/tutorial/NDSolveIDAMethod.html, or the corresponding part of the on-line help.

http://reference.wolfram.com/mathematica/tutorial/NDSolveIDAMethod.html

http://reference.wolfram.com/mathematica/tutorial/NDSolveIDAMethod.html


One of the key problems is to bound the largest possible gain from initial condi-tions to the state at any later time. For linear systems, the state at time t is given bythe transition matrix (Rugh, 1996, chapter 3), sometimes known as the fundamentalmatrix,

x(t) = Φ(t, 0) x(0)

Hence, the gain to be bounded may be expressed as

supt≥0‖Φ(t, 0)‖2

and we will often use language in terms of the transition matrix and initial conditionresponses interchangeably. We are interested in systems which are asymptoticallystable (below, we will use stronger stability conditions, such as definition 2.37), forwhich it is meaningful to seek bounds that hold at all future times.

2.3.1 LTI ODE

For linear time-invariant systems,

x′(t) = M x(t)

the transition matrix is given by Φ(t, 0) = eM t . Bounding the matrix eM t is a funda-mental problem which has been studied by many, and this section contains a selec-tion of results from the litterature. Before we start, however, we remind of one of thebasic results by Lyapunov.

2.21 Theorem (An inverse Lyapunov theorem). If M is a Hurwitz matrix (that is,α(M ) < 0), then there exists a symmetric positive definite matrix P satisfying the(time-invariant) Lyapunov equation

MTP + P M != −I (2.44)

(The matrix I may be replaced by any positive definite matrix.) The solution is givenby

P =

∞∫0

eMTt eM t dt (2.45)

Proof: This is a well-known result; for instance, see Rugh (1996, theorem 7.11).

Generally speaking, a Lyapunov function is a function used to prove stability prop-erties of a system. They are used for lti, ltv, as well as nonlinear systems. Theidea is that the function shall be a continuously differentiable non-negative functionof the state, being 0 only at the origin, and such that when it is composed with thestate trajectory, it becomes a decreasing function of time. If a function is intendedto be a Lyapunov function, but it hasn’t proven so yet (for instance, because it hassome parameters to be determined first) it is often referred to as a Lyapunov func-tion candidate. See Khalil (2002, chapter 4) for an introduction in the general contextof nonlinear systems. The primary purpose of theorem 2.21 is to use x 7→ xTP x as


a Lyapunov function for the system x′ = M x, and doing so it is easy to derive aconstant bound on eM t for Hurwitz M.

2.22 Theorem. If M is a Hurwitz matrix, then∥∥∥eM t∥∥∥

2≤

√‖P ‖2

∥∥∥P −1∥∥∥

2, for all t ≥ 0

where P is the symmetric positive definite matrix whose existence is ensured by the-orem 2.21.

Proof: Consider solutions to the differential equation x′(t) = M x(t) with initial con-ditions x(0) = x0, and let t ≥ 0. Then |x(t)| =

∣∣∣eM t x0∣∣∣. Since x 7→ xTP x is a Lya-

punov function, it follows that x(t)TP x(t) ≤ x(0)TP x(0). Knowing that P is a sym-metric positive definite matrix, we may conclude that x(t)TP x(t) ≥ σmin(P ) |x(t)|2,and x(0)TP x(0) ≤ σmax(P ) |x(0)|2. Noting that σmin(P ) =

∥∥∥P −1∥∥∥

2and σmax(P ) = ‖P ‖2,

one obtains∣∣∣eM t x0

∣∣∣ = |x(t)| ≤√‖P ‖2

∥∥∥P −1∥∥∥

2

∣∣∣x0∣∣∣. Since x0 was arbitrary, this implies

the result.

In Gurfil and Jodorkovsky (2003), the Lyapunov method of theorem 2.22 was appliedin combination with convex optimization techniques to find a matrix P with smallcondition number. The beauty of the method of using Lyapunov functions is that itis not restricted to linear systems.

Theorem 2.22 is very coarse. For instance, at t = 0 it is clear that∥∥∥eM 0

∥∥∥2

= 1, whilethe theorem completely fails to capture this.� Additionally, it is well known that Mbeing Hurwitz implies that

∥∥∥eM t∥∥∥

2→ 0 as t → ∞, and the theorem fails to capture

this too. A common technique is to obtain decaying bounds by using shifts.

2.23 Lemma. For any scalar z,∥∥∥eM t∥∥∥

2= e−Re z

∥∥∥eM t+I z∥∥∥

2(2.46)

Proof: Since M t and I z commute, eM t+I z = eM t eI z = ez eM t . Taking norms onboth sides and solving for

∥∥∥eM t∥∥∥

2gives the result.

Applying lemma 2.23 to theorem 2.22 with z = a ∈ ( 0, −α(M ) ) gives∥∥∥eM t∥∥∥

2≤ e−a t

√‖Pa‖2

∥∥∥P −1a

∥∥∥2

(2.47)

where Pa is the solution to (2.44) with the Hurwitz M + a I instead of M. Note theform of this bound; it is the product of one finite expression which depends only onM, and one expression which is exponentially decaying for Hurwitz M.

Better bounds were derived in Van Loan (1977). For instance, the next theorem givesa bound which is able to capture the exponential decay.

� In Veselić (2003, equation (13)) there is a reference to a similar result, where the same bound as above ismultiplied with the exponentially decaying factor e−t/( 2 ‖P ‖2 ).


2.24 Theorem. IfM has Schur decompositionM = Q (D+N )QH (Qwill be unitary,D diagonal, and N upper triangular), then∥∥∥eM t

∥∥∥2≤ eα(M ) t

n−1∑k=0

‖N t‖k2k!

(2.48)

Proof: See the derivation of Van Loan (1977, equation (2.11)).

The theorem captures the fact that the problem is trivial if M is normal, as this im-pliesN = 0, but this will not be the case in this thesis — this is a matter of what we arewilling to assume about M. More generally, if we were willing to make assumptionsabout ‖N‖2, being the departure from normality of M, this would also immediatelyyield a bound by theorem 2.24. However, we are inclined to only make assumptionsabout system features, and this measure’s being invariant under norm-preservingchanges of variables does not convince us that it could rightfully be considered a sys-tem feature; it is the restriction to norm-preserving transformations which bothersus.

Making shifts in (2.48) makes no difference, but a bound comparable with (2.47) isstill easy to derive. The following two results (except for the shifted bound) appearedin Tidefelt and Glad (2008) and are much more simple than tight.

2.25 Corollary. The matrix exponential is bounded according to∥∥∥eM t∥∥∥

2≤ eα(M ) t

n−1∑i=0

( 2 ‖M‖2 )i ti

i!(2.49)

Proof: Let QHMQ = D + N be a Schur decomposition of M, and use ‖N‖2 =‖QHMQ − D‖2 ≤ ‖M‖2 + ‖M‖2 in theorem 2.24.

2.26 Lemma. If the map M is Hurwitz, that is, α(M ) < 0, then for t ≥ 0,∥∥∥eM t∥∥∥

2≤ e

2 e−1 n ‖M‖2−α(M ) (2.50)

Further, shifting with a ∈ ( 0, −α(M ) ) results in∥∥∥eM t∥∥∥

2≤ e−a t e

2 e−1 n ‖M‖2−(α(M )+a ) (2.51)

Proof: Let f ( t ) 4=∥∥∥eM t

∥∥∥2. From corollary 2.25 we have that

f ( t ) ≤n−1∑i=0

( 2 ‖M‖2 )i ti

i!eα(M ) t C

∑i

fi( t )

Each fi( t ) can easily be bounded globally since they are smooth, tend to 0 from aboveas t →∞, and the only stationary point is found via f ′i ( t ). From

f ′i ( t ) = eα(M ) t ( 2 ‖M‖2 )i ti−1

i!(t α(M ) + i)


it follows that the stationary point is t = − iα(M ) . Hence,

fi( t ) ≤ fi(− iα(M )

)=

( 2 ‖M‖2−α(M )

)ii i

i!e−i ≤

(2 e−1 n ‖M‖2

−α(M )

)ii!

and it follows that

f ( t ) ≤n−1∑i=0

(2 e−1 n ‖M‖2

−α(M )

)ii!

≤∞∑i=0

(2 e−1 n ‖M‖2

−α(M )

)ii!

= e2 e−1 n ‖M‖2

−α(M )

The shifted result follows immediately from (2.50).

However, after the development of this result, the theorem below was found in theliterature. The bounds provided by the two theorems are both functions of the sameratio between a matrix’ norm and the smallest distance from any eigenvalue to theimaginary axis, and hence they are equivalent for our qualitative convergence re-sults. However, for practical purposes, when quality must be turned into quantity,the theorem below offers a tremendous advantage.

2.27 Theorem. For a Hurwitz matrix M ∈ Rn×n and t ≥ 0, the matrix exponential isbounded as ∥∥∥eM t

∥∥∥2≤ γ( n )

(‖M‖2−α(M )

)n−1

eα(M ) t/2 (2.52)

where

γ( n ) = 1 +n−1∑i=1

4i(ie−1)i

i!

Proof: This is Godunov (1997, proposition 3.3, p 20) extended with the expressionfor γ( n ), which can easily be extracted from the proof.

Comparing (2.47) with (2.51), we note that the former bound involves the non-trivialdependence on M through the solution to the Lyapunov equation (2.44), while thelatter often grossly over-estimates the norm it bounds, but uses only very elementaryproperties of the matrix. However, the condition number of the solution to the Lya-punov equation may be bounded without actually solving the equation, by applica-tion of bounds listed in the survey Kwon et al. (1996, equation (70) and equation (87))(in their notation, ‖Pa‖2 = α1,

∥∥∥P −1a

∥∥∥2

= α−1n , and ‖M + a I‖2 = γ1). The only upper

bound they list for α1 makes use of twice the logarithmic norm (see Ström (1975) forproperties of this norm and further references) of M + a I , being α(M + MT + 2a I ),and requires this to be negative. When this is the case the following bound is ob-tained, √

‖Pa‖2∥∥∥P −1a

∥∥∥2≤

√‖M + a I‖2

−α(M + MT + 2a I )


but unfortunately the logarithmic norm may be positive even though α(M + a I ) isnegative. Hence, we are unable to derive from (2.47) a bound that is both exponen-tially decaying whenever α(M ) is negative, and expressed without direct referenceto the solution to the Lyapunov equation.

While theorem 2.24 both gives a tight estimate at t = 0 and exhibits the true rate ofthe exponential decay, the polynomial coefficient makes the estimate very conserva-tive, even for small t. Since the tightness of bounds for the matrix exponential aredirectly related to how well our results in this thesis are connected to applications (bymeans of deriving useful quantitative bounds), we shall end our discussion of matrixexponential bounds with a recent result which appear in Veselić (2003). However,while the original presentation is concerned with exponentially stable semigroups(which may be infinite-dimensional), the results are stated here in terms of matricesto make the results more accessible to readers unfamiliar with the original frame-work.

The bounds are formulated using the following two scalar functions of a matrix:

δ(M ) = 2 sup|x| !=1

Re xHM x γ(M ) = 2 inf|x| !=1

Re xHM x

For their forthcoming analysis, it is assumed that γ(M ) < δ(M ). The first of thesedefinitions, δ(M ), may be recognized as twice the logarithmic norm of M. Amongthe properties for the logarithmic norm in Ström (1975, lemma 1c), we note

α(M ) ≤ 12δ(M ) ≤ ‖M‖2 (2.53)

and the following alternative formulation shows its close connection to the norm ofthe matrix exponential

δ(M ) = 2 limh→0+

∥∥∥eMh∥∥∥2

h

Veselić (2003) also reminds that δ(M ) = α(M + MT), and regarding the second ofthe definitions, it is shown that γ(M ) ≤ − ‖P ‖−1

2 , where P is the solution to (2.44).

2.28 Theorem. For Hurwitz M,

∥∥∥eM t∥∥∥

2≤

eδ(M ) t

2 , t ≤ h0(M )( 1+δ(M ) ‖P ‖21−δ(M )/γ(M )

) 12 + 1

2 δ(M ) ‖P ‖2 e−t/( 2 ‖P ‖2 ) , h0(M ) ≤ t(2.54)

where

h0(M ) =1

δ(M )ln

(1 + δ(M ) ‖P ‖2

1 − δ(M ) /γ(M )

)(2.55)

and P is the solution to (2.44).

Proof: See Veselić (2003, theorem 4).


2.3.2 LTV ODE

When we consider linear time-varying systems in chapter 8, we extend results fromKokotović et al. (1986, section 5.2) and we shall make use of some results from there.

2.29 Lemma. Let φ( t, s ) denote the transition matrix of the time-scaled ltv system

mz′(t) = M(t) z(t)

Assume that there exist a time interval I and constants c1 > 0, c2, β, such that

∀ t ∈ I : α(M(t) ) ≤ −c1

∀ t ∈ I : ‖M(t)‖2 ≤ c2

∀ t ∈ I :∥∥∥M ′(t)∥∥∥

2≤ β

Then there exist positive constants m0, a, K , such that for all m < m0, and s, t in I ,

t ≥ s ⇒∥∥∥φ( t, s ) − eM(s) ( t−s )/m

∥∥∥2≤ mK e−a ( t−s )/m (2.56)

Proof: See Kokotović et al. (1986, lemma 5:2.2), with further references to similarresults given in Kokotović et al. (1986, section 5:10).

While discussing linear time-varying systems, we take the opportunity to give a def-inition related to transformations of such systems, even though it is not particularlyrelated to the bounding of initial condition responses. Consider the change of vari-ables

T (t) z(t) = x(t) (2.57)

in the system

x′(t) = M(t) x(t) (2.58)

Via the intermediate dae,

T ′(t) z(t) + T (t) z′(t) = M(t) T (t) z(t)

the ode in z is found to be

z′(t) =(T (t)−1M(t) T (t) − T (t)−1T ′(t)

)z(t) (2.59)

2.30 Definition (Lyapunov transformation). The square time-varying matrix T iscalled a Lyapunov transformation if it is continuously differentiable, T (t) is invertiblefor every t, and there are time-invariant constants bounding ‖T (t)‖2 and

∥∥∥T (t)−1∥∥∥

2for

all t.

Knowing that a transformation matrix is a Lyapunov transformation allows us towork with the transformed system instead of the original one, knowing that the qual-itative properties will be the same, and when we are done, we apply the reverse trans-formation to obtain results for the original system. For a theoretical application ofthis definition, see for instance Rugh (1996, theorem 6.15).


2.3.3 Uncertain LTI ODE

Bounds on the initial condition response of an uncertain system is closely related toperturbation theory, so there is a strong connection between the present section andsection 2.4.1 below.

The bounds mentioned so far only apply to exactly known systems, while the applica-tions in this thesis will concern uncertain systems. In Boyd et al. (1994), a bound forlinear differential inclusions (see section 2.4.2) is given as a linear matrix inequalityoptimization problem. The technique is based on the idea to use Lyapunov functionsas described above. However, the classes of uncertainty that can be handled cannotcater for the problems we encounter in later chapters. An alternative to convex opti-mization might be to use the plethora of bounds on the eigenvalues (or, equivalently,the singular values) of the solution to the Lyapunov equation. The survey Kwon et al.(1996) contains many such bounds, including the following theorem.

2.31 Theorem. Let M be Hurwitz. The solution P to the Lyapunov equation (2.44)satisfies

‖P ‖2 ≥1

2 ‖M‖2(2.60)

Proof: This is a special case of the main result in Shapiro (1974) — the general caseallows for the right hand side of (2.44) to be an arbitrary negative definite matrixinstead of just −I . However, the current case is trivial since

1 = ‖−I‖2!=∥∥∥P M + MTP

∥∥∥2≤ 2 ‖P ‖2 ‖M‖2

2.4 Regular perturbation theory

By a regular perturbation we refer to perturbations of expression for the time deriva-tive in an ode. In the literature, the perturbations often occur in just one smallparameter, but we shall not restrict our notion to this case. Instead, we let perturba-tion theory refer to any theory which aims to describe how the solutions to equationsdepend on small parameters in the equations. The perturbation parameters may beused to model uncertainties, but the theory may also be useful for known quanti-ties. In this and the following sections, we only consider perturbations of differentialequations (compare lemma 2.46 which concerns the problem perturbation in matrixinversion). Like the perturbed equations themselves, the perturbation parametersmay or may not be allowed to be time-varying.

2.4.1 LTI ODE

Since the solution to the initial value problem

x′(t) = M x(t) x(0) = x0 (2.61)

is given by

x(t) = eM t x0

2.4 Regular perturbation theory 59

understanding the perturbed problem

z′(t) = (M + F ) z(t) z(0) = x0 (2.62)

becomes a matter of understanding the sensitivity of the matrix exponential withrespect to perturbations.

The sensitivity of the norm of the matrix exponential was the theme of Van Loan(1977), to which we have referred previously regarding results on bounds on the ma-trix exponential. It turns out that it is the bound on the matrix exponential (sec-tion 2.3.1) with is the key to the relative sensitivity, formalized by the followinglemma.

2.32 Lemma. Assume there exists is a monotonically increasing function γ on[ 0, ∞ ) and a constant β such that t ≥ 0 implies∥∥∥eM t

∥∥∥2≤ γ(t) eβ t

Then ∥∥∥e(M+F )t∥∥∥

2∥∥∥eM t∥∥∥

2

≤ ‖F‖2 t γ(t)2 e[ β−α(M )+‖F‖2 γ(t) ] t (2.63)

Proof: This is Van Loan (1977, lemma 1).

The lemma should be compared with the outer approximations of the reachable setsin example 2.36 on page 61. For any choice of β and γ , the lemma implies thatrestriction to a finite time interval makes the perturbations in the solutions O( ‖F‖2 ).

While the lemma bounds |z(t)| by a factor times |x(t)|, we are often interested in adifferent bound, namely the absolute difference between the two, |z(t) − x(t)|.

2.33 Lemma. Assume that the nominal system (2.61) is stable, and that there exista polynomial γ and a constant β < 0 such that t ≥ 0 implies∥∥∥e(M+F )t

∥∥∥2≤ γ(t) eβ t

Then there is a finite constant k such that the solution to the perturbed system (2.62)satisfies

supt≥0|z(t) − x(t)| = k ‖F‖2

Proof: Introducing y = z − x, we find

y′(t) = (M + F ) y(t) + F x(t) y(0) = 0

with solution

y(t) =

t∫0

e(M+F )( t−τ ) F x(τ) dτ


Since the nominal system is stable, |x| will be a bounded function, say supt≥0 |x(t)| ≤x. Then we get the estimate

∣∣∣y(t)∣∣∣ ≤ ‖F‖2 x t∫

0

∥∥∥e(M+F )τ∥∥∥

2dτ ≤ ‖F‖2 x

t∫0

γ(t) eβ t dτ

Here, the integrand has a primitive function which is also a polynomial (with coeffi-cients depending on β) times eβ t . Hence, the integral will be bounded independentlyof t, which completes the proof.

Several possible choices of polynomials and exponents to use with lemma 2.33 arelisted in Van Loan (1977), but we find theorem 2.27 particularly convenient since itonly relies on two basic properties of the perturbed matrix. Clearly the number k pro-vided by the lemma would be possible to improve if we also used that x will satisfyan exponentially decaying bound, which may be important to utilise in applicationswhen quantitative perturbation bounds need to be as tight as possible.

2.34 Lemma. Consider the perturbed solution over the finite interval [ 0, tf ]. Thenthere is a finite constant k such that the solution to the perturbed system (2.62) sat-isfies

supt∈[ 0, tf ]

|z(t) − x(t)| = k ‖F‖2

Proof: Compare the proof of lemma 2.33. Since the nominal solution x will be de-fined on a compact interval, it will be a bounded function. For the integral, we mayuse the coarse over-estimate

t∫0

∥∥∥e(M+F )τ∥∥∥

2dτ ≤

t∫0

e‖M+F‖2 τ dτ =1

‖M + F‖2

(e‖M+F‖2 t − 1

)

2.4.2 LTV ODE

We now turn to perturbations of the system

x′(t) = M(t) x(t) x(t0) = x0 (2.64)

with transition matrix φ. Let the time interval I be defined as [ t0, ∞ ), so that ‖M‖I ≤α is the same as saying that ‖M(t)‖2 ≤ α for all t ≥ t0. We will often consider t0 = 0without loss of generality. The following definition turns out useful.

2.35 Definition (Uniformly bounded-input, bounded-state stable). The ltv sys-tem

x′(t) = M(t) x(t) + B(t) u(t) x(t0) = 0

with input u is called uniformly bounded-input, bounded-state stable if there existsa finite constant γ such that for any t0 ≥ 0,

supt≥t0|x(t)| ≤ γ sup

t≥t0|u(t)|


See Rugh (1996, note 12.1) regarding some subtleties of definition 2.35.

Three types of results for the perturbation of (2.64) dominate the literature. The firsttype is confined to only consider the stability properties of the perturbed system, andthe amount of results shows that this is both important and non-trivial. Some resultswhich will be useful in later chapters are included below. The second type, wellexplained in Khalil (2002, chapter 10), addresses the effect that a scalar perturbationhas on the solutions, and here the amount of literature is likely to be related to themany application areas where corresponding methods have been successful. Sincewe are mainly interested in non-scalar perturbations in this thesis, we shall not givean account of the scalar perturbation results, but turn attention to the solutions ofthe perturbed equation

z′(t) = [M(t) + F(t) ] z(t) z(0) = x0 (2.65)

where there is a bound ‖F‖I ≤ f0.

Introducing y = z − x yields the system

y′(t) = [M(t) + F(t) ] y(t) + F(t) x(t) y(0) = 0 (2.66)

which can be handled by showing that (2.66) is uniformly bounded-input, bounded-state stable from the input F(t) x(t). Since the input to the system decays with thesize of F, uniform convergence of y to zero follows (provided that the input-staterelation is not only bounded uniformly in the input, but also in the perturbation).We refer to Rugh (1996, chapter 12) for the definitions and basic results. Clearly,using the gain provided by the uniformly bounded-input, bounded-state stabilityproperty will result in very conservative perturbation bounds, since they will onlydepend on the peak value of |x|, even though x is a known function.

The third way to analyze perturbations is to approximate (2.65) conservatively usingdifferential inclusions (Filippov, 1985),

z′(t) ∈ { [M(t) + ∆ ] z(t) : ‖∆‖2 ≤ f0 } z(0) ∈{z0

}(2.67)

That is, we include all the solutions obtained by letting F(t) vary arbitrarily fromone t to another. Hence, the differential inclusion approximation corresponds toignoring all differentiability and continuity properties that we may have for F. Thesolution to the problem is represented by a set-valued solution function, at each tgiving the reachable set at that time. If these sets can be computed conservatively(outer approximations), we have a means to deal with quite general perturbationsof ltv systems. The concepts are illustrated by the following example, applied to atime-invariant system. In the book Boyd et al. (1994) on linear matrix inequalities,four out of ten chapters are devoted to the study of linear differential inclusions, andthe text should be accessible to a broad audience.

2.36 ExampleConsider the perturbed lti system(

z′1(t)z′2(t)

)=

([0 1−1 −1

]+ F

) (z1(t)z2(t)

) (z1(0)z2(0)

)=

(01

)(2.68)


where max(F) ≤ ε. Let ε B 0.1. The set-valued function f defined by

f

((z1z2

) )4=

[[−0.1, 0.1 ] [ 0.9, 1.1 ]

[−1.1, −0.9 ] [−1.1, −0.9 ]

] (z1z2

)maps any point z to a point with interval coordinates, that is, a (convex) rectanglein the ( z1, z2 ) plane. It is easy to see that the image of a convex set under f is alsoa convex set, and according to Kurzhanski and Vályi (1997, lemma 1.2.1) it followsthat the reachable sets are also convex.

We shall approach the perturbation problem in three ways, all illustrated in fig-ure 2.2:

• Making a grid of points in the 4 dimensional uncertainty space, and generatethe corresponding solutions. This is supposed to be a reasonably good innerapproximation of the perturbation problem (2.68) we aim to solve.

• Compute an outer approximation of the reachable sets of the differential in-clusion, by making an interval approximation of the reachable set at each timeinstant. This results in an ode in 4 variables, being the lower and upper inter-val bounds on z1, z2.

• Compute an inner approximation of the reachable sets by discretizing time andcompute a set of points in the interior of the reachable set at each time instant.Since the reachable sets will be convex, the points will actually represent theirconvex hull. To “integrate” from one time instant to the next, each vertex in theconvex set is mapped to several new points by evaluating all possible combina-tions of minimum, mid, and maximum values in the uncertain intervals. Whenall vertices have been mapped, points that are not vertices of the new convexhull are first removed, and then a selection of the remaining vertices is made sothat the number of vertices is never more than 10 at any time instant.

Additional outer approximations for smaller values of ε are shown in figure 2.3. It isseen that the outer approximation is useful during a short time interval, but that itexplodes exponentially — this is typical for this type of interval analysis. Of course,other outer approximations could also be considered. For instance, in the context oflinear matrix inequalities ellipsoids are the obvious choice (see Boyd et al. (1994)),and ellipsoids were also used for hybrid systems in Jönsson (2002). The inner ap-proximation seems to approximate the solution to the original problem (2.68) well.

Formalizing the differential inclusion idea gives a constructive method to prove thatthe solutions to the perturbed problem converge uniformly as the perturbation tendsto zero, see Filippov (1985, theorem 8.2). Analogously to the method using thebounded-input, bounded-output stability above, application of Filippov (1985, theo-rem 8.2) to perturbed linear systems, requires us to conservatively use a bound on thepeak norm of the nominal solution. Unlike when uniform bounded-input bounded-output stability is applied to (2.66), the cited theorem for differential inclusions ap-plies only to bounded intervals of time and does not guarantee a rate of convergence.Hence, in view of the fundamental theorems that establish continuous dependence of


−2

−1

0

1

2

1 2 3 4 5 6 t

Figure 2.2: Inner and outer approximation of the reachable sets of the differen-tial inclusion corresponding to (2.68) with ε = 0.1. The converging trajectories(gray) were generated by deterministically replacing the uncertainty F by the 34

matrices obtained by enumerating all matrices with entries in { −ε, 0, ε }. Thisprovides a heuristic inner approximation of the original perturbation problem,to which the differential inclusion should be compared. The diverging trajec-tories are the bounds of the outer interval approximation of the reachable sets,found by integrating an ode. The vertical lines are the projections of the innerapproximations of the reachable sets, obtained by replacing the uncertainty inthe differential inclusion by fixed choices of F over short intervals of time. Here,the same 34 matrices were used again, over time intervals of length 0.01 (to en-hance readability in the plot, the projections are only shown at a sparse selectionof time instants).

−2

−1

0

1

2

1 2 3 4 5 6 7 8 t

ε = 10−2ε = 10−3

ε = 10−4ε = 10−5

Figure 2.3: Outer approximations of the reachable sets of the differential inclu-sion corresponding to (2.68), for smaller values of ε compared to figure 2.2.


solutions as functions of parameters in the system (uniformly in time), the strength ofthe theorem lies in its applicability to equations with discontinuous right hand side.Since we will not encounter such equations in this thesis, the method of differentialinclusion is here mainly considered a computational tool for perturbed systems.

The sequence of results below are slightly more precise formulations of results inRugh (1996). They are based on time-varying Lyapunov functions, and in their origi-nal form they provide conditions for uniform exponential stability, explained by thefollowing definition according to Rugh (1996, definition 6.5).

2.37 Definition (Uniformly exponentially stable). The system (2.64) is said to beuniformly exponentially stable if there exist finite positive constants γ , λ such thatfor any t0, x0, and t ≥ t0 the solution x satisfies

|x(t)| ≤ γ e−λ ( t−t0 )∣∣∣x0

∣∣∣For lti systems, uniform exponential stability is easily characterized in terms ofeigenvalues.

2.38 Theorem. The lti system

x′(t) = M x(t)

is uniformly exponentially stable if and only if M is Hurwitz.

Proof: The time-invariance of the system allows us to use t0 = 0 without loss ofgenerality. If M is Hurwitz, then we may take λ ∈ ( 0, −α(M ) ), and it remains toshow that

|x(t)|∣∣∣x0∣∣∣ eλ t

can be bounded by some constant γ . Since

|x(t)| ≤∥∥∥eM t

∥∥∥2

∣∣∣x0∣∣∣

and theorem 2.24 shows that there exists a polynomial p such that∥∥∥eM t∥∥∥

2≤ p(t) eα(M ) t

it follows that|x(t)|∣∣∣x0

∣∣∣ eλ t ≤ p(t) e(λ+α(M ) ) t

where λ + α(M ) < 0. Since the exponential decay will dominate the polynomialgrowth as t → ∞, and the function to be bounded is continuous, the function isbounded.

Conversely, if M is not Hurwitz, then there exists at least one eigenvalue λ withReλ ≥ 0. Taking x0 as the corresponding eigenvector shows that |x(t)| does not tendto zero as t →∞, showing that the system cannot be uniformly exponentially stable.


The additional precision needed in this thesis is captured by the next definition,which requires the constants of the exponential decay to be made visible. For sys-tems without uncertainty, the difference to the usual uniform exponential stabilityabove is minor, but for uncertain systems, the new definition means that the uniformexponential stability is uniform with respect to the uncertainty.

2.39 Definition (Uniformly[γ e−λ•

]-stable). The system (2.64) is said to be uni-

formly[γ e−λ•

]-stable if it is uniformly exponentially stable with the parameters γ ,

λ used in definition 2.37.

We now rephrase three theorems in Rugh (1996) using our new definition.

2.40 Theorem. The system (2.64) is uniformly[√

ρη e− ν

2 ρ •]-stable if there exists a

symmetric matrix-valued, continuously differentiable function P and constants η >0, ρ ≥ η, and ν > 0, for all t satisfying

η I � P (t) � ρ I (2.69a)

M(t)TP (t) + P (t)M(t) + P ′(t) � −ν I (2.69b)

Proof: The proof of Rugh (1996, theorem 7.4) applies.

2.41 Theorem. Suppose the system (2.64) is uniformly[γ e−λ•

]-stable and ‖M‖I ≤

α. Then the matrix-valued function P defined by

P (t) =

∞∫t

φ(τ, t)Tφ(τ, t) dτ (2.70)

is symmetric for all t, continuously differentiable, and satisfies (2.69) with

η =1

2αρ =

γ2

2λν = 1

Proof: The proof of Rugh (1996, theorem 7.8) applies.

For the perturbation

z′(t) = [M(t) + F(t) ] z(t) (2.71)

of (2.64) we now have the following theorem.

2.42 Theorem. If the system (2.64) satisfies the assumptions of theorem 2.41, thenthere exists a constant β > 0 such that ‖F‖I ≤ β implies that the perturbed sys-tem (2.71) is uniformly

[γ e−λ•

]-stable with

γ = γ

√αλ

λ =λ

2 γ2


Proof: Follows by the proof of Rugh (1996, theorem 8.6) with minor addition ofdetail. The main idea is to use the P which theorem 2.41 provides for the nominalsystem (2.64), and use it in theorem 2.40 applied to the perturbed system (2.71).Of the two conditions P must satisfy, (2.69a) is trivial since it does not involve theperturbation. For the other condition, (2.69b), the cited proof shows that ν = 1

2 doesthe job. The proof is completed by inserting the values for η, ρ, ν in theorem 2.40.

The strength of theorem 2.42 compared to Rugh (1996, theorem 8.6) is that the ex-ponential convergence parameters of the perturbed system are expressed only in theexponential convergence parameters of the nominal system and the norm bound onM. This will be useful in chapter 8, where the “nominal” system is unknown up tothe specifications required by theorem 2.42.

2.4.3 Nonlinear ODE

That a whole chapter in the classic Coddington and Levinson (1985, chapter 17) isdevoted to the perturbations of a nonlinear system in two dimensions, signals thatperturbations of nonlinear systems is in general a very difficult problem. Neverthe-less, Lyapunov-based stability results similar to those in the previous section exist,see, for instance, Khalil (2002, chapter 9). However, since perturbations of nonlin-ear systems will not be considered in later chapters, we will not present any of theLyapunov-based results here. Instead, we will just quickly show how a standard per-turbation form can be derived.

Consider adding a small perturbation g( x(t), t ) to the right hand side of the nominalsystem

x′(t) = f ( x(t), t ) (2.72)

yielding

z′(t) = f ( z(t), t ) + g( z(t), t ) (2.73)

Introducing y = z − x, and subtracting (2.72) from (2.73) results in

y′(t) = f ( x(t) + y(t), t ) − f ( x(t), t ) + g( x(t) + y(t), t )

Series expansion of the first term and regarding x(t) as a given function of t showsthat

y′(t) = M(t) y(y) + h( y(t), t ) + g( x(t) + y(t), t ) (2.74)

where∣∣∣h( y, t )

∣∣∣ = o(∣∣∣y∣∣∣ ) for each t. To help the analysis, it is typically assumed that

h( y, t ) + g( x(t) + y, t ) = o(∣∣∣y∣∣∣ ) uniformly in t. Additionally assuming that M is

time-invariant helps even more, leading to the standard form

y′(t) = M y(t) + f ( y(t), t ) (2.75)

where M is assumed Hurwitz and f ( y, t ) = o(∣∣∣y∣∣∣ ).

Results regarding the solutions to (2.75) can be found in standard text books on ode,such as Coddington and Levinson (1985, chapter 13), Cesari (1971, chapter 6), orKhalil (2002).

2.5 Singular perturbation theory 67

2.5 Singular perturbation theory

Recall the model reduction technique called residualization (section 2.1.5). In sin-gular perturbation theory, a similar reduction can be seen as the limiting system assome dynamics become arbitrarily fast. (Kokotović et al., 1986) However, some of theassumptions made in the singular perturbation framework are not always satisfiedin the presence of matrix-valued singular perturbations, and this is a major concernin this thesis. The connection to model reduction and singular perturbation theory isinteresting also for another reason, namely that the classical motivation in those areasis that the underlying system being modeled is singularly perturbed in itself, and oneis interested in studying how this can be handled in modeling and model-based tech-niques. Although that framework is built around ordinary differential equations, thesituation is just as likely when dae are used to model the same systems. It is a goalof this thesis to highlight the relation between matrix-valued singular perturbationsthat are due to stiffness in the system being modeled, and the treatment of matrix-valued singular perturbations that are artifacts of numerical errors and the like. Inview of this, this section not only provides background for forthcoming chapters, butalso contains theory with which later development is to be contrasted.

Singular perturbation theory has already been mentioned when speaking of singularperturbation approximation in section 2.1.5. However, singular perturbation theoryis far more important for this thesis than just being an example of something whichreminds of index reduction in dae. First, it provides a theorem which is fundamentalfor the analysis in the second part of the thesis. Second, the way it is developed inKokotović et al. (1986) contains the key ideas used in our development from chap-ter 6 on. In this section, we begin by stating a main theorem for lti systems. Wethen briefly indicate how the lti scalar singular perturbation problem has been gen-eralized, as some of these generalizations provide important directions for futuredevelopments of our work. We then give a more detailed account on the work on theso-called multiparameter singular perturbations, since this generalization relative toscalar singular perturbation reminds of the generalization to matrix-valued singularperturbation initiated in this thesis. A fairly recent overview of singular perturbationproblems and techniques is presented in Naidu (2002).

2.5.1 LTI ODE

The following (scalar) singular perturbation theorem found in Kokotović et al. (1986,chapter 2, theorem 5.1) will be useful. Consider the singularly perturbed lti ordi-nary differential equation(

x′(t)ε z′(t)

)!=[M11 M12M21 M22

] (x(t)z(t)

) (x(t0)z(t0)

)!=(x0

z0

)(2.76)

where we are interested in small ε > 0. Define M0 B M11 −M12 M−122 M21, denote

x′s(t)!= M0 xs(t) xs(t0) != x0 (2.77)


the slow model (obtained by setting ε B 0 and eliminating z using the thereby ob-tained non-differential equations), and denote

z′f(τ) != M22 zf(τ) zf(0) != z0 + M−122 M21 x

0 (2.78)

the fast model (which is expressed in the timescale given by ε τ ∼ t − t0).

2.43 Theorem. If α( ( ) M22 ) < 0, there exists an ε∗ > 0 such that, for all ε ∈ ( 0, ε∗ ],the states of the original system (2.76) starting from any bounded initial conditionsx0 and z0,

∣∣∣x0∣∣∣ < c1,

∣∣∣z0∣∣∣ < c2, where c1 and c2 are constants independent of ε, are

approximated for all finite t ≥ t0 by

x(t) = xs(t) + O( ε )

z(t) = −M−122 M21 xs(t) + zf(τ) + O( ε )

(2.79)

where xs(t) and zf(τ) are the respective states of the slow model (2.77) and the fastmodel (2.78). If also α( ( ) M0 ) < 0 then (2.79) holds for all t ∈ [ t0, ∞ ).

Moreover, the boundary layer correction zf(τ) is significant only during the initialshort interval [ t0, t1 ], t1 − t0 = O( ε log ε ), after which

z(t) = −M−122 M21 xs(t) + O( ε )

Among the applications of this theorem, numerical integration of the equations isprobably the simplest example. The theorem says that for every acceptable toleranceδ > 0 in the solution, there exists a threshold for ε such that for smaller ε, the con-tribution to the global error from the timescale separation is at most, say, δ/2. If thetimescale separation is feasible, one can apply solvers for non-stiff problems in thefast and slow model separately, and then combine the results according to (2.79). Thisapproach is likely to be much more efficient than applying a solver for stiff systemsto the original problem.

However, note that the theorem only states that there exist certain constants (eachO-expression has two inherent constants), as opposed to giving explicit expressionsfor these. That is, even though the constants are possible to compute, it requires abit of calculations, and hence the theorem highlights the qualitative nature of theresult. Similarly, the perturbation results to be presented in later chapters of thisthesis are also formulated qualitatively, even though the constructive nature of theproofs allows error estimates to be computed.

2.5.2 Generalizations of scalar singular perturbation

As an indication of how our results in this thesis may be extended in the future, wedevote some space here to listing a few directions in which theorem 2.43 has beenextended. The extensions in this section are still concerned with the case of just onesmall perturbation parameter, and are all found in Kokotović et al. (1986).

The first extension is that the O( ε ) expressions in (2.79) can be refined so that thefirst order dependency on ε is explicit. Neglecting the higher order terms in ε, thismakes it possible to approximate the thresholds which are needed to keep track of


the global error when integrating the equations in separate timescales. However, it isstill not clear when ε is sufficiently small for the O( ε2 ) terms to be neglected.

The other extension we would like to mention is that of theorem 2.43 to time-varyinglinear systems. That such results exist may not be surprising, but it should be notedthat time-varying systems have an additional source of timescale separation com-pared to time-invariant systems. This must be taken care of in the analysis, and is apotential difficulty if these ideas are used to analyze a general nonlinear system bylinearizing the equations along a solution trajectory (because of the interactions be-tween timescale separation in the solution itself and in the linearized equations thatdetermine it). The decoupling transform for time-varying systems appears in Chang(1969, 1972). For the problem of finding a bound on the perturbation parameter suchthat the asymptotic stability of the coupled system is ensured, Abed (1985b) containsa relatively recent result.

In Khalil (1984), singularly perturbed systems are also derived from odewith singu-lar trailing matrices. The so-called semisimple null structure assumption they makeenables the formulation of a corresponding scalar singularly perturbation problem.

Another related problem class is obtained if stochastic forcing functions are addedto the singularly perturbed systems (see, for instance, Ladde and Sirisaengtaksin(1989)). The properties of the resulting stochastic solution processes may then bestudies using decoupling techniques similar to those used for deterministic systems.Looking at statistical properties such as mean and variance will give quite differentresults compared to the L∞ function measure often used for deterministic systems,and this relates to yet another related deterministic problem class, obtained by re-placing the L∞ function measure by, for instance, the L2 function measure. For manyapplications such norms may be more relevant than the maximum-norm-over-timemeasure of L∞. While, the L∞ measure remains an important general-purpose mea-sure that should be supported by general-purpose numerical integration software,it appears that both developers and users of numerical integration software wouldbenefit from also allowing other error estimates for the computed solution. We notethat both stochastic formulation and alternative function measures constitute rele-vant generalizations of the singular perturbation results derived in the thesis.

2.5.3 Multiparameter singular perturbation

The multiparameter singular perturbation problem is closely related to the subject ofthe present work in that it considers several small parameters at the same time. Themultiparameter singular perturbation form arises when small parasitics are includedin a physical model. For instance, a parasitic parameter may be the capacitance ofa wire in an electric model where such capacitances are expected to be negligible.Since the parasitics have physical origin, they are known to be greater than zero, andthis requirement is part of the multiparameter singular perturbation form.


In its linear formulation, the autonomous multiparameter singular perturbation formmay be written

x′ = Axx x + Axz zε1 I

. . .εN I

z′ = Azx x + Azz z(2.80)

Here, all the εi > 0 are the small singular perturbation parameters, and the goal isto understand how the system behaves as maxi εi tends to zero. By introducing theparameter µ = maxi εi the system may be written in a form which is closer to thescalar singular perturbation form,

x′ = Axx x + Axz z

µ z′ =

µε1I

. . .µεNI

︸︷︷︸D

(Azx x + Azz z ) (2.81)

In the early work on multiparameter singular perturbation in Khalil and Kokotović(1979), all singular perturbation parameters are assumed to be of the same order ofmagnitude, corresponding to assuming abound on the µ

εiin (2.81) (another common

choice of µ is the geometric mean of all εi , and then the ratios µεi

needs to be boundedboth from above and below to imply the equal order of magnitude condition). Thecondition about equal order of magnitudes was originally formulated as

m ≤ εiεj≤ M for all i, j (2.82)

and this remains the most popular way to state the assumption. The condition shouldbe seen in contrast to the case when the singular perturbation parameters are as-sumed of different orders of magnitudes in the sense that

limεi→0

εi+1

εi= 0 for all i = 1, 2, . . . , N − 1 (2.83)

Such problems can be analyzed by applying a sequence of scalar singular perturba-tion results, and are said to have multiple time scales (where multiple refers to thenumber of fast time scales).

The main assumption used in Khalil and Kokotović (1979) is the so-called D-stability, which means that a system remains stable (with some positive, fixed,margin between poles and the imaginary axis) if the state feedback matrix is left-multiplied by any D of (2.81). They remark that the condition (2.82) is not realisticin many applications. In Khalil (1981), the results are extended to the nonlinearsetting using Lyapunov methods, and further refinement was made in Khorasaniand Pai (1985).


Later, the condition (2.82) was removed for lti systems in Abed (1985a), and the so-called strong D-stability condition was then introduced in Abed (1986) to simplifythe analysis. However, when ltv systems are treated in Abed (1986), the condi-tion (2.82) is still used.

In Khalil (1987), the condition (2.82) used in Khalil and Kokotović (1979), has beenremoved. Further, both slow and fast subsystems are allowed to be nonlinear. In-stead of (2.82), assumptions are used to ensure the existence of Lyapunov functionswith certain additional constraints. This technique was used for scalar singular per-turbation in Saberi and Khalil (1984), where references to other authors’ early workon singular perturbations based on Lyapunov functions can be found.

In Coumarbatch and Gajic (2000), the algebraic Riccati equation is analyzed for amultiparameter singularly perturbed system. The system under consideration hastwo small singular perturbation parameters of the same order of magnitude. To un-derstand properties of the solutions of Riccati equations for singularly perturbed sys-tems seems a promising tool for future developments in singular perturbation theory,and the replacement of the equal order of magnitude assumption by something morerealistic from an application point of view would be a valuable development by itself.

Within the class of multiparameter singularly perturbed problems, the class of multi-ple time scale singularly perturbed problems allows the small singular perturbationparameters to belong to different groups depending on order of magnitude. Withina group of singular perturbation parameters, all parameters satisfy a condition ofthe kind (2.82), while there is an ordering among the groups such dividing a param-eter from one group by a parameter from a succeeding group yields a ratio whichtends to zero as the latter parameter tends to zero. This problem was studied fortwo fast time scales using partial decoupling in Ladde and Siljak (1989) and laterwill full decoupling in Ladde and Rajalakshmi (1985). The generalization to morethan two fast time scales was later presented with partial decoupling in Ladde andRajalakshmi (1988), and with full decoupling and a flowchart for the decouplingprocedure in Kathirkamanayagan and Ladde (1988). In Abed and Tits (1986), thestrong D-stability property is extended to the context of multiple time scales. Theyalso highlight an example showing that for asymptotic stability in the multiple timescale setting (2.83), asymptotic stability given (2.82) is a sufficient condition for andonly for N = 2 and dim z = 2.

Comparing the multiparameter singular perturbation theory with the matrix-valuedsingular perturbation results in the second part of the thesis, there are a few thingsto mention here. Assumptions are often used to restrict the singular perturbationparameters within the basic multiparameter singular perturbation form, but authorsagree that some of these assumptions are not realistic in view of typical applicationsfor the theory. Hence, there is a constant drive to do away with such assumptions,replacing them by conditions that can be verified by inspection of properties of theunperturbed system. For lti systems, the (strong) D-stability definition is such acondition. We remark that the requirement that all singular perturbation parametersbe positive adds important structure to the problem, and this is essential for the D-stability concept.


When we consider matrix-valued singular perturbations, the lack of structure in theperturbation makes conditions such as D-stability much harder to come up with.The kind of properties which can be meaningfully verified are simple things such asnorm bounds on matrices in the unperturbed system. Everything else will have to beassumed, and finding assumptions which are reasonable in view of applications willbe key to a successful theory. Unlike multiparameter singular perturbation, however,matrix-valued singular perturbation cannot be handled without assumptions, andthis is in line with the non-physical origin of the matrix-valued singular perturbationproblems that we know of. That is, it is primarily for problems of physical originthat imposing assumptions can be inappropriate — for perturbation problems thatare due to modeling and software artifacts, it is less surprising that assumptions maybe necessary to mitigate the effects of those artifacts.

2.5.4 Perturbation of DAE

The issue with perturbations in dae has been considered previously in Mattheij andWijckmans (1998). While they consider perturbations of the trailing matrix and notin the leading matrix, we share many of their observations regarding the possibilityof ill-posed problems. It is due to this similarity and that the dae perturbations westudy turn out to be of singular perturbation type, that the current section residesunder section 2.5.

The above-mentioned work on perturbations in the trailing matrix is referred to inKunkel and Mehrmann (2006, remark 6.7), as the latter authors remark that a pertur-bation analysis is still lacking in their influential framework for numerical solutionof dae based on the strangeness index. Although this thesis deals with perturba-tion related more to index reduction by shuffling than the methods of Kunkel andMehrmann (2006), it is hoped that our work will inspire the development of similarperturbation analyses in other contexts as well.

When the study of matrix-valued singular perturbation was motivated in sec-tion 1.2.4, one of the applications was to handle a change of rank in the leadingmatrix of a time-varying system. For a recent alternative approach to singularitiesin time-varying systems, see März and Riaza (2007), where the assumed existence ofsmooth projector functions is used to mitigate isolated rank drops.

An interesting topic in perturbation of dae is the study of how sensitive the eigenval-ues are to perturbations. The eigenvalue problems generalize naturally form matrixpencils (or pairs) to matrix polynomials, with immediate application to higher orderdae. Some recent results on the conditioning of the eigenvalue problem appear inHigham et al. (2006). Although eigenvalues are very central to our treatment of per-turbations in dae, the assumptions we use allow the “fast and uncertain” eigenvaluesto be very sensitive to perturbation. This behavior is very different from the settingwhere eigenvalue perturbation results can be used to bound the sensitivity, and un-fortunately the difference has hindered us from seeing applications of the eigenvalueperturbation theory in our work.

2.6 Contraction mappings 73

2.6 Contraction mappings

Contraction mappings can provide elegant proofs of existence and uniqueness of thesolution to an equation. In this section, we state the fundamental theorem, which canbe found in standard text books on real analysis. The theorem is illustrated with oneexample and one lemma which will be useful in later chapters.

2.44 Theorem (Contraction principle). Let X be a complete metric space, withmetric d. Let T be a mapping from X into itself such that d( T (x2), T (x1) ) ≤c d( x2, x1 ) for some c < 1. Then there exists a unique x ∈ X such that T (x) = x.

Proof: See Rudin (1976, theorem 9.23).

2.45 ExampleConsider the equation

x2 != 4 + ε, x > 0 (2.84)

for small values of the parameter ε ≤ m. Although we know that the solution is givenby x =

√4 + ε, we shall estimate the quality of the first order approximation to the

solution.

As the first order approximation is given by x0(ε) = 2 + 14 ε, we set

x(ε) = x0(ε) + m2 y(ε)

where∣∣∣y(ε)

∣∣∣ shall be bounded independently of ε (for sufficiently small m) using acontraction mapping argument.

Inserting (2.45) in (2.84) yields (dropping the argument to y)

4 + ε +1

16ε2 + 2

(2 +

14ε)m2 y + m4 y2 != 4 + ε

which is rewritten with y alone on one side of the equation (but y may still appearon the other side of the equation as well)

y!= − 1

2(2 + 1

4 ε)m2 + m4 y

116

ε2 (2.85)

Now assume that∣∣∣y∣∣∣ ≤ ρ, where ρ is to be selected soon, and define the mapping

T y 4= − 1

2(2 + 1

4 ε)m2 + m4 y

116

ε2

so that a fixed point of T is a solution to (2.85).

In view of ∣∣∣T y∣∣∣ ≤ 116

1

4 + 12 ε + m2 y


0

1

2

3

0 2 4 6 4 + ε

x =√4 + ε

x0(ε) − 116 |ε|

2

x0(ε) + 116 |ε|

2

4−2.9

4+2.9

Figure 2.4: Approximating the square root near 4 using a contraction mappingargument. Using ε to denote the deviation from 4, and introducing the firstorder approximation x0(ε) = 2 + 1

4 ε, the square root is expressed in the variable

y through the equation(x0(ε) + y

)2 != 4 + ε. The computed upper and lower

bounds on y are added to x0 and shown in the figure, and are known to be validfor all ε ≤ 2.9.

we take ρ = 1/16 so that 12 m < 2, and hence 1

16 m2 < 1 ensures that

∣∣∣T y∣∣∣ ≤ ρ. Solvingthe requirements for m reveals that m shall be chosen less than the smallest of

√16

and 4 (both equal to 4).

To prove that T is a contraction, note that it is continuously differentiable with posi-tive and decreasing derivative (since T y tends to zero from below as y grows). Hence,the modulus of the derivative at the low end of the domain is a valid Lipschitz con-stant in all of the domain. In view of this, the domain shall be selected such that thederivative at −ρ is less than 1. This yields the equation

m4(2(2 + 1

4 ε)m2 −m4 ρ

)21

16ε2 < 1

or1(

4 + 12 ε −m2 ρ

)2 m2 < 16

which is implied by m ≤ 2.9 (the optimal bound is somewhere between 2.9 and 3.0).

Using theorem 2.44 it may be concluded that for |ε| ≤ 2.9,∣∣∣x(ε) − x0(ε)∣∣∣ ≤ 1

16|ε|2

illustrated in figure 2.4.

2.6 Contraction mappings 75

The example deserves a few remarks. First, note that ρ could have been selectedarbitrarily close to 1

64 at the price of obtaining very small bounds on m. Also notethat other ways of isolating y would lead to other operators, and possibly to improvedbounds. Another feature of the example is that the operator could be defined withoutreference to the square root function, which is the function we set out to approximatein the first place.

The method can be contrasted with series expansion techniques. Using the con-traction mapping principle we first guess the approximation x0(ε), and then provebounds on the rest term. In contrast, a Taylor expansion requires the existence ofderivatives of the function being approximated. Here,

x′′(ε) = − 1

4 ( 4 + ε )3/2

and bounding |x′′(ε)| for |ε| ≤ 2 yields∣∣∣x(ε) − x0(ε)∣∣∣ ≤ 1

21

8√

2︸︷︷︸= 1

16√

2

|ε|2

Note that this bound is stronger than that obtained in the example.

Having tried the technique on the scalar example, we now turn to matrix equations,and the result is general enough to be put as a lemma.

2.46 Lemma. Let the non-singular matrix X have its inverse bounded as∥∥∥X−1

∥∥∥2≤ c.

Then

‖F‖2 ≤ρ

ρ + c1c

=⇒∥∥∥(X + F )−1 − X−1∥∥∥

2≤ ρ

(2.86)

Proof: Assume ‖F‖2 ≤ρρ+c

1c , define the operator

T G 4= −(X−1 F X−1 + G F X−1

)and consider the set G =

{G : ‖G‖2 ≤ ρ

}. Then

‖T G‖2 ≤ ‖F‖2 ( c + ρ ) c ≤ ρ

so T G ⊂ G. Since

‖T G2 − T G1‖2 =∥∥∥G2 F X

−1 − G1 F X−1

∥∥∥2

≤ ‖F‖2 c ‖G2 − G1‖2

< ‖F‖2ρ + cρ

c ‖G2 − G1‖2 ≤ ‖G2 − G1‖2

T is a contraction on G.


By theorem 2.44 there is a unique solution G ∈ G. Hence, G satisfies

G!= −

(X−1 F X−1 + G F X−1

)and multiplying by X from the right reveals

X−1 F + GX + G F != 0

Adding I to both sides allows us to write(X−1 + G

)(X + F ) != I

Where it is seen that X + F indeed is invertible, and

G!= (X + F )−1 − X−1

shows that (X + F )−1 − X−1 ∈ G.

2.47 Corollary. Let X be a non-singular matrix with ‖X‖2 ≤ c and let ρ > 0 be given.Then there exists a constant m > 0 such that

‖F‖2 ≤ m =⇒∥∥∥(X + F )−1

∥∥∥2≤ c + ρ (2.87)

Proof: Take m = ρρ+c

1c and use

∥∥∥(X + F )−1∥∥∥

2=

∥∥∥(X + F )−1 − X−1 + X−1∥∥∥

2≤∥∥∥(X + F )−1 − X−1

∥∥∥2

+∥∥∥X−1

∥∥∥2.

In chapter 8, when contracting mapping arguments are applied to time-varying sys-tems, the fixed-point equations will be integral equations. We end this section withan example that will come in handy when reading those arguments.

2.48 Example

Consider equations in time-varying matrices over the time interval [ 0, tf ). Let φ bethe transition matrix of the system

x′(t) = M(t) x(t)

so thatφ(t, t) = I

φ(•, τ)′(t) = M(t)φ(t, τ)

φ(τ, •)′(t) = −φ(τ, t)M(t)

Define the operators S and T according to

(S R) (t) 4=1a

t∫0

φ(t, τ) P (τ) dτ

(T R) (t) 4=1a

tf∫t

P (τ)φ(τ, t) dτ

2.7 Interval analysis 77

where a is a constant and P is a matrix-valued function of time.

The fixed-point equation

S R != R

then implies (swapping the sides, multiplying by a, and differentiating)

a R′(t) != P (t) +

t∫0

M(t)φ(t, τ) P (τ) dτ = P (t) + M(t)

t∫0

φ(t, τ) P (τ) dτ

= P (t) + aM(t) (S R) (t) = P (t) + aM(t)R(t)

(2.88)

Similarly, the fixed-point equation

T R != R

implies

a R′(t) != −P (t) −tf∫t

P (τ)φ(τ, t)M(t) dτ = −P (t) −

tf∫t

P (τ)φ(τ, t) dτ

M(t)

= −P (t) − a (T R) (t)M(t) = −P (t) − a R(t)M(t)

(2.89)

Hence, by identifying the forms of (2.88) or (2.89) with some equation in time-varying matrices, we will be able to formulate corresponding fixed-point integralequations.

2.7 Interval analysis

While perturbation analysis is the theoretical tool used to prove convergence resultsin this thesis, it relies on a too coarse uncertainty model to be successful in applica-tions. In the much more fine-grained uncertainty model used in interval analysis, oneuncertainty interval is used for each scalar quantity. A survey and quick introductionis given in Kearfott (1996). Another quick introduction including many algorithmsis given in Jaulin et al. (2002), written jointly with two of the authors of the popularbook Jaulin et al. (2001).

Though superior to “implemented perturbation analysis”, interval analysis is of-ten blamed for producing error bounds so pessimistic that they are useless. Un-fortunately, pessimistic error bounds — which are not always useless — is the priceone has to pay to be certain that the result of the uncertain computation really in-cludes every possible outcome of the uncertain problem. Methods to improve per-formance include use of preconditioning matrices, coordinate transformations, anduncertainty models which captures some of the correlation between uncertain quan-tities. For instance, even if the uncertainty in the quantity x is large, as long as x is


known to be non-zero it holds thatxx

= 1 x − x = 0x + x

2 x − x= 2

but without support from symbolic math computations these relations may be diffi-cult to maintain. This problem is addressed in Neumaier (1987), but in this thesis weshall only use the following trivial observation.

Notation. In the language of interval analysis, scalars, vectors, and matrices, repre-sented by box constraints on each entry, are denoted intervals, interval vectors, andinterval matrices, respectively. In contrast — when the difference needs to be empha-sized — the denotation of the exactly known corresponding objects are real number,point vector, and point matrix. The notion of point objects is natural in view of theuncertain objects being technically defined as sets of point objects. Since this thesisis not in the field of interval analysis (we merely use the technique for illustration),we tend to use the terms uncertain and exact instead.

A function containing uncertainty is also thougt of as a set of exact functions. Thisenables us to define the solution to the uncertain equation in the variable x ∈ Rn

f ( x ) != 0

as {x ∈ Rn :

(∃ f ∈ f : x != f ( x )

) }Notation. For a general set we speak of inner and outer approximations, refering toits subsets and supersets. In the context of equations, we simply write inner/outersolution when referring to inner/outer approximations of the solution.

An uncertain matrix is called regular if it only admits non-singular point matrices.Otherwise, it is called singular. The next definition is non-standard but will be con-venient in the thesis.

2.49 Definition (Pointwise non-singular uncertain matrix). An uncertain matrixis said to have the additional property of being pointwise non-singular if it admitsonly non-singular point matrices.

Clearly every regular uncertain matrix has the property of being pointwise non-singular, but for singular uncertain matrices this is the property which allows theinverse to be formed formally, even though the inverse is a matrix with at least oneunbounded interval of uncertainty. For the inverse to be useful, additional assump-tions will have to be added, for instance, a bound on the norm of the inverse.

It is important to note that an interval matrix with additional constraints, such aspointwise non-singularity or boundedness of inverse, is generally not possible to rep-resent as an interval matrix, and it becomes more appropriate to use the more generalnotion of uncertain matrix.

2.7 Interval analysis 79

If X is known to be a regular matrix, so that X−1X = I , the following column reduc-tion is possible [

X Y] [X−1 −X−1Y

0 I

]=

[I 0

]This also makes use of the trivial −X X−1Y + Y = −I Y + Y = 0. Thinking of the col-umn reduction as a coordinate transformation, the uncertainty in the matrix

[X Y

]was transferred to uncertainty in the coordinate transform. Of course, row reductioncan be carried out analogously.

While approximate intervals for X−1 are easily obtained by a first order expansion ofX−1 around some point matrix X0 ∈ X, good bounds that are guaranteed to containX−1 are more demanding. We shall not go into details, since we will rely on Math-ematica to deliver accurate results, and interested readers are referred to Neumaier(1990, theorem 4.1.11) for a theoretical result.

The following theorem is characteristic for interval arithmetic. It is a simple form ofconstraint propagation, and proves itself.

2.50 Theorem. Consider the fixed-point equation,

x!= f ( x )

where uncertainty in f implies uncertainty in the solution, and where the solutionset x is known to be a subset of x1. If conservative evaluation of f ( x1 ) results in x2,that is

f ( x1 ) ⊂ x2

then

x ⊂ x1 ∩ x2

The theorem immediately gives a technique for iterative refinement of outer solutionsto a fixed-point equation. Hence, even very conservative outer solutions may be veryvaluable as they can serve as a starting point for the iterative refinement.

2.51 ExampleConsider the uncertain polynomial (the brackets denote intervals here, not matrices)

f ( x ) 4= [ 0.9, 1.1 ] x2 + [ 9.5, 10.5 ] x + [−8.5, −7.5 ]

which has a solution near 0.75. The plot of f in figure 2.5 allows the solution to beread off as the interval where 0 ∈ f ( x ), and it is easy to find that the true solution setis approximately [ 0.6676, 0.8295 ].

Let x0 = [ 0.0, 2.0 ] be a given outer solution, and rewrite the equation f ( x ) != 0 as

the fixed-point equation x != T ( x ), where

T ( x ) 4= − [ 0.9, 1.1 ] x2 + [−8.5, −7.5 ][ 9.5, 10.5 ]


−2

0

2

0.6 0.65 0.7 0.75 0.8 0.85 0.9

f ( x )

x

Figure 2.5: The uncertain function f in example 2.51. Since no uncertain quan-tities appear more than once in the expression for f , it is straint-forward to com-pute the set f ( x1 ) at any point x1.

The iterative refinement produces the following sequence of outer solutions.

Iterate Outer solutionx0 [ 0.00, 2.00 ]x1 [ 0.55, 0.89 ]x2 [ 0.63, 0.87 ]x3 [ 0.64, 0.86 ]

Further iteration gives little improvement; x6 = [ 0.6375, 0.8562 ], sharing its fourmost significant digits with x200. Hence, the iterative refinement produced signifi-cant improvement of the initial outer solution in just a few iterations, but was unableto converge to the true solution set.

2.8 Gaussian elimination

Although assumed that the reader is familiar with Gaussian elimination, in this sec-tion some aspects of particular interest for the proposed algorithm in chapter 3 willbe discussed.

The shuffling algorithm in chapter 3 makes use of row reduction. The most wellknown row reduction method is perhaps Gaussian elimination, and although infa-mous for its numerical properties, it is sufficiently simple to be a realistic choice forimplementations. In fact, the proposed algorithm makes this particular choice, andamong the many variations of Gaussian elimination, a fraction-free scheme is used.This technique for taking a matrix to row echelon form� uses only addition and multi-

� A matrix is said to be in row echelon form if each non-zero row has more leading zeros than the previousrow. Actually, in order to account for the outcome when full pivoting is used, one should really say that thematrix is in row echelon form after suitable reordering of variables. In the current setting of elimination

2.8 Gaussian elimination 81

plication operations. In contrast, a fraction-producing scheme involves also division.The difference is explained by example. Consider performing row reduction on amatrix of integers of the same order of magnitude:[

5 73 −4

]A fraction-free scheme will produce a new matrix of integers,[

5 75 · 3 − 3 · 5 5 · (−4) − 3 · 7

]=

[5 70 −41

]while a fraction producing scheme generally will produce a matrix of rational num-bers, [

5 73 − (3/5) · 5 (−4) − (3/5) · 7

]=

[5 70 −(41/5)

]The fraction-free scheme thus has the advantage that it is able to preserve the integerstructure present in the original matrix. On the other hand, if the original matrix is amatrix of rational numbers, both schemes generally produce a new matrix of rationalnumbers, so there is no advantage in using the fraction-free scheme. Note that it isnecessary not to allow the introduction of new integer entries in order to keep thedistinction clear, since any matrix of rational numbers can otherwise be convertedto a matrix of integers. Further, introducing non-integer scalars would destroy theinteger structure. The two schemes should also be compared by the numbers theyproduce. The number −41 in comparison with the original numbers is a sign of thetypical blowup of entries caused by the fraction-free scheme. The number −(41/5) =−8.2 does not indicate the same tendency.

When the matrix is interpreted as the coefficients of a linear system of equations tobe solved in the floating point domain, the blowup of entries implies bad numericcondition, which in turn has negative implications for the quality of the computedsolution. Unfortunately, this is not the only drawback of the fraction-free scheme,since the operations involved in the row reduction are ill-conditioned themselves.This means that there may be poor correspondence between the original equationsand the row reduced equations, even before attempting to solve them.

Fraction-free Gaussian elimination can also be applied to a matrix of polynomials,and will then preserve the polynomial structure. Note also that the structure is notdestroyed by allowing the introduction of new scalars. This can be used locally todrastically improve the numerical properties of the reduction scheme by making itapproximately the same as those of the fraction producing scheme. That is, multipli-cation by scalars is used to locally make the pivot polynomial approximately equalto 1, and then fraction-free operations are used to eliminate below the pivot as usual.

Finally, recall that Gaussian elimination also takes different flavors in the pivotingdimension. However, this dimension is not explored when proposing the algorithm

where it makes sense to speak of structural zeros, the reference to reordering of variables can be avoided bysaying that the reduced matrix is such that each non-zero row has more structural zeros than the previousrow.


in chapter 3.

2.9 Miscellaneous results

The chapter ends with a collection of results that are included here only to be ref-erenced from subsequent chapters. Please refer to cited references for discussion ofthese results.

The following theorem gives an upper bound on the roots of a polynomial, purelyin terms of the moduli of the coefficients. Of course, there are many ways to obtaintighter estimates by using more knowledge about the coefficients, but we don’t havethat kind of information in this thesis, and the bounds we get from this theorem aretight enough for our purposes.

2.52 Theorem. The moduli of the roots of the polynomial

f ( z ) =n∑i=0

ai zi

are bounded by

2 max

∣∣∣∣∣an−ian

∣∣∣∣∣ 1i

n−1

i=1

⋃∣∣∣∣∣ a0

2 an

∣∣∣∣∣ 1n

Proof: The result is included in the form of an exercise in the thorough Marden(1966, exercise 30.5).

Motivated by theorem 2.27, the bounding of matrices is very important in our work,but it turns out that we actually know more about the inverse of the matrix, thanabout the matrix itself. Since bounding the norm of the inverse of a matrix is relatedto bounding the condition number, several useful results are presented with condi-tion number bounds in mind. The survey Higham (1987) lists many such results.One of them which is useful to us and applies to upper triangular matrices is givenas the next theorem.

2.53 Theorem. For the upper triangular matrix U , it holds that

‖U‖2 ≤√

( a + 1 )2 n + 2 n ( a + 2 ) − 1( a + 2 ) b

(2.90)

where, with J = U−1,

a = maxi<j

∣∣∣Jij ∣∣∣|Jii |

= maxi<j

λi∣∣∣Jij ∣∣∣ ≤ max

(U−1

)λmax(U )

b = mini|Jii | = min

i

1λi

=1

λmax(U )

2.9 Miscellaneous results 83

Proof: This is an immediate consequence of results in Lemeire (1975) and builds onthe theory of M-matrices.

3Shuffling quasilinear DAE

Methods for index reduction of general nonlinear differential-algebraic equations aregenerally difficult to implement due to the recurring use of functions defined onlyvia the implicit function theorem. The problem is avoided in chapter 5, but insteadof implementing the implicit functions, additional variables are introduced, and anunder-determined system of equations needs to be solved each time the derivative (orthe next iterate of a discretized solution) is computed. As an alternative, additionalstructure may be added to the equations in order to make the implicit functionspossible to implement, thereby avoiding additional variables and under-determinedsystems of equations. In particular, this is possible for the quasilinear and lineartime-invariant (lti) structures, and it turns out that there exists an algorithm forthe quasilinear form that is a generalization of the shuffle algorithm for the lti formin the sense that, when applied to the lti form, it reduces to the shuffle algorithm.For this reason, the more general algorithm is referred to as a quasilinear shufflealgorithm.

This chapter is devoted to quasilinear shuffling. It is included in the backgroundpart of the thesis since it was the application to quasilinear shuffling that was theoriginal motivation behind the study of matrix-valued singular perturbations. Thisconnection was mentioned in section 1.2.2 and will be discussed below when theseminumerical approach presented in section 3.2.4. This chapter also contains a dis-cussion on how the quasilinear shuffle algorithm can be used to find consistent initialconditions, and touches upon the issue of the algorithm complexity.

The contents of the chapter presents only minor improvements compared to thechapter with the same title in the author’s licentiate thesis, Tidefelt (2007, chapter 3),except that the section on algorithm complexity has been removed due to its weakconnection to the current thesis.

85

86 3 Shuffling quasilinear dae

Notation. We use a star to mark that a symbol denotes a constant. For instance,the symbol E∗ denotes a constant matrix, while a symbol like E would in generalrefer to a matrix-valued function. A few times, we will encounter the gradient of amatrix-valued function. This object will be a function with 3 indices, but rather thanadopting tensor notation with the Einstein summation convention, we shall permuteindices using generalized transposes denoted (•)T and (•)

T

. Since their operation willbe clear form the context, they will not be defined formally in this thesis.

3.1 Index reduction by shuffling

In section 2.2.3, algorithm 2.1 provided a way of reducing the differentiation index oflti dae. The extension of that algorithm to the quasilinear form is immediate, but toput this extension in a broader context, we will take the view of it as a specializationinstead. In this section, we mainly present the algorithm as it applies to equationswhich are known exactly, and are to be index reduced exactly.

3.1.1 The structure algorithm

In section 2.2.5 we presented the structure algorithm (algorithm 2.2) as means forindex reduction of general nonlinear dae,

f ( x′(t), x(t), t ) != 0 (3.1)

This method is generally not possible to implement, since the recurring use of the im-plicit function theorem often leaves the user with functions whose existence is givenby the theorem, but whose implementation is very involved (to the author’s knowl-edge, there are to date no available implementations serving this need). However, itis possible to implement for the quasilinear form, as was done, for instance, usingGauss-Bareiss elimination (Bareiss, 1968) in Visconti (1999), or outlined in Stein-brecher (2006).

3.1.2 Quasilinear shuffling

Even though algorithms for quasilinear dae exist, the results they produce may becomputationally demanding, partly because the problems they apply to are still verygeneral. This should be compared with the linear time-invariant (lti) case,

E x′(t) + A x(t) + B u(t) != 0 (3.2)

to which the very simple and certainly tractable shuffle algorithm (see section 2.2.3)applies — at least as long as there is no uncertainty in the equation coefficients. Inter-estingly, the algorithm for quasilinear dae described in Visconti (1999) is using thesame idea, and it generalizes the shuffle algorithm in the sense that, when appliedto the lti form, it reduces to the shuffle algorithm. For this reason, the algorithm inVisconti (1999) is referred to as a quasilinear shuffle algorithm.�

� Note that it is not referred to as the quasilinear shuffle algorithm, since there are many options regardinghow to do the generalization. There are also some variations on the theme of the lti shuffle algorithm,leading to slightly different generalizations.

3.1 Index reduction by shuffling 87

In the next two sections, the alternative view of quasilinear shuffling as a special-ization of the structure algorithm is taken. Before doing so, we show using a smallexample what index reduction of quasilinear dae can look like.

3.1 ExampleThis example illustrates how row reduction can be performed for a quasilinear dae.The aim is to present an idea rather than an algorithm, which will be a later topic.Consider the dae (dropping the dependency of x on t from the notation)2 + tan( x1 ) x2 4 t

2 cos( x1 ) 0 ex3

sin( x1 ) x2 cos( x1 ) 4 t cos( x1 ) − ex3

x′ +

5x2 + x3

x1 ex3 + t3 x2

!= 0

The leading matrix is singular at any point since the first row times cos( x1 ) less thesecond row yields the third row. Adding the second row to the third, and then sub-tracting cos( x1 ) times the first, is an invertible operation and thus yields the equiva-lent equations:2 + tan( x1 ) x2 4 t

2 cos( x1 ) 0 ex3

0 0 0

x′ +

5x2 + x3

x1 ex3 + t3 x2 + x2 + x3 − 5 cos( x1 )

!= 0

This reveals the implicit constraint of this iteration,

x1(t) ex3(t) + t3 x2(t) + x2(t) + x3(t) − 5 cos( x1(t) ) != 0

Then, differentiation yields the new dae 2 + tan( x1 ) x2 4 t2 cos( x1 ) 0 ex3

ex3 − 5 sin( x1 ) t3 + 1 x1 ex3 + 1

x′ +

5x2 + x33 t2 x2

!= 0

Here, the leading matrix is generally non-singular, and the dae is esentially an odebundeled with the derived implicit constraint.

3.1.3 Time-invariant input affine systems

In this section, the structure algorithm is applied to equations

0 != f ( x(t), x′(t), t )

where f is in the form

f ( x, x, t ) = E( x ) x + A( x ) + B( x ) u(t) (3.3)

with u being a given forcing function. This system is considered time-invariant sincetime only enters the equation via u.

After one iteration of the structure algorithm, we will see what requirements (3.3)must fulfill in order for the equations after one iteration to be in the same form as theoriginal equations. This will show that (3.3) is not a natural form for dae treated bythe structure algorithm. In the next section, a more successful attempt will be made,


starting from a more general form than (3.3).

The system is rewritten in the form

x′(t) != x(t) (3.4a)

0 != f ( x(t), x(t), t ) (3.4b)

to match the setup in Rouchon et al. (1992) (recall that the notation x is not definedto denote the derivative of x; it is a composed symbol denoting a newly introducedfunction which is required to equal the derivative of x by (3.4a)). As is usual in theanalysis of dae, the analysis is only valid locally, giving just a local solution. Asis also customary, all matrix ranks that appear are assumed to be constant in theneighborhood of the initial point defining the meaning of local solution.

We will now follow one iteration of the structure algorithm applied to this system(compare algorithm 2.2).

Let f0 B f , and introduce E0, A0 and B0 accordingly. Let µk4= rank Ek (that is, the

rank of the “x-gradient” of fk , which by assumption may be evaluated at any pointin the neighborhood), and let fk denote µk components of fk such that Ek denotes µklinearly independent rows of Ek . Let fk denote the remaining components of fk . Bythe constant rank assumption it follows that, locally, the rows of Ek( x ) span the rowsof Ek( x ), and hence there exists a function ϕk such that

Ek( x ) = ϕk( x ) Ek( x )

Hence,

fk( x, x, t ) = Ek( x ) x + Ak( x ) + Bk( x ) u(t)

= ϕk( x ) Ek( x ) x + Ak( x ) + Bk( x ) u(t)

= ϕk( x )(Ek( x ) x + Ak( x ) + Bk( x ) u(t)

)− ϕk( x ) Ak( x ) − ϕk( x ) Bk( x ) u(t) + Ak( x ) + Bk( x ) u(t)

= ϕk( x ) fk( x, x, t ) + Ak( x ) − ϕk( x ) Ak( x )

+(Bk( x ) − ϕk( x ) Bk( x )

)u(t)

Define

Ak( x ) 4= Ak( x ) − ϕk( x ) Ak( x )

Bk( x ) 4= Bk( x ) − ϕk( x ) Bk( x )

Φk( x, t, y ) 4= ϕk( x ) y + Ak( x ) + Bk( x ) u(t)

(3.5)

and note that along solutions,

Φk( x, t, 0 ) = Φk( x, t, fk( x, x, t ) ) = fk( x, x, t ) = 0

In particular, the expression is constant over time, so it can be differentiated withrespect to time to obtain a substitute for the (locally) uninformative equations given

3.1 Index reduction by shuffling 89

by fk . Thus, let

fk+1( x, x, t ) 4=(

fk( x, x, t )∂ t 7→Φk( x(t), t, 0 )

∂t (t)

)(3.6)

Expanding the differentiation using the chain rule, it follows that

∂ t 7→ Φk( x(t), t, 0 )∂t

(t) = ∇1Φk( x(t), t, 0 ) x′( t ) + ∇2Φk( x(t), t, 0 )

=(∇Ak( x(t) ) +

(∇TBk( x(t) ) u(t)

) T)x′( t )

+ Bk( x(t) ) u′(t)

(3.7)

However, since x′ = x along solutions, the following defines a valid replacementfor fk :

fk+1( x, x, t ) ={[Ek( x )∇Ak( x )

]+

([0

∇TBk( x )

]u(t)

) T}x +

(Ak( x )

0

)+

[Bk( x ) 0

0 Bk( x )

] (u(t)u(t)

)(3.8)

We have now completed one iteration of the structure algorithm, and turn to findingconditions on (3.3) that make (3.8) fullfill the same conditions.

In (3.8), the product between u(t) and x( t ) is unwanted, so the structure is restrictedby requiring

∇Bk( x(t) ) = 0 (3.9)

that is, Bk is constant; Bk( x ) = B∗k .

Unfortunately, the conflict has just been shifted to a new location by this require-ment. The structure of fk+1 does not match the structure in (3.3) together with therequirement (3.9), since Bk( x ) includes the non-constant expression ϕk( x ). Henceit is also required that Ek is constant so that ϕk( x ) may be chosen constant. This iswritten as Ek( x ) = E∗k . Then, if structure is to be maintained,[

E∗k∇Ak( x )

]would have to be constant. Again, this condition is not met since ∇Ak( x ) is generallynot constant. Finally, we are led to also requiring that ∇Ak( x ) be constant. In otherwords, that

Ak( x ) = A∗k x

so the structure of (3.3) is really

f ( x, x, t ) = E∗ x + A∗ x + B∗ u(t)

which is a standard lti dae.


Note that another way to obtain conditions on (3.3) which become fulfilled also by(3.8) is to remove the forcing function u.

The key point of this section, however, is that we have seen that in order to be ableto run the structure algorithm repeatedly on equations in the form (3.3), an imple-mentation that is designed for one iteration on (3.3) is insufficient. In other words, ifan implementation that can be iterated exists, it must apply to a more general formthan (3.3).

3.1.4 Quasilinear structure algorithm

Seeking a replacement for (3.3) such that an implementation for one step of the struc-ture algorithm can be iterated, a look at (3.8) suggests that the form should allow fordependency on time in the leading matrix. Further, since the forcing function u hasentered the leading matrix, the feature of u entering the equations in a simple wayhas been lost. Hence it is no longer motivated to keep Ak( x ) and Bk( x )uk(t) separate,but we might as well turn to the quasilinear form in its full generality,

fk( x, x, t ) = Ek( x, t ) x + Ak( x, t )

The reader is referred to the previous section for the notation used below. This time,the constant rank assumption leads to the existence of a ϕk such that

Ek( x, t ) = ϕk( x, t ) Ek( x, t )

Such a ϕk can be obtained from a row reduction of E, and corresponds to the rowreduction performed in a quasilinear shuffle algorithm.

It follows that

fk( x, x, t ) = Ek( x, t ) x + Ak( x, t )

= ϕk( x, t ) Ek( x, t ) x + Ak( x, t )

= ϕk( x, t )(Ek( x, t ) x + Ak( x, t )

)− ϕk( x, t ) Ak( x, t ) + Ak( x, t )

= ϕk( x, t ) fk( x, x, t ) + Ak( x, t ) − ϕk( x, t ) Ak( x, t )

Define

Ak( x, t ) 4= Ak( x, t ) − ϕk( x, t ) Ak( x, t )

Φk( x, t, y ) 4= ϕk( x, t ) y + Ak( x, t )(3.10)

and note that along solutions,

Φk( x, t, 0 ) = Φk( x, t, fk( x, x, t ) ) = fk( x, x, t ) = 0

Taking a quasilinear shuffle algorithm perspective on this, we see that Φk( x, t, 0 ) =Ak( x, t ) is computed by applying the same row operations to A as were used to findthe function ϕk above.

The expression Φk( x, t, 0 ) is constant over time, so it can be differentiated with re-spect to time to obtain a substitute for the (locally) uninformative equations given

3.2 Proposed algorithm 91

by fk . Thus, let

fk+1( x, x, t ) 4=(

fk( x, x, t )∂ t 7→Φk( x(t), t, 0 )

∂t (t)

)Expanding the differentiation using the chain rule, it follows that

∂ t 7→ Φk( x(t), t, 0 )∂t

(t) = ∇1Φk( x(t), t, 0 ) x′( t ) + ∇2Φk( x(t), t, 0 )

= ∇1Ak( x(t), t ) x′( t ) + ∇2Ak( x(t), t )(3.11)

Again, since x′ = x along solutions, fk may be replaced by

fk+1( x, x, t ) =[Ek( x, t )∇1Ak( x, t )

]x +

(Ak( x, t )∇2Ak( x, t )

)(3.12)

This completes one iteration of the structure algorithm, and it is clear that this canalso be seen as the completion of one iteration of a quasilinear shuffle algorithm. Asopposed to the outcome in the previous section, this time (3.12) is in the form westarted with, so the procedure can be iterated.

3.2 Proposed algorithm

Having seen how the structure algorithm can be implemented as an index reductionmethod for (exact) quasilinear dae, and that this results in an immediate general-ization of the shuffle algorithm for lti dae, we now turn to the task of detailing thealgorithm to make it applicable in a practical setting. Issues to be dealt with includefinding a suitable row reduction method and determining whether an expression iszero along solutions.

The problem of adopting algorithms for revealing hidden constraints in exact equa-tions to a practical setting has previously been addressed in Reid et al. (2002). Thegeometrical awareness in their work is convincing, and the work was extended inReid et al. (2005). For examples of other approaches to system analysis and/or indexreduction which remind of ours, see for instance Unger et al. (1995) or Chowdhryet al. (2004).

3.2.1 Algorithm

The reason to do index reduction in the following particular way is that it is simpleenough to make the analysis easy, and also that it does not rule out some of thecandidate forms (Tidefelt, 2007, section 4.2) already in the row reduction step byproducing a leading matrix outside the form. If maintaining invariant forms wouldnot be a goal in itself, the algorithm could easily be given better numeric properties(compare section 2.8), and/or better performance in terms of computation time (byreuse of expressions and similar techniques).


Algorithm 3.1 Quasilinear shuffling iteration for invariant forms.

Input: A square dae,

E( x(t), t ) x′(t) + A( x(t), t ) x(t) != 0It is assumed that the leading matrix is singular (when the leading matrix is non-singular, the index is 0 and index reduction is neither needed nor possible).

Output: An equivalent square dae of lower index, and additional algebraic con-straints.

Iteration:Select a set of independent rows in E( x(t), t ).Perform fraction-free row reduction of the equations such that exactly the rows

that were not selected in the previous step are zeroed. The produced algebraicterms corresponding to the zero rows in the leading matrix, define algebraicequations restricting the solution manifold.

Differentiate the newly found algebraic equations with respect to time, and jointhe resulting equations with the ones selected in the first step to obtain the newsquare dae.

Remarks: The most important remark to make here is that the differentiation is notguaranteed to be geometric (recall the remark in algorithm 2.2). Hence, the termi-nation criterion based on the number of iterations in algorithm 2.2 cannot be usedsafely in this context. If that termination criterion is met, our algorithm aborts with“non-geometric differentiation” instead of “ill-posed”, but no conclusion regardingthe existence of solutions to the dae can be drawn.

Although there are choices regarding how to perform the fraction-free row reduction,a conservative approach is taken by not assuming anything more fancy than fraction-free Gaussian elimination, with pivoting used only when so required and done themost naïve way. This way, it is ensured that the tailoring of the reduction algorithmsis really just a tailoring rather than something requiring elaborate extension.

As an alternative to the fraction-free row reduction, the same step may be seen asa matrix factorization.(Steinbrecher, 2006) This view hides the reduction process inthe factorization abstraction, and may therefore be better suited for high-level rea-soning about the algorithm, while current presentation may be more natural from animplementation point of view and easier to reason about on a lower level of abstrac-tion.

It would be of no consequence for the analysis in the current chapter to require thatthe set of equations chosen in the first step always include the equations selected inthe previous iteration, as is done in Rouchon et al. (1992).


We stress again that an index reduction algorithm is typically run repeatedly until alow index is obtained (compare, for instance, algorithm 2.2). Here, only one iterationis described, but this is sufficient since the algorithm output is in the same form asthe algorithm input was assumed to be.

Recall the discussion on fraction producing versus fraction-free row reductionschemes in section 2.8. The proposed algorithm uses a fraction-free scheme for tworeasons. Most importantly in this chapter, it does so in order to hold more invariantforms (to be defined). Of subordinate importance is that it can be seen as a heuristicsfor producing simpler expressions. The body of the index reduction loop is given inalgorithm 3.1.

3.2 Example

Here, one iteration is performed on the following quasilinear dae:x1(t) x2(t) sin(t) 0ex3(t) x1(t) 0t 1 0

x′1(t)x′2(t)x′3(t)

+

x2(t)3

cos(t)4

!= 0

The leading matrix is clearly singular, and has rank 2.

For the first step in the algorithm, there is freedom to pick any two rows as the inde-pendent ones. For instance, the rows { 1, 3 } are chosen. The remaining row can thenbe eliminated using the following series of fraction-free row operations. Firstx1(t) x2(t) − t sin(t) 0 0

ex3(t) − t x1(t) 0 0t 1 0

x′1(t)x′2(t)x′3(t)

+

x2(t)3 − 4 sin(t)cos(t) − 4 x1(t)

4

!= 0

Then x1(t) x2(t) − t sin(t) 0 00 0 0t 1 0

x′1(t)x′2(t)x′3(t)

+

x2(t)3 − 4 sin(t)e( x(t), t )

4

!= 0

where the algebraic equation discovered is given by

e( x, t ) =(x1 x2 − t sin(t)

) (cos(t) − 4 x1

)−(ex3 − t x1

) (x3

2 − 4 sin(t))

Differentiating the derived equation with respect to time yields a new equation withresidual in the form(

a1( x(t), t ) a2( x(t), t ) a3( x(t), t )) x

′1(t)x′2(t)x′3(t)

+ b( x(t), t )


where

a1( x, t ) = x2 ( cos(t) − 4 x1 ) − 4 ( x1 x2 − t sin(t) ) + t(x3

2 − 4 sin(t))

a2( x, t ) = x1 ( cos(t) − 4 x1 ) − 3 x22 ( ex3 − t x1 )

a3( x, t ) = −ex3(x3

2 − 4 sin(t))

b( x, t ) = − ( sin(t) + t cos(t) ) ( cos(t) − 4 x1 ) − sin(t) ( x1 x2 − t sin(t) )

+ x1

(x3

2 − 4 sin(t))

4 cos(t) ( ex3 − t x1 )

Joining the new equation with the ones selected previously yields the following out-put from the algorithm (dropping some notation for brevity):x1(t) x2(t) sin(t) 0

t 1 0a1 a2 a3

x′1(t)x′2(t)x′3(t)

+

x2(t)3

4b

!= 0

Unfortunately, the expression swell seen in this example is typical for the investigatedalgorithm. Compare with the neat outcome in example 3.1, where some intelligencewas used to find a parsimonious row reduction.

3.2.2 Zero tests

The crucial step in algorithm 3.1 is the row reduction, but exactly how this can bedone has not been discussed yet. One of the important topics for the row reductionto consider is how it should detect when it is finished. For many symbolic matriceswhose rank is determined by the zero-pattern, the question is easy; the matrix isrow reduced when the rows which are not independent by construction consist ofonly structural zeros. This was the case in example 3.2. However, the terminationcriterion is generally more complicated since there may be expressions in the matrixwhich are identically zero, although this is hard to detect using symbolic software.

It is proposed that structural zeros are tracked in the algorithm, making many ofthe zero tests trivially affirmative. An expression which is not a structural zero istested against zero by evaluating it (and possibly its derivative with respect to time)at the point where the index reduction is being performed. If this test is passed, theexpression is assumed rewritable to zero, but anticipating that this will be wrongoccasionally, the expression is also kept in a list of expressions that are assumed to bezero for the index reduction to be valid. Numerical integrators and the like can thenmonitor this list of expressions, and take appropriate action when an expression nolonger evaluates to zero.

Note that there are some classes of quasilinear dae where all expressions can be putin a canonical form where expressions that are identically zero can be detected. Forinstance, this is the case when all expressions are polynomials.

Of course, some tolerance must be used when comparing the value of an expressionagainst zero. Setting this tolerance is non-trivial, and at this stage we have no scien-


tific guidelines to offer.� This need was the original motivation for the research onmatrix-valued singular perturbations.

The following example exhibits the weakness of the numerical evaluation approach.It will be commented on in section 3.2.5.

3.3 Example

Let us consider numerical integration of the inevitable pendulum�, modeled byx′′

!= λ x

y′′!= λ y − g

1 != x2 + y2

where we take g B 10.0. Index reduction will be performed at two points (the timepart of these points is immaterial and will not be written out); one at rest, the othernot at rest. Note that it is quite common to begin a simulation of a pendulum (as wellas many other systems) at rest. The following values give approximately an initialangle of 0.5 rad below the positive x axis:

x0,rest : { x(0) = 0.87, x(0) = 0, y(0) = −0.50, y(0) = 0, λ(0) = −5.0 }x0,moving : { x(0) = 0.87, x(0) = −0.055, y(0) = −0.50, y(0) = −0.1, λ(0) = −4.8 }

Clearly, if x or y constitutes an entry of any of the intermediate leading matrices,the algorithm will be in trouble, since these values are not typically zero. After tworeduction steps which are equal for both points, the equations look as follows (notshowing the already deduced algebraic constraints):

11

11

2 x 2 x 2 y 2 y

x′

x′

y′

y′

λ′

+

−λ x

10 − λ y−x−y0

!= 0

Reducing these equations at x0,rest, the algorithm produces the algebraic equation

20 y != 2(x2 + y2

)λ

but the correct equation, produced at x0,moving is

20 y != 2( (x2 + y2

)λ + x2 + y2

)

� The perturbation results in the second part of the thesis are limited to linear systems, but here a theory fornon-linear systems is needed.� The two-dimensional pendulum in Cartesian coordinates is used so often in the study of dae that it was

given this nick name in Mattsson and Söderlind (1993). While their model contains parameters for thelength and mass of the pendulum, our pendulum is of unit length and mass to simplify notation.


Our intuition about the mechanics of the problem gives immediately that x′ andy′ are non-zero at x0,rest. Hence, computing the derivatives of all variables using areduction to index 0 would reveal the mistake.

As a final note on the failure at x0,rest, note that x and y would be on the list ofexpressions that had been assumed zero. Checking these conditions after integratingthe equations for a small period of time would detect the problem, so delivery of anerroneous result is avoided.

3.2.3 Longevity

At the point ( x(t0), t0 ), the proposed algorithm performs tasks such as row opera-tions, index reduction, selection of independent equations, etc. Each of these may bevalid at the point they were computed, but fail to be valid at future points along thesolution trajectory. By the longevity of such an operation, or the product thereof, werefer to the duration until validity is lost.

A row reduction operation becomes invalid when its pivot entry becomes zero. Aselection of equations to be part of the square index 1 system becomes invalid whenthe iteration matrix looses rank. An index reduction becomes invalid if an expres-sion which was assumed to be zero becomes non-zero. The importance of longevityconsiderations is shown by an example.

3.4 Example

A circular motion is described by the following equations (think of ζ as “zero”):ζ

!= x′ x + y′ y

1 != (x′)2 + (y′)2

1 != x2 + y2

This system is square but not in quasilinear form. The trivial conversion to quasilin-ear form described in section 2.2.4 yields a square dae of size 5 with new variablesintroduced for the derivatives x′ and y′ .

By the geometrical interpretation of the equations we know that the solution mani-fold is one-dimensional and equal to the two disjoint sets (distinguished by the signchoices below, of which one has been chosen to work with)

( x, x, y, y, z ) ∈ R5 :

x = cos(β), x = −(+) sin(β),

y = sin(β), y = +(−) cos(β),

ζ = 0,

β ∈ [ 0, 2π ]


Let us consider the initial conditions given by β = 1.4 in the set characterization. Thequasilinear equations are:

x!= x′

y!= y′

ζ!= x x + y y

1 != x2 + y2

1 != x2 + y2

Note that there are three algebraic equations here.� The equations are already rowreduced, and after performing one differentiation of the algebraic constraints andone row reduction, the dae looks like

11

x y −1 x y2 x 2 y

x′

y′

ζ′

x′

y′

+

−x−y00

2 x x + 2 y y

Differentiation of the derived algebraic constraints will yield a full-rank leading ma-trix, so the index reduction algorithm terminates here. There are now four differen-tial equations,

11

x y −1 x y2 x 2y

x′

y′

ζ′

x′

y′

+

−x−y00

and four algebraic equations

ζ!= x x + y y

1 != x2 + y2

1 != x2 + y2

0 != 2 x x + 2 y y

with Jacobian x y −1 x y

2 x 2 y2x 2y2 x 2 y 2 x 2 y

� Another quasilinear formulation would be obtained by replacing the third equation by ζ != x x′ + y y′ ,

containing only two explicit algebraic equations. The corresponding leading matrix would not be rowreduced, so row reduction would reveal an implicit algebraic equation, and the result would be the samein the end.


The algebraic equations are independent, so they shall be completed with one of thedifferential equations to form a square index 1 system. The last two differential equa-tions are linearly dependent on the algebraic equations by construction, but either ofthe first two differential equations is a valid choice. Depending on the choice, thefirst row of the iteration matrix will be either of(

1 0 0 −h 0)

or(0 1 0 0 −h

)After a short time, the four other rows of the iteration matrix (which are simply theJacobian of the algebraic constraints) will approach

sin(π2 ) − cos(π2 ) −1 cos(π2 ) sin(π2 )2 sin(π2 ) −2 cos(π2 )

2 cos(π2 ) 2 sin(π2 )2 sin(π2 ) −2 cos(π2 ) 2 cos(π2 ) 2 sin(π2 )

=

1 −1 1

22

2 2

In particular, the third row will be very aligned with

(0 1 0 0 −h

), which means

that it is better to select the differential equation x != x′ than y != y′ . This holds notonly on paper, but numerical simulation using widespread index 1 solution software(Hindmarsh et al., 2004, through the Mathematica interface) demands that the formerdifferential equation be chosen.

This example illustrated the fact that, if an implementation derives the reduced equa-tions without really caring about the choices it makes, things such as the ordering ofvariables may influence the end result. Hence, the usefulness of the reduced equa-tions would depend on implementation details in the algorithm, even though theresult does not feature any numerically computed entities.

Even though repeated testing of the numerical conditioning while the equations areintegrated is sufficient to detect numerical ill-conditioning, the point made here isthat at the point ( x0, t0 ) one wants to predict what will be the good ways of per-forming the row reduction and selecting equations to appear in the square index 1form.

While it is difficult to foresee when the expressions which are assumed rewritableto zero seizes to be zero (the optimistic longevity estimation is simply that they willremain zero forever), there is more to be done concerning the longevity of the rowreduction operations. For each entry used as a pivot, it is possible to formulate scalarconditions that are to be satisfied as long as the pivot stays in use. For instance, itcan be required that the pivot be no smaller in magnitude than half the magnitudeof largest value it is used to cancel.

Using the longevity predictions, each selection of a pivot can be made to maximizelongevity. Clearly, this is a non-optimal greedy strategy (since only one pivot selec-tion is considered at a time, compared to considering all possible sequences of pivotselections at once), but it can be implemented with little effort and at a reasonableruntime cost.


3.2.4 Seminumerical twist

In section 3.2.2 it was suggested that numerical evaluation of expressions (combinedwith tracking of structural zeros) should be used to determine whether an expressioncan be rewritten to zero or not. That added a bit of numerics to an otherwise symbolicindex reduction algorithm, but this combination of symbolic and numeric techniquesis more of a necessity than a nice twist. We now suggest that numerical evaluationshould also be the basic tool when predicting longevity. While the zero tests areaccompanied by difficult questions about tolerances, but are otherwise rather clearhow to perform, it is expected that the numeric decisions discussed in this sectionallow more sophistication while not requiring intricate analysis of how tolerancesshall be set.

Without loss of generality, it is assumed that the scalar tests compare an expression,e, with the constant 0. The simplest way to estimate (predict) the longevity of

e( x(t), t ) < 0

at the point ( x0, t0 ) is to first compute the derivatives x′ at t0 using a method thatdoes not care about longevity, and use linear extrapolation to find the longevity. Indetail, the longevity, denoted Le( x0, t0 ), may be estimated as

e( x0, t0 ) 4= ∇1e( x0, t0 ) x′(t0) + ∇2e( x0, t0 )

Le( x0, t0 ) 4=

− e( x0, t0 )e( x0, t0 ) if e > 0

∞ otherwise

In case of several alternatives having infinite longevity estimates by the calculationabove, the selection criterion needs to be refined. The natural extension of the aboveprocedure would be to compute higher order derivatives to be able to estimate thefirst zero-crossing, but that would typically involve more differentiation of the equa-tions than is needed otherwise, and is therefore not a good option. Rather, someother heuristic should be used. One heuristic would be to disregard signs, but onecould also altogether ignore derivatives when this situation occurs and fall back onthe usual pivot selection based on magnitudes only.

A very simple way to select equations for the square index 1 system is to greedily addone equation at a time, picking the one which has the largest angle to its projection inthe space spanned by the equations already selected. If the number of possible com-binations is not overwhelmingly large, it may also be possible to check the conditionnumber for each combination, possibly also taking into account the time derivativeof the condition number.

3.2.5 Monitoring

Since the seminumerical algorithm may make false judgements regarding what ex-pressions are identically zero, expressions which are not structurally zero but havepassed the zero-test anyway needs to be monitored. It may not be necessary to eval-uate these expressions after each time step, but as was seen in example 3.3, it is wise


to be alert during the first few iterations after the point of index reduction.

For the (extremely) basic bdfmethod applied to equations of index 1, the local inte-gration accuracy is limited by the condition number of the iteration matrix for a timestep of size h. In the quasilinear index 1 case and for small h, the matrix should haveat least as good condition as [

E( x, t )∇1A( x, t )

](3.13)

To see where this comes from�, consider solving for xn in[E( xn, tn )

0

]xn − xn−1

hn+

(A( xn, tn )A( xn, tn )

)!= 0

where xn−1 is the iterate at time tn − hn. The equations being index 1 guaranteesthat this system has a locally unique solution for hn small enough. Any methodof some sophistication will perform row (and possibly column) scaling at this stageto improve numerical conditioning.(Brenan et al., 1996, section 5.4.3)(Golub andVan Loan, 1996) It is assumed that any implementation will achieve at least as goodcondition as is obtained by scaling the first group of equations by hn.

For small hn the equations may be approximated by their linearized counterpart forwhich the numerical conditioning is simply given by the condition number of thecoefficient matrix for xn. See for example Golub and Van Loan (1996) for a discussionof error analysis for linear equations. This coefficient of the linearized equation is�(∇T

1E( xn, tn ) · (xn − xn−1)) T

+ E( xn, tn )0

+[hn ∇1A( xn, tn )∇1A( xn, tn )

]Using the approximation

xn − xn−1 ≈ hn x′(tn) ≈ hn[E( xn, tn )∇1A( xn, tn )

]−1 (A( xn, tn )∇2A( xn, tn )

)gives the coefficient[

E( xn, tn )∇1A( xn, tn )

]+ hn

∇T

1E( xn, tn )[E( xn, tn )∇1A( xn, tn )

]−1 (A( xn, tn )∇2A( xn, tn )

)

T

+ ∇1A( xn, tn )

0

As hn approaches 0, the matrix tends to (3.13). This limit will be used to monitor nu-merical integration in examples, but rather than looking at the raw condition numberκ(t) as a function of time t, a static transform, φ, will be applied to this value in orderto facilitate prediction of when the iteration matrix becomes singular. If possible, φshould be chosen such that φ( κ(t) ) is approximately affine in t near a singularity.

� Note that the iteration matrix of example 2.19 was found for an lti dae, while we are currently consideringthe more general quasilinear form.� The notation used is not widely accepted. Neither will it be explained here since the meaning should be

quite intuitive, and the terms involving inverse transposes will be discarded in just a moment.


−0.15

−0.1

−0.05

0 1 2 3(κ = ∞) 0

t

− 1κ(t)

Figure 3.1: The condition of the iteration matrix for the better choice of squareindex 1 system in example 3.4. The strictly increasing transform of the conditionnumber makes it roughly linear over time near the singularity. Helper linesare drawn to show the longevity predictions at the times 1.7 (pessimistic), 2.0(optimistic), and 2.5 (rather accurate).

Since the∞-norm and 2-norm condition numbers are essentially the same, the statictransform will be heuristically developed for the 2-norm condition number. Approx-imating the singular values to first order as functions of time, it follows that, near asingularity, the condition number can be expected to grow as t 7→ 1

t1−t , where t1 isthe time of the singularity.

The following observation is useful to see how φ can be chosen to match the behaviorof the condition number near a singularity. Suppose φ is unbounded above, that is,φ(κ) → ∞ as κ → ∞. Then every affine approximation of φ( κ(t) ) will be bad nearthe singularity, since the affine approximation cannot tend to infinity. Hence, oneshould consider strictly increasing functions that map infinity to a finite value, andone may normalize the finite value to be 0 without loss of generality. Similarly, theslope of the affine approximation may be normalized to 1. Given the assumed growthof the condition number near a singularity, this leads to the following equation forφ( κ ).

φ

(1

t1 − t

)!= t − t1 = − 1(

1t1−t

) ⇐⇒

φ( κ ) != −1κ

Since κ is always at least 1, this will squeeze the half open interval [ 1, ∞ ) onto[−1, 0 ). As is seen in figure 3.1, the first order approximation is useful well in ad-vance of the singularity. However, further away it is not. For example the predictionbased on the linearization at time 2 would be rather optimistic.


3.2.6 Sufficient conditions for correctness

It may not be obvious that the seminumerical row reduction algorithm above reallydoes the desired job. After all, it may seem a bit simplistic to reduce a symbolic ma-trix based on its numeric values evaluated at a certain point. In this section, sufficient(while perhaps conservative) conditions for correctness will be presented. Some newnomenclature will be introduced, but only for the purpose of making the theorembelow readable.

Consider the quasilinear dae

E( x(t), t ) x′(t) + A( x(t), t ) != 0

Replacing a row

ei( x(t), t ) x′(t) + ai( x(t), t ) != 0

by (dropping some “(t)” in the notation for compactness)[ω( x, t )ei( x, t ) + η( x, t )ej ( x, t )

]x′ +

[ω( x, t )ai( x, t ) + η( x, t )aj ( x, t )

] != 0

where ω and η are both continuous at ( x0, t0 ) and ω is non-zero at this point, iscalled a non-singular row operation on the dae.

Since the new dae is obtained by multiplication from the left by a non-singular ma-trix, the non-singular row operation on the quasilinear dae does not alter the rankof the leading matrix.

Let x be a solution to the dae on the interval I , and assume that the rank of E( x(t), t )is constant as a function of t on I . A valid row reduction at ( x0, t0 ) of the original(quasilinear) dae

E( x(t), t ) x′(t) + A( x(t), t ) != 0

is a sequence of non-singular row operations such that the resulting (quasilinear)dae

Err( x(t), t ) x′(t) + Arr( x(t), t ) != 0

has the following properties:

• A solution x is locally a solution to the original dae if and only if it is a solutionto the resulting dae.

• Err( x, t ) has only as many non-zero rows as E( x, t ) has rank.

3.5 Theorem. Consider the time interval I with inf I = t0, and the dae with initial

condition x(t0) != x0. Assume

1. The dae with initial condition is consistent, and the solution is unique anddifferentiable on I .

2. The dae is sufficiently differentiable for the purpose of running the row reduc-tion algorithm.


3. Entries of E( x0, t0 ) that are zero, are zero in E( x(t), t ) for all t ∈ I . Further,this condition shall hold for intermediate results as well.

Then there exists a time t1 ∈ I with t1 > t0 such that the row reduction of the sym-bolic matrix E( x, t ) based on the numeric guide E( x0, t0 ) will compute a valid rowreduction where the non-zero rows of the reduced leading matrix Err( x(t), t ) are lin-early independent for all t ∈ [ t0, t1 ].

Proof: The first two assumptions ensure that each entry of E( x(t), t ) is continuousas a function of t at every iteration. Since the row reduction will produce no moreintermediate matrices than there are entries in the matrix, the total number of entriesin question is finite, and each of these will be non-zero for a positive amount of time.

Further, the non-zero rows of Err( x0, t0 ) are independent by construction (as this isthe reduced form of the guiding numeric matrix). Therefore they will contain a non-singular sub-block. The determinant of this block will hence be non-zero at time t0,and will be a continuous function of time.

Hence, there exists a time t1 ∈ I with t1 > t0 such that all those expressions that arenon-zero at time t0 remain non-zero for all t ∈ [ t0, t1 ]. In particular, the determinantwill remain non-zero in this interval, thus ensuring linear independence of the non-zero reduced rows.

The last assumption ensures the constant rank condition required by the definitionof valid row reduction, which is a consequence of each step in the row reductionpreserving the original rank, and the rank revealed by the reduced form is alreadyshown to be constant.

Beginning with the part of the definition of valid row reduction concerning the num-ber of zero-rows, note first that the number of non-zero rows will match the rank attime t0 since the row reduction of the numeric guide will precisely reveal its rank asthe number of non-zero rows. It then suffices to show that the zero-pattern of thesymbolic matrix contains that of the numeric guide during each iteration of the rowreduction algorithm. However, this follows quite direct by the assumptions since thezero-pattern will match at E( x(t0), t0 ), and the assumption about zeros staying zerowill ensure that no spurious non-zeros appear in the symbolic matrix evaluated atlater points in time.

It remains to show that a function x is a solution of the reduced dae if and only if it isa solution of the original dae. However, this is trivial since the result of the completerow reduction process may be written as a multiplication from the left by a sequenceof non-singular matrices. Hence, the equations are equivalent.

Note that, in example 3.3, the conditions of theorem 3.5 were not satisfied since theexpressions x and y were zero at ( x0, t0 ), but do not stay zero. Since their devia-tion from zero is continuous, they will stay close to zero during the beginning of thesolution interval. Hence, it might be expected that the computed solution is approx-imately correct near t0, and this is confirmed by experiments. However, to show thatthis observation is generally valid, and to quantify the degree of approximation, we


need a kind of perturbation theory which remains to be developed. This problem waspart of the original motivation for studying matrix-valued singular perturbations, themain topic of the thesis.

3.3 Consistent initialization

The importance of being able to find a point on the solution manifold of a dae, whichis in some sense close to a point suggested or guessed by a user, was explained sec-tion 2.2.7. In this section, this problem is addressed using the proposed seminu-merical quasilinear shuffle algorithm. While existing approaches (see section 2.2.7)separate the structural analysis from the determination of initial conditions, we notethat the structural analysis may depend on where the dae is to be initialized. The in-terconnection can readily be seen in the seminumerical approach to index reduction,and a simple bootstrap approach can be attempted to handle it.

3.3.1 Motivating example

Before turning to discussing the relation between guessed initial conditions and al-gebraic constraints derived by the seminumerical quasilinear shuffle algorithm, wegive an illustration to keep in mind in the sections below.

3.6 Example

Let us return to the inevitable pendulum in example 3.3,x′′

!= λ x

y′′!= λ y − g

1 != x2 + y2

where g = 10.0, with guessed initial conditions given by

x0,guess : { x(0) = cos(−0.5), x(0) = 0, y(0) = sin(−0.5), y(0) = −0.1, λ(0) = 0 }

Running the seminumerical quasilinear shuffle algorithm� at x0,guess produces thealgebraic constraints

Cx0,guess=

1 != x2 + y2

0 != 2 x x + y y

0 != 2 x2 λ + 2 y ( g − y λ ) + 2 y2

� The implementation used here does not make the effort to compute the derivatives needed to make betterzero tests and longevity estimates.

3.3 Consistent initialization 105

The residuals of these equations at x0,guess are0.0

0.0959

9.61

so either the algorithm simply failed to produce the correct algebraic constraints al-though x0,guess was consistent, or x0,guess is simply not consistent. Assuming the lat-ter, we try to find another point by modifying the initial conditions for the threevariables x, y, and λ to satisfy the equally many equations in Cx0,guess

. This yields

x0,second :

{ x(0) = cos(−0.5), x(0) = −0.055, y(0) = −0.48, y(0) = −0.1, λ(0) = −4.804 }

(This point does satisfy Cx0,guess; solving the equations could be difficult, but in this

case it was not.) At this point the algorithm produces another set of algebraic con-straints:

Cx0,second=

1 != x2 + y2

0 != 2 x x + y y

0 != 2 x2 λ + 2 y ( g − y λ ) + 2 x2 + 2 y2

with residuals at x0,second: 0.0

0.0

0.0060

By modifying the same components of the initial conditions again, we obtain

x0,final :

{ x(0) = cos(−0.5), x(0) = −0.055, y(0) = −0.48, y(0) = −0.1, λ(0) = −4.807 }

This point satisfies Cx0,second, and generates the same algebraic constraints as x0,second.

Further, the algorithm encountered no non-trivial expressions which had to be as-sumed rewritable to zero, so the index reduction was performed without seminumer-ical decisions. Hence, the index reduction is locally valid, and the reduced equationsprovide a way to construct a solution starting at x0,final. In other words, x0,final isconsistent.

3.3.2 A bootstrap approach

A seminumerical shuffle algorithm maps any guessed initial conditions to a set ofalgebraic constraints. Under certain circumstances, including that the initial condi-tions are truly consistent, the set of algebraic constraints will give a local descriptionof the solution manifold. Hence, truly consistent initial conditions will be consistentwith the derived algebraic constraints, and our primary objective is to find points


with this property. Of course, if a correct characterization of the solution manifoldis available, finding consistent initial conditions is easy given a reasonable guess —there are several ways to search a point which minimizes some norm of the residualsof the algebraic constraints. If the minimization fails to find a point where all resid-uals are zero, the guess was simply not good enough, and an implementation mayrequire a better guess form the user.

If the guessed initial conditions are not consistent with the derived algebraic con-straints, the guess cannot be truly consistent either, and we are interested in finding anearby point which is truly consistent. In hope that the derived algebraic constraintscould be a correct characterization of the solution manifold, even though they werederived at an inconsistent point, the proposed action to take is to find a nearby pointwhich satisfies the derived constraints.

What shall be considered nearby is often very application-specific. Variables maybe of different types, defying a natural metric. Instead, if the solution manifold ischaracterized by m independent equations, a user may prefer to keep all but m vari-ables constant, and adjust the remaining to make the residuals zero. This avoids in anatural way the need to produce an artificial metric.

No matter how nearby is defined, we may assume that the definition implies a map-ping from the guessed point to a point which satisfies the derived algebraic con-straints (or fails to find such a point, see the remark above). Noting that a guessedpoint of initial conditions is mapped to a set of algebraic constraints, which thenmaps the guessed point to a new point, we propose that this procedure be iterateduntil convergence or cycling is either detected or suspected.

3.3.3 Comment

The algebraic constraints produced by the proposed seminumerical quasilinear shuf-fle algorithm are a function of the original equations and a number of decisions (pivotselections and termination criteria) that depend on the point at which index reduc-tion is performed. Since the number of index reduction steps before the algorithmgives up is bounded given the number of equations, and the number of pivot selec-tions and termination criterion evaluations is bounded given the number of indexreduction steps, the total number of decisions that depend on the point of indexreduction is bounded (although the algorithm has to give up for some problems).Hence, any given original equations can only give rise to finitely many sets of alge-braic constraints. The space of initial conditions (almost all of which are inconsis-tent) can thus be partitioned into finitely many regions according to what algebraicconstraints the algorithm produces.

Defining the index of a dae without assuming that certain ranks are constant inthe neighborhood of solutions can be difficult, and motivates the use of so-calleduniform and maximum indices, see Campbell and Gear (1995). To the bootstrapapproach above, constant ranks near solutions implies that the algorithm will pro-duce the correct algebraic constraints if only the guessed initial conditions are closeenough to the solution manifold. To see how the approach suffers if constant ranksnear solutions are not granted, it suffices to note that even finding a point which gen-

3.4 Conclusions 107

erates constraints at level 0 which it satisfies can be hard. In other words, consistentinitialization can then be hard even for problems of index 1.

3.4 Conclusions

The shuffle algorithm for lti dae can be generalized so that it applies to quasilineardae. The current chapter highlights that numerical evaluation may be both neces-sary and convenient, and proposes several aspects of index reduction and numericalsolution where seminumerical methods are useful.

The numerical evaluation introduces uncertainty in the equations, and the relatedperturbation problems do not have the structure of any of the well known pertur-bation problems in the literature. New perturbation problems also arise near pointswhere there is a change in matrix ranks.

Since the quasilinear form is a nonlinear form, the perturbation theory needed tocomplete the proposed algorithms will also be nonlinear. Although the matrix-valued perturbation problems studied in the second part of the thesis are all linear,it has always been a long term goal to derive results for nonlinear systems so thatthe quasilinear shuffle algorithm can be theoretically justified. Having emphasizedthe connection between nonlinear problems and the linear theory developed in thesecond part of the thesis, it must be mentioned that the results for linear systems inthe second part of the thesis have an immediate application in the shuffle algorithmfor linear systems.

Part II

Results

4Point-mass filtering on manifolds

Life in differential-algebraic equations is full of constraints. What if theindex-reduction based on matrix-valued singular perturbations re-vealed a constraint showing that we might as well consider our-

selves living on a sphere?! If not knowing where we are on this sphere isa problem we often have to deal with, we will surely write a lot of algo-rithms to find out.

Writing algorithms in terms of coordinate tuples was so convenient whenwe thought the world was flat, but now what? Trying to use the sameinterpretation of the coordinate tuple at any point in the world causessingularity in our algorithms, and we get lost again. We need new tools toshield us form the traps of the curvature in space.

The current chapter is an extended version of

Henrik Tidefelt and Thomas B. Schön. Robust point-mass filters on man-ifolds. In Proceedings of the 15th IFAC Symposium on System Identifica-tion, pages 540–545, Saint-Malo, France, July 2009.

and presents a framework for algorithm development for objects on manifolds.The only connection with the rest of the thesis is that index reduction of nonlineardifferential-algebraic equations may reveal a manifold structure in the problem.There are two main approaches for how to deal with this. The first approach is tokeep the equations in their differential-algebraic form, and try to invent (or search inthe literature) dae versions of all the theory and algorithms we need for the task athand (for example, finding out where we are on the sphere). The second approach isto use the ordinary differential equations that result from index reduction togetherwith the manifold structure implied by the discovered non-differential constraints.We then need tools for working with ordinary differential equations on a manifold,and the present chapter is a contribution to this field.

109

110 4 Point-mass filtering on manifolds

4.1 Introduction

State estimation on manifolds is commonly performed by embedding the manifoldin a linear space of higher dimension, combining estimation techniques for linearspaces with some projection scheme (Brun et al., 2007; Törnqvist et al., 2009; Cras-sidis et al., 2007; Lo and Eshleman, 1979). Obvious drawbacks of such schemes arethat computations are carried out in the wrong space, and that the arbitrary choice ofembedding has an undesirable effect on the projection operation. Another commonapproach is to let the filter run in a linear space of local coordinates on the manifold.Drawbacks include the local nature of coordinates, the nonlinearities introduced bythe curved nature of the manifold, and the dependency on the choice of coordinates.Despite the drawbacks of these two approaches, it should be admitted that they workwell for many “natural” choices of embeddings and local coordinates, as long as theuncertainty about the state is concentrated to a small — and hence approximatelyflat — part of the manifold. Still, the strong dependency on embeddings and localcoordinates suggests that the estimation algorithms are not defined within the appro-priate framework. The Monte-Carlo technique called the particle filter lends itselfnaturally to a coordinate-free formulation (as in Kwon et al. (2007)). However, thestochastic nature of the technique makes it unreliable�, and addressing this problemmotivates the word robust in the title of this work. With a growing geometric aware-ness among state estimation practitioners, geometrically sound algorithms tailoredfor particular applications are emerging. A very common application is that of ori-entations of rigid bodies (for instance, Lee and Shin (2002)), and this is also a guidingapplication in our work.

Our interest in this work is to examine how robust state estimation on compact man-ifolds of low dimension can be performed while honoring the geometric nature of theproblem. The robustness should be with respect to uncertainties which are not con-centrated to a small part of the manifold, and is obtained by using a non-parametricrepresentation of stochastic variables on the manifold. By honoring the geometricnature we mean that we intend to minimize references to embeddings and local co-ordinates in our algorithms. We say minimize since, under a layer of abstraction,we too will employ embeddings to implement the manifold structure, and local co-ordinates are the natural way for users to interact with the filter. Still, the proposedframework for state estimation can be characterized by the abstraction barrier thatseparates the details of the embedding from the filter algorithm. For example, in thecontext of estimation of orientations, rather than speaking of filters for unit quater-nions or rotation matrices, this layer of abstraction enables us to simply speak offilters for SO(3) — both unit quaternions and rotation matrices may be used to im-plement the low-level details of the manifold structure, but this is invisible to thehigher-level estimation algorithm.

Pursuing non-parametric filtering in curved space comes at some computational

� The technique is unreliable in the sense that the produced estimate depends on samples from randomvariables inside the algorithm, and the estimate can at most be expected to be correct on average. If therandom samples come out very unfortunate, the produced estimate may come out very far from the correctresult.

4.1 Introduction 111

costs compared to the linear space setting. Most notably, equidistant meshes do notexist, but on the other hand our restriction to compact manifolds means that thewhole manifold can be “covered” by a mesh with finitely many nodes. One of thepractical benefits of the proposed non-parametric filter is the ability to dynamicallyadapt the mesh to enhance the degree of detail in regions of interest, for instance,where the probability density is high.

The proposed point-mass-based solution for filtering in curved space has three maincomponents:

• Compute — and possibly update — a tessellation of the manifold. Each regionof the tessellation is required to be associated with a point that will representthe location of the region in calculations, and the volume of each region mustbe known.

• Implement measurement and time updates. This requires a system modelwhich, unlike when filtering in Euclidean space, cannot have additive noiseon the state.

• Provide the user with a point estimate. There is always the option to compute acheap extrinsic estimate (typically the extrinsic mean), but honoring geometricreasoning in this work, we also look into intrinsic estimates.

Each of these components will be considered in the following sections, including spe-cial treatment for the case of spheres where the general situation lacks detail. A moredetailed, algorithmic, description of the proposed solution is given in section 4.6.

Terminology. By manifold , we refer to a differentiable, Riemannian manifold.Loosely speaking, a (contravariant) vector is a velocity on the manifold, belongingto the tangent space (which is a vector space) at some point on the manifold, and isbasically valid only at that point. A curve on the manifold which locally connectspoints along the shortest path between the points, is called a geodesic, and theexponential map maps vectors to points on the manifold in such a way that, for avector v at p, the curve t 7→ et vp has velocity v at t = 0, and is a geodesic. Whenneeded, we shall assume that the manifold is geodesically complete, meaning thatthe exponential map shall be defined for all vectors. We recommend Frankel (2004)for an introduction to these concepts from differential geometry. A tessellation of themanifold is a set

{Ri

}i

of subsets of the manifold, such that “there is no overlay andno gap” between regions; the union of all regions shall be the whole manifold, andthe intersection of any two regions shall have measure zero. We shall additionallyrequire that each region Ri be simply connected.

Notation. The manifold on which the estimated state evolves is denoted M. Wemake no distinction in notation between ordinary and stochastic variables; xmy referboth to a stochastic variable over the manifold and a particular point on the manifold.The probability of a statement, such as x ∈ R, is written P( x ∈ R ). The probabilitydensity function for a stochastic variable x is written fx. When conditioning on avariable taking on a particular value, we usually drop the stochastic variable from thenotation; for instance, fx|y is a shorthand for fx|Y=y , where the distinction between the


stochastic variable, Y , and the value it takes, y, had to be made clear. The distancein the induced Riemannian metric, between the points x and y, is written d( x, y ).The symbol δ is used to denote the Dirac delta “function”. A Gaussian distributionover a vector space, with mean m and covariance C, is denoted N (m, C ), and if thevariable x is distributed according to this distribution, we write x ∼ N (m, C ). (Thecovariance is a symmetric, positive semidefinite, linear mapping of pairs of vectorsto scalars, and it should be emphasized that a covariance is basically only compatiblewith vectors at a certain point on the manifold.) In relation to plans for future work,we should also mention that group structure on the manifold is not used in this work,although such manifolds, Lie groups, are often a suitable setting for estimation ofdynamic systems.

4.2 Background and related work

For models with continuous-time dynamics, the evolution of the probability distri-bution of the state is given by the Fokker-Planck equation, and a great amount ofresearch has been aimed at solving this partial differential equation under varyingassumptions and approximation schemes. Daum (2005) gives a good overview thatshould be accessible to a broad audience. In the present discrete-time setting, thecorresponding relation is the Chapman-Kolmogorov equation. It tells how the distri-bution of the state at the next time step (given all available measurements up till thepresent) depends on the distribution of the state at the current time step (given allavailable measurements up till the present) and the process noise in the model. Lety0..t be the measurements up to time t, and xs|t be the state estimate at time s giveny0..t . Conditioned on the measurements y0..t , and using that xt+1 is conditionallyindependent of y0..t given xt , the Chapman-Kolmogorov equation states the familiar

fxt+1|t ( xt+1 ) =∫

fxt+1 |xt ( xt+1 ) fxt|t ( xt ) dxt (4.1)

In combination with Bayes’ rule for taking the information in new measurementsinto account,

fxt|t ( xt ) =fxt|t−1

( xt ) fyt |xt ( yt )

fyt|t−1( yt )

this describes exactly the equations that the discrete-time filtering problem is allabout.

To mention just a few references for the particular application of filtering on SO(3), afilter for random walks on the tangent bundle (with the only system noise being addi-tive noise in the Lie algebra corresponding to velocities) was developed in Chiuso andSoatto (2000), a quaternion representation was used with projection and a Kalmanfilter adapted to the curved space in Choukroun et al. (2006), and Lee et al. (2008)proposes a method to propagate uncertainty under continuous-time dynamics in anoise-free setting. The particle filter approach in Kwon et al. (2007) has already beenmentioned.

4.3 Dynamic systems on manifolds 113

A solid account of the most commonly used methods for filtering on SO(3) is pro-vided by Crassidis et al. (2007). In Lo and Eshleman (1979) the authors presents aninteresting representation of probability density functions on SO(3), making use ofexponential Fourier densities.

4.3 Dynamic systems on manifolds

The filter is designed to track the discrete-time stochastic process x, evolving on somemanifold of low dimension. That the dimension is low is instrumental to enablingthe use of filter techniques that, in higher dimensions, break down performance-wisedue to the curse of dimensionality (Bergman, 1999, section 5.1). We use discrete-timemodels� in the form

qx ∼ Wg( x, u )

y ∼ Vxwhere Wg( x, u ) is the random distribution of process noise taking values on the mani-fold, u is a known external input, and the measurement y is distributed according tothe random distribution Vx. Not being aware of a standard name for a distributionover the manifold, parameterized by a point on the same manifold, we shall use dis-tribution field for W• (here, the bullet indicates that there is a free parameter — fora fixed value of this parameter, we have an ordinary random distribution).

For example, the measurement equation could be given by

Vx = N(h( x ), Cy( x )

)That is, we have additive Gaussian white noise added to the nominal measurementsh( x ), and we allow the noise covariance to depend on the state.

A less general example of the dynamic equation could be to combine Gaussian dis-tributions with the exponential map

qx ∼ expN(

0, Cg( x, u )

)Here, N

(0, Cg( x, u )

)is our way of denoting a zero mean Gaussian distribution of

vectors at g( x, u ). However, (without the structure of a Lie group) the simplicity ofthis expression is misleading, since the Gaussian distributions at different points onthe manifold are defined in different tangent spaces. Hence, a common matrix willnot be sufficient to describe the covariance in all points.

To really obtain simple equations for the dynamic equation, we may employ distri-butions that only depend on the distance

fqX( qx ) = fd( d( qx, g( x, u ) ) )

� We avoid the term state space model here since this notion is so strongly associated with models in termsof a state vector which is just a coordinate tuple; our models shall be stated in a coordinate-free manner.


4.4 Point-mass filter

The main idea of the point-mass filter is to model the probability distribution of thestate x being estimated as a sum of weighted Dirac delta functions. The Dirac deltasare located at fixed positions in a uniform grid, and the idea dates back to the seminalwork by Bucy and Senne (1971). When the filter is run, a sequence of such randomvariables will be produced and there is a need to distinguish between the variablesbefore and after measurement and time updates, recall the notation introduced insection 4.2.

Readers familiar with the particle filter will notice many similarities to the proposedfilter, but should also pay attention to the differences. To mention a few, the pro-posed filter is deterministic (and in this sense robust), does not require resampling,associates each probability term (compare particle) with a region in the domain ofthe estimated variable, and calculates with the volumes of these regions. One no-table drawback compared to the particle filter is that when the estimated probabilityis concentrated to a small part of the domain, the particle filter will automaticallyadapt to provide estimates with smaller uncertainty, while the proposed filter wouldrequire a non-trivial extension to do so.

In this section, we first discuss the representation of stochastic variables, and thenturn to deriving equations for the time and measurement updates, expressed usingthe proposed representation.

4.4.1 Point-mass distributions on a manifold

In this section, we consider how any random variable on the manifold may be repre-sented, and omit time subscripts to keep the notation clear. That the idea is termedpoint-mass is due to the sometimes used assumption that the probability is dis-tributed discretely at certain points. Written using the Dirac delta, the probabilitydensity function for x is then given by

fX( x ) =∑i

pi δ( x − xi )

where the sum is over some finite number of points with probability pi located atxi . While this makes several operations on the distribution feasible, which wouldbe extremely computationally demanding using other models, this is clearly verydifferent from what we would expect the density function to look like.

To be able to make other interpretations of the pairs ( pi , xi ), each such pair needs tobe associated with a region Ri of the probability space, and we require that the set ofregions,

{Ri

}i, be a tessellation. Let µi = µ

(Ri

), where µ( • ) measures volume.

That our definition of tessellation did not require that the overlaps between regionsbe empty forces us to use only the interior of the regions for many purposes, thatis, Ri \ ∂Ri instead of Ri . For the sake of brevity, however, we shall make abuse ofnotation and often write simply Ri when it is actually the interior that is referred to— the reader should be able to see where this applies.

4.4 Point-mass filter 115

Given a tessellation{Ri

}i

(of cardinality N ), a more relaxed interpretation of the

probabilities pi is obviously

P(X ∈ Ri

)= pi (4.2)

and a more realistic model of the distribution is that it is piecewise constant;

fX( x ) =∑i : x∈Ri

pi

µi

Note that the sum may expand to more than one term, but only on a set of measurezero.

Given the tessellation, including the µi , it is clear that the numbers pi my be replaced

by f i 4= pi

µi. Since this is a more natural representation of piecewise constant functions

in general, we choose to use this also for the probability density function estimate.For completeness, we state the above equations again, now using f i instead of pi :

P(X ∈ Ri

)= f i µi (4.3)

fX( x ) =

∑i f

i µi δ( x − xi ) , (Point-mass)∑i : x∈Ri f

i , (Piecewise constant)(4.4)

The point-mass filter is a meshless method in that it does not make use of a connec-tion graph describing neighbor relations between the nodes xi . (A connection graphis implicit in the tessellation, but it is not used.) While meshless methods in many fi-nite element method applications would use interpolation (of, for instance, Sibson orLaplace type, see Sukumar (2003) for an overview of these) instead of the piecewiseconstant (4.4), our choice makes it easy to ensure that the density is non-negativeand integrates to 1. Furthermore, both computation of the interpolation itself, anduse of the interpolated density, would drastically increase the computational cost ofthe algorithm.

It turns out that computing good tessellations is a major task of the implementationof point-mass filters on manifolds, just like mesh generation is a major task whenusing finite element methods. It may also be a time-consuming task, but a basicimplementation may do this once for all, offline. Since the number of regions greatlyinfluences the runtime cost of the filter, a tessellation computed offline will haveto be rather coarse. For models where large uncertainty is inherent in the filteringproblem, this may be sufficient, but if noise levels are low and accurate estimation istheoretically achievable, the tessellation should be adapted to have smaller regionsin areas where the probability density is high.�

If each region Ri is given as the set of points being closer to xi than to all other xj,i ,the tessellation is called a Voronoi diagram of the manifold (in case of the 2-sphere,see for instance Augenbaum and Peskin (1985); Na et al. (2002)). Since this will make

� This statement is based on intuition; it is a topic for future research to provide a theoretical foundation forhow to best adapt the tessellation.


the point-mass interpretation more reasonable, it seems to be a desirable property ofthe tessellation, although a formal investigation of this strategy remains a relevanttopic for future research.

To make transitions between tessellations easy, we require that adaptation is per-formed by either splitting a region into smaller regions, or by recombining the partsof a split region. Following this scheme, two kinds of tessellation operations areneeded; first one to compute a base tessellation of the whole manifold, and thenone to split a region into smaller parts. When the base tessellation is computed, thecurved shape of the manifold on a global scale will be necessary to consider. The basetessellation should be fine enough to make flat approximations of each region feasi-ble. Such approximation should be useful to the algorithm that then splits regionsinto smaller parts. How to compute good base tessellations will generally requiresome understanding of the particular manifold at hand, and will therefore requirespecialized algorithms, while the splitting of approximately flat regions should bepossible in a general setting.

Finally, a scheme for when to split and when to recombine will be required. Thisscheme shall ensure that the regions are small where appropriate, while keeping thetotal number of regions below some given bound.

4.4.2 Measurement update

Just as for particle filters, the measurement update is a straightforward applicationof Bayes’ rule. To incorporate a new measurement of the random variable Y ∼ Vxmodeling the output, we have�

P(X ∈ Ri | y

)=

fY |X∈Ri ( y ) P(X ∈ Ri

)fY ( y )

≈fY |X=xi ( y ) P

(X ∈ Ri

)fY ( y )

where the measurement prior fY ( y ) need not be known since it is a common factor toall probabilities on the mesh, and will just act as a normalizing constant. Convertingto our favorite representation f i , adding time indices, conditioning on Y0..t−1, and

� To see this, let By (r) denote a ball of radius r centered at y. The relation follows directly from

P(X ∈ Ri | y

)fY ( y ) = lim

r→0

P(X ∈ Ri | Y ∈ By (r)

)P(Y ∈ By (r)

)µ(By (r)

)= limr→0

P(X ∈ Ri ∧ Y ∈ By (r)

)µ(By (r)

)= limr→0

P(Y ∈ By (r) |X ∈ Ri

)P(X ∈ Ri

)µ(By (r)

)= fY |X∈Ri ( y ) P

(X ∈ Ri

)

4.4 Point-mass filter 117

using conditional independence of Y and Y0..t−1 given X, this reads

f it|t =P(X ∈ Ri | y0..t

)µi

≈fY |X=xi ( y ) f it|t−1

fYt|t−1( y )

By defining

BayesRule( f , g ) 4=f g∫f g

and noting that the result will always be a proper probability distribution (and henceintegrate to 1, just as the result of the BayesRule operator) we can write:

fXt|t = BayesRule(fXt|t−1

, fY |X=•( y ))

Note how the volumes of regions enter the computation of the BayesRule operator:

BayesRule( f , g ) ( xi ) ≈f ( xi ) g( xi )∑j f ( xj ) g( xj ) µj

(4.5)

4.4.3 Time update in general

The time update can be described by the relation

P(

qX ∈ Ri)

=∫M

∫Ri

fWg( x, u )( x ) fX( x ) dx dx

In the filtering application, the stochastic entities in this relation will be conditionedon y0..t , but since the conditioning is the same on both sides, it may be dropped forthe sake of a more compact notation in this section. By the mean value theorem, wefind

P(

qX ∈ Ri)

=∫M

µi fWg( x, u )( x ) fX( x ) dx

for some x ∈ Ri , and dividing both sides by µi and fitting the region in a shrinkingball centered at xi , we obtain

P(

qX ∈ Ri)

µi→ fqX

(xi

)and ∫

M

fWg( x, u )( x ) fX( x ) dx→

∫M

fWg( x, u )

(xi

)fX( x ) dx

Hence, we obtain the Chapman-Kolmogorov equation (4.1) in the limit,

fqX(xi

)=

∫M

fX( x ) fWg( x, u )

(xi

)dx


and this we make the definition of the convolution:

fqX = fX ∗ fWg( •, u )

The convolution of a distribution field and a probability density function is a newprobability density function. We shall think of the time update as implementing thisrelation.

By approximating the probability density functions as constant over small regions(assuming all the regions Ri are small ), we get the time update approximation

P(

qX ∈ Ri)

=∫M

∫Ri

fWg( x, u )( x ) fX( x ) dx dx

≈∫M

µi fWg( x, u )

(xi

)fX( x ) dx

= µi∑j

∫Rj

fWg( x, u )

(xi

)fX( x ) dx

≈ µi∑j

fWg( xj , u )

(xi

) ∫Rj

fX( x ) dx

= µi∑j

fWg( xj , u )

(xi

)P(X ∈ Rj

)This is readily converted to an implementation of the convolution (here, the condi-tioning is written out for future reference):

f it+1|t =P(

qX ∈ Ri | y0..t

)µi

≈∑j

fWg( xj , u )

(xi

) P(X ∈ Rj | y0..t

)µj

µj

=∑j

fWg( xj , u )

(xi

)fjt|t µ

j

(4.6)

4.4.4 Dynamics that simplify time update

Since the number of regions may be large, and computing the time update convolu-tion involves N2 lookups of the probability density fW

g( xj , u )

(xi

), we should consider

means to keep the cost of each such lookup low.

First, if the system is autonomous (that is, g( xj , u ) does not depend on u), all tran-sitions may be computed offline and stored in a stochastic matrix�. The µj could alsobe included in this matrix, reducing the convolution computation to a matrix multi-plication.

� This matrix is often called the transition matrix, but this notion has a different meaning in the thesis.

4.5 Point estimates 119

As was noted above, one class of distributions for the noise on the state, which makesthe expression simple, is that where the density depends only on the distance fromthe nominal point;

f it+1|t ≈∑j

fd(

d(g( xj , u ), xi

) )fjt|t µ

j

This will be the structure in our example.

4.5 Point estimates

The distinction between intrinsic and extrinsic was introduced in Srivastava andKlassen (2002), where a mean value of a distribution on a manifold was estimatedby first estimating the mean of the distribution of the manifold embedded in Eu-clidean space, and then projecting the mean back to the manifold. This, they termedthe extrinsic estimator. In contrast, an intrinsic estimator was defined without ref-erence to an embedding in Euclidean space. While this may seem a hard contrast atfirst, Brun et al. (2007) shows that both kinds of estimates may be meaningful froma maximum likelihood point of view, for some manifolds with “natural embedding”.

4.5.1 Intrinsic point estimates

A common intrinsic generalization of the usual mean in Euclidean space is definedas a point where the variance obtains a global minimum, where the variance “only”requires a distance to be defined:

VarX( x ) 4=∫

d( x, x )2 fX( x ) dx (4.7)

Unfortunately, such a mean may not be unique, but if the support of the distributionis compact, there will be at least one.

Other intrinsic point estimates may also be defined, but these alternatives will notbe discussed further. The reason is that the motivation for the current discussion isjust to illustrate that it is possible to define algorithms aimed at computing intrinsicpoint estimates based on the proposed probability density representation.

Since distributions with a globally unique minimum may be arbitrarily close to distri-butions with several distinct global minimums, it is our understanding that schemesbased on local search, devised to find one good local minimizer, are reasonable ap-proximations of the definition. Hence, there are two tasks to consider; implementa-tion of the local search, and a scheme that uses the local search in order to find a goodlocal minimizer.

Given an implementation of the local search, we propose that it be run just once,initiated at the region representative xi with the least variance. Since the region rep-resentatives are assumed to be reasonably spread over the whole manifold, there isgood hope that at least one of them is in the region of attraction of the global mini-mum. However, even if this is the case, it may not include the xi with least variance,


which directly leads to more robust schemes where the local search is initiated atseveral (possibly all) xi . A completely different approach to initialization of the localsearch, is to use an extrinsic estimate of the mean if available. Since the extrinsicmean may be extremely cheap to compute compared to even evaluating the varianceat one point, and may at the same time be a good approximator of the intrinsic mean,it is very beneficial to use, while the major drawback is that it requires us to go out-side the geometric framework.

To implement a local search, one must be able to compute search directions and toperform line searches. For this, we rely on the exponential map, which allows thesetasks to be carried out in the tangent space of the current search iterate. The searchdirection used is steepest descent computed using finite difference approximation,although more sophisticated methods exist in the literature (Pennec, 2006).

4.5.2 Extrinsic point estimates

The extrinsic mean estimator proposed in Srivastava and Klassen (2002) is definedby replacing the distance d( x, x ) in (4.7) by the distance obtained by embeddingthe manifold in Euclidean space and measuring in this space instead. It is arguedthat if the support of the distribution is small, this should give results similar to theintrinsic estimate. However, considering how arbitrary the choice of embedding is,it is clear that the procedure as a whole is rather arbitrary as well. (Nevertheless, agood embedding seems likely to produce useful results, see for instance the examplesin Srivastava and Klassen (2002).)

Recall that the algorithm for computing the extrinsic mean is very efficient; first com-pute the mean in the embedding space, and then project back to the manifold. Theprojection step is defined to yield the point on the manifold which is closest to themean in the embedding space, and clearly assumes that this point will be unique.

To give an example of how sensitive the extrinsic mean is to the selection of embed-ding, and why we find it worth-while to spend effort on intrinsic estimates, considerembedding S2 in R3. However, instead of identifying S2 with the sphere in R3, wemagnify the sphere in some direction.

4.6 Algorithm and implementation

The final component to discuss before putting the theory of the previous sectionstogether in an algorithm, is how tessellations are computed. In this section, we dothis, present the algorithm in a compact form in algorithm 4.1 on page 123, andinclude some notes on the software design and implementation.

4.6.1 Base tessellations (of spheres)

To be more specific about how a base tessellation may be computed, we have consid-ered how this can be done for spheres, but the technique we employ does not onlywork for spheres.

4.6 Algorithm and implementation 121

The first step is to generate the set of points xi . Here, the user is given the ability toaffect the number of points generated, but precise control is sacrificed for the sakeof more evenly spread points. The basic idea is to use knowledge of a sphere’s totalvolume to compute a desired volume of each region. Then we use spherical coordi-nates in nested loops, with the number of steps in each loop being a function of thecurrent coordinates of the loops outside. The details for the 2-sphere and 3-sphereare provided in section 4.A.

The remaining steps are general and do not only apply to spheres. First, equationsfor the half-space containing the manifold and being bordered by the tangent spaceat each point xi is computed. This comes down to finding a base for the space orthog-onal to the tangent space at xi — for spheres, this is trivial. The intersection of thesehalf-spaces is a polytope with a one-to-one correspondence between facets and gen-erating points. (We rely on existing software here, please refer to section 4.6.3 at thispoint.) Projecting the facets towards the origin will generate a tessellation, and forspheres this will be a Voronoi tessellation if the “natural” embedding is used. Eachregion is given by the set of projected vertices of the corresponding polytope facet.

As part of the tessellation task, the volume of each region must also be computed. Forthe 2-sphere this can be done exactly thanks to the simple formula giving the area ofthe region bounded inside the geodesics between three points on the sphere (Beger,1978, p 198). In the general case we approximate the volumes on the manifold bythe volume of the polytope facets. (Note that a facet can be reconstructed from theprojected vertices by projecting back to the (embedded) tangent space at the generat-ing point.) For spheres the ideal total volume is known, and any mismatch betweenthe sum of the volumes of the regions and the ideal total volume is compensated byscaling all volumes by a normalizing constant.

4.6.2 Software design

Our implementation is written in C++ for fast execution. Still, there is a strong em-phasis on careful representation of the concepts of geometry in the source code. Per-haps most notably, a manifold is implemented as a C++ type, and allows elementsto be handled in a coordinate-free manner. By providing a framework for writingcoordinate-free algorithms, we try to guide algorithm development in a direction thatmakes sense from a geometric point of view. Quite obviously, there is an overheadassociated with the use of our framework, but it is our understanding that if the de-veloped algorithms are to be put in production units, they shall be rewritten directlyin terms of the underlying embedding — our framework is aimed at research anddevelopment, and it is an attempt to increase awareness of geometry in the filteringcommunity.

Other concepts of geometric relevance that are represented in the software designare:

• Scalar functions, that is, mappings from a manifold to the set of real numbers.

• Coordinate maps, that is, invertible mappings from a part of the manifold totuples of real numbers.


• Tangent spaces, that is, the linear spaces of directional derivatives at a certainpoint of the manifold. As with the manifold elements, elements of the tangentspaces are handled in a coordinate-free manner. The basic means for construc-tion of tangents is to form the partial derivative with respect to a coordinatefunction.

• Euclidean spaces are implemented as special cases of manifolds.

4.6.3 Supporting software

A very important part of the tessellation procedure for spheres and other manifoldswith a convex interior seen in the embedding space, are the conversions betweenpolytope representations. That is, given a set of bounding hyperplanes, we want avertex representation of all the faces, and given a set of vertices, we want the corre-sponding set of hyperplanes. I our work, these tasks were carried out using cddlib(Fukuda, 2008), distributed under the GNU general public licence.

Although several algorithms for computing the volume of polytopes of arbitrary di-mension exist (Büeler et al., 2000), no freely available implementation compatiblewith C++ was found. We would like to encourage the development, the sharing, andthe advertisement of such software. The authors’ implementation for this task is avery simple triangulation-based recursion scheme.

4.7 Example

To illustrate the proposed filtering technique, a manifold of dimension 2 was chosenso that the probability distributions are amenable to illustration. We consider thebearing-tracking problem in 3 dimensions, that is, the state evolves on the 2-sphere.This may be a robust alternative to tracking the position of an object when range-information cannot be determined reliably. It is also a good example to mentionwhen discussing models without dynamics (velocities are not part of the state), sincethe lack of (Lie) group structure makes the extension to dynamic models non-trivial.As an example of a bearing-sensor in 3 dimensions, we may consider a camera and anobject recognition algorithm, which returns image coordinates in each image frame,which are then converted to the three components of a unit vector in the correspond-ing direction. The example is about the higher-level considerations of the filteringproblem, and not the low-level details of implementing the manifold at hand.

The deterministic part of the dynamic equation, g, does not depend on any exter-nal input, and just maps any state to itself. The noise in the equation is given by avon Mises-Fischer distribution field (see the overview Schaeben (1992)) with concen-tration parameter κ = 12 everywhere.

The three scalar relations in the measurement equation are clearly dependent, as themanifold has only two dimensions. Also, the fact that the noise in the estimate fromthe object recognition has only two dimensions, implies that the noise on the threecomponents in the measurement equation will be correlated. Besides the dependen-cies and correlations, noise levels should be state-dependent, as the uncertainty for a

4.7 Example 123

Algorithm 4.1 Summary of point-mass filter on a manifold.

Input:

• A model of a system with state belonging to a manifold.

• An a priori probability distribution for the state at the initial time.

• A sequence of measurement data.

Output: A sequence of probability density estimates for the filtered state, possiblyalong with or replaced by point estimates.

Notation: The numbers f it|t−1 are the (approximate) values of the probability density

function at the point xi , at time t given the measurements from time 0 to time t − 1.The numbers f it|t are the (approximate) values of the probability density function attime t, given also the measurements available at time t.

Initialization:

Compute a tessellation with regions Ri of the manifold. Assign a representativepoint xi to each region, and measure the volumes µi . In case of spheres, seesection 4.6.1.

Let f i0|−1 be the a priori distribution. That is, each f i0|−1 is assigned a non-negative

value, and all values jointly satisfy∑i f

i0|−1 µ

i = 1.

Process measurements:

for t = 0, 1, 2, . . .Compute a point prediction from ft|t−1, for instance, by minimizing (4.7).Use the measurements yt to compute ft|t using BayesRule, see (4.5) for details.Compute a point estimate from ft|t , for instance, by minimizing (4.7).Make a time update to compute ft+1|t using (4.6).Possibly update the tessellation. (Details are subject for future work.)

end

given direction component is at minimum (though not zero) when the tracked objectis in line with the component, and at maximum when at a right angle. Despite ourawareness of this structure, we make the model as simple as possible by assumingindependent and identically distributed Gaussian noise on the three components,hence parameterized by the single scalar σ = 0.4.

Given an initial state, a simulation of the model equations (compare with simulatinga moving object in 3 dimensions, with measurement noise entering in a simulatedobject recognition algorithm) is run, resulting in a sequence of measurements. Themanifold is tessellated into N = 200 approximately equally sized regions, and thefilter is initialized with a uniform probability density. The probability density esti-mate is then updated as measurements are made available to the filter. The result isillustrated in figure 4.1.


4.8 Conclusions and future work

We have shown that point-mass filters can be used to construct robust filters on com-pact manifolds. By separating the implementation of the low-level manifold struc-ture from the higher-level filter algorithm, we are able to formulate and implementmuch of the algorithm without reference to a particular embedding. The techniquehas been demonstrated by considering a simple application on the 2-sphere.

Future work includes application to SO(3), that is, the manifold of orientations,adaptation of the tessellation, and utilizing Lie group structure when available. Inorder to cope with the substantial increase of dimension that would result from aug-menting the state of our models to also include physical quantities such as angularmomentum, the filter should be tailored to tangent or cotangent bundles.

4.8 Conclusions and future work 125

Figure 4.1: Estimated probability density function. Left: predictions before ameasurement becomes available. Right: estimates after measurement update.Rows correspond to successive time steps. Patches are colored proportional tothe density in each region, and random samples are marked with dots. The colorof the patches is scaled so that white corresponds to zero density, while blackcorresponds to the maximum density of the distribution (hence, the scale differsfrom one figure to another). It is seen how the uncertainty increases when timeis incremented, and decreases when a measurement becomes available, and thatthe uncertainty decreases over time as the information from several measure-ments is fused.

Appendix

4.A Populating the spheres

This appendix contains the two algorithms we use to populate spheres S2 (algo-rithm 4.2) and S3 (algorithm 4.3) with points such that the density of points is ap-proximately constant over the whole space. The method contains a minor randomelement, but this is not crucial for the quality of the result, and could easily be re-placed by deterministic choice.

The idea for populating spheres generalize to higher dimensions. The number ofsteps to take in each loop is found by computing the length of the curve obtainedby sweeping the corresponding coordinate over its range while the other coordinatesare held fixed, and the curve length is divided by the side length of a hypercube ofdesired volume. The curve length is found as the width of the coordinate’s span,times the product of the cosines of the other coordinates, and the hypercube volumetimes the desired number of points in the population should equal the total volume�

of the sphere (the formula for the volume can be found under the entry for spherein Hazewinkel (1992)). Denoting the side of the hypercube δ0, the dimension of thesphere N , and the desired number of points n, this corresponds to setting

δ0 BN

√√2π

N+12

n Γ ( N+12 )

where Γ is the standard gamma function.

� The volume of S2 is often denoted the area of the sphere.

126

4.A Populating the spheres 127

Algorithm 4.2 Populating the 2-sphere.

Input: The desired number n of points in the population.

Output: A set P of points on S2, approximately of cardinality n.

Notation: Let φ denote the usual polar coordinate map mapping a point S2 to thetuple ( θ, ϕ ), where θ ∈ [ 0, π ], ϕ ∈ [ 0, 2π ]. That is, embedding S2 in R3, theinverse map is identified with

φ−1( θ, ϕ ) =

cos( θ ) cos(ϕ )cos( θ ) sin(ϕ )

sin( θ )

Algoritm body:P ← { }

δ0 B√

4πn (Compute the desired volume belonging to each point, and compute

an approximation of the angle which produces a square on the sphere with thisvolume.)

imaxθ B

⌈πδ0

⌉(Compute the number of steps to take in the θ coordinate.)

∆θ B − πimaxθ

(Compute the corresponding step size in the θ coordinate.)

θ0 B π

for iθ = 0, 1, . . . , (imaxθ − 1)

θ B θ0 + iθ ∆θimaxϕ B max

{1,

⌊2π cos( θ )δ0

⌋ }(Compute the circumference of the sphere at

the current θ coordinate, and find the number of ϕ steps by dividing by thedesired step length.)∆ϕ B

2πimaxϕ

ϕ0 B x, where x is a random sample from [ 0, 2π ].for iϕ = 0, 1, . . . , (imax

ϕ − 1)ϕ B ϕ0 + iϕ ∆ϕ

P ← P ∪{φ−1( θ, ϕ )

}end

end

Remark: A deterministic replacement for the random initialization of ϕ0 in each iθiteration would be to add

ϕ ← ϕ − 12 ∆ϕ

just before the update of θ, and then use the final ϕ at the end of one iθ iteration asthe initial ϕ in the next iθ iteration.


Algorithm 4.3 Populating the 3-sphere.

Input: The desired number n of points in the population.

Output: A set P of points on S3, approximately of cardinality n.

Notation: Let φ denote the usual polar coordinate map mapping a point S3 to thetuple ( θ, ϕ, γ ), where θ ∈ [ 0, π ], ϕ ∈ [ 0, π ], γ ∈ [ 0, 2π ]. That is, embedding S3

in R4, the inverse map is identified with

φ−1( θ, ϕ, γ ) =

cos( θ ) cos(ϕ ) cos( γ )cos( θ ) cos(ϕ ) sin( γ )

cos( θ ) sin(ϕ )sin( θ )

Algoritm body: Compare the body of algorithm 4.2. This algorithm has the samestructure, and we shall only indicate how the important quantities are computed,namely the number of steps to take in the different loops.

. . .

δ0 B3√

2π2

n

imaxθ B

⌈πδ0

⌉∆θ B − π

imaxθ

. . .for iθ . . .

θ B . . .imaxϕ B max

{1,

⌊π cos( θ )

δ0

⌋ }∆ϕ B

πimaxϕ

. . .for iϕ . . .

ϕ B . . .

imaxγ B max

{1,

⌊2π cos( θ ) cos(ϕ )δ0

⌋ }∆γ B

2πimaxγ

. . .for iγ . . .

γ B . . .

P ← P ∪{φ−1( θ, ϕ, γ )

}end

endend

Remark: The random choices in this algorithm can be made deterministic in thesame way as in algorithm 4.2.

5A new index close to strangeness

Kunkel and Mehrmann have developed a theory for analysis and numerical solutionof differential-algebraic equations. The theory centers around the strangeness index,which differs from the differentiation index in that it does not consider the derivativesof the solution to be independent of the solution itself at each time instant. Instead,it takes the tangent space of the manifold of solutions into account, thereby reducingthe number of dimensions in which the derivative has to be determined. The bookKunkel and Mehrmann (2006) covers the theory well and will be the predominantreference used in the current chapter.

The numerical solution procedure applies to general nonlinear differential-algebraicequations of higher indices, and is currently the only one we know of that can handlesuch problems, let be that it does not provide a sensitivity analysis. Our interest inthis matter is mostly due to this capability.

Since the theme of the chapter is to relate a new index to the closely relatedstrangeness index, parts of the background theory has been included in the presentchapter instead of chapter 2 in order to put the two definitions side by side. Care hasbeen taken to make it clear what the contributions of the chapter are.

5.1 Two definitions

In this section, two index definitions will be presented along with some basic prop-erties of each. The one to be presented first is the strangeness index, found in Kunkeland Mehrmann (2006). The second, which is proposed as an alternative, is called thesimplified strangeness index. Both are based on the derivative array equations.

129

130 5 A new index close to strangeness

5.1.1 Derivative array equations and the strangeness index

As always when working with dae, it is crucial to be aware that the solutions arerestricted to a manifold. In practice, one is interested in obtaining equations describ-ing that manifold, and the way this is done in the present chapter is by using thederivative array introduced in Campbell (1993), see section 2.2.3.

Consider the dae

f ( x(t), x′(t), t ) != 0 (5.1)

Assuming sufficient differentiability of f and of x, the idea is that the original dae iscompleted with derivatives of the equations with respect to time. This will introducehigher order derivatives of the solution, but the key idea is that, given values of x(t),it suffices to be able to determine x′(t) in order to compute a numerical solutionto the equations. That is, higher order derivatives such as x′′(t) may appear in theequations, but are not necessary to determine.

Conversely, the choice of x(t) will affect the possibility to determine x′(t), and the setof points x(t) where the derivative array equations can be solved for x′(t) is the solu-tion manifold. Hence, the derivative array equations can be used as a characterizationof the solution manifold.

If the completion procedure is continued until the derivative array equations areone-full with respect to x′(t), the procedure has revealed the differentiation indexof the dae, see definition 2.3. The meaning of one-full is defined in terms of theequation considered pointwise in time, so that a variable and its derivative becomeindependent variables. We emphasize the independence by using the variable x(t)instead of x′(t), where the dot is just an ornament, while the prime is an operator. Theequations are then said to be one-full if they determine x(t) uniquely within someopen ball, given x(t) and t. An equivalent characterization can be made in terms ofthe Jacobian of the derivative array with respect to its differentiated variables; thenthe equations are one-full if and only if row operations can bring the Jacobian intoblock diagonal form, with a non-singular block in the block column correspondingto derivatives with respect to x(t) (clearly, this shows that it is possible to solve forx(t) without knowing the variables corresponding to higher order derivatives of x attime t).

However, instead of requiring that the completed equations be one-full, it turns outthat there are good reasons for using the weaker requirement that the equations dis-play the strangeness index instead. The definition of strangeness index is the topicof the current chapter, and will soon be considered in detail. It turns out that equa-tions displaying the strangeness index determine x′(t) uniquely if one takes into ac-count the connection between x(t) and x′(t) being imposed by the non-differentialconstraints which locally describe the solution manifold. Strangeness-free equations(strangeness index 0) are suitable for numerical integration.(Kunkel and Mehrmann,1996)

5.1 Two definitions 131

In the sequel, it will be convenient to speak of properties which hold on non-emptyopen balls inside the set

LνS

4={ (t, x, x, . . . , xνS+1

): FνS

( x, x, . . . , x(νS+1), t ) != 0}

(5.2)

5.1 Definition (Strangeness index). The strangeness index νS at ( t0, x0 ) is definedas the smallest number (or∞ if no such number exists) such that the derivative arrayequations�

FνS( t, x(t), x′(t), x′(2)(t), . . . , x′(νS+1)(t) ) != 0 (5.3)

satisfy the following properties on

LνS

⋂{(t, x, x, . . . , xνS+1

): t ∈ Bt0(δ) ∧ x ∈ Bx0

(δ)}

︸︷︷︸bδ

for some δ > 0.

• P1a–[5.1] There shall exist a constant number na such that the rank of

MνS( t, x, x, . . . , x(νS+1) ) 4=

[∂FνS ( t, x, x, ..., x(νS+1) )

∂x . . .∂FνS ( t, x, x, ..., x(νS+1) )

∂x(νS+1)

]is pointwise equal to ( νS + 1 ) nx − na, and shall exist a smooth matrix-valuedfunction Z2 with na pointwise linearly independent columns such that ZT

2MνS=

0.

• P1b–[5.1] Let nd = nx − na, and let AνS= ZT

2NνSwhere

NνS( t, x, x, . . . , x(νS+1) ) 4=

∂FνS( t x, x, . . . , x(νS+1) )

∂x

Then the rank of AνSshall equal na, and there shall exist a smooth matrix-valued

function X with nd pointwise linearly independent columns such that AνSX = 0.

• P1c–[5.1] The rank of ∇2f X shall be full, and there shall exist a smooth matrix-valued function Z1 with nd pointwise linearly independent columns such thatZT

1∇2f X is non-singular.

In section 5.1.2 we present the well-known result that the derivative x′(t) is uniquely

defined by FνS

!= 0, and that we can construct a square strangeness-free dae with thesame solution for x′(t). The square and strangeness-free equations are referred to asthe reduced equation. Then, it is not surprising that the reduced equations can alsobe used for numerical integration, see Kunkel and Mehrmann (1996, 1998).

In section 5.1.3 we propose the new index, seen directly from the viewpoint of dis-cretized equations. In sections 5.2 and 5.3 it is shown that the two views are closelyrelated.

� The notation is defined such that x′(1) = x′ .


When working with the strangeness index for nonlinear dae, the next theorem isimportant to keep in mind.

5.2 Theorem (Kunkel and Mehrmann (2006, theorem 4.13)). Let the strangenessindex of (5.1) be νS. If the conditions of definition 5.1 also hold with ( νS + 1, na, nd ),and there is a point

(t0, x0, x0, . . . , x

(νS+2))∈ LνS+1, then the reduced square and

strangeness-free dae obtained from FνS

!= 0 has a unique solution passing throughthe given point, and this solution also solves (5.1).

Proof: This is Kunkel and Mehrmann (2006, theorem 4.13), and we shall just givea very brief overview of their proof. A closely related result for the simplifiedstrangeness index is proved in section 5.3.

Since the reduced equation is implied by the original equation, the original equationcannot have more solutions than the reduced equation. It follows that it suffices toshow that the reduced equation has a unique solution, and that this solution satisfiesthe original equation. In particular, it needs to be shown that the derivatives of thealgebraic variables are consistent with the original equation.

The thing to notice about theorem 5.2 is that only knowing that (5.1) has strangenessindex νS at the point ( t0, x0 ) is not enough to ensure that there is a solution passingthrough this point which also solves (5.1). This fact is well illustrated by Kunkel andMehrmann (2006, exercise 4.11).

However, if the reduced equation (or the full FνS

!= 0) has a unique solution, anapproximate alternative to using theorem 5.2 is simply to test the obtained solution(at a finite number of points along the trajectory) against the original equation. Inview of this, we mainly consider definition 5.1 as a means for determining when thederivative array equations can be used to show uniqueness of a solution, if one existsat all.

In the next section, we elaborate what might be obvious, namely that definition 5.1corresponds to a procedure to determine a solution uniquely, if it exists. Then, insection 5.1.3 we make an alternative characterization of νS.

5.1.2 Analysis based on the strangeness index

The first step in the analysis, relying on P1a–[5.1], is to determine the local nature ofthe non-differential constraints that can be deduced from the derivative array equa-tions. By definition, these constraints do not involve any differentiated variables,so the local nature of these constraints is obtained as linear combinations of thederivative array equations such that the gradient with respect to derivatives van-ishes. P1a–[5.1] states that there are na such linear combinations, and that the linearcombinations, the columns of Z2, can be selected smoothly and linearly independent.Since the Gram-Schmidt procedure can be carried out smoothly, it follows that thecolumns of Z2 can be selected of unit length and orthogonal to each other.


Since the linear combinations ZT2FνS

are smooth functions on a non-empty open set,with zero derivative with respect to the differentiated variables everywhere, theselinear combinations do not depend on the differentiated variables at all. Hence, theyare pure non-differential constraints that give a local characterization of the solutionmanifold. To see the local nature of these constraints, their gradient with respectto x is computed, and P1b–[5.1] states that the so obtained normal directions to thesolution manifold are linearly independent. Since they are linearly independent, thedimension of the solution manifold is nd = nx − na. P1b–[5.1] then states that it ispossible to construct a local coordinate map x(t) = φ−1( xd(t), t ) with coordinates xd,determined by the partial differential equation

∇1φ−1( xd(t), t ) != X( x(t), t ) (5.4)

where the columns of X are smooth functions and pointwise linearly independent.Again, they can be selected of unit length and orthogonal to each other. That is,the columns of X can be selected as an orthonormal basis for the right null space ofthe matrix A. The local coordinates xd are denoted the dynamic variables. (If therequirement that X have orthonormal columns is dropped, the dynamic variablescan be selected as a subset of the original variables x, but this may lead to numericill-conditioning.)

The last property, P1c–[5.1], is finally there to ensure that the time derivative ofthe local coordinates on the solution manifold are determined by the original equa-tion (5.1). Replacing (5.1) by an equation with residual expressed only through thedynamic variables,

fd( xd, xd, t ) 4= f(φ−1( xd, t ), ∇1φ

−1( xd, t ) xd + ∇2φ−1( xd, t ), t

)(5.5)

property P1c–[5.1] states that the Jacobian with respect to xd,

∇2fd( xd, xd, t ) = ∇2f(φ−1( xd, t ), ∇1φ

−1( xd, t ) xd + ∇2φ−1( xd, t ), t

)X (5.6)

is full-rank. Since there are only nd derivatives to be determined, and there are nxequations, there are na more equations than unknowns. The property P1c–[5.1] alsostates that nd linear combinations, given by the columns of Z1, of the equations in(5.1) can be chosen smoothly and linearly independent (and hence orthonormal), sothat these linear combinations are sufficient to determine the time derivatives of thedynamic variables.

The reduced equations can now be constructed by joining the nd residuals ZT1 f with

the na residuals ZT2FνS

. The resulting system has nx equations and ( νS + 1 ) nx un-knowns (x and t being known variables), but νS nx of these cannot and need not besolved for. Hence, the system may be considered square, and it is easy to see that itis strangeness-free (with trivial choices of Z1 and Z2, and the same X as when thestrangeness index of (5.1) was determined).

Unfortunately, despite the theoretical appeal of the reduced equations, and whileit is sufficient to approximate Z1 in numeric implementations, the practical imple-mentation of Z2 needs to make the non-differential equations truly independent ofthe differentiated variables. This presents severe difficulties unless it can be shown


that Z2 only depends on t, for then it can be computed pointwise in time. It followsthat the reduced equations will generally not be suitable for numerical integration.

We shall now show how the analysis above can be used for numerical integrationwithout using the reduced equation. Although Kunkel and Mehrmann (2006, sec-tion 6.2) addresses this, we give another (although similar) argument for the casehere.

Consider numerical integration via a first order bdf method. We then need to showthat this formula uniquely determines the next iterate, qx, given x. The equations towhich the bdf method is applied is the full derivative array equations, where eachderivative is considered an unknown, and without discretizing any of the derivatives.To these equations the nd selected linear combinations of f are added, discretizingall derivatives.

We know that the derivative array equations constrain qx to lie on a manifold whichcan be parameterized locally using xd as coordinates. It remains to show that thesecoordinates are uniquely determined by the dynamic equations. However, thinkingof the dynamic equations in terms of the dynamic variables, it is readily seen that be-ing able to solve for the derivatives (which is possible by definition 5.1) is equivalentto being able to solve for qxd for sufficiently small step lengths.

5.1.3 The simplified strangeness index

The final remarks on numerical integration in the previous section relates closely tothe analysis in this section. Here we begin, not with reasoning about the ability tosolve for derivatives, but going directly to the topic of finding equations that uniquelydetermine the next iterate in a bdf method. Later, it will be shown in lemma 5.13that the resulting index definition can also be interpreted as a condition to make x′(t)uniquely determined by x(t) and t.

By discretizing the derivatives (using a bdf method) in the original equation (5.1)(and scaling the equations by the step length), we get that the gradient of these equa-tions with respect to x tends to ∇2f ( x, x, t ) as the step length tends to zero. Hence,joining these equations with the full derivative array equations (where no derivativesare discretized) yields a set of equations which (locally) shall determine x uniquely.This leads to the following definition.

5.3 Definition (Simplified strangeness-index). The simplified strangeness indexνq at ( t0, x0 ) is defined as the smallest number (or∞ if no such number exists) suchthat the derivative array equations

Fνq( t, x(t), x′(t), x′(2)(t), . . . , x′(νq+1)(t) ) != 0 (5.7)

satisfy the following property on

Lνq

⋂{(t, x, x, . . . , xνq+1

): t ∈ Bt0(δ) ∧ x ∈ Bx0

(δ)}

︸︷︷︸bδ

for some δ > 0.


• P2–[5.3] Let

Hνq

4=

∂f ( x, x, t )

∂x 0 . . . 0

∂Fνq ( t, x, x, ..., x(νq+1) )

∂x

∂Fνq ( t, x, x, ..., x(νq+1) )

∂x . . .∂Fνq ( t, x, x, ..., x(νq+1) )

∂x(νq+1)

=

∇2f 0Nνq

Mνq

where Nνq

and Mνqare defined as in definition 5.1. Then it shall hold that

rank

I 0∇2f 0Nνq

Mνq

!= rank[∇2f 0Nνq

Mνq

]That is, the basis vectors corresponding to x shall be in the span of the rows ofHνq

, which may be recognized as the property of Hνqbeing one-full.

The property P2–[5.3] can be interpreted as that there is no freedom in the x compo-nents of the solution to (

h f ( x, 1h ( x − q−1x ), t )

Fνq( t, x, x, . . . , x(νq+1) )

)!= 0

since adding additional equations for the x variables alone does not decrease thesolution space of the linearized equations. For theoretic considerations, however, thecontinuous-time interpretation provided by lemma 5.13 below is more relevant.

Of course, we must show what the simplified strangeness index is for the inevitablependulum.

5.4 Example

Let us once more consider the pendulum from example 3.3. To match the notation ofthe present chapter we define

f

ξuyvλ

,ξuyvλ

, t4=

λ ξ − u

λ y − g − vξ2 + y2 − 1ξ − uy − v

We consider initial conditions where the pendulum is in motion and neither x nor yis zero.

To check P2–[5.3] we look at the projection of a basis for the right null space of Hionto the space spanned by the basis vectors corresponding to x, for i = 1, 2, . . . , νq.(The projection is implemented by just keeping the five first entries of the vectors.)The basis for the null space is computed using Mathematica , and for i = 0, 1, 2 the


projected basis vectors are, in order,

00000

,

0000− 1y

00000

,

0000ξ

y ξ−ξ y

,

0000−y

y ξ−ξ y

00000

,00000

,00000

Assuming that the symbolic null space computations are valid in some neighborhoodof the initial conditions inside Li , it is seen that the λ component is undeterminedfor i = 0 and i = 1, and as all components are determined for i = 2 we get νq = 2.

To verify that the symbolic computations of the null space are actually valid, it mustbe checked that the denominators are non-zero. The expressions which were re-moved by the projections are also rational with the denominator y ξ − ξ y. Since the

length of the vector(ξy

)is 1 on Li , a geometric interpretation shows that the denom-

inator expression is the scalar product of the vector(ξy

)and a unit vector which is

tangent to the unit circle at the point ( ξ, y ). Our intuition about the problem gives

that the initial conditions for(ξy

)=

(uv

)may actually be chosen parallel with the tan-

gent, and the restriction to a neighborhood of the initial conditions inside Li gives

that any(uv

)will at least be close to parallel with the tangent. Hence, the denomina-

tor expression is zero precisely when the velocity variables are zero. Since we choseto analyze the equations for initial conditions where the pendulum is in motion, thevelocity will remain non-zero in a neighborhood of the initial conditions, proving thevalidity of the null space computations.

Since our prior understanding of the problem makes it easy to compute points insideLi for any i, the simplified strangeness index can also be computed numerically ifwe either assume or make sure that certain critical ranks are not sensitive to smallperturbations of the variables. The method is not pursued in the example, in orderto keep focus in the current chapter on methods and theory for exact dae.

The following lemma is an example of how easy the simplified strangeness index isto work with.

5.5 Lemma. If P2–[5.3] is satisfied for νq on (the obvious projection of) Lνq+i ∩ bδfor some i ≥ 1, then P2–[5.3] is also satisfied for νq + i on the same set.

Proof: If the basis vectors corresponding to x is in the span of the rows of Hνq, they

(extended to the appropriate dimension) will also be in the span of the rows of Hνq+i

since the upper part of this matrix equals[Hνq

0].


While lemma 5.13 below quite intuitively will show that a finite simplifiedstrangeness index implies uniqueness of solutions to the dae, we postpone untilsection 5.3 to consider how P2–[5.3] may be used to also test existence of solutions.For now, we concentrate on how to compute the solution if it exists; recall that thereis always a possibility to test any solution numerically (at a finite set of points alongthe trajectory) against the original equation, which should yield a good indication oftrue existence.

Once νq has been determined, the next task is to select which equations to use (thatis, pick a suitable Z1), and which variables to discretize (possibly via a change ofvariables, using an approximation of X). It is important to not make the discretizedequations over-determined by including too many independent columns in Z1, as thismay compromise the non-differential constraints of the dae. Since the discretizedderivatives are approximations, as few variables as possible should be discretized.

The procedure prescribed by definition 5.3 may become demanding if one tries touse it directly to find a subset of components of f which is sufficient to determinex, since a null space of a large map has to be computed for each candidate subset ofcomponents. The following constructive method remedies this.

First, a basis for the right null space of[Nνq

Mνq

](that is, the gradient of the deriva-

tive array equations with respect to all unknown variables) is computed. The basisvectors are then chopped to get the tangent space of the solution manifold seen in x-space. Note that the chopped vectors will span the tangent space, but always containtoo many elements to be a basis (there are ( νq + 1 ) nf equations and ( νq + 2 ) nx vari-ables, and the equations will generally be dependent). To simplify the argument, abasis X for the tangent space is constructed, and the number of elements in this basisis the number of dynamic variables. Hence, it is possible to locally parameterize x inxd, and more equations are needed in order to determine xd given q−1x.

As in section 5.1.2, we are led to rewriting the equations given by f in terms ofxd, and require that the gradient of these equations (where derivatives have beendiscretized) with respect to xd be non-singular. By the chain rule, this means that theproduct ∇2f X shall have full column rank. Selecting a subset of components of fdirectly translates to selecting a subset of rows in this matrix product, and selectinga non-singular such subset is relatively cheap compared to computing the large nullspaces in P2–[5.3].

It can be seen that computing the null space basis X is not necessary, but at least thedimension nd of the null space has to be known, because instead of requiring that theproduct ∇2f X be non-singular, we shall require that its rank agrees with nd.

Note that if a consistent point is given (with as many derivatives as we may need),νq can be determined using definition 5.3, and then the constructive method can beused to determine a sufficient subset of components of f (or, in general, determinethe matrix Z1).

For lti dae, definition 5.3 is easily related to the differentiation index νD (defini-tion 2.3), as the next theorem shows.


5.6 Theorem. For the lti dae

E x′(t) + A x(t) + B u(t) != 0

it holds that νq = max { 0, νD − 1 }.

Proof: The residual function of the dae is given by

f ( x, x, t ) = E x + A x + B u(t)

If νD = 0, ∇2f ( x, x, t ) = E is non-singular, and it follows that νq = 0. It remains toconsider νD > 0.

Recall the special structure of the derivative array equations for lti dae, seen in(2.34),

FνD( t, x, x, . . . , x(νD+1) ) =

A0...0

︸︷︷︸NνD

x +

EA E

. . .. . .A E

︸︷︷︸MνD

x...

x(νD+1)

+

B u(t)B u′(t)...

B u′(νD)(t)

By definition of νD, x is uniquely determined by FνD

!= 0, which is a condition onlyin terms of MνD

. Partitioning this matrix as

MνD=

EA E

. . .. . .A E

=

∇2f

NνD−1 MνD−1

shows that νq = νD − 1.

The strangeness index νS is known to have the same relation to the differentiationindex also for ltv dae (Kunkel and Mehrmann, 2006, section 3.3) and nonlineardae in Hessenberg form (Kunkel and Mehrmann, 2006, theorem 4.23), but some ofthe proofs are lengthy and instead of making more comparisons of νq versus νD, wenow turn to the direct relation between νq and νS.

5.2 Relations

In this section, the two indices νS and νq will be shown to be closely related. Thisis done by means of a matrix decomposition developed for this purpose. We firstshow the matrix decomposition, and then interpret the two definitions in terms ofthis decomposition.

5.2 Relations 139

5.7 Lemma. The matrix [N M

]where N ∈ Rk×l , M ∈ Rk×k , rankM = k − a, a ≥ 1, can be decomposed as

[N M

]=

[Q1,1 Q1,2

] [ 0 0 Σ 0A 0 0 0

]

QT3,1 0

QT3,2 0

Σ−1 QT1,1N QT

2,1

0 QT2,2

In this decomposition, the left matrix is unitary, as are the diagonal blocks of theright matrix. The matrix Σ is a diagonal matrix of the non-zero singular values of M.The matrix A is square.

Proof: Introducing the singular value decomposition

M = Q1

[Σ 00 0

]QT

2 =[Q1,1 Q1,2

] [Σ 00 0

] [QT

2,1QT

2,2

]we get

[N M

]= Q1

[QT

1,1N Σ 0QT

1,2N 0 0

] I 00 QT

2,10 QT

2,2

By the QR decomposition

QT1,2N =

[A 0

] [QT3,1

QT3,2

]where A is square and may contain dependent columns (in particular, some of thecolumns may be zero), we then get

[N M

]= Q1

[QT

1,1N Q3,1 QT1,1N Q3,2 Σ 0

A 0 0 0

] QT

3,1 0QT

3,2 00 QT

2,10 QT

2,2

Finally, the relation

[QT

1,1N Q3,1 QT1,1N Q3,2 Σ 0

] QT

3,1QT

3,200

= QT1,1N

=[0 0 Σ 0

] QT

3,1QT

3,2Σ−1 QT

1,1N0


enables us to write

[N M

]=

[Q1,1 Q1,2

] [0 0 Σ 0A 0 0 0

] QT

3,1 0QT

3,2 0Σ−1 QT

1,1N QT2,1

0 QT2,2

as desired.

5.8 Theorem. Definition 5.1 and definition 5.3 satisfy the relation νS ≥ νq.

Proof: Suppose that the strangeness index is νS and finite, as the infinite case is triv-ial. Let the matrices N and M in lemma 5.7 correspond to MνS

and NνSas in defini-

tion 5.1.

First, let us consider νS in view of this decomposition. The left null space of M isspanned by Q1,2, and making these linear combinations of N results in

QT1,2N =

[A 0 0 0

] QT

3,1QT

3,2Σ−1 QT

1,1N0

=[A 0

] [QT3,1

QT3,2

]

where A has full rank due to P1b–[5.1]. This matrix determines the tangent space ofthe non-differential constraints as being its null space, spanned by the independentcolumns of Q3,2. Hence, we can parameterize x as x = Q3,2 xd.

Turning to νq, we follow the constructive interpretation of P2–[5.3] in section 5.1.3.

The right null space of[N M

]is spanned by the second and fourth rows of the right

factor in the decomposition;[N M

] (xy

)!= 0 ⇐⇒

∃z1, z2 :(xy

)=

[Q3,2 0

0 Q2,2

] (z1z2

) (5.8)

Extracting the part of this equation which only involves x, we find that it can beparameterized in z1 alone, and since the columns of Q3,2 are independent, we canuse z1 as dynamic variables; x = Q3,2 xd.

Since the strangeness index is νS, ∇2f Q3,2 has full column rank according toP1c–[5.1]. Hence, [

∇2f 0N M

] (xy

)!= 0 ⇐⇒

∃z2 :(xy

)=

(0

Q2,2 z2

) (5.9)

which is exactly the condition captured by P2–[5.3]. Since νq is the smallest indexsuch that this condition is satisfied, it is no greater than νS.

5.3 Uniqueness and existence of solutions 141

5.9 Theorem. Given the property

• P3–[5.9] The matrix[Nνq

Mνq

]has full row rank on the set Lνq

∩ bδ in defini-tion 5.3. That is,

rank[Nνq

Mνq

]= ( νq + 1 ) nx (5.10)

it holds that definition 5.1 and definition 5.3 satisfy the relation νS = νq.

Proof: Due to theorem 5.8 it suffices to show νS ≤ νq. To this end, suppose theequations have finite simplified strangeness index νq, as the infinite case is trivial. Letthe matrices N and M in lemma 5.7 correspond to Mνq

and Nνqas in definition 5.3.

The rank condition (5.10) implies that A in lemma 5.7 is non-singular.

Consider (5.8) and (5.9). Since adding the equation ∇2f x!= 0 is sufficient to conclude

x = 0 given x = Q3,2 z1, it is seen that ∇2f Q3,2 z1!= 0 must imply z1 = 0. This is only

true if ∇2f Q3,2 has full column rank, which shows that P1c–[5.1] holds. Since νS isthe smallest index such that this condition is satisfied, it is no greater than νq.

We now have an alternative to the procedure of definition 5.1 for computing thestrangeness index νS. First, one computes νq according to definition 5.3. If P3–[5.9]is satisfied for νq and some selection of bδ in definition 5.3, νS = νq according totheorem 5.9. If P3–[5.9] is not satisfied for any choice of bδ, νS = ∞. The remainingcase, when P3–[5.9] holds on some set where P2–[5.3] does not hold, νS > νq may stillbe finite. According to lemma 5.5 it can be found as the smallest number such thatP3–[5.9] holds on the intersection of νS and bδ ∈ Rnx ( νS+2 )+1, while P2–[5.3] holds onthe obvious projection of this set.

5.3 Uniqueness and existence of solutions

The present section gives a result corresponding to what Kunkel and Mehrmann(2006, theorem 4.13) states for the strangeness index. As the difference between thetwo index definitions is basically a matter of whether P3–[5.9] is required or not, themain ideas in Kunkel and Mehrmann (2006) apply here as well.

5.10 Lemma. If the simplified strangeness index νq is finite, there exist matrix func-tions Z1, Z2, X, similar to those in definition 5.1. They are all smooth with pointwiselinearly independent columns, satisfying

ZT2Mνq

= 0 and columns of Z2 span left null space of Mνq(5.11a)

ZT2Nνq

X = 0 and columns of X span right null space of ZT2Nνq

(5.11b)

ZT1∇2f X is non-singular (5.11c)

Proof: Using the decomposition of lemma 5.7, we may take Z2 B Q1,2 and X = Q3,2.As in the proof of theorem 5.9, (5.8) and (5.9) then imply that ∇2f X has full columnrank, and the existence of Z1 follows.


Multiplying the relations in (5.11) by smooth pointwise non-singular matrix func-tions shows that the matrix functions Z1, Z2, X are not unique, but they can be re-placed by any smooth matrices with columns spanning the same linear spaces. Fornumerical purposes, the smooth Gram-Schmidt orthonormalization procedure maybe used to obtain matrices with good numerical properties, while the theoretical ar-gument of the present section benefits from another choice, to be derived next.

Select the non-singular constant matrix P =[Pd Pa

]such that ZT

2NνqPa is non-

singular in a neighborhood of the initial conditions, and make a change of the un-dotted variables in Lνq

according to

x =[Pd Pa

] (xdxa

)(5.12)

The following notation will turn out to be convenient later (note that N aνq

is non-singular)

Ndνq

4= ZT2Nνq

Pd N aνq

4= ZT2Nνq

Pa (5.13)

The next result corresponds to Kunkel and Mehrmann (2006, corollary 4.10) for thestrangeness index.

5.11 Lemma. There exists a smooth function R such that

xa = R( xd, t ) (5.14)

inside Lνq, in a neighborhood of the initial conditions.

Proof: In Lνqit holds that Fνq

= 0 and ZT2Mνq

= 0, and it follows that

∂ZT2Fνq

∂x(1+)= ZT

2

∂Fνq

∂x(1+)+

∂ZT2

∂x(1+)Fνq

= 0

Hence, the construction of Z2 is such that ZT2Fνq

only depends on t and x, and thechange of variables (5.12) was selected so that the part of the Jacobian correspondingto xa is non-singular. It follows that xa can be expressed locally as s function of xdand t.

We now introduce the function φ−1 to describe the local parameterization of x usingthe coordinates xd and t,

x = φ−1( xd, t ) 4= P

(xd

R( xd, t )

)(5.15)

and the next lemma shows an important coupling between φ−1 and lemma 5.10.

5.12 Lemma. The matrix X in lemma 5.10 can be chosen in the form

X = P

[I

∇1R( xd, t )

]= ∇1φ

−1( xd, t ) (5.16)

5.3 Uniqueness and existence of solutions 143

Proof: Clearly, the columns are linearly independent and smooth. By verifying thatthe matrix is in the right null space of ZT

2Nνqwe will show that its column spans the

same linear space as X. It will then follow that X and X are related by a relation inthe form

X = X W

for some smooth non-singular matrix function W . Using the form X W then showsthat (5.11c) is also satisfied. Hence, it remains to show that X is in the right nullspace of ZT

2Nνq.

Using (5.14) and allowing also the dotted variables x(1+) to depend on xd in (sup-pressing arguments)

∂ZT2Fνq

∂xd= 0

it follows that

∂ZT2

∂xdFνq

+ ZT2

∂Fνq

∂xd+

∂ZT2

∂xaFνq

+ ZT2

∂Fνq

∂xa

∂xa

∂xd+

∂ZT2

∂x(1+)Fνq

+ ZT2

∂Fνq

∂x(1+)

∂x(1+)

∂xd

!= 0

Here, Fνq= 0 and ZT

2∂Fνq

∂x(1+) = ZT2Mνq

= 0 implies that

ZT2

∂Fνq

∂xd+ ZT

2

∂Fνq

∂xa∇1R = ZT

2∇2Fνq

[Pd Pa

] [ I∇1R

]= ZT

2NνqX

!= 0

Back in section 5.1.3 it was indicated that we would be able to show that a finitesimplified strangeness index implies local uniqueness of solutions. With lemma 5.7at our disposal this statement can now be shown rather easily.

5.13 Lemma. If the simplified strangeness index is finite and x is a solution to thedae for some initial conditions in Lνq

∩ bδ, then the solution x is locally unique.

Proof: Using the parameterization of x given by (5.15), it suffices to show that thecoordinates xd are uniquely defined. By the smoothness assumptions and the ana-lytic implicit function theorem, Hörmander (1966), showing that x′d(t) is uniquelydetermined given xd(t) and t will be sufficient, since then the corresponding odewill have a right hand side which is continuously differentiable, and hence locallyLipschitz on any compact set. One may then complete the argument by applyinga basic local uniqueness theorem for ode, such as Coddington and Levinson (1985,theorem 2.2)).

Reusing (5.5) for the current context, x′d(t) is seen to be uniquely determined if∇2fd( xd, xd, t ) is non-singular (in some neighborhood Lνq

∩ bδ of the initial con-ditions). Identifying (5.6) in (5.11c), lemma 5.12 completes the proof.

With X according to (5.16) it follows that

ZT2Nνq

X = Ndνq

+ N aνq∇1R

!= 0 (5.17)


using the notation (5.13). Before stating the main theorem of the section we deriveone more equation. Using (5.14) and allowing also the dotted variables x(1+) to de-pend on t in (suppressing arguments)

ZT2

∂Fνq

∂t!= 0

it follows that

ZT2

(∇1Fνq

+ ∇2Fνq∇2φ

−1 + Mνq

∂x(1+)

∂t

)= ZT

2

(∇1Fνq

+ ∇2Fνq∇2φ

−1) != 0 (5.18)

5.14 Theorem. Consider a sufficiently smooth dae (5.1), repeated here,

f ( x(t), x′(t), t ) != 0 (5.1)

with finite simplified strangeness index νq and where the un-dotted variables in Lνq

form a manifold of dimension nd. If the set where P2–[5.3] holds is the projection ofa similar bδ+Lνq+1, and P2–[5.3] also holds on bδ+Lνq+1 with the same dimension nd,then there is a unique solution to (5.1) for any initial conditions in bδ+Lνq+1.

Proof: Considering how Fνq+1 is obtained form Fνq, it is seen that the equality

Fνq+1 = 0 can be written

∇1Fνq+ ∇2Fνq

x + ∇3+Fνqx(2+) = 0

Multiplying by ZT2 from the left and identifying the expressions for Nνq

and Mνq, one

obtains

ZT2

(∇1Fνq

+ ∇2Fνqx)

= 0

Using (5.18) and the change of variables (compare (5.12))

x =[Pd Pa

] (xdxa

)(5.19)

leads to (using the notation introduced in (5.13))[Ndνq

N aνq

] (−P −1∇2φ

−1 +(xdxa

) )= 0

Using (5.15) and (5.17) yields

−N aνq∇2R( xd, t ) − N a

νq∇1R( xd, t ) xd + N a

νqxa = 0

and since N aνq

is non-singular, it must hold that

x = P

(xdxa

)= P

(I

∇1R( xd, t )

)xd + P

(0

∇2R( xd, t )

)= ∇1φ

−1( xd, t ) xd + ∇2φ−1( xd, t )

Since f ( x, x, t ) != 0 holds by definition on Lνq, it follows that

f (φ−1( xd, t ), ∇1φ−1( xd, t ) xd + ∇2φ

−1( xd, t ), t ) = 0

5.4 Implementation 145

where xd is uniquely determined given xd and t by (5.11c) with ∇1φ−1 = X in place

of X.

Hence, the dae

f (φ−1( xd(t), t ), ∇1φ−1( xd(t), t ) x′d(t) + ∇2φ

−1( xd(t), t ), t ) = 0

has a (locally unique) solution and the trajectory generated by

x(t) = φ−1( xd(t), t )

is a solution to the original dae (5.1).

5.4 Implementation

The definition of the simplified strangeness index does not prescribe that a basisfor the tangent space of x should be computed in the same way as the definition ofthe strangeness index does. We have seen, however, that this basis is an importantintermediate object for the selection of equations to be discretized during numericalintegration. Two ways to compute this basis will be presented in this section, andtheir computational complexity will be compared.

We shall assume that occuring matrices are full, so that there is no particular struc-ture that can be utilized in the problem. This assumption may be highly questionablein many applications, but that just opens up for a refined analysis in the future, tak-ing sparsity into account. We assume QR decomposition is used both for computinga well-conditioned base for a null space, and to compute a well-conditioned base fora range space. If sparsity is not utilized, the QR decomposition is preferably com-puted using Housholder reflections, while Givens rotations would be used to takeadvantage of sparsity.

A conceptually simple way to determine the basis is to compute a basis for the rightnull space of

[Nνq

Mνq

], and project these vectors on the x-space; these are the

directions in which x will be free to move under the algebraic constraints. The set ofprojected vectors will always contain at least nx elements which is typically too manyfor the set to be a basis, and the vectors may also be poorly conditioned. Hence, toobtain a well conditioned basis one additional computation has to be performed. Thismethod will be referred to as the projection method below.

An alternative way to determine the basis is to follow the definition of the strangenessindex, except that one does not require the matrix Aνq

to have independent rows.This method requires computation of the left null space of Mνq

and the right nullspace of Aνq

. Lemma 5.7 was originally developed to show that the two ways of com-puting the basis are equivalent. This method will be referred to as the strangenessindex method below.


5.4.1 Computational complexity

Both methods will perform two QR decompositions, one large and one small.The small one would not be required for the projection method unless a (well-conditioned) base was sought in the end, but it is not here the big difference incomputational burden is to be sought. Note that computing a complete QR decom-position involves more than twice the number of multiplications compared to onlycomputing the upper triangular factor. Hence, for the strangeness index method, itwill be more efficient to apply the same row operations to Nνq

as are applied whenrow-reducing Mνq

, than first compute a matrix spanning the left null space and thenapply it to Nνq

. Similarly, for the projection method, where only the projection of anull space onto x-space is needed, it suffices to compute just the first columns of theunitary matrix. This can be implemented by row reducing the left part of the matrix[

N TνqI

MTνq

0

]and then reading the lower right block of the resulting matrix. This will, however,always involve more computation than to do the row reduction of the left block ofthe matrix [

MνqNνq

]since both reductions involve the same number of columns, but the former has nxmore rows to take care of and will not terminate early (hence requiring ( νq + 1 ) nx −1 Housholder reflections), while the latter will terminate when the na last rows ofthe left block are found to be zero (thus requiring ( νq + 1 ) nx − na − 1 Housholderreflections).

The comparison shows that the strangeness index method has an advantage. Still, wethink that the conceptual simplicity of the projection method adds valuable insight.

5.4.2 Notes from experiments

Experimental results are not included in the section in the usual sense, the reasonbeing that the two methods are equivalent. Therefore, we only include some briefremarks based on experience from tests with our experimental implementations ofthe two methods. In all examples, the simplified strangeness index has been equal tothe strangeness index.

Since the construction of Z1 is not canonical, it will generally depend on X via ∇2f X,or more generally, the columns used to span the tangent space of the solution mani-fold. We have seen that the two methods yield the same linear space spanned by thecolumns of X, but the construction of X differs. Hence, in our experimental setupwhere Z1 has been chosen using a greedy method to pick out a subset of the rowsof ∇2f X with good condition number, the selection of rows is not always the samefor the two methods. However, comparing the resulting difference in the numericalsolution is meaningless since the differences will be due to the greedy algorithm andnot due to differences conceptual differences between the two methods.

5.5 Conclusions and future work 147

We remark that consistent initialization (that is, finding a root to the residual Fνq+1,compare Kunkel and Mehrmann (2006, theorem 4.13)) has been a major concern forthe numeric experiments. However, the Mathematica function FindMinimum hasbeen a very useful tool, while finding a good enough initial guess for Mathematica’slocal search method FindRoot has turned out to be notoriously hard.

5.5 Conclusions and future work

In our view, a simpler way of computing the strangeness index has been proposed.It gives a lower bound on the strangeness index, and when the auxiliary prop-erty P3–[5.9] holds, it gives an equivalent test. While the original definition follows athree step procedure, the proposed definition has just one step (except that P3–[5.9]needs to be verified separately). The new index definition is also appealing due to itsimmediate interpretation from a numerical integration perspective.

Just as the strangeness index, the simplified strangeness index emphasizes the pa-rameterization of all variables via a smaller number of “differential” variables, butthe corresponding dimensions are not required to be visible by looking only at Mνq

.

Analogues of central results for the original strangeness index have been derivedfor the simplified strangeness index. In particular, it has been shown that a finitesimplified strangeness index implies that if a solution exists, it will be unique, andexistence of a solution can be established by checking the property that defines theindex for two successive values of the index parameter.

For the simplified strangeness index, the computational complexities of two methodsfor computing a basis for the x tangent space have been compared. The outcome wasfavorable for a method closely related to the definition of the original strangeness in-dex. The other method offers superior conceptual simplicity, and adds insight to themore efficient definition, and hence also to the original strangeness index. These ob-servations are though be useful when the strangeness index concept is being taught.

An important aspect of the analysis of the strangeness index provided in Kunkel andMehrmann (2006, chapter 4) is that the strangeness index is shown to be invariantunder some transformations of the equations which are known to yield equivalentformulations of the same problem. It is an important topic for future research to findout whether the simplified strangeness index is also invariant under these transfor-mations. Another interesting topic for future work is to seek examples where νq , νSin order to get a better understanding of this exceptional case.

Finally, in view of the emphasis that this thesis puts on the singular perturbationproblems arising from uncertainties in dae, it would be very interesting to derivethe singular perturbation problems related to the results of the present chapter.

6LTI ODE of nominal index 1

This is the first chapter of three with results for uncertain dae and the related matrix-valued singular perturbation problems.

The current chapter considers the same problem as was considered in Tidefelt (2007,chapter 6), and deals with the two major deficiencies discussed there. At the sametime, the next chapter will show that some of the central ideas here are limited to thenominal index 1 case, so this is the chapter where the strongest results appear. Thechapter contains the results of the two papers

Henrik Tidefelt and Torkel Glad. Index reduction of index 1 dae un-der uncertainty. In Proceedings of the 17th IFAC World Congress, pages5053–5058, Seoul, Korea, July 2008.

Henrik Tidefelt and Torkel Glad. On the well-posedness of numericaldae. In Proceedings of the European Control Conference 2009, pages826–831, Budapest, Hungary, August 2009.

as well as some variations of results in Tidefelt (2007, chapter 6), that were omittedfrom Tidefelt and Glad (2008) due to space constraints. At the end, the chaptercontains a new example which indicates better applicability of the theoretical results,adds important insights to the problem, and connects the current chapter with thenext.

The singular perturbation theory, as presented in (Kokotović et al., 1986), providesvery relevant background for this chapter. In particular, theorem 6.21 herein shouldbe compared with Kokotović et al. (1986, chapter 2, theorem 5.1).

The chapter is organized as follows (compare figure 6.1). Section 6.1 introduces theproblem and derives the related matrix-valued singular perturbation problem. To be-

149

150 6 lti ode of nominal index 1

gin with, it is assumed that the nominal index 1 dae is pointwise index 0. Then thenew section 6.2 gives a schematic overview of the analysis and captures the essenceof the current chapter as well as chapter 8. Section 6.3 considers the decoupling ofnominal and fast uncertain dynamics. In section 6.4, we derive a matrix result whichwill be the main tool for the formulation of assumptions in terms of eigenvalues.It is applied in section 6.5 when we formulate results for ode which will then beused when we study the fast and uncertain subsystem in section 6.6. In section 6.7we draw conclusions regarding the original coupled system using results from pre-vious sections. Then, section 6.8 considers what happens if the pointwise index 0assumption is dropped. Two examples of the theory are given in section 6.9, beforethe chapter is concluded in section 6.10.

6.1 Introduction

We are interested in the utility of modeling dynamic systems as unstructured uncer-tain differential-algebraic-equations. Consider the linear time-invariant dae

E x′(t) + A x(t) != 0 (6.1)

If E is a regular matrix, this equation is readily turned into an ordinary differentialequation, but our interest is with the other case. By saying that this dae is unstruc-tured, we mean that we cannot assume any of the matrix entries to be known exactly.Instead, we assume that there is an uncertainty model with independent uncertaintydescriptions for the matrix entries, and then example 1.1 showed that additional as-sumptions are needed to turn the equation into an uncertain ode. To see what kindof assumptions we might find reasonable to add, we recall that the equations repre-sent a dynamic system, and so we might be willing to make assumptions regardingsystem features; that is, properties of a dynamic system which are not dependent ofthe particular equations used to describe the system. Invertibility of E is not a sys-tem feature. The poles are a system feature, and the poles of an ode are given by thefinite eigenvalues� of the matrix pair ( E, A ).

One way to analyze the equations is to apply a row reduction procedure to the equa-tions, trying to bring E into upper triangular form (variables may be reordered asneeded). Such a procedure (we think of Gaussian elimination, or QR decompositionusing Given’s rotations or Householder reflections) can only proceed as long as thelower right block (which remains to reduce to upper triangular form) contains anentry which can be distinguished from zero. If the uncertain matrix is not regular,the procedures will at some point fail to find an entry which can be distinguishedfrom zero (or else the procedure would prove the regularity of the matrix), and willbe unable to continue beyond that point:[

E11 E120 E22

] (x′1(t)x′2(t)

)+

[A11 A12A21 A22

] (x1(t)x2(t)

)!= 0 (6.2)

� Since all variables in the dae are considered outputs, the system is trivially observable — that is, theeigenvalues cannot correspond to un-observable modes without corresponding system poles.

6.1 Introduction 151

where E11 is regular, and E22 has no entries which can be distinguished from zero.The dae is considered ill-posed if the family of solutions does not converge as theuncertainty tends to zero, for any initial conditions in some set of interest. Hence,showing well-posedness in this sense is a first step towards replacing the ad hoc pro-cedure of neglecting E22 with an analysis that accounts for the error in the solutionintroduced by substituting zero for E22.

The remaining part of this introduction contains just enough analysis to reach thefundamental matrix-valued singular perturbation problem.

It is assumed that the equations are of nominal index� 1, where nominal is taken torefer to max

(E22

)= 0. This means that the matrix[

E11 E12A21 A22

](6.3)

is regular. When E22 = 0, the second group of equations becomes a static relationbetween the variables. We also assume that the initial conditions x(0) satisfy thisrelation.

Next, a change of variables leads to[I 00 E

] (x′(t)z′(t)

)+

[A11 A12A21 A22

] (x(t)z(t)

)!= 0 (6.4)

From the assumed regularity of (6.3), it follows that the new matrix(I 0A21 A22

)and hence A22, is regular.

Maintaining the style of Tidefelt and Glad (2008), applicability of the results in thischapter are increased by generalizing (6.4) slightly, allowing the trailing matrix todepend on E. Doing so leads to the following ltimatrix-valued singular perturbationform (in chapter 7 a somewhat different form is used, given by (7.8))

x′(t) + A11(E) x(t) + A12(E) z(t) != 0 (6.5x)

E z′(t) + A21(E) x(t) + A22(E) z(t) != 0 (6.5z)

where the following properties will be used throughout the chapter.

P1–[6.1] Property. The functions Aij shall be analytic functions, without uncer-tainty at 0, and with a known, finite, Lipschitz constant.

P2–[6.2] Property. The (nominal) matrix A22(0) shall be non-singular.

That is, the uncertainty in the form (6.5) is in the matrix E and how the trailingmatrices Aij depend on E under the respective Lipschitz conditions.

� That is, the differentiation index, see definition 2.2.


Note that the Lipschitz condition for A22 together with corollary 2.47 provide thatthe inverse of A22(E) can be bounded by a constant, if E is required to be sufficientlysmall.� To ease notation, we shall often not write out the dependency of the Aij on E.

For reference to the published work behind this chapter, the notion of unstructuredperturbation has to be explained. It refers to the lack of structure in the uncertainty Ecompared to the singular perturbation problems studied in the past (see section 2.5),where E has either been in the form ε I for a small scalar ε > 0, or a diagonal matrixwith small but positive entries on the diagonal. In this thesis, the lack of structure ismarked by the use of matrix-valued uncertainty instead.

6.2 Schematic overview of nominal index 1 analysis

The analysis and convergence proofs (where present) in the present and subsequentchapters have much in common. To emphasize this, and to enhance the reading ofthese chapters, this section contains a schematic overview of the common structure.The schematic overview is given in figure 6.1, annotated below.

• (a) The solution to the uncertain part of the decoupled system will generallynot be known beyond boundedness. Hence, it is not necessary that the changeof variables converge to a known transform as max(E) → 0, but what mattersis that the relation between the solution to the slow dynamics and the originalvariables converges to a known entity, and that the influence of the solution tothe uncertain dynamics on the original variables is bounded independently ofmax(E).

In chapter 8, where the pointwise index of the equations is assumed zero, show-ing the existence of a decoupling transform with the required properties is themain concern.

• (b) Showing that the eigenvalues of the uncertain dynamics must grow asmax(E) → 0 is the main concern in chapter 7, as the rest of this chapter hasmuch in common with the present chapter.

• (c) Since the solution η will have a non-vanishing influence on some of theoriginal variables, it is necessary for convergence in the original variables thatη converges uniformly to zero as max(E)→ 0. In particular, it has to be shownthat there is uniform convergence at the initial time.

• (d) Making assumptions about eigenvalues is a key ingredient in the conver-gence proofs. In the end, these assumptions will restrict the uncertainty setsof the uncertain entities in the equations, but we prefer making the restrictionsindirectly via the eigenvalues since these can be related to system properties.

Besides stating the assumptions, it needs to be verified that the restricted un-certainty sets are non-empty, or else any further reasoning will be meaningless.

� If it would not have been in order to maintain the style of Tidefelt and Glad (2008), the uncertainty modelof chapter 7 would have been used instead. There all deviation from some nominal matrices are assumedbounded by a common number m, and instead of considering max(E)→ 0, one considers m→ 0.

6.2 Schematic overview of nominal index 1 analysis 153

Decouplingtransform

Show|λ| → ∞

Showη(0)→ 0

Eigenvalueassumptions

λ ∈ fast region

Bound‖Φ(t, τ)‖2

η → 0uniformly

Combinesolutions

Uncertaindae

Slowode in ξ

Uncertaindae in η

Convergenceof solutions

a

b cd

e

f

g

Figure 6.1: Schematic overview of convergence proofs. The crucial steps havebeen marked with a thick border. The labels link to annotations in the text.


• (e) Using that the eigenvalues of the uncertain dynamics must tend to infin-ity as max(E) → 0, it can be concluded that for sufficiently small max(E) theeigenvalues of the uncertain dynamics must belong to a subset of the assumedregion, strictly included in the left half of the complex plane.

• (f) The last crucial step is to show that the location of the eigenvalues of theuncertain dynamics imply that the initial condition response to the uncertaindynamics can be bounded independently of E, for sufficiently small max(E).(Then, uniform convergence of initial conditions to zero implies that the wholesolution converges uniformly to zero.)

In the lti cases, we use results from the theory of M-matrices. In the ltv casewe also need Lyapunov-based methods.

• (g) This is the final conclusion of the analysis. While we are mainly concernedwith the qualitative property of convergence in this thesis, a review of theproofs will indicate how the convergence may be quantified. However, somesteps in the analysis seem to give rise to gross over-estimation, making thequantitative results unpleasant to use in real applications (the alternative beingto ignore the issue with matrix-valued singular perturbation altogether, or tryto avoid the issue by re-deriving the equations with more attention to structure,recall section 1.2.5).

6.3 Decoupling transforms and initial conditions

Following the scheme outlined in the previous section, we now start by deriving thedecoupling transform. Similarly to how this done in most of the literature on othersingular perturbation problems, the transform is divided into two steps. Since theinitial conditions for the decoupled system are a direct consequence of the decou-pling transform (and the initial conditions for the coupled system), results on initialconditions are also included in the present section. With this short introduction, wenow begin with a lemma for the first decoupling step.

6.3 Lemma. There exists an analytic matrix-valued function L such that, for suffi-ciently small max(E), the change of variables(

xz

)=

[I 0

L( E ) I

] (xη

)(6.6)

transforms (6.5) into the system[I

E

] (x′(t)η′(t)

)+

[A11 + A12 L( E ) A12

0 A22 − E L( E )A12

] (x(t)η(t)

)!= 0 (6.7)

The matrix L( E ) satisfies

L( 0 ) = −A22(0)−1A21(0) (6.8)

and a Lipschitz condition.

6.3 Decoupling transforms and initial conditions 155

Proof: Applying the change of variables shows that x is eliminated from the η′ equa-tion provided L( E ) satisifies

0 != A21(E) + A22(E) L( E ) − E L( E )(A11(E) + A12(E) L( E )

)(6.9)

For E = 0 there is the solution

L( 0 ) = −A22(0)−1A21(0)

The derivative of the right hand side of (6.9) with respect to each column of L at E = 0is A22(0), which is non-singular. It follows from the analytical implicit function the-orem, (Hörmander, 1966), that the equation can be solved to give an analytical L insome neighborhood of 0. On a closed ball of positive radius within that neighbor-hood, L will be bounded due to its continuity. Since L′ will also be analytic (see, forinstance, Krantz and Parks (2002, proposition 1.1.14)), it follows that L′ will also bebounded on the same closed ball, implying the Lipschitz condition.�

Let the initial conditions for (6.5) be

x(0) = x0 z(0) = z0

6.4 Lemma. If the initial conditions x0 and z0 are chosen to make the dae consis-tent for E = 0, that is,

0 != A21(0) x0 + A22(0) z0 (6.10)

then the initial condition η(0) = η0(E) for the lower part of (6.7) satisfies

η0(E) =[−A22(0)−1A21(0) − L( E )

]x0 (6.11)

In particular η0 is analytic with

η0(E) = O( ‖E‖2 ) (6.12)

� In chapter 7, the decoupling transforms will be established using fixed-point methods rather than analyticcalculus. Then the neighborhoods — whose existence is the only thing we care about in this chapter — willbe balls with radii that are obtained constructively. This will bring theory much closer to application, andthe presentation in this chapter is using analytic calculus to maintain the flavor of the published worksthat the chapter builds upon. We shall briefly indicate the type of results that fixed-point methods provideby looking at how a bound on L over a closed ball may be constructed.Let a1(E), . . . , am(E) denote the columns of A21(E), and let l1(E), . . . , lm(E) be the columns of L( E ). Im-pose a bound on E so that there exists constants k11 and k12 such that ‖A11(E)‖2 ≤ k11 and ‖A12(E)‖2 ≤k12. Let ρ >

∥∥∥A22(0)−1A21(0)∥∥∥

2 denote the bound on ‖L( E )‖2 to be established and write (6.9) as

−

a1(E)...

am(E)

!=

A22(E)

. . .A22(E)

+ F( E )

l1(E)...

lm(E)

where ‖F( E )‖2 ≤ ‖E‖2 ρ ( k11 + k12 ρ ). According to corollary 2.47, the solution L( E ) will satisfy thebound ‖L( E )‖2 ≤ ρ if ‖F( E )‖2 is made sufficiently small. In other words, for each ρ >

∥∥∥A22(0)−1A21(0)∥∥∥

2we can construct an open ball for E, within which the bound ‖L( E )‖2 ≤ ρ holds. The derivative can bebounded similarly.


Proof: From the definition of the change of variables z = L( E ) x + η it follows that

η0( E ) = z0 − L( E ) x0

Substituting z0 from (6.10) gives (6.11) while (6.8) and the Lipschitz condition for Lgives (6.12).

Introduce the notation

Aη( E ) 4= A22(E) − E L( E )A12(E) (6.13)

where Aη( 0 ) is the non-singular matrix A22(0), and a Lipschitz condition for Aη isobtained by taking E sufficiently small.

To emphasize the difference between uniform and “directional convergence” withrespect to the uncertainty, the following lemma gives a result to be contrasted withlemma 6.20.

6.5 Theorem. Assume E = mE∗, where E∗ is a non-singular matrix with max(E∗) =1, such that −E−1

∗ Aη(0) is Hurwitz. Then,

supt≥0

∣∣∣η(t)∣∣∣ = O(m ) (6.14)

Proof: With the change of variables t = mτ we get

∂η

∂τ= −E−1

∗ Aη(mE∗ ) η, η(0) = η0

with solution

η(τ) = e−E−1∗ Aη(mE∗ ) τη0

Since −E−1∗ Aη(0) is a Hurwitz point matrix, the time-scaled system with m = 0 can be

shown to be uniformly[γ e−λ•

]-stable for some α

(−E−1∗ Aη(0)

)< λ < 0 according to

theorem 2.38. Since the matrix in the exponent (as a function of m) satisfies a Lips-chitz condition, and

∥∥∥−E−1∗ Aη(0)

∥∥∥2

is bounded since it is a point matrix, theorem 2.41shows that there exist positive constants C1 and m0 (ignoring the exponential decayrate) such that

m ≤ m0 =⇒∥∥∥∥e−E−1

∗ Aη(mE∗ ) τ∥∥∥∥

2≤ C1

Since η0 = O( ‖E‖2 ) = O(m ) the result follows.

If E∗ would have been singular in theorem 6.5, other estimates would be possible,but note that the “directional convergence” is not the type of convergence we seek.Consequently, theorem 6.5 has no applications in the thesis. The following examplegives a better picture of the problem we have to address.

6.3 Decoupling transforms and initial conditions 157

6.6 Example

In this example, the bounding of η over time is considered in case η has two compo-nents. For simplicity, we shall assume that η is given by

η′(t) = E−1 η(t)

where

E = ε

(−δ 1 − δ0 −δ

)where ε ∈ ( 0, m ] and δ > 0 is a small uncertainty parameter so that max(E) ≤ m.Since

E−1 = ε−1(−1/δ 1/δ − 1/δ2

0 −1/δ

)it is seen that both eigenvalues are perfectly stable and far into the left half plane,while the off-diagonal entry is at the same time arbitrarily big. It is easy to verifyusing software that the maximum norm of the matrix exponential grows withoutbound as δ tends to zero, for any boundm. Hence, knowing that the initial conditionsη(0) must tend to zero with max(E) is not sufficient to obtain a uniform bound on ηwhich tends to zero with max(E).

In Tidefelt and Glad (2008), the unboundedness supt≥0 η(t) was remedied by assum-ing that the condition number of E be bounded, and it is easy to see that this wouldimply a lower bound on δ in this example, thereby solving the problem. However,assuming a bound on the condition number of E is very artificial and will not be donein the thesis.

So far, the presentation in the present chapter has been rather close to Tidefelt andGlad (2008). However, we will now omit Tidefelt and Glad (2008, lemma 5) and takea different route thanks to the improved results in Tidefelt and Glad (2009). It is thetopic of section 6.6 to find conditions that can be used to provide a uniform boundon the solution η which tends to zero with E, when η satisfies

E η′(t) + Aη(E) η(t) != 0

The vanishing function η may then be regarded an external input to the ode for xin (6.7), and regular perturbation techniques may be used to show that the solutionsx converge as E tends to zero. However, it is also illustrative to apply a second de-coupling transform which isolates the slow dynamics; the transform is guaranteed toexist by the next lemma.

Continuing on the result of lemma 6.3, the following lemma shows that the influenceof η on x is small.


6.7 Lemma. There exists a matrix-valued function H such that, for sufficientlysmall max(E), the change of variables(

xη

)=

[I H( E ) E0 I

] (ξη

)(6.15)

transforms (6.7) into the system[I

E

] (ξ ′(t)η′(t)

)+

[A11(E) + A12(E) L( E ) 0

0 Aη(E)

] (ξ(t)η(t)

)!= 0 (6.16)

where ‖H( E )‖2 is bounded by a constant independently of E.

Proof: Applying the change of variables and then performing row operations on theequations to eliminate η′ from the first group of equations, lead to the conditiondefining H( E ) (dropping other dependencies on E from the notation):

0 != [A11 + A12 L ]H( E ) E + A12 − H( E ) [A22 − E LA12 ] (6.17)

It follows that

H( 0 ) = A12(0)A22(0)−1

which is clearly bounded independently of E. The equation is linear in H( E ), in-vertible at E = 0, and the coefficients depend smoothly on E, so as for L it followsthat H is analytical in some neighborhood of 0. Hence, restricting H to a sufficientlysmall closed ball (via the selection of a sufficiently small bound on max(E)) will make‖H( E )‖2 bounded independently of E.�

The results Tidefelt (2007, lemma 6.4, corollary 6.1) consider the derivatives of η0( E )with respect to E, and belong in the present section. However, since these have nocounterparts in later chapters, we prefer to present them using the more constructivefixed-point methods of later chapters, rather than using the original framework ofreal analytic functions.

6.8 Lemma. Irrespectively of the rank of E, the solution L to (6.9) has the form

L0( E ) = −A22(E)−1A21(E)

L( E ) = L0( E ) + A22(E)−1E(L0( E )

[A11(E) + A12(E) L0( E )

]+ mRL(E)

) (6.18)

where m is an upper bound on E, and RL(E) can be bounded independently of E, form sufficiently small.

Proof: The following proof will never make use of the rank or pointwise non-singularity of E, which makes it valid for any rank.

Inserting (6.18) in (6.9), repeated here,

0 != A21(E) + A22(E) L( E ) − E L( E )(A11(E) + A12(E) L( E )

)� As in lemma 6.3, more constructive formulations are easily obtained using corollary 2.47.

6.4 A matrix result 159

yields

0 != A21(E) − A22(E)A22(E)−1A21(E)

+ E(L0( E )

[A11(E) + A12(E) L0( E )

]+ mRL(E)

)− E L( E )

(A11(E) + A12(E) L( E )

)and then

0 != mE RL(E)

+ E L0( E )[A11(E) + A12(E) L0( E )

]− E L0( E )

(A11(E) + A12(E) L0( E )

)− E L0( E )A12(E)

(L( E ) − L0( E )

)− E

(L( E ) − L0( E )

) (A11(E) + A12(E) L( E )

)Cancelling a factor of E on the left means that the above equation is implied by theone below.

0 != mRL(E)

− L0( E )A12(E)(L( E ) − L0( E )

)−(L( E ) − L0( E )

) (A11(E) + A12(E) L( E )

)The rest of the proof is a contraction mapping argument showing that there is aρL > 0, such that ‖RL(E)‖2 ≤ ρL if max(E) is required to be sufficiently small. Theargument is given in section 6.A.

In addition to some details of the proof of lemma 6.8, section 6.A contains an exampleof the lemma.

6.9 Corollary. If the trailing matrices in lemma 6.4 are independent of E, then thefollowing relation gives a more precise account of the relation (6.12).

η0( E ) = −A−122E

(L0

[A11 + A12 L

0]

+ O(m ))x0 (6.19)

Proof: Use lemma 6.8. That the trailing matrices are independent of E implies thatthe difference A22(E)−1A21(E) − A22(0)−1A21(0) vanishes.

6.4 A matrix result

Before we take on the problem of analyzing the dynamics of η, we need to switch con-text for a while and see how eigenvalue conditions can help to bound the inverse ofa small matrix. The result we need is quickly derived, and we then turn to examplesin an attempt to illustrate its qualities.


6.10 Lemma. For an invertible upper triangular matrix U , it holds that

‖U‖2 ≤ λmax(U )

√( a + 1 )2 n + 2 n ( a + 2 ) − 1

a + 2(6.20)

where a = max(U−1

)λmax(U ).

Proof: The difference to theorem 2.53 is only minor. Here one uses that the bound(2.90) of theorem 2.53 is increasing with a, so the bound is overestimated by insertingthe expression for b and the upper bound for a.

Extending the result for upper triangular matrices to the general case is easy andrelies on the Schur decomposition of a matrix. Unfortunately, the nice property ofthe Schur decomposition that all factors but one are unitary, is not quite as beneficialas if our results had been assuming bounds on the induced 2-norm of a matrix insteadof entry-wise maximum.

6.11 Theorem. For an invertible matrix X ∈ Rn×n, it holds that

‖X‖2 ≤ λmax(X )

√( n a + 1 )2 n + 2 n ( n a + 2 ) − 1

n a + 2(6.21)

where a = max(X−1

)λmax(X ).

Proof: Let QU QH = X be a Schur decomposition of X. Then ‖X‖2 = ‖U‖2, andλmax(U ) = λmax(X ), so a bound on max

(U−1

)will yield a bound on ‖X‖2 by

lemma 6.10. From

max(U−1

)≤

∥∥∥U−1∥∥∥

2=

∥∥∥QU−1 QH∥∥∥

2=

∥∥∥X−1∥∥∥

2≤ nmax

(X−1

)(6.22)

the result follows by substituting nmax(X−1

)for max

(U−1

)in (6.20).

We now turn to the examples. An exact treatment of the optimization problembounded by (6.21),

maximizeX∈Rn×n

‖X‖2

max(X−1

)≤ m

λmax(X ) ≤ λappears difficult, even for as low dimension as n = 2. Instead, in example 6.12 weturn to numeric (nonlinear, global) optimization in order to obtain examples whichwe provide as indications of how tight the bound may actually be. The section thenends with example 6.13, where we plot the true norm against the bound for a largenumber of randomly generated matrices.


6.12 Example

Since we are only interested in finding good examples here, we choose to considerthe problem

minimizeX∈Rn×n

max(X−1

)λmax(X ) ≤ λ

‖X‖2!=∥∥∥X0

∥∥∥2

where∥∥∥X0

∥∥∥2

is a feasible objective function value in the original problem for boundsm as low as indicated by solutions to this problem. The optimization problem isfurther simplified by a restrictive parameterization of X−1 as QU−1 QT. Here, U−1 ischosen as

U−1 =

λ−1 η 0 . . . 0

0 λ−1 η...

0. . . η 0

... 0 λ−1 η0 . . . 0 λ−1

(6.23)

with λ = −λ and η chosen as some small constant (this will define∥∥∥X0

∥∥∥2), and the

orthogonal� Q is parameterized as a composition of Given’s rotations and reflec-tions. Different combinations of reflections are enumerated, and for each enumer-ated combination, simulated annealing is applied to find rotations that yield a smallmax

(QU−1QT

).

Application of this scheme for some choices of n, λ, and η, gives the results shownin table 6.1. The table shows that the bound is no or only a few orders of magnitudefrom the ideal bound in cases of practical importance.

The following matrix manifests one of the rows of table 6.1. The reader is encouragedto verify this, but is warned that the precision in the printed matrix entries is insuf-ficient to reproduce the λmax(X ) column with more than one accurate digit, causingthe bound (6.21) to have zero accurate digits.

X−1 =0.1518712275 −0.1522399412 −2.388603613 · 10−3 2.33223507 · 10−3

0.1524043923 −0.143406358 −2.337599229 · 10−3 2.282425336 · 10−3

−0.1524043968 −0.1487240742 −0.1475359243 0.14792320460.1465683972 0.1522222484 −0.152030243 0.1524043881

� Orthogonal instead of unitary ensures that X−1 is real.


dimX λmax(X ) η max(X−1

)‖X‖2 (6.21)

2 0.3 0.3 3.33 0.314 0.7352 0.3 30. 18. 2.73 3.272 30. 0.3 0.18 2.73 · 102 3.27 · 102

2 30. 30. 15. 2.7 · 104 2.71 · 104

4 30.1 0.3 0.157 2.21 · 104 2.23 · 105

4 30. 3 · 10−2 3.33 · 10−2 76.1 3.13 · 103

4 3.03 · 102 0.3 0.152 2.19 · 108 1.93 · 109

4 3.01 · 102 3 · 10−2 1.61 · 10−2 2.21 · 105 2.4 · 106

Table 6.1: Some examples of the bound (6.21) of theorem 6.11 compared to thetrue norm. The parameters are explained in the text. All λmax entries should be3 times 10 to an integer power, exceptions are due to numeric ill-conditioning.Note that the bound is very tight where the inequality (6.22) is tight.

6.13 ExampleAnother way to illustrate the bound (6.21) is to generate a large number of randominstantiations of X−1 and plot the true norm of ‖X‖2 against the bound. Since ascaling of X−1 results in the same scaling of the bound, the scaling should be fixedto make the two-dimensional nature of the plot meaningful (otherwise, a histogramover the ratios would be a better illustration). Here, the freedom to scale is usedto fix the value of max

(X−1

), making the x-values in the plot a monotone function

of λmax(X ). Figures 6.2 and 6.3 show the result of this procedure for two differentvalues of n.

Looking at the figures — figure 6.3 in particular — it is tempting to conclude thatthere must be a possibility to tighten the bound by some factor which is independentof λmax(X ).


104 106 108 1010 1012

104

106

108

1010

1012

bound

‖X‖2

Figure 6.2: The true norm in (6.21) is plotted against the corresponding boundfor n = 2 and a fixed value of max

(X−1

), for 5000 random instantiations of X−1.

In order to obtain the fixed value (here 1) for max(X−1

), each matrix X−1 was

generated via an intermediate matrix Y according to X−1 = max(Y )−1 Y , wherethe entries of Y were sampled from independent uniform distributions over theinterval [−1, 1 ].

103 106 109 1012 1015103

106

109

1012

1015

bound

‖X‖2

Figure 6.3: Analogous to figure 6.2, but with n = 3.


Re

Im

R0

a m−1

φ0

Figure 6.4: The region (white color) in the complex plane where the eigenvaluesare assumed to reside, illustrating the condition (A1–6.26). The slow dynamicsare assumed to have eigenvalues smaller than R0, while the fast and uncertaindynamics are assumed to have eigenvalues smaller than a m−1 and damping noless than cos(φ0). The dashed line emphasizes the upper bound on the real partof all eigenvalues larger than R0.

6.5 An LTI ODE result

To simplify matters and to obtain results that are not limited to dae, we shall beginby assuming that the invertible −Aη is the identity matrix, leading to the fundamentalquestion of bounding the initial condition response of the system

E η′(t) != η(t) (6.24)

given stability, a bound on E, and some additional assumptions which remain to bestated.

Although a bound on the induced matrix norm, ‖E‖2 would be convenient for theanalysis, we observe that it is much more useful from an application point of viewto assume a bound on the maximum size of any entry in E, and this was why the-orem 6.11 was formulated accordingly. However, since max(E) ≤ ‖E‖2, any resultwhich assumes max(E) ≤ m immediately follows if ‖E‖2 ≤ m. Hence, the results be-low can readily be rewritten using an ordinary induced norm instead of max(•), butdoing so would introduce additional slack in the derived inequalities.

In the rest of this section, (6.24) is written in ode form,

η′(t) != M η(t) (6.25)

and we let m > 0 be the known bound on max(M−1

), that is, max(E) ≤ m. We are

ultimately interested in giving conditions under which the solutions converge as theupper bound m on max(E) tends to zero.

We shall assume that the poles of the dynamic system being modeled satisfy thefollowing condition, illustrated in figure 6.4,

6.5 An lti ode result 165

A1–[6.14] Assumption. Let λ denote any eigenvalue of M. Assume there exist con-stants R0 > 0, φ0 ∈ [ 0, π/2 ), and a > 1 such that

|λ| m < a and |λ| > R0 =⇒ |arg(−λ)| ≤ φ0 (A1–6.26)

where m is an upper bound for max(M−1

), and a presents a trade-off between gen-

erality of the assumption and the quantitative properties of the forthcoming conver-gence results.

Note that if A1–[6.14] is acceptable for the value of m at hand, the assumption willonly become weaker (the feasible region for λwill grow) as we imagine smaller valuesof m. It would be a subtle mistake to propose assumptions which are not satisfied forany M, which motivates the following simple lemma.

6.15 Lemma. The condition a > 1 for (A1–6.26) is sufficient (but not necessary) forthe assumption to be possible to fulfill with some M, for arbitrarily small m. It isnecessary that a ≥ n−1.

Proof: Sufficiency� follows by noting that M = −m−1 I obviously satisfies

max(M−1

)≤ m

with |arg(−λ)| = 0 and |λ| m = 1 for every eigenvalue λ.

That the condition a > 1 is not necessary — at least not for n = 2 and φ0 ≥ π/4 — isdemonstrated by the following example,

M =(−m m−m −m

)−1

with |λ| m = 1/√

2 and |arg(−λ)| = π/4 for both eigenvalues.

Necessity of a ≥ n−1 is a consequence of any eigenvalue being greater than∥∥∥M−1∥∥∥−1

2≥

(nmax

(M−1

) )−1≥ (nm )−1

so for any eigenvalue λ it holds that |λ| m ≥ n−1, and hence a < n−1 would be acontradiction.

The section now ends with a bound on the initial condition response of (6.25).

� It should be noted here that the uncertainty model max(M−1

)≤ m is often just a coarse over-estimation

of a more fine-grained model with individual uncertainty bounds on each matrix entry. Hence, someof the best known uncertainty intervals for individual matrix entries may be much smaller than m, andthe matrix used to prove sufficiency here may actually not fall within these bounds. Note then, that thecoarser uncertainty model has a family of solution trajectories which includes those of the more fine-grained uncertainty model. Hence, at some point, the fine-grained uncertainty model may be abandoned,and the coarser uncertainty model which is more tractable be used instead — it is in this situation thepresent lemma is to be applied.


6.16 Theorem (Main theorem for ode). Consider the ordinary differential equa-tion

x′(t) = Mx(t) (6.27)

where max(M−1

)≤ m.

Assuming A1–[6.14], there exist constants m0 > 0 and γ < ∞ such that

m < m0 =⇒ supt≥0‖x(t)‖2 ≤ γ ‖x(0)‖2

Proof: In view of theorem 2.27, it is seen that the initial condition response of (6.27)gets bounded if

‖M‖2−α(M )

=m ‖M‖2−mα(M )

can be bounded.

To see that the denominator is bounded from below by a constant independent ofm, if m is sufficiently small, we use A1–[6.14] and lemma 6.15. Then, any eigen-

value is greater than(nmax

(M−1

) )−1≥ (nm )−1, showing that m < ( n R0 )−1 implies

that all eigenvalues are greater than R0, and hence −mα(M ) > m ( nm )−1 cos(φ0 ) =n−1 cos(φ0 ).

That the numerator m ‖M‖2 can be bounded from above follows from theorem 6.11,as it shows that

m ‖M‖2 ≤ a√

( n a + 1 )2 n + 2 n ( n a + 2 ) − 1n a + 2

(6.28)

Combining the bound for the denominator with the bound for the numerator, oneobtains

‖M‖2−α(M )

≤ n an a + 2

√( n a + 1 )2 n + 2 n ( n a + 2 ) − 1

cos(φ0 )(6.29)

6.17 Remark. Note the trade-off present in the selection of a. If a is selected very large, (A1–6.26) is easier to justify, while the bound on the gain of the initial condition response becomeslarger. At the other end, as a → n−1, (A1–6.26) becomes increasingly restrictive (recall thatlemma 6.15 does not even ensure consistency for a < 1), while our bound for the gain of theinitial condition response tends to (here γ is referring to the notation of theorem 2.27)

∥∥∥eM t∥∥∥

2 ≤ γ( n )

√

22 n + 6 n − 13 cos(φ0 )

n−1

= γ( n )

√

22 n + 6 n − 13

n−11

cos(φ0 )n−1

The part of this expression that only depends on n is 1 for n = 1, 4.3 for n = 2, andgrows rapidly. For n = 5, it is 9.7 · 105, so even for very well dampened systems for which1/ cos(φ0 )n−1 ≈ 1, the bound will be huge. This highlights the qualitative nature of thiswork; the quantitative relation implied by the proof of theorem 6.16 will give so poor errorbounds for the solution that they are rarely meaningful to apply. This problem can be handled

6.6 The fast and uncertain subsystem 167

both by removing slack in the derivation of bounds under the current assumptions, or takingadvantage of stronger assumptions.

6.6 The fast and uncertain subsystem

As we now return to dae after the two preceding sections on matrices and ode, wemake the assumption that the uncertain dae is pointwise index� 0. This assumptionwill be removed in section 6.8.

Let us now give A1–[6.14] a new interpretation when considering the differential-algebraic equation

E x′(t) + A x(t) != 0 (6.30)

where A is known and non-singular (the unknown but regular case will be consideredsoon), while E is unknown but assumed pointwise non-singular (by definition ofpointwise index). For this system, the eigenvalues λ in A1–[6.14] naturally refer tothe eigenvalues of ( E, A ), and m is an upper bound for max(E). In this setup, werequire that a > max(A).

6.18 Lemma. The condition a > max(A) for (A1–6.26) in the context of (6.30) issufficient for the assumption to be possible to fulfill with some E, for arbitrarilysmall m.

Proof: Just take E = mmax(A) A. This satisfies max(E) ≤ m with all eigenvalues of

( E, A ) equal to −max(A)m−1.

We now obtain a corollary to theorem 6.16 by making minor changes to its proof.

6.19 Corollary. Consider (6.30). Assuming (A1–6.26) with a > max(A), there existconstants m0 > 0 and γ < ∞ such that

m < m0 =⇒ supt≥0‖x(t)‖2 ≤ γ ‖x(0)‖2

Proof: Compare the proof of theorem 6.16. Writing the equation as an ode,

x′(t) = −E−1A x(t)

the ratio which needs to be bounded is seen to be∥∥∥−E−1A∥∥∥

2

−α(−E−1A )=

m∥∥∥−E−1A

∥∥∥2

−mα(−E−1A )

For the denominator, any eigenvalue is greater than∥∥∥∥(−E−1A)−1 ∥∥∥∥−1

2≥

∥∥∥A−1∥∥∥−1

2(mn )−1

� Recall definition on page 35.


so for sufficiently small m, any eigenvalue will be greater than R0, and hence

−mα(−E−1A

)≥ m

∥∥∥A−1∥∥∥−1

2(mn )−1 cos(φ0 ) =

∥∥∥A−1∥∥∥−1

2n−1 cos(φ0 )

In order to apply theorem 6.11 for the numerator, we note that

|λ| max((−E−1A

)−1)≤ |λ|

∥∥∥A−1E∥∥∥

2≤ |λ|

∥∥∥A−1∥∥∥

2mn ≤ a

∥∥∥A−1∥∥∥

2n C a∗

yielding

m∥∥∥−E−1A

∥∥∥2≤ a

√( n a∗ + 1 )2 n + 2 n ( n a∗ + 2 ) − 1

n a∗ + 2

Combining denominator and numerator bounds, one obtains∥∥∥−E−1A∥∥∥

2

−α(−E−1A )≤ a∗

n a∗ + 2

√( n a∗ + 1 )2 n + 2 n ( n a∗ + 2 ) − 1

cos(φ0 )

6.20 Lemma. Compare the definition of Aη in (6.13). The conclusion of corol-lary 6.19 still holds if (6.30) is replaced by

E x′(t) + A( E ) x(t) != 0 (6.31)

where the analytic A satisfies a Lipschitz condition in a neighborhood of zero, andA( 0 ) is without uncertainty and non-singular.

Proof: By corollary 2.47 it follows that we can choose m0 so small that∥∥∥A( E )−1

∥∥∥2

can be bounded. The conclusion is now reached by following the steps of the proofof corollary 6.19.

6.7 The coupled system

In Kokotović et al. (1986), results come in two flavors; one where approximations arevalid on any finite time interval, and one where stability of the slow dynamics in thesystem makes the approximations valid without restriction to finite time intervals.Compare lemma 2.34 and lemma 2.33, for the respective cases. Here, only finite timeintervals are considered, but the other case is treated just as easily.

Recall from section 6.1 how transformations of bounded condition number wereused to bring the original system (6.1) into the matrix-valued singular perturbationform (6.5), repeated here,

x′(t) + A11(E) x(t) + A12(E) z(t) != 0 (6.5x)

E z′(t) + A21(E) x(t) + A22(E) z(t) != 0 (6.5z)

6.7 The coupled system 169

Let the x be the solution to

x′ = (A11(0) + A12(0) L( 0 ) ) x x(0) = x0 (6.32)

where L( 0 ) = −A22(0)−1 A21(0) according to (6.8), and let the solution to (6.5) at timet be denoted x( t, E ), z( t, E ). For E = 0 (the nominal system), we have

x( t, 0 ) = x(t) (6.33z)

z( t, 0 ) = L( 0 ) x(t) (6.33z)

Summarizing the result of previous sections leads to the following theorem.

6.21 Theorem (Main theorem for pointwise index 0). Consider the form (6.5)where E is pointwise non-singular, but otherwise unknown. The matrix expressionsAij (E) have to satisfy a Lipschitz condition with respect to E, and A22(0) is non-singular (that is, the nominal dae is index 1). Assume that the initial conditions areconsistent with E = 0, and that A1–[6.14] holds with a > max(A). Let I = [ 0, tf ] be afinite interval of time. Then

supt∈I|x( t, E ) − x( t, 0 )| = O(max(E)) (6.34x)

supt∈I|z( t, E ) − z( t, 0 )| = O(max(E)) (6.34z)

Proof: Define L( E ) and H( E ) as in section 6.3, and consider the solution in terms ofξ and η in (6.16). According to lemma 6.4,

∣∣∣η( 0, E )∣∣∣ = O( ‖E‖2 ) = O( max(E) ), and

then lemma 6.20 shows that supt≥0

∣∣∣η( t, E )∣∣∣ = O( max(E) ).

Note that x( t, 0 ) coincides with ξ( t, 0 ), so the left hand side of (6.34x) can bebounded as

supt∈I|x( t, E ) − x( t, 0 )| = sup

t∈I

∣∣∣ξ( t, E ) + H( E ) E η( t, E ) − ξ( t, 0 )∣∣∣

≤ supt∈I|ξ( t, E ) − ξ( t, 0 )| + O( max(E)2 )

To see that the first of these terms is O( max(E) ), note first that lemmas 6.4 and 6.7give that the initial conditions for ξ are only O( max(E)2 ) away from x0. Hence,the restriction to a finite time interval gives that the contribution from initial con-ditions is negligible. The difference between the state feedback matrix of ξ( •, E )and ξ( •, 0 ) in (6.16) is seen to be O( max(E) ) by using the Lipschitz conditions inA11(E) + A12(E) L( E ). Hence, the contribution from perturbation of the state feed-back matrix for ξ is O( max(E) ) according to lemma 2.34.

Concerning z,

supt∈I|z( t, E ) − z( t, 0 )| = sup

t∈I

∣∣∣z( t, E ) + A22(0)−1A21(0) x( t, 0 )∣∣∣

≤ supt∈I

∣∣∣z( t, E ) + A22(0)−1A21(0) x( t, E )∣∣∣

+ supt∈I

∣∣∣A22(0)−1A21(0) ( x( t, 0 ) − x( t, E ) )∣∣∣


The proof is completed by noting that

supt∈I

∣∣∣A22(0)−1A21(0) ( x( t, 0 ) − x( t, E ) )∣∣∣ ≤ ∥∥∥A22(0)−1A21(0)

∥∥∥2O( max(E) )

= O( max(E) )

and

supt∈I

∣∣∣z( t, E ) + A22(0)−1A21(0) x( t, E )∣∣∣ ≤ sup

t∈I|z( t, E ) − L( E ) x( t, E )|

+ O( max(E) ) supt∈I|x( t, E )|

= supt∈I

∣∣∣η( t, E )∣∣∣ + O( max(E) ) sup

t∈I|x( t, E )|

= O( max(E) )

since |x( t, E )| can be bounded over any finite time interval.

An immediate consequence of theorem 6.21 is the establishment of an O(

max(E22

) )bound for the error introduced by neglecting E22 in (6.2). From a practical point ofview, though, this observation appears to be only of minor interest since determiningA22(E) (or at least A22(0)) seems necessary for any quantitative analysis of the fastand uncertain dynamics.

6.8 Extension to non-zero pointwise index

With the exceptions of some results (including lemmas 6.3, 6.4, 6.7, and 6.8), the re-sults so far require, via lemma 6.20, that E (or E22 in (6.2)) be pointwise non-singular.However, we are able to obtain some results also when some singular values of E areexactly zero. To that end, the results of the previous section will be extended to thissituation by revisiting the relevant proofs.

Since there are only finitely many choices of rank for E (that is, how many non-zerosingular values there are), showing convergence for an arbitrary value of the rankimmediately implies convergence independently of the rank.

6.22 Lemma. (Compare lemma 6.20.) In addition to the assumptions of lemma 6.4,assume the perturbed dae has pointwise index no more than 1, and that its poles(that is, the finite eigenvalues of the associated matrix pair) satisfy A1–[6.14]. Then,

supt≥0

∣∣∣E η( t, E )∣∣∣ = O( max(E)2 )

Proof: The case of pointwise index 0, when E is full-rank, was treated in lemma 6.20,so it remains to consider the case of pointwise index 1. When the rank of E is zero,E = 0 and it is immediately seen from (6.7) that η must be identically zero and theconclusion follows trivially. Hence, assume that the rank is neither full nor zero and

6.8 Extension to non-zero pointwise index 171

let

E =[U1(E) U2(E)

] [Σ(E) 00 0

] [V1(E)T

V2(E)T

]be an SVD of E where Σ(E) is pointwise non-singular and of known dimensions.

Applying the unknown change of variables η = V (E)(η1η2

)and the row operations

represented by U (E)T, (6.7) turns into (dropping E from the notation)I Σ 00 0

ξ′(t)η′1(t)η′2(t)

+

A11 + A12 L A12 V1 A12 V2K22 K23K32 K33

ξ(t)η1(t)η2(t)

!= 0

where, for instance and in particular,

K334= UT

2A22 V2 − UT2E LA12 V2

= UT2A22 V2

Since the dae is known to be pointwise index 1, differentiation of the last group ofequations shows that K33(E) is non-singular, and hence the change of variables(

η1(t)η2(t)

)=

[I 0

−K33(E)−1K32(E) I

] (¯η1(t)¯η2(t)

)(6.35)

leads to the dae in ( ξ, ¯η1, ¯η2 ) with matrix pairI Σ 0

0 0

,A11 + A12 LE A12 V1 − A12 V2 K

−133K32 A12 V2

K22 − K23 K−133K32 K23

0 K33

It is seen that ¯η2 = 0 and that ¯η1 is given by an ode with state feedback matrix

M ¯η1(E) 4= −Σ(E)−1

(K22(E) − K23(E)K33(E)−1K32(E)

)Just like in lemma 6.20 it needs to be shown that the eigenvalues of this matrix growas max(E)−1, but here we need to recall that E is not only present in Σ(E), but also inthe unknown unitary matrices U (E) and V (E).

Let m be a bound on max(E). Then ‖Σ(E)‖2 = ‖E‖2 ≤ mn.

From the partial block matrix inversion formula[K22(E) K23(E)K32(E) K33(E)

]−1

=

(K22(E) − K23(E)K33(E)−1K32(E))−1

?? ?

it follows that∥∥∥∥∥∥

[K22(E) K23(E)K32(E) K33(E)

]−1 ∥∥∥∥∥∥2

≥∥∥∥∥(K22(E) − K23(E)K33(E)−1K32(E)

)−1 ∥∥∥∥2


and hence ∥∥∥∥(K22(E) − K23(E)K33(E)−1K32(E))−1 ∥∥∥∥

2

≤∥∥∥∥(U (E)T

(A22(E) − E L( E )A12(E)

)V (E)

)−1 ∥∥∥∥2

=∥∥∥∥V (E)T

(A22(E) − E L( E )A12(E)

)−1U (E)

∥∥∥∥2

=∥∥∥∥(A22(E) − E L( E )A12(E)

)−1∥∥∥∥2

(6.36)

This means that the eigenvalues are bounded from below by∥∥∥M ¯η1(E)−1

∥∥∥−12

=∥∥∥∥(K22(E) − K23(E)K33(E)−1K32(E)

)−1Σ(E)

∥∥∥∥−1

2

≥ m−1n−1∥∥∥∥(A22(E) − E L( E )A12(E)

)−1∥∥∥∥−1

2

and just like in lemma 6.20 the expression gives that the eigenvalues of M ¯η1(E) grow

as m−1. Before reaching the same conclusion as in lemma 6.20, it remains to showthat the constant a∗ in the proof of corollary 6.19� can be chosen finite, but this alsofollows from (6.36). Hence, E can be chosen sufficiently small to make supt≥0

∣∣∣ ¯η1(t)∣∣∣

bounded by some factor times∣∣∣ ¯η1(0)

∣∣∣. Further,∣∣∣ ¯η1(0)∣∣∣ =

∣∣∣∣∣∣(

¯η1(0)¯η2(0)

) ∣∣∣∣∣∣ =

∣∣∣∣∣∣(η1(0)

0

) ∣∣∣∣∣∣ ≤∣∣∣∣∣∣(η1(0)η2(0)

) ∣∣∣∣∣∣ =∣∣∣η0( E )

∣∣∣ = O( max(E) ) (6.37)

Using this, the conclusion finally follows by taking such a small bound m on max(E),∣∣∣E η( t, E )∣∣∣ =

∣∣∣∣∣∣E V(

I 0−K−1

33 K32 I

) (¯η1(t)

0

) ∣∣∣∣∣∣≤

∥∥∥∥∥∥U(Σ 00 0

)V TV

(I 0

−K−133 K32 I

) ∥∥∥∥∥∥2

O( max(E) )

=

∥∥∥∥∥∥(Σ 00 0

) ∥∥∥∥∥∥2

O( max(E) ) = O( max(E)2 )

6.23 Corollary. Lemma 6.22 can be strengthened when z has only two components.Then, just like in lemma 6.20, the conclusion is

supt≥0

∣∣∣η( t, E )∣∣∣ = O( max(E) )

Proof: The only rank of E that needs to be considered is 1, and then ¯η1 will be ascalar and commute with the corresponding transition matrix φ ¯η1

. From (6.35) andthe last two equalities of (6.37) it follows that

∣∣∣K33(E)−1 K32(E) ¯η1(0)∣∣∣ = O( max(E) ),

� The matrix A in the proof of corollary 6.19 is the trailing matrix of (6.30), here corresponding to thetrailing matrix of the dae for ¯η1.

6.8 Extension to non-zero pointwise index 173

and hence

supt≥0

∣∣∣∣∣∣(η1(t)η2(t)

) ∣∣∣∣∣∣ = supt≥0

∣∣∣∣∣∣(

¯η1(t)−K33(E)−1 K32(E)φ ¯η1

( t, 0 ) ¯η1(0)

) ∣∣∣∣∣∣≤ sup

t≥0

∣∣∣ ¯η1(t)∣∣∣ +

∣∣∣K33(E)−1 K32(E) ¯η1(0)∣∣∣ supt≥0

∥∥∥φ ¯η1( t, 0 )

∥∥∥2

= O( max(E) )

It just remains to use∣∣∣η(t)

∣∣∣ =∣∣∣η(t)

∣∣∣. An alternative proof is included as a footnote.�

Theorem 6.21 can be extended as follows.

6.24 Theorem. Consider the setup (6.5), but rather than assuming that E be point-wise non-singular, it is assumed that E is a matrix with max(E) ≤ m, and that thedae has pointwise index no more than 1. Except regarding E, the same assumptionsthat were made in theorem 6.21 are made here. Then

supt∈I|x( t, E ) − x( t, 0 )| = O( max(E) ) (6.38x)

supt∈I|E [z( t, E ) − z( t, 0 ) ] | = O( max(E)2 ) (6.38z)

where the rather useless second equation is included for comparison with theo-rem 6.21.

Proof: Define L(E) and H(E) as above, and consider the solution expressed in thevariables ξ and η. Lemma 6.22 shows how E η is bounded uniformly over time. Notethat x( t, 0 ) coincides with ξ( t, 0 ), so the left hand side of (6.38x) can be bounded as

supt∈I|x( t, E ) − x( t, 0 )| = sup

t∈I

∣∣∣ξ( t, E ) + H( E ) E η( t, E ) − ξ( t, 0 )∣∣∣

≤ supt∈I|ξ( t, E ) − ξ( t, 0 )| + O( max(E)2 )

The conclusion concerning x then follows by an identical argument to that found inthe proof of theorem 6.21.

For the weak conclusion regarding E z, the relation z = L( E ) x + η together with(6.38x) and lemma 6.22 immediately yields the bound.

� The alternative proof uses that K33(E)−1 K32(E) is a scalar, and hence of perfect condition number. Itfollows that

supt≥0

∣∣∣η2(t)∣∣∣ ≤ ∥∥∥K33(E)−1 K32(E)

∥∥∥2 supt≥0

∣∣∣ ¯η1(t)∣∣∣

≤∥∥∥K33(E)−1 K32(E)

∥∥∥2 supt≥0

∥∥∥∥φ ¯η1 ( t, 0 )∥∥∥∥

2

∣∣∣ ¯η1(0)∣∣∣

≤ supt≥0

∥∥∥∥φ ¯η1 ( t, 0 )∥∥∥∥

2

∥∥∥K33(E)−1 K32(E)∥∥∥

2

∥∥∥∥∥(K33(E)−1 K32(E))−1

∥∥∥∥∥2︸︷︷︸

=1

∣∣∣η2(0)∣∣∣


The following result reminds of the example concerning multiple time scale singularperturbations given in Abed and Tits (1986), see section 2.5.3. That is, this is not thefirst time in the history of singular perturbations that a result has been shown onlyin the case when the fast dynamics has dimension two. Unlike Abed and Tits (1986),however, we have not been able to show that the result holds only in this case.

6.25 Corollary. Theorem 6.24 can be strengthened in case z has only two compo-nents. Then (6.38z) is replaced by

supt∈I|(z( t, E ) − z( t, 0 ) ) | = O( max(E) )

Proof: Follows by repeating the argument of theorem 6.21 using corollary 6.23.

6.26 Remark. Regarding the failure to show convergence in z unless it has at most two com-ponents: Having excluded the possibility of bounding η by looking at the matrix exponentialalone, it remains to explore the fact that we are actually not interested in knowing the maxi-mum gain from initial conditions to later states of the trajectory of η. That is, since the initialconditions are a function of E, it might be sufficient to maximize over a subset of initial condi-tions. Here, it is expected that lemma 6.8 will come to use. Compare section 7.4.

6.9 Examples

The examples given here are primarily meant to illustrate the convergence propertybeing established in this chapter. We shall consider an uncertain dae and plot tra-jectories of randomly selected instances of the dae which are in agreement with theassumptions proposed in this chapter. In order to make a close connection to theory,the spread of these trajectories should be related to the bounds which can be con-structed from the proofs herein. However, as has been indicated above, these boundswill be overly pessimistic, and as we work them out it will be clear that they givebounds which do not fit into our plots. Again, this stresses the qualitative nature ofour results.

The first example is constructed from the index 1 dae in the variable x, with matrixpair (written A

E ) (1.3 0.17 4.6 · 10−2

0.34 0.66 0.660.87 0.83 0.14

)(

1 1 10 0 00 0 0

)with the only finite eigenvalue −2.32. By operating on the equation with randomorthogonal matrices from both sides, an equally well behaved system of equationsshould be obtained. The rows and columns are ordered so that the best pivot entryis at the (1, 1) position — otherwise numerics would come out worse than necessary.Finally an interval uncertainty of ±1.0 · 10−2 is added to each matrix entry. This results

6.9 Examples 175

in the pair (0.21±1 · 10−2 1.3±1 · 10−2 3.2 · 10−2±1 · 10−2

0.69±1 · 10−2 0.38±1 · 10−2 0.65±1 · 10−2

0.85±1 · 10−2 0.78±1 · 10−2 0.14±1 · 10−2

)(

1.1±1 · 10−2 0.95±1 · 10−2 0.99±1 · 10−2

6.5 · 10−2±1 · 10−2 5.9 · 10−2±1 · 10−2 6.2 · 10−2±1 · 10−2

−5.2 · 10−2±1 · 10−2 −4.7 · 10−2±1 · 10−2 −4.9 · 10−2±1 · 10−2

)

6.27 Example

To illustrate theorem 6.21, we first take the equations into the form (6.5), propagatinginterval uncertainties,(

0.2±1.1 · 10−2 1.1±3.5 · 10−2 −0.16±2.4 · 10−2

0.68±1.3 · 10−2 −0.31±4.8 · 10−2 8.7 · 10−3±3.6 · 10−2

0.86±1.3 · 10−2 6.5 · 10−2±5.1 · 10−2 −0.68±3.9 · 10−2

)( 1 0 0

0 −1.9 · 10−4±2 · 10−2 −2 · 10−4±2.1 · 10−2

0 1.9 · 10−4±2 · 10−2 1.9 · 10−4±2 · 10−2

) (6.39)

This gives us the bound m for max(E), and noting that what is A in (A1–6.26) will beclose to A22(E) in (6.5), we get approximately the lower bound for a. Picking valuesfor a, φ0, and with R0 = 5 (this is big enough to encompass the slow eigenvalue, aswill be seen soon), the region where eigenvalues are allowed has been determined.Ignoring the O(max(E)) terms in the transform to the form (6.7), we still obtain anapproximation of the interval uncertainties in (6.7), and are thus able to obtain anapproximate interval for the eigenvalue of the slow dynamics. It turns out to be −2.5±1.2, so at least the uncertainties do not destroy the stability of the slow dynamics.

Next, the interval uncertainties in (6.5) are replaced by independent rectangular ran-dom variables, and the random equation is then sampled. The samples in the leadingmatrix are scaled such that max(E) = m (expecting the constraint max(E) = m to beactive in the worst case). Samples which do not fulfill the eigenvalue condition arerejected, while valid samples are used to compute and plot the trajectory of x1 to givean idea of the uncertainty in the solution, see figure 6.5.

For comparison, we outline how the bound based on the proofs herein can be approx-imated. Since the uncertain leading matrix of the fast dynamics will be of the sameorder of magnitude as the interval uncertainty, and since this will also be the order ofmagnitude of the initial conditions for the fast and uncertain dynamics, the crucialquestion is how the initial condition response gain of the fast and uncertain dynam-ics can be bounded. The tightest bound is obtained with a∗ = max(A)

∥∥∥A−1∥∥∥

2n,

where A is the trailing matrix of the fast and uncertain dynamics. Conservativelyover-estimating each of max(A) and

∥∥∥A−1∥∥∥

2independently of the other (where the

latter norm is approximated using local optimization), it is found that a∗ < 17.5. Thiscorresponds to a gain of 4.2 · 104. The inverse of this gain is an upper bound for theorder of magnitude that can be tolerated in the original uncertainty, if the computeduncertainty estimates shall be at all usable. Being two orders of magnitude smallerthan the size of the uncertainties used to generate figure 6.5, this appears overly con-servative.


0 0.5 1 1.50

0.5

1

Figure 6.5: Random trajectories from systems in the form (6.5), satisfying theeigenvalue condition. The uncertainty intervals around each region is due to theuncertainty in the change of variables leading to (6.5).

We shall now consider the same uncertain dae once more. This time, we willbring the equations into a form where theorem 6.16 can be applied instead ofcorollary 6.19.

In reaching (6.39), uncertain row and column operations had to be performed on theequations. Since the elimination operations are given by simple rational expressionsin the uncertain entries of the matrices, it is straightforward to compute the uncer-tainty in the transforms. At this point, however, it is time to apply the decouplingtransforms, being given as the solution to second order matrix equations, and ac-curately computing the resulting interval uncertainties is non-trivial. We note thefollowing — two approximate and one conservative — options

• Compute the nominal solution, and then find an approximate interval solutionby solving the first order approximations at the nominal solution.

• Compute the nominal solution, and use it as a starting point in a local optimiza-tion method which optimizes the lower and upper interval bounds for each en-try in the solution. Though a theoretically sound approach, this method has thedisadvantages that it is both time consuming and relies on local optimizationin possibly non-convex problems.

• Derive L and H using the same technique as is done later in section 7.A. Thenouter solutions for the uncertainties follow from the constructive style of thefixed-point argument, and iterative refinement may then be applied to decreasethe uncertainties while maintaining the property of being outer approxima-tions. This technique will be applied in example 7.3 on page 194.

Since the only conservative option relies on a technique which was not used in thepresent chapter, we opt for one of the approximations in the present section. Forthe particular problem of finding the interval solution to the matrix inverse problem,

6.9 Examples 177

0

0.5

1

0.5 1 1.5−1

−0.5

00.5 1 1.5

−0.6

−0.4

−0.2

00.5 1 1.5x1

t

x2 t x3 t

Figure 6.6: Bounds on the uncertainty in the solution. That the bounds do notconverge as the solution components tend to zero is a consequence of the em-phasis being on uniform convergence. Converging bounds are easily obtainedby using the time dependency of the bound in theorem 2.27.

examples show that the two approximation approaches often produce very similarresults, while the result as computed by Mathematica is distinctively more conserva-tive. Even though we are just computing approximate solutions, we shall use the re-sult computed by Mathematica since it presumably uses techniques which have beenmore carefully developed than the two approximations listed above. In case a firstorder approximation leads to a problem which can be solved via matrix inversion, wewill do so.

Now we have the tools to proceed with the decoupling transforms.

6.28 Example

In this example, we will be able to derive bounds on the uncertainty in the solutionto the coupled system, which — if not tight — are at least possible to visualize inthe same plots as the nominal solution. To this end we will use smaller intervaluncertainties than in the previous example; instead of ±1.0 · 10−2 we add just ±1.0 · 10−6

to the unperturbed matrix entries. Instead of (6.39), we now obtain(0.2±1.1 · 10−6 1.1±3.5 · 10−6 −0.16±2.4 · 10−6

0.68±1.3 · 10−6 −0.31±4.8 · 10−6 9.1 · 10−3±3.6 · 10−6

0.86±1.3 · 10−6 6.5 · 10−2±5 · 10−6 −0.68±3.9 · 10−6

)( 1 0 0

0 −1.9 · 10−12±2 · 10−6 −2 · 10−12±2.1 · 10−6

0 1.9 · 10−12±2 · 10−6 1.9 · 10−12±2 · 10−6

) (6.40)

Recall the equations for the decoupling transforms, (6.9) and (6.17). In the nota-tion of these equations, knowing that L( E ) = −M−1

22M21 + O( max(E) ) and H( E ) =M12 M

−122 + O( max(E) ), the first order approximations are seen to be

0 != A21 + A22 L + E M−122M21

(A11 − A12 M

−122M21

)0 !=

(A11 + A12 L( 0 )

)M12 M

−122 E + A12 − H

(A22 − E L( 0 )A12

)


These equations are solved using matrix inversion, and after application of the twodecoupling transforms, the pair becomes(

2.3±1.8 · 10−4 0 00 −0.31±1.3 · 10−5 9.1 · 10−3±4.8 · 10−6

0 6.5 · 10−2±1.3 · 10−5 −0.68±5.1 · 10−6

)( 1 0 0

0 −1.9 · 10−12±2 · 10−6 −2 · 10−12±2.1 · 10−6

0 1.9 · 10−12±2 · 10−6 1.9 · 10−12±2 · 10−6

) (6.41)

As a last step, a matrix inverse is applied to the rows of the fast and uncertain dy-namics to bring it into the form of theorem 6.16;(

2.3±1.8 · 10−4 0 00 1 00 0 1

)( 1 0 0

0 6.1 · 10−12±6.6 · 10−6 6.2 · 10−12±6.7 · 10−6

0 −2.2 · 10−12±3.6 · 10−6 −2.3 · 10−12±3.7 · 10−6

) (6.42)

In the previous example, the initial conditions were never stated explicitly. Giventhe unperturbed system at hand, the set of initial conditions which are valid for ar-bitrarily small perturbations form a one-dimensional linear space. Fixing the firstcomponent to 1 implies x0 =

(1. −0.923 −0.619

). Transforming to the variables of

(6.42), the initial conditions are given by(ξ0

η0

)=

−0.415 ± 7.87 · 10−5

1.95 · 10−9 ± 3.4 · 10−4

1.59 · 10−9 ± 2.52 · 10−4

, ∣∣∣η0∣∣∣ ≤ 4.24 · 10−4

With n = 2, a = 1.1, and φ0 = 1.4, (6.29) used in theorem 2.27 gives the bound83.7 on the gain from η0 to η(t). This allows each of the components of η(t) to bebounded by a small constant. Concerning ξ(t), it is a scalar system, so upper andlower bounds are easy to compute given the intervals of ξ0 and the correspondingeigenvalue (seen in (6.42)). Inverting the variable transforms one by one�, we arefinally able to compute interval uncertainties in the original variables. The boundsare shown in figure 6.6.

6.10 Conclusions

The chapter has derived a matrix-valued singular perturbation problem related tothe analysis of uncertain lti dae of nominal index 1. The perturbation problem hasbeen solved using assumptions in terms of system features, namely its poles. De-pending on whether we also made the assumption that the dae be pointwise index 0or not, the convergence results come out a bit different except for when the fast anduncertain dynamics has at most two states.

The decoupling transformations related to the matrix-valued singular perturbation

� It is also possible to compute the aggregated variable transform by multiplying the individual transforms,and then compute just one inverse, but it turns out that this causes loss of precision compared to comput-ing the inverses of the individual transforms.

6.10 Conclusions 179

problem have been analyzed using asymptotic arguments. The reason for not de-ploying the more constructive methods used in the following chapters has been topreserve the style of the original published work that the chapter builds upon. (Ex-cept for that, though, the style of presentation has been changed substantially forbetter compatibility with the following chapters.)

The problem of bounding the norm of a matrix, given a bound on the moduli of itseigenvalues and an entry-wise max bound on its inverse, has been motivated as auseful tool in analysis of uncertain dae. A bound has been derived, and its qualityhas been addressed in examples.

An example was used to illustrate the usefulness of analyzing the equations in a formwhere the uncertainties of the fast and uncertain sub-system were brought entirelyto the leading matrix. This idea will appear again in the next chapter, when we setout to understand perturbed equations of nominal index 2.

Appendix

6.A Details of proof of lemma 6.8

This section proves that there exists a ρL > 0 such that the equation

0 != mRL(E)

− L0( E )A12(E)(L( E ) − L0( E )

)−(L( E ) − L0( E )

) (A11(E) + A12(E) L( E )

)appearing in lemma 6.8 has a solution satisfying ‖RL(E)‖2 ≤ ρL if max(E) is requiredto be sufficiently small.

The equation is written as the fixed-point form

RL(E) != TL(RL(E), E )

by defining

TL(RL(E), E ) 4=1mL0( E )A12(E)

(L( E ) − L0( E )

)+

1m

(L( E ) − L0( E )

) (A11(E) + A12(E) L( E )

)where the dependence on RL(E) is through L( E ).

From here on, the dependency of RL(E) and TL(RL(E), E ) on E is dropped fromthe notation, so instead of TL(RL(E), E ) we just write TL RL. Consider RL ∈ L ={RL : ‖RL‖2 ≤ ρL

}.

By the bounded derivative of all matrices in the problem that depend on E, it followsthat my requiring max(E) to be sufficiently small, it will be possible to find c3, c0, c11,

180

6.A Details of proof of lemma 6.8 181

c+11, c12 which fulfill ∥∥∥A22(E)−1

∥∥∥2≤ c3∥∥∥L0( E )

∥∥∥2≤ c0

‖A11(E)‖2 ≤ c11∥∥∥A11(E) + A12(E) L0( E )∥∥∥

2≤ c+

11

‖A12(E)‖2 ≤ c12

Applied to (6.18) these bounds yield∥∥∥L( E ) − L0( E )∥∥∥

2= m c3

(c0 c

+11 + mρL

)To ensure that TL maps L into itself, first note that

‖TL RL‖2 ≤1m

∥∥∥L( E ) − L0( E )∥∥∥

2c1

where upper bounds on m and ρL are used to ensure that c1 can be chosen to fulfill(dropping dependencies on E from the notation)∥∥∥L0 A12

∥∥∥2

+∥∥∥A11 + A12 L

0∥∥∥

2+ m c3

(c0 c

+11 + mρL

)‖A12‖2 ≤ c1

for the lowest of all bounds imposed on max(E).

Setting

ρL B ( 1 + αL ) c3 c0 c+11 c1 (6.43)

for some αL > 0, will then yield the condition

m ≤αL c0 c

+11

ρL=

αL

1 + αL

1c3 c1

(6.44)

for TL to map L into itself.

For the contraction part of the argument, let L1 and L2 denote the expressions forL( E ) corresponding to RL,1 and RL,2, respectively. Then

( L2 − L0 )(A11 + A12 L2 ) − ( L1 − L0 )(A11 + A12 L1 )

= ( L1 − L0 )A12 ( L2 − L1 ) + ( L2 − L1 ) (A11 + A12 L2 )

As (6.18) shows that L( E ) is affine in mRL with the matrix coefficient A22(E)−1Eacting from the left, one obtains

TL RL,2 − TL RL,1 =1mL0 A12 A

−122E

(mRL,2 −mRL,1

)+

1m

( L1 − L0 )A12 A−122E

(mRL,2 −mRL,1

)+

1mA−1

22E(mRL,2 −mRL,1

)(A11 + A12 L2 )

= L1 A12 A−122E

(RL,2 − RL,1

)+ A−1

22E(RL,2 − RL,1

)(A11 + A12 L2 )


Using upper bounds on m and ρL to ensure that cL may be chosen to fulfill

‖L‖2 ≤ c0 + m c3(c0 c

+11 + mρL

)≤ cL

one obtains (using nz to denote the dimension of E)∥∥∥TL RL,2 − TL RL,1

∥∥∥2≤ mnz c3 ( c11 + 2 c12 cL )

∥∥∥RL,2 − RL,1

∥∥∥2

Hence, the condition

m <1

nz c3 ( c11 + 2 c12 cL )(6.45)

implies that TL is a contraction on L, and the contraction principle (theorem 2.44)gives that there is a unique solution RL ∈ L.

This concludes the argument, and the section ends with an small example.

6.29 Example

To illustrate lemma 6.8, let us consider a small example with constant matrices in(6.5) given by

A11 =[

1. 0.50.1 1.

]A12 =

[0.1 0.5 2.0.5 0.1 1.

]

A21 =

−1. 1.3. 0.3

0.5 −0.5

A22 =

0.5 1. 2.0 1. 0.5−0.5 0.5 0

Sampling the matrix E randomly a large number of times, indicates that ρL might beas low as 3.5 · 103 for m < 1.0 · 10−3.�

Supposing (guided by the Monte Carlo analysis) that the bounds to be derived willshow that ρL ≤ 10 · 103, and require m to be less than 0.01, one obtains the followingnumeric values of the constants used in the proof of lemma 6.8.

c3 = 3.561 c0 = 7.598 c11 = 1.32 c+11 = 6.31

c12 = 2.311 c1 = 28.06 cL = 12.87

Taking αL = 0.5 yields

ρL = 7.185 · 103

and the following two bounds on m

m ≤ 3.336 · 10−3 and m < 1.54 · 10−3

These values are in accordance with the supposed bounds.

For a particular value of m, we may now use interval arithmetic fixed-point iterationto improve the bound on RL. In this example, m = 1.0 · 10−3 and five fixed point

� The entries of E are sampled from independent uniform distributions over [−m, m ]. For each E, we firstcompute L, and then solve for RL in (6.18). In case E is not full rank, it is still possible to solve for RL byusing pseudo inverse techniques.

6.A Details of proof of lemma 6.8 183

iterations results in

RL ∈

[−6.476 · 103, 6.476 · 103 ] [−1.817 · 102, 1.817 · 102 ][−5.322 · 103, 5.322 · 103 ] [−1.517 · 102, 1.517 · 102 ][−4.716 · 103, 4.716 · 103 ] [−1.331 · 102, 1.331 · 102 ]

The improvement of the entry-wise bounds is larger for smaller values of m.

7LTI ODE of nominal index 2

In chapter 6 the convergence of uncertain dae of true and nominal indices at most1 was considered. In this chapter, the nominal index 2 case will be considered. Asin the nominal index 1 case, the analysis depends on the true index, and to simplifymatters true indices higher than 0 will not be considered.

For many purposes, it turns out that it is useful to distinguish between dae basedon their index; lower index dae or higher index dae. Only index-0 and index 1 areconsidered low (recall that these have strangeness index 0), and these are generallyconsidered easy to deal with in comparison with the higher indices. Hence, the cur-rent chapter opens the door to the analysis of equations which are expected to bedifficult to deal with in comparison to the equations in previous chapters.

The chapter is organized as follows. In section 7.1 a canonical form for perturbedmatrix pairs of nominal index 2 is proposed. Section 7.3 contains an analysis of thegrowth of the uncertain eigenvalues as the uncertainties tend to zero, and the initialconditions of the fast and uncertain subsystem is the topic of section 7.2. Then, insection 7.4 we take a closer look at a very small index 2 system, and we will see boththat it is possible to prove convergence of solutions in this case, and that the index 2case really is a lot harder to deal with than the lower index systems. Section 7.5summarizes our conclusions from the chapter.

7.1 Canonical form

In chapter 6, we were able to analyze the equations without bringing the equationsinto a form where we could really define the size of the uncertainties; scaling rowsand columns in the equation could change the absolute size of the uncertainties arbi-

185


trarily. However, (6.16) is only a small step away from[I

E

] (ξ ′(t)η′(t)

)!=[Mξξ + O( max(E) )

I

] (ξ(t)η(t)

)where E is still O(m ), although different compared to (6.16). It would be possible toadd some more structure by, for instance, bringing Mξξ + O( max(E) ) into the formJξξ + O( max(E) ) where Jξξ would be the Jordan form of the nominal Mξξ , but sincethere are many choices, and the choice has no implications for our understanding ofthe pair (or dae) aspects of the matrix pair, we prefer to defer the choice of structurefor Mξξ + O( max(E) ). As this form can be reached from the original, coupled, equa-tion using row and column operations which are nominally non-singular, and withuncertainties of size O(m ), this may be considered a canonical form for perturbednominal index 1 lti dae— recall how the use of this form was intrumental for theimprovement in example 6.28 over example 6.27. In this section, a correspondingcanonical form is derived for lti dae of nominal index 2. The theorem is formulatedin terms of matrix pairs.

7.1 Theorem (Perturbed index 2 canonical form). Consider the parameterized setof uncertain matrix pairs

( E(m), A(m) ) , m ≥ 0

satisfying

E(m) − E0 = O(m )

A(m) − A0 = O(m )

for some point matrix pair(E0, A0

)of index 2.

Then there exist a number m0 > 0 and uncertain regular matrices T (m) and V (m)with

cond T (m) = O(m0 )

condV (m) = O(m0 )

such that for all m ≤ m0

( E(m), A(m) ) ⊂

T (m)

IIεE

33(m) εE34(m)

εE43(m) εE

44(m)

,J + εA

11(m)I

II εA

44(m)

V (m)−1

where J is a point matrix and

εEij = O(m ), i, j ∈ { 3, 4 }

εAii = O(m ), i ∈ { 1, 4 }

Proof: See section 7.1.1.

7.1 Canonical form 187

The canonical form will first be derived by assuming that the Weierstrass form ofthe nominal matrix pair is known. Except for the initial step where the Weierstrassdecomposition is used (see comment below), the form is derived constructively byprescribing a sequence of transformations which each bring some additional struc-ture to the matrix pair representing the equations. Each transformation is nominallyinvertible, with O(m ) uncertainty, m as usual being the entry-wise bound on thematrix entries in the original matrix pair. The sequence ends at a stage where theperturbed nominal index 2 nature is obvious and we are unable to add more struc-ture using nominally invertible transformations.

What makes the Weierstrass decoupling step non-constructive is that the nominalmatrix to be decomposed is typically not obvious in applications. Rather, the nominalmatrix is something which is selected as a means to obtain as small perturbationbounds as possible, and it is not until the pair has been transformed into a formwhich reveals more of the structure in the pair that the selection should take place.Motivated by the practical shortcomings of the derivation based on the Weierstrassform, the same form will be derived again using a sequence of steps which is possibleto use in applications.

Before we start, we also remark that the proposed form — similar to the Weierstrassform — may be best suited for theoretical arguments regarding perturbed matrixpairs. Once their theory is better understood, other forms based on approximate or-thogonal matrix decompositions may be both be more applicable (allowing for largeruncertainties) and able to deliver higher accuracy. We shall not explore such formsin this chapter, but the idea was present back in theorem 6.24.

7.1.1 Derivation based on Weierstrass decomposition

The Weierstrass canonical form (recall theorem 2.16) allows us to identify any nomi-nal index 2 dae with a pair in the form(

T

[I

N

]V −1 + E0, T

[JI

]V −1 + F0

)where T and V are non-singular matrices, and E0 and F0 (here, non-negative super-scripts are used as ornaments, while the superscript −1 denotes matrix inverse) arethe uncertain perturbations satisfying max

(E0

)≤ m and max

(F0

)≤ m. Note that

while the nominal index is 2, the pointwise index is generally 0 since the perturba-tion E0 generally makes the leading-matrix non-singular. By application of T −1 fromthe left and V from the right, the pair transforms into( [

I + E111 E1

12E1

21 N + E122

],

[J + F1

11 F112

F121 I + F1

22

] )where E1 = T −1E V = O(m ) and F1 = T −1F V = O(m ) since T and V were assumednon-singular point matrices.

Since I +E111 = I +O(m ), takingm sufficiently small will make it invertible according

to corollary 2.47, and applying the inverse as a small but uncertain row operation


produces ( [I E2

12E1

21 N + E122

],

[J + F2

11 F212

F121 I + F1

22

] )where, for instance,

F211 =

(I + E1

11

)−1 (J + F1

11

)− J =

(I + E1

11

)−1 (F1

11 − E111 J

)= O(m )

It will be necessary to apply corollary 2.47 in nearly every transformation we make,so from here on we take its use for granted at any point where needed. Since thenumber of applications will be finite, the smallest of all imposed bounds on m willstill be positive.

We are now only one near-identity row operation and one near-identity column op-eration from ( [

IN + E3

22

],

[J + F2

11 F312

F321 I + F3

22

] )The O(m ) property of E3 and F3 is maintained.

Since the Jordan blocks of N are of size 1 or 2, with at least one of size 2 (or thenominal index would be less than 2), there are numbers n1 and n2 such that n1 + 2 n2equals the size of N , with n2 being the total number of off-diagonal 1 entries. Letus consider permutations of rows and columns in the pair (N, I ) for a while. Bypermuting rows and columns in the same way, it is possible to bring the n1 Jordanblocks of size 1 to the lower right part of N :( [

N12

0

],

[II

] )By permuting columns in N1

2 so that columns 2, 4, . . . appears before columns1, 3, . . . , and permuting rows so that rows 1, 3, . . . appears before rows 2, 4, . . . , oneobtains the form

I 00 0

0

,0 II 0

I

and we finally swap the second and third block rows and columns to obtainI 0

0

, I

II

Using these permutations in the perturbed pair results inII + E4

22 E423 E4

24E4

32 E433 E4

34E4

42 E443 E4

44

,J + F2

11 F412 F4

13 F414

F421 F4

22 F423 I + F4

24F4

31 F432 I + F4

33 F434

F441 I + F4

42 F443 F4

44


Similar to the first transforms, we now use that I + E422 can be reduced to I using a

matrix inverse for sufficiently small m, and then be used to eliminate below and tothe right in the leading matrix.

II

E533 E5

34E5

43 E544

,J + F2

11 F512 F5

13 F514

F521 F5

22 F523 I + F5

24F5

31 F532 I + F5

33 F534

F541 I + F5

42 F543 F5

44

Yet three more rounds of two inversions and elimination results inII

E633 E6

34E6

43 E644

,J + F2

11 F512 F5

13 F614

F521 F5

22 F523 I + F6

24F6

31 F632 I

F641 I + F6

42 F644

I + E7

11 E712

E721 I

E633 E7

34E7

43 E744

,J + F7

11 F712 F7

13F7

21 F522 F5

23 IF7

31 F732 II F7

44

II

E633 E7

34E7

43 E744

,J + F8

11 F812 F8

13F8

21 F822 F8

23 IF7

31 F832 II F7

44

This is the form in which the decouplings of section 7.A apply. Due to F821, F8

31being O(m ), also the decoupling transforms will only be O(m ) away from identitytransforms. After the transforms, the pair has the form

IIE6

33 E734

E743 E7

44

,J + F9

11

F922 F9

23 I + F924

F932 I + F9

33 F934

I + F942 F9

43 F944

(7.1)

The first block to eliminate is F922, it takes two rounds.

II E10

23 E1024

E633 E7

34

E743 E7

44

,

J + F9

11

F1023 I + F10

24

F932 I + F9

33 F934

I + F942 F9

43 F944

IIE6

33 E734

E743 E7

44

,J + F9

11

F1023 I + F10

24

F932 I + F11

33 F1134

I + F942 F11

43 F1144


Finally, four more O(m ) blocks are removed.

IIE12

33 E1234

E1243 E12

44

,J + F9

11

II + F12

33 F1234

I F1243 F12

44

IIE13

33 E1334

E1343 E13

44

,J + F9

11

II

I F1344

(7.2)

Equation (7.2) is the proposed canonical form. At the cost of a substantial increasein the uncertainties, one could additionally obtain E13

33 and F1344 in real Schur form.

The reason they cannot be put in Jordan canonical form is that the condition numberof the similarity transform must be possible to bound in order to maintain the O(m )size of the uncertainties.

7.1.2 Derivation without use of Weierstrass decomposition

When we now derive the same canonical form again, we will be able to make reuseof the latter part of the derivation in the previous section. The interesting part of thederivation is how to get started without knowing the nominal pair.

The notation in this section is independent of that introduced in the previous section.

While corollary 2.47 was used repeatedly in the previous section to invert perturbedidentity matrices. We start by making a similar remark regarding perturbed fullrank matrices. Consider the perturbed matrix X + F with no less rows than columns,where X being the nominal matrix has full column rank, and F is a perturbation ofsize O(m ). Then a QR decomposition brings the perturbed matrix into the form[

R + QT1F

QT2F

]where R is an invertible point matrix, and QTF is still an O(m ) perturbation. Itfollows by corollary 2.47 that taking m sufficiently small will allow the upper blockto be inverted using a column operation of bounded condition number, leading to I

QT2F

(R + QT

1F)−1

where the lower block is still of size O(m ). Hence, a row operation of boundedcondition number brings the matrix into the final form[

I0

]


The procedure of bounding m to be sufficiently small for this reduction to be possiblewill be applied several times in the following derivation, and we take its use forgranted at any point where needed. The transposed case is analogous, and since thetotal number of applications is finite, the smallest of all imposed bounds on m willstill be positive, and the respective products of all row and column operations willstill have bounded condition numbers.

Since the nominal matrices are unknown to us this time, we simply write the originalmatrix pair as (

E0, A0)

(7.3)

where both E0 and A0 are uncertain matrices.

As always, we start by applying row operations and column permutations until wereach the form

E111 E1

12

E122

, A1

11 A112

A121 A1

22

where E111 is non-singular while E1

22 = O(m ). Typically, E122 will be so small that

it has no entries which can be distinguished from 0 — one proceeds with row op-erations as long as there exists non-zero entries to pivot on. When transforms areapplied to given pair data, however, there is no m which tends to zero, and E1

22 mayeven contain non-zero intervals as long as they are sufficiently small — rather thanpivoting on a very small entry with large relative uncertainty, it may be wiser to ar-tificially increase the uncertainty so that zero is within the interval of uncertainty inorder to avoid a very large uncertainties in other parts of the equations. For furtherdiscussion of how to think of O(m ), see section 1.2.4.

Next, column operations are applied to yieldI E1

22

, A2

11 A212

A221 A2

22

and if A222 would be non-singular, we know from chapter 6 that the pair can be de-

coupled and that there is a natural choice of nominal equations of index 1. Since weare considering dae of nominal index 2 in this section, it follows that A2

22 is singular(that is, the uncertainties allows for an instantiation of A2

22 which is singular in theordinary sense of point matrices).

Since A222 is singular, it is possible to apply row and column operations which de-

composes A222 in the same way as E0 was decomposed.

IE3

22 E323

E332 E3

33

,A2

11 A312 A3

13

A321 I

A331 F3

33

(7.4)

where F333 = O(m ) (and the same remark regarding given data applies again).


If A331 would not have full row rank, it would be possible to row reduce to reveal

a row with only small and uncertain coefficients, and the corresponding row in theleading matrix would also contain only small and uncertain entries. Hence, the unitbasis vector which corresponds to this row would be in the left null space of boththe leading and the trailing matrix, showing that the matrix pair is singular. Thiswould contradict the index 2 assumption according to corollary 2.12, and it followsthat A3

31 must at least have full row rank in the nominal case. It follows that thereexists a positive bound on m which will make also the uncertain A3

31 have full rowrank. In particular, this means that A3

31 must have at least as many columns as rows.By symmetry, it follows that A3

13 has the transposed size. If A313 would not have full

column rank, column operations would reveal a vector in the right null space of bothmatrices, again showing that the pair is singular. However, we shall soon see thatA3

13 having full column rank is implied by the property that the nominal index of thedae is 2.

7.2 Remark. That the nominal A331 has full row rank is not particularly related to the index 2,

but will be necessary for any finite index.

We now know the existence of column operations on A331 which are applied together

with row operations which maintain the leading matrix (the matrices E4ij and F4

44introduced here are identical to matrices in the previous step, but the new notationis introduced to avoid confusing subscripts in disagreement with the block structure),yielding

IIE4

33 E434

E433 E4

44

,A4

11 A412 A4

13 A414

A421 A4

22 A423 A4

24

A431 A4

32 I

I F444

(7.5)

Until now, we have not made use of the property that the nominal index of the pairbe 2, but at this point the pair has enough structure to directly relate it to its index viathe shuffle algorithm. Let us temporarily consider the following nominal equation.

II

0 00 0

,A4

11 A412 A4

13 A414

A421 A4

22 A423 A4

24

A431 A4

32 II 0

Shuffling the last two rows leads to

I

IA4

31 A432 II 0

,A4

11 A412 A4

13 A414

A421 A4

22 A423 A4

240 0 0 00 0 0 0


which is row reduced toI

IA4

31 A432 I0 0

,A4

11 A412 A4

13 A414

A421 A4

22 A423 A4

240 0 0 0−A4

21 −A422 −A

423 −A

424

Shuffling the last row shows that the nominal A4

24 will have to be non-singular forthe index 2 property to hold, and we now return to the non-nominal equations. Forsufficiently small m, A4

24 will be regular, enabling the following three steps to becarried out.

IIE4

33 E534

E433 E5

44

,A4

11 A412 A4

13 A514

A421 A4

22 A423 I

A431 A4

32 I

I F544

I E6

12

IE4

33 E534

E433 E5

44

,

A6

11 A612 A6

13

A421 A4

22 A423 I

A431 A4

32 I

I F544

IIE4

33 E534

E433 E5

44

,A6

11 A712 A6

13

A421 A7

22 A423 I

A431 A7

32 I

I F544

(7.6)

Here, the form where the decoupling transforms of section 7.A apply has beenreached, and the remaining steps towards the canonical form are exactly the same asin the previous section.

Note that, in the previous section, both the regularity and the nominal index of thematrix pair were ensured by using the Weierstrass canonical form as a starting point.In the current section, these properties were added as requirements along the wayin order to be able to proceed with the reduction towards the canonical form. Theregularity property will always be necessary in order to rule out system where theuncertainties have destroyed the nominal solution. The restriction to system of nom-inal index at most 2, on the other hand, would make sense to relax.

7.1.3 Example

As an illustration of the two approaches to the canonical form, we shall use the Weier-strass form to construct an example of nominal index 2, and then use the construc-tive approach to rediscover the nominal structure. Among other things, the example


will show that the O(m ) of the theoretical development can be turned into concretequantities when our techniques are applied to data.

Due to space constraints, we are unable to present matrix entries to sufficient preci-sion to make it possible to repeat the computations. The data is given in section 7.B,but readers interested in the full precision will receive the complete computation asa Mathematica notebook upon request.

7.3 Example

In order to avoid trivial dimensions, let us take as example a matrix pair in Weier-strass form (recall theorem 2.16) with

Eigenvalue Size of Jordan block0 1−0.5 2∞ 1∞ 1∞ 2∞ 2

That is, the slow dynamics has 3 states, the index of the pair is 2, there are 2 index 1variables, and 2 index 2 subsystems with 2 variables each. The Weierstrass formis mutliplied by random matrices from the left (condition number 2.7) and right(condition number 1.9), so that a pair with known eigenvalue structure, but no visiblestructure, is obtained. Finally, an interval uncertainty of ±1.0 · 10−9 is added to eachentry. The added uncertainty may be strikingly small, but this is necessitated by theconservative estimates we will make to obtain the initial outer solution in the firstdecoupling step. The resulting matrix pair is show in table 7.2 on page 222.

Carrying out the transformation steps given in section 7.1.2, the matrix pair in ta-ble 7.3 is obtained, along with two chains of uncertain transformation matrices (oneoperating from the left, and one from the right). Multiplying together the factors ineach chain, the condition numbers can be bounded as 40.0 (left) and 69.5 (right).

Regarding the two decoupling steps, implemented based the derivation in sec-tion 7.A.1 (applied to the transposed matrices in the second step), it is the first onewhich turns out critical here, requiring the uncertainties to be very small to ensurethat the transformation is valid. As the O(m ) expressions in the derivation arereplaced by interval quantities in computations, there is no use of the parameter m,and it may — without loss of generality — be set to 1. The items below provide someinsight into the computations.

• Bounding constants: cL0 B 1.00 · 10−3, cLB 13.0 · 100, cE B 9.03 · 10−7, αL B

8.36 · 10−2, ρL B 1.41 · 10−2.

• Constraint on m: 1 ≤ 1.55.

• After a few rounds of iterative refinement (using (7.35) solved with respect toRL by inverting L), the uncertainty ‖RL‖2 comes down to 3.59 · 10−3.

7.2 Initial conditions 195

The rather large potential for improvement of the initial outer solution for RL indi-cates that there is also potential for relaxing the constraint on m.

To verify that the decomposition is valid, the transformation chains are applied to thepair in the canonical form, which shall result in a matrix pair containing the originalmatrix pair. Consider the leading matrix, given by an expression like

T1 T2 · · · Tn E Vm · · · V2 V1

For point matrices, the order in which the multiplications are carried out would notmatter, but as an illustration of the nature of interval arithmetic, we compare twodifferent ways of carrying them out.

• Collapse the transformation matrices first, with multiplication associating tothe left, that is

[( ( T1 T2 ) · · · Tn ) E ] ( (Vm · · · V2 )V1 )

The result is shown in table 7.4, and completely includes the original matrixpair.

• Apply the transformation matrices one by one, that is

[ [ [ ( T1 ( T2 · · · ( Tn E ) ) )Vm ] · · · V2 ]V1 ]

The result is shown in table 7.5, and completely includes the original matrixpair as well as the pair in table 7.4.

The difference is remarkable; the median interval width in table 7.5 is approximately7 times that in table 7.4, and the ratio between the widest intervals is near 30. Itis a topic for future research to investigate whether the collapsed transformationmatrices can be given even higher accuracy by iterative refinement methods suchas forward-backward propagation (Jaulin et al., 2001, section 4.2.4).

7.2 Initial conditions

We now turn to the question whether nominally consistent initial conditions of theoriginal coupled system imply that the initial conditions of the fast and uncertainsubsystem tend to zero with m.

For the purposes of this section, the transformations leading to the canonical formare divided into three groups. The variables of the original, coupled, form (7.3) aredenoted x,

Exx x′ + Axx x

!= 0 (7.7)


The first group of transformations brings us to the form (7.6), where variables aredenoted (

xv

)=

xv1v2v3

The corresponding dae manifests the lti matrix-valued singular perturbation formof the present chapter (compare (6.5)), written[

IEvv

] (x′

v′

)+

[Axx AxvAvx Avv

] (xv

)!= 0 (7.8)

It is easy to check that the variable transforms have O(m0 ) norm, so∣∣∣∣∣∣(xv

) ∣∣∣∣∣∣ = O( |x| )

The second group of transforms are the decoupling transforms, leading to theform (7.1), with variables

(ξη

)=

ξη1η2η3

belonging to [

IEηη

] (ξ ′

η′

)+

[Aξξ

Aηη

] (ξη

)!= 0

Here, v = L(m ) x + η relates η to the variable before the transforms, where L(m ) isthe matrix analyzed in section 7.A.1.

Finally, the last group of transforms, which only operates on the last three block rowsand columns, lead to the form (7.2), with variables

(ξη

)=

ξη1η2η3

and [

IEηη

] (ξ ′

η′

)+

[Aξξ

Aηη

] (ξη

)!= 0 (7.9)

Again, it is easy to check that the variable transforms have O(m0 ) norm, so∣∣∣η∣∣∣ = O(∣∣∣η∣∣∣ )

and hence we shall focus on the initial conditions for η in the rest of this section.

In section 7.A.1 is was shown that Evv L = O(m ), and it follows that

Eηη = Avv − Evv LAxv = Avv + O(m )

7.2 Initial conditions 197

We now end this section with a variation of lemma 6.4, establishing that the initialconditions of the uncertain system tend to zero with m. The initial conditions for ηgiven by η(0) = v0 − L x0 depend on the choice of x0 and v0, and becomes uncertaindue to the uncertainty in L. Note that it is generally not possible to establish conver-gence of the solutions if x0 and v0 are set to fixed values without consideration of thealgebraic relations imposed by the dae. Rather, an integrator for dae need freedomto select suitable initial conditions from some region or in the neighborhood of some“guess” that a user may provide.

7.4 Lemma. Consider the fast and uncertain subsystem in (7.9), obtained from (7.7)using the transformations of section 7.1. Allow for O(m ) uncertainty in equationcoefficients as well as initial conditions. The initial conditions satisfy

∣∣∣η0∣∣∣ = O(m) if

and only if ∣∣∣Avx x0 + Avv v0∣∣∣ != O(m) (7.10)

Proof: The statement may be proved for η instead of η without loss of generality. Forthe sufficiency, consider the expression for η0,

η0 = v0 − L x0 = A−1vv

(Avx x

0 + Avv v0)−mRL x

0 (7.11)

Here,∥∥∥A−1

vv

∥∥∥2

can be bounded by taking m sufficiently small, so the implication fol-lows.

For the necessity, rearranging (7.11), we find that

Avx x0 + Avv v

0 = Avv(η0 + mRL x

0)

from which the O(m) size of (7.10) follows.

7.5 Corollary. The degrees of freedom in assignment of initial conditions under(7.10) can be expressed already at the stage of (7.4). Denoting the variables belong-ing to this form

(x v1 v2

), and writing Av2 x in place of A3

31, the condition may beexpressed as ∣∣∣Av2 x x

0∣∣∣ != O(m )

leaving no degrees of freedom for v1 and v2. Equations to determine v1 and v2 interms of x0 are available in (7.5)

Proof: The statements can be verified by first checking that the involved block rowsare not changed by row operations between the stage of use and (7.6). Then usingthat F3

33 in (7.4), the condition on x0 follows. That there is no degrees of freedomfor the remaining initial states, and that they can be determined from (7.5) followsimmediately from the non-singularity of[

A423 A4

24I

]


7.3 Growth of eigenvalues

This section amounts to some tedious bookkeeping of the characteristic polynomialof the pair belonging to the η dynamics (compare (7.2)). Since we are only concernedwith the pair in the final form in this section, we drop the superscripts on the sym-bols, I E33 E34

E43 E44

η′(t) +

II

I F44

η(t) != 0 (7.12)

The proofs are not difficult to understand, but may take some time to read.

We begin by stating some properties of the characteristic polynomial of (7.12) withthe nominal trailing matrix;I E33 E34

E43 E44

η′(t) +

II

I 0

︸︷︷︸CAηη (0)

η(t) != 0 (7.13)

For future reference, the corresponding determinant is written separately in the vari-able λ

det

λ I Iλ E33 + I λ E34

I λ E43 λ E44

(7.14)

We now make three observations. First, the modulus of a product of k entries fromE can be written as εk for some ε ≤ max(E). Second, the term in the characteristicpolynomial which is free of factors from λ and E, is the determinant of the trailingmatrix. Third, with every factor from E follows a factor λ. Hence, the characteristicpolynomial expanded as a sum of monomials can be written

det

II

I 0

︸︷︷︸CD

+∑i

σi λmi εnii (7.15)

where |D | = 1, |σi | = 1, |εi | is some number smaller than max(E), and mi ≥ ni ≥ 1.(The number of terms in the sum over i will depend on the matrix block dimensions.)

The ratios mini

turn out to be important, and in particular we will need a good upperbound.

7.6 Lemma. The ratios mini

are bounded from above by 2, and this bound is generallyattained.

Proof: Please refer to (7.14) during the following argument.

7.3 Growth of eigenvalues 199

Since we are trying to maximize the power of λ relative to the power of εi , we areinterested in the terms in the determinant which contain one or more factors fromthe upper left block. From the structure of the matrix, with identity matrices in thelower left and upper right blocks, any selection of n factors from the diagonal upperleft block will remove the corresponding rows and columns from the lower and rightblock rows and columns from the remaining determinant factor. The lowest orderterm in this remaining determinant will be a product containing all the positionsalong the I in the middle block, and hence all remaining factors must come from thelower right block. Adding up, the lowest order terms containing n factors from theupper left block will be in the form

λn∏i

(E33)ii + 1

∑i⊂Nn

λn det( (E44)ii )

where the last sum is over all minors with symmetric choice of rows and columns.Hence, the generality of the constructions depends on at least one of these sums tobe non-zero.

To see that 2 is an upper bound, note that the ratio is made big by including factorswith a λ that do not come with an entry from E. Such factors only exist in the upperleft block, but it is clear from the argument above that for every such pure factor λ,one factor from the corresponding row of

[λ E43 λ E44

]must also be in the product.

The remaining factors in the term will be from the middle block row where there areno factors λ that do not come with an entry from E, and hence the power of ε willalways be at least twice the power of λ.

Next, the lemma is used to show that the eigenvalues of the uncertain subsystemmust grow as max(E)→ 0.

7.7 Theorem. There exists a constant k > 0 such that |λ| ≥ kmax(E)−1/2.

Proof: Let nη be the dimension of η (and hence also of η), and hence the degree ofthe characteristic polynomial (7.15). Lemma 7.6 allows us to write ni = mi /2 + ri forsome ri > 0. Rearranging the characteristic polynomial as a sum of monomials in λ,we obtain

D +

nη∑d=1

λd∑

i :mi=d

σiεmi /2+rii︸︷︷︸

Cad

Using εi ≤ max(E) and assuming max(E) ≤ 1 (so that∣∣∣εrii ∣∣∣ ≤ 1), the coefficients may

be bounded as

|ad | ≤ max(E)d/2∑

i :mi=d

1

︸︷︷︸Cαd


where αd ∈ N only depends on the matrix block dimensions. Dividing the polyno-mial by λnη, and writing it as

D (λ−1)nη +

nη−1∑d=0

anη−d (λ−1)d

an upper bound on∣∣∣λ−1

∣∣∣ may be obtained using theorem 2.52. It gives that∣∣∣λ−1

∣∣∣ isbounded by 2 times the maximum of the expressions∣∣∣∣adD ∣∣∣∣1/d ≤ max(E)1/2

∣∣∣∣αdD ∣∣∣∣1/d , d = 1, . . . , nη − 1∣∣∣∣∣ anη2D

∣∣∣∣∣1/nη ≤ max(E)1/2∣∣∣∣∣αnη2D

∣∣∣∣∣1/nηInverting the bound, the proof is completed by taking

k =12

min

{ ∣∣∣∣∣ Dαd∣∣∣∣∣1/d }nη−1

d=1

⋃∣∣∣∣∣∣ 2Dαnη

∣∣∣∣∣∣1/nη

This was for the case of the nominal trailing matrix. Note that typical perturbationtheory (for instance, Stewart and Sun (1990, chapter vi)) may be hard to apply here,since we only have knowledge of the egienvalue magnitues so far, and only care aboutthe magnitudes. Typical perturbation analyses will study the perturbations of theeigenvalues themselves, but these perturbations may actually be large in the presentsituation without conflicting with our needs. So, instead of trying to use existingeigenvalue perturbation theory, we consider the characteristic equation agian, thistime with perturbed coefficients.

The perturbed determinant det [λ E − (A + mF ) ] where max(F) = O( max(E)0 ), andbe rewritten

det [λ E − (A + mF ) ]

= det[(A + mF )A−1A (A + mF )−1 [λ E − (A + mF ) ]

]= det

[I + mF A−1

]det

[λA (A + mF )−1 E − A

]= det

[I + mF A−1

]det

[λ

[I −mF (A + mF )−1

]E − A

]Here, ∥∥∥(A + mF )−1

∥∥∥2

=

∥∥∥∥∥∥∥∥−F44 I

II

∥∥∥∥∥∥∥∥

2

= O(m0 )

Hence bounding∣∣∣∣det

[I + mF A−1

] ∣∣∣∣ from below makes it possible to regard[I −mF (A + mF )−1

]E

as a new unstructured uncertainty which is still O(m ). If the special structure of theleading matrix would not have been used in theorem 7.7, it would have been possible

7.3 Growth of eigenvalues 201

to apply again. However, the use of the blocked structure of the leading matrix doesnot permit this, and even though the proof of the next theorem is far from as elegantas the idea that we just ruled out, it does the job.

7.8 Lemma. Let the uncertainties in (7.12) be bounded by m by setting

m B max {max(E) , max(F) }

The characteristic polynomial can then be written

det

II

I F44

︸︷︷︸CD ′

+∑i

σ ′i λmi εnii

where |D ′ | = 1 + O(m ),∣∣∣σ ′i ∣∣∣ = 1, |εi | < m, and mi ≥ ni ≥ 1.

Just as for the case of nominal trailing (F44 = 0), it holds that mi ≤ 2 ni .

Proof: Compare lemma 7.6. The characteristic polynomial is now given by the de-terminant

det

λ I Iλ E33 + I λ E34

I λ E43 λ E44 + F44

Trying to construct monomials in the determinant with as high degree in λ relativeto the degree in the entries of E and F, leads to the same reasoning as in lemma 7.6.That is, with any pure factor λ included in the monomial, there must be one factorfrom the block

[λ E43 λ E44 + F44

]. The only difference to the previous case is that

the included factor may be in the form λ e + f instead of just λ e this time. Hence,there will be more monomials than before, but the added ones will never be one ofthose which maximize the degree in λ relative to the degree in the entries of E andF. Hence, the old result of lemma 7.6 obtains.

7.9 ExampleFor the matrix block dimensions corresponding to a nominal system with 3 index 1subsystems and 4 index 2 subsystems, the dimension of η is 1 · 3 + 2 · 4 = 11, anddepending on whether F is included or not, the characteristic polynomial is charac-terized by the numbers in table 7.1. It is seen that the table is in agreement withlemma 7.8.

7.10 Corollary. Let the uncertainties in (7.12) satisfy

max(E) = O(m )

max(F) = O(m )

Then there are constants k > 0 and m0 > 0 such that for m < m0, |λ| ≥ k m−1/2 forevery eigenvalue λ belonging to (7.12).


F = 0d 2 min { ni : mi = d } αd1 2 32 2 103 4 304 4 845 6 2046 6 4567 8 10088 8 14649 10 3240

10 12 216011 14 5040

F , 0d 2 min { ni : mi = d } αd1 2 72 2 343 4 1384 4 4925 6 15246 6 39847 8 85928 8 144249 10 17640

10 12 1368011 14 5040

Table 7.1: Data for example 7.9. Compare the proof of theorem 7.7.

Proof: The O(m ) size of the uncertainties implies the existence of the constant m0 >0, and numbers lE and lF such that m ≤ m0 implies

max(E) ≤ lE mmax(F) ≤ lF m

By defining m′ B max { lE, lF }m, and repeating the proof of theorem 7.7 withm′ in place of max(E), it is seen that there exists k′ > 0 such that |λ| ≥ k′ m′ =k′ max { lE, lF }m. Hence, setting k B k′ max { lE, lF } completes the proof.

7.4 Case study: a small system

At this point, we have established some results for index 2 lti dae that remind ofthe results obtained for index 1 lti dae in chapter 6. Unfortunately, this route comesto an end here. We shall investigate the reasons for this in the present section, butas a first sign of the difficulties which arise for index 2 systems, note that the statefeedback matrix of (7.12) written as an ode will inevitably grow as max(E)−1, whilethe eigenvalues of this system have only been shown to grow at least as max(E)−1/2.Hence, it appears hard to establish a bound like (6.29) which allows the transitionmatrix of the η subsystem to be bounded by a constant. Indeed, we shall soon seethat no such bound exists.

Consider the smallest possible perturbed nominal index 2 fast and uncertain subsys-tem, [

1e

]η′ +

[1

1 f

]η

!= 0 (7.16)

where both |e| and |f | are O(m ).

7.4 Case study: a small system 203

7.4.1 Eigenvalues

The characteristic polynomial in λ is

e λ2 + f λ − 1 = e

(λ +f

2 e

)2

−(f

2 e

)2

− e−1

(7.17)

For both eigenvalues to be in the left half plane, it is required that e and f have equalsigns. For complex conjugate eigenvalues −e−1 = λ1 λ2 > 0, implying e < 0. For realpoles, stability requires

−f

2 e+

√(f

2 e

)2

+ e−1 < 0

which also simplifies to e < 0. Hence, both e and f must always be negative for theeigenvalues to be in the left half plane.

For complex conjugate poles,

|λ| = (−e)−1/2

Reλ = −f

2 eSince the modulus does not depend on f , f will only affect the argument of complexconjugate eigenvalues, and when

−f

2 e≤ −(−e)−1/2

the eigenvalues will become real. This condition simplifies to

− f ≥ 2 (−e)1/2 (7.18)

In case the eigenvalues are complex, the m−1/2 growth is obvious, but in the case ofreal roots we resort to theorem 2.52. Applied to

−1 (λ−1)2 + f (λ−1) + e != 0

the theorem immediately provides

|λ| ≥ min{

12 |f |

,1

|e|1/2

}which proves the m−1/2 growth regardless of poles being complex conjugates or not.

As usual we make the assumption there is a φ0 < π/2 such that arg(−λ) ≤ φ0. Sincethe argument is given by the ratio between imaginary and real parts, this effectivelyputs an upper bound on |f | in relation to |e| via

cos(φ0 ) ≤ −Reλ|λ|

=f

2 e

(−e)−1/2


That is,

|f | = −f ≥ 2 (−e)1/2 cos(φ0 )

In case they are real, the characteristic polynomial is written

e(λ + (−e)−1/2 r

) (λ + (−e)−1/2 r−1

)where r ≥ 1 must satisfy (−e)−1/2 ( r + r−1 ) != f

e , that is, r is the bigger solution to

r + r−1 !=−f

(−e)1/2≥ 2 (7.19)

where the left hand side is increasing for r ≥ 1, but r can also be found directly fromthe expression for the smaller eigenvalue,

r = −12

(−e)−1/2f +12

√f 2 + 4e−e

(7.20)

In chapter 6, A1–[6.14] also imposed the bound

|λ| m < a (7.21)

on the eigenvalue moduli. We do not make this assumption yet in the current casestudy, but we just note what it would imply. For complex eigenvalues, it would imply

1(−e)1/2

= max |λ| ≤ a m−1 (7.22)

(which is equivalent to a lower bound on |e| proportional to m2). For real eigenvalues,it would imply ∣∣∣∣∣∣− 1

(−e)1/2r

∣∣∣∣∣∣ = max |λ| ≤ a m−1 (7.23)

and hence,

r ≤ a m−1 (−e)1/2 = O(m−1/2 ) (7.24a)

min |λ| = (−e)−1/2r−1 ≥ a−1 m (−e)−1 = O(m0 ) (7.24b)

As an alternative to the bound on moduli in A1–[6.14], we shall also consider bound-ing of r here. As an upper bound on r can be expressed in terms of the fast eigenval-ues,

max { |λi | : |λi | ≥ R0 }min { |λi | : |λi | ≥ R0 }

≤ r20 (7.25)

where the known growth of eigenvalues ensures that all the eigenvalues of the ηsubsystem will satisfy |λ| ≥ R0 for sufficiently small m. Of course, (7.25) would implyr ≤ r0 here. While φ ≤ φ0 imposes a lower bound on f relative to e, the bound on rimposes an upper bound on |f | relative to |e| via (7.19) (the eigenvalues will always


be real near this limit),

−f(−e)1/2

≤ r0 + r−10 (7.26)

While the bound r ≤ r0 is a much stronger assumption than (7.21) for real eigenvalues(compare (7.24a)), the bound adds no information given that the eigenvalues form asingle complex conjugate pair (compare (7.22)).

A lower bound on r would be quite artificial, and will not be considered an option.

7.4.2 Transition matrix

The transition matrix is computed using the Laplace transform. In the Laplace vari-able s the transition matrix is given by

1−e−1 + e−1f s + s2

[e−1f + s −1e−1 s

](7.27)

The gain of the transition matrix will be no smaller than the entries in the secondrow. The form of the corresponding time functions expressed in the real domain willdepend on whether the eigenvalues are complex or not.

Let us first consider the case of complex conjugate eigenvalues. Using φ ∈ [ 0, π/2 )to denote the common value of |arg(−λ )|, we introduce

α B −(−e)−1/2 cos(φ )

β B (−e)−1/2 sin(φ )

The characteristic polynomial can now be expressed using

α2 + β2 = −e−1

−2α = e−1f

Then the last row of transition matrix is[eα t sin( β t )

e βeα t[ β cos( β t )+α sin( β t ) ]

β

]Optimizing out t, the largest gains are found to be (these values are attained, not justupper bounds) [

e−φ cot(φ )

(−e)1/2 2 e−φ cot(φ ) cos(φ )]

The expression e−φ cot(φ ) is a monotonically increasing function of φ, tending to e−1

as φ → 0, and equals 0 at φ = π/2. The second entry is a monotonically decreasingfunction which tends to 2 e−1 as φ→ 0, and equals 0 at φ = π/2. Since

e−1

(−e)1/2≥ e−1m−1/2


the maximum entry implies that the transition matrix is bounded from below,

supt≥0‖Φ(t)‖2 ≥ e−1m−1/2 (7.28)

Using (7.22), we would obtain the bound

e−1

(−e)1/2≤ e−1a m−1

but the bound grows too fast asm→ 0 to be successful in combination with the O(m )convergence of initial conditions.

Let us see if it helps to constrain the eigenvalues to be real. Again, optimizing out tfrom the result yields expressions where the dependency on (−e)−1/2 can be factoredout, but this time the remaining factor depends on r ≥ 1 instead of φ, r2 being theratio between the larger and smaller eigenvalues,[

g1( r )(−e)1/2 g1( r )

]Here, both g1 and g2 are monotonically decreasing functions tending to zero, with|g1( 1 )| = e−1 and |g2( 1 )| = e−2. As we are unwilling to assume a lower bound on r, itis seen that restriction to real eigenvalues would not lower the lower bound (7.28) onthe transition matrix.

Without an upper bound on eigenvalue magnitudes or r, (7.20) allows us to evaluatethe limit

lime→0−

g1( r )(−e)1/2

=1f

which that the transition matrix will grow as

supt≥0‖Φ(t)‖2 ≥ m

−1

However, letting e→ 0− alone would violate (7.23) since the eigenvalues would tendto infinity while the fixed f prevents m from approaching zero. On the other hand,if the constraint (7.21) is added (7.23) yields

g1( r )(−e)1/2

≤g1( r )r

a m−1 ≤ e−1 a m−1

which is the same bound as for complex eigenvalues. An upper bound on r wouldmerely produce the same kind of lower bound as (7.28), but with g1( r0 ) replac-ing e−1.

Since the transition matrix is bounded from below when the tight upper bounds inthis section are expressed in terms of m, and the upper bounds grow as m−1, it is notpossible to conclude that supt≥0

∣∣∣η(t)∣∣∣ → 0 as m → 0, even though

∣∣∣η(0)∣∣∣ → 0. The

difficulty in separating the bounding of initial conditions from the bounding of theinitial condition response gain, is a major obstacle for the analysis of nominal index 2dae.


7.4.3 Simultaneous consideration of initial conditions andtransition matrix bounds

We now end this case study by proving that supt≥0

∣∣∣η(t)∣∣∣ → 0 as m → 0 in a very

simple case, despite how difficult this appears to us in the general case due to theinability to consider bounds on initial conditions and initial condition response gainin terms of m. The key to the problem is to consider how initial conditions and thetight transition matrix upper bounds depend directly on e and f .

Noting that it is the gain from the initial condition of the first state which cannot bebounded in terms of m, and that this gain grows as (−e)−1/2, it will suffice to showthat the initial condition tends to 0 as |e|. It is the utter lack of generality that makesus present this result as an example, even though the whole case study is a kind ofexample in itself.

7.11 Example

Consider the coupled nominal index 2 system described by the pair1 1

e

,1a 1

1 f

(7.29)

The system is decoupled using the matrix L = L0 + mRL, with exact solutions givenby

L0 =[a f−a

]L =

ae − f − 1

[e − f

1

]and thanks to the structure of the original pair, the decoupling will lead directly tothe canonical form, with the same parameters as in the rest of the current case study.That is, η = η in this case.

If the initial conditions x0 and v0 of (7.29) are chosen nominally consistent, (7.11)gives that the initial conditions η0 are given by

η0 = −mRL x0

Computing mRL as L − L0 yields

η0 = − ae − f − 1

[e ( 1 − f ) + f 2

e − f

]x0

Hence, for the first state to be O( |e| ), it is required that that∣∣∣f 2

∣∣∣ = O( |e| ).

It the eigenvalues are not real, (7.18) directly gives∣∣∣f 2∣∣∣ ≤ 4 |e|


It the case of real eigenvalues, (7.19) gives

f 2 =(r + r−1

)2(−e)

Now, the bound (7.25) on r is the most natural assumption to add in order to obtain∣∣∣f 2∣∣∣ = O( |e| ). However, by taking m = |f |, the other bound (7.21) is still a valid

alternative since

r ≤ a m−1 (−e)1/2 = a(−e)1/2

|f |

where (−e)1/2

|f | ≤ 1/2 for real eigenvalues. (Clearly, one would have to take a > 2, but itis expected that there will be a lower bound on how small a can be chosen, comparethe sufficient condition for index 1 systems given in lemma 6.15.)

We argue that it is more elegant to use the assumption on r rather than m |λ|, sincethe former does not involve the parameter m. Also note that it was due to the specialstructure of the system under consideration that we were able to derive a bound onr from the bound on m |λ| — the example does not show that this can be done forgeneral systems of nominal index 2.

From the simple example, we learnt that for systems of nominal index 2, it may benecessary to bound the ratio between the moduli of the fast and uncertain eigenval-ues (in one way or another), in order to obtain converging solutions of the fast anduncertain subsystem. We find it more elegant to assume this directly, rather thanvia the bound on m |λ| currently used for systems of nominal index 1. To assume abound on r may have many other applications as well, for instance, it might be a pos-sible and more elegant replacement for the m |λ| bound also for systems of nominalindex 1.

7.5 Conclusions

This chapter was devoted to the analysis of autonomous index 0 lti dae of nominalindex 2. As the case study in section 7.4 shows in theory, and as do numerical exper-iments (not included in the thesis) indicate in other cases, there can be and generallyseems to be uniform convergence of solutions — just as we were able to prove gen-erally for nominal index 1 in chapter 6. Though we have not been able to prove thisgenerally (the case study of a particular form of small system being the exception),the chapter has contributed with several findings which we think will help in futureresearch on these systems.

First, the existence of a canonical form for perturbed lti dae of nominal index 2 hasbeen proposed. It is closely related to the Weierstrass form for exact matrix pairs,and can be seen as a statement of where non-trivial perturbations of this form needto be considered. The derivation includes existence proofs for the decoupling trans-forms which isolates the fast and uncertain dynamics from the slow and regularlyperturbed dynamics.

7.5 Conclusions 209

Second, it was shown that the eigenvalues of the fast and uncertain subsystem mustgrow as m→ 0. This makes it possible to formulate assumptions about the eigenval-ues of the fast and uncertain subsystem.

Third, the case study of a small system has contributed with three findings. On theone hand, the study shows that at least there can be uniform convergence of solu-tions, which should inspire research on how to prove this also in the general case.On the other hand, the study revealed some drastic differences between the nominalindex 1 and nominal index 2 cases. In particular, the basic idea of limiting the initialcondition response gain of the fast and uncertain subsystem by a constant indepen-dent of the size of the perturbations turned out to be useless in this case. Finally, theexample in the case study showed that while the assumed bound on m |λ| used inchapter 6 was still sufficient to obtain convergence, bounding the ratio of eigenvaluemoduli for the fast and uncertain subsystem can be an elegant replacement.

Appendix

7.A Decoupling transforms

Inspection of the equations that L and H must satisfy to yield a decoupling transformreveal that the results from chapter 6 are not readily applicable. Most notably, theleading matrix of the lower part of the dae does not vanish with max(E).

In this section, we shall use notation which is unrelated to other sections in thischapter, considering a matrix pair which is in the form

II

E33 E34E43 E44

,A11 A12 A13A21 A22 A23 I24A31 A32 I33

I42 F44

︸︷︷︸

CP

(7.30)

where

• max(Eij

)and max(F44) are both O(m ).

• I24, I33, and I42 each have a corresponding non-singular nominal matrix I0ij

such that max(Iij − I0

ij

)= O(m ).

• Aij are bounded independently of m.

In preceding sections of this chapter, it has been shown how this form can be reachedusing non-singular transforms from more general starting points. Combining thesetransforms with the transforms of the present section would result in decouplingtransforms applicable to more general forms than (7.30). However, we prefer thenotion of decoupling transforms applying to (7.30) avoid unnecessary clutter in thissection which is already lengthy.

For an application of these decoupling transforms to a concrete example, see exam-ple 7.3.

210

7.A Decoupling transforms 211

The idea to use a fixed-point theorem to prove the existence of decoupling transformsfor lti systems appears in Kokotović (1975), where tighter estimates are providedcompared to the more general ltv results in Chang (1972).

Notation. For brevity, we shall often omit the for m sufficiently small in this section.For instance, when we say that there exists a constant which gives a bound on some-thing, this typically means that there is a m0 > 0 such that the bound is valid for allm < m0.

7.A.1 Eliminating slow variables from uncertain dynamics

In the first decoupling step we seek a matrix L partitioned as

L =

L1L2L3

such that the blocks of

I

−

I E33 E34E43 E44

L I

P[IL I

]︸︷︷︸

CPL

(which is a pair with the same leading matrix as P ) below the “11-position” in thetrailing matrix are zero. Writing

L = L0 + mRL

with L0 denoting the nominal solution corresponding to m = 0, we shall proveuniqueness of L0 and that ‖RL‖2 = O(m0 ).

Recall (6.9), the equation that L must satisfy. In the current context given by (7.30),a corresponding residual function is defined by

δL( L ) 4=

A21A31

0

+

A22 A23 I24A32 I33I42 F44

L −I E33 E34

E43 E44

L (A11 +

[A12 A13 0

]L)

(7.31)so that the equation is written

δL( L ) != 0 (7.32)

Let us first consider the nominal solution to this equation, in which case the last ofthe tree groups of equations reads

0 != I042 L

01

and the known non-singularity of I042 gives that L0

1 = 0. Hence, the third term in(7.32) vanishes in the nominal case, and L0 is obtained by either of the following


expressions

L0 =

0

−[A0

23 I024

I033

]−1 [A0

21A0

31

]= −

A0

22 A023 I0

24A0

32 I033

I042 0

−1 A

021

A0310

(7.33)

Note that the second form is exactly the same as in the index 1 case.

Since L0 only solves the nominal equation, the residual δL( L0 ) is generally non-zero.However, using that I E33 E34

E43 E44

L0 =

0 E33 E34E43 E44

L0

it is seen that δL( L0 ) = O(m ), so by taking m sufficiently small, we know the exis-tence of

∃ cL0 < ∞ :

∥∥∥δL( L0 )∥∥∥

2≤ cL

0 m

Leaving the nominal case, we seek an O(m0 ) bound (that is, a bound which is inde-pendent of m for m sufficiently small) on ‖RL‖2. Inserting the decomposed L in (7.32)and cancelling a factor of m in the equation, it reads

0 !=

A22 A23 I24A32 I33I42 F44

RL −

I 0 00 0

RL

(A11 +

[A12 A13 0

]L0

)−m−1 δL( L0 )

−

0 E33 E34E43 E44

L0[A12 A13 0

]RL

−

0 E33 E34E43 E44

RL

(A11 +

[A12 A13 0

]L0

)

−m

I E33 E34E43 E44

RL

[A12 A13 0

]RL

(7.34)

To simplify notation, we introduce the linear function L given by

L(RL ) 4=

A22 A23 I24A32 I33I42 F44

RL −

I 0 00 0

RL

(A11 +

[A12 A13 0

]L0

)


and name the remaining terms according to

g1(RL ) 4=

0 E33 E34E43 E44

L0[A12 A13 0

]RL

+

0 E33 E34E43 E44

RL

(A11 +

[A12 A13 0

]L0

)

g2(RL ) 4=

I E33 E34E43 E44

RL

[A12 A13 0

]RL

This allows us to write (7.34) as

L(RL ) != m−1 δL( L0 ) + g1(RL ) + mg2(RL ) (7.35)

As it was possible to solve for L0 in the nominal equation, it is seen that the matrix ofL is only O(m ) away from an invertible matrix, so taking m sufficiently small allowsthe induced 2-norm of the operator’s inverse to be bounded.

P1–[7.12] Property. The constant cL< ∞ shall be chosen so that∥∥∥L−1

∥∥∥2≤ cL

(P1–7.36)

For instance, the bound may be computed using∥∥∥L−1∥∥∥

2def= sup

R,0

∥∥∥L−1R∥∥∥

2

‖R‖2≤

∥∥∥L−1R∥∥∥

F1√n‖R‖F

=√n

∥∥∥(vecL)−1∥∥∥

2

where n = min{nη, nξ

}(nη by nξ being the dimensions of L), and vecL refers to a

vectorized version of L.

The following constants are also readily available

∃ c1 < ∞ :∥∥∥∥A11 +

[A12 A13 0

]L0

∥∥∥∥2≤ c1

∃ c2 < ∞ :∥∥∥∥[A12 A13 0

] ∥∥∥∥2≤ c2

Further, the O(m ) property of the Eij implies the existence of

∃ cE < ∞ :

∥∥∥∥∥∥∥∥0 m−1E33 m−1E34

m−1E43 m−1E44

∥∥∥∥∥∥∥∥

2

≤ cE


Now consider RL ∈ L ={RL : ‖RL‖2 ≤ ρL

}, where ρL is to be selected later. The the

following bounds are obtained for m small enough∥∥∥m−1 δL( L0 )∥∥∥

2≤ cL

0

‖g1(RL )‖2 ≤ m cE

( ∥∥∥L0∥∥∥

2c2 + c1

)ρL

‖g2(RL )‖2 ≤ c2 ρ2L

Using the matrix equality

X2 QX2 − X1 QX1 = (X2 − X1 )QX2 + X1 Q (X2 − X1 ) (7.37)

(used already in Chang (1969)), one also obtains∥∥∥g1(RL,2 ) − g1(RL,1 )∥∥∥

2≤ m cE

( ∥∥∥L0∥∥∥

2c2 + c1

) ∥∥∥RL,2 − RL,1

∥∥∥2∥∥∥g2(RL,2 − RL,1 )

∥∥∥2≤ 2 c2 ρL

∥∥∥RL,2 − RL,1

∥∥∥2

In search of a contraction mapping to prove the existence of abounded solution RL ∈L, the operator TL is defined by

TL RL4= L−1

(m−1 δL( L0 ) + g1(RL ) + mg2(RL )

)(7.38)

Setting

ρL = ( 1 + αL ) cLcL

0 (7.39)

for some αL > 0, and considering

‖TL RL‖2 ≤ cL cL0

(1 + m [ 1 + αL ]

[cE

( ∥∥∥L0∥∥∥

2c2 + c1

)+ c2 ρL

] )it is seen that TL maps L to itself if

m [ 1 + αL ][cE

( ∥∥∥L0∥∥∥

2c2 + c1

)+ c2 ρL

]≤ αL

or, equivalently,

m ≤ 1

cE

( ∥∥∥L0∥∥∥

2c2 + c1

)+ c2 ρL

αL

1 + αL(7.40)

From ∥∥∥TL RL,2 − TL RL,1

∥∥∥2≤ m c

L

(cE

( ∥∥∥L0∥∥∥

2c2 + c1

)+ 2 c2 ρL

) ∥∥∥RL,2 − RL,1

∥∥∥2

it is seen that TL is a contraction if

m ≤ 1

cE

( ∥∥∥L0∥∥∥

2c2 + c1

)+ 2 c2 ρL

1cL

(7.41)

and the conjunction of the two conditions (7.40) and (7.41) is equivalent to

m ≤ 1

cE

( ∥∥∥L0∥∥∥

2c2 + c1

)+ 2 c2 ( 1 + αL ) c

LcL

0

min{

1cL

,αL

1 + αL

}(7.42)


If the parameter αL is tuned for maximizing the bound on m, the optimal choicein (7.42) can be given in closed form. In case c2 = 0 the best choice is that whichmakes the two bounds equal (larger values will only worsen the bound on ρL withoutimproving the bound on m), so we consider the more interesting case when c2 , 0.One first computes the optimum of (7.40) alone,

α1L B

√√1 +

cE

( ∥∥∥L0∥∥∥

2c2 + c1

)2 c2 cL c

L0

The objective function in (7.40) is increasing in the range[0, α1

L

], but the combined

objective in (7.42) is only increasing up to the point where

αL

1 + αL

!=1cL

This can only happen if cL> 1, with solution αL = 1

cL−1 . Hence, the bound (7.42) is

maximized by

αL =

α1L, if c2 , 0 and α1

L1+α1

L≤ 1

cL

1cL−1 , otherwise

(7.43)

The choice (7.43) should be used with care, since if cL< 1 and c2 approaches zero,

the rule will assign arbitrarily large values to αL, ignoring the consequences for thebound on ρL.

For future reference, we note that c2 = 0 and maximizing the bound in (7.42) withrespect to αL yields the bound

m ≤ 1cE c1 cL

(7.44)

This concludes the proof of existence of the approximation

L = L0 + mRL

valid for sufficiently small m. Given a choice of the tuning parameter αL > 0, thebound on m is given in (7.42), while the bound on ‖RL‖2 in (7.39) is no less thancLcL

0 . If the bound on ‖RL‖2 is not critical, (7.43) may be used to set the tuningparameter.

Now that the approximation has been proved, we may additionally conclude that∥∥∥∥∥∥∥∥I E33 E34

E43 E44

L∥∥∥∥∥∥∥∥

2

=

∥∥∥∥∥∥∥∥0 E33 E34

E43 E44

L0 + m

I E33 E34E43 E44

RL

∥∥∥∥∥∥∥∥2

≤( ∥∥∥L0

∥∥∥2cE + ρL

)m

(7.45)


7.A.2 Eliminating uncertain variables from slow dynamics

In the second decoupling step we seek a matrix H partitioned as

H =[H1 H2 H3

]such that the blocks of [

I −HI

]PL

II E33 E34

E43 E44

HI

(which again is a pair with the same leading matrix as P ) below and to the right ofthe “11-position” in the trailing matrix are zero. While the objective is primarily toshow just that H = O(m0 ), we will still write

H = H0 + mRH

with H0 denoting the nominal solution corresponding to m = 0, in order to be ableto provide better estimates of the size of H . Hence, we shall prove uniqueness of H0

and that ‖RH‖2 = O(m0 ).

This section is to a large extent analogous to the previous section. This should comeas no surprise since the decoupling can be implemented with the same kind of trans-form used in the previous section, only applied to the transposed matrix pair thistime. While this proves the existence, the pair P T

L has some structural differences tothe pair P , and we aim to exploit this to get insight into the problem and hopefullyobtain tighter bounds. We shall return to the duality between the two decouplingsteps in section 7.A.3 when we have the bounding expressions of the two steps athand, and the reader who is not interested in the minor details of obtaining goodbounds should skip to section 7.A.3 at this point.

The condition that H must satisfy has a corresponding residual function

δH(H ) 4=[A12 A13 0

]+

(A11 +

[A12 A13 0

]L)H

I E33 E34E43 E44

− H

A22 A23 I24A32 I33I42 F44

−I E33 E34

E43 E44

L [A12 A13 0

] (7.46)

so that the equation is written

δH(H ) != 0 (7.47)

Using knowledge about L0, the equation for H0 simplifies to

H0

A0

22 A023 I0

24A0

32 I033

I042 0

!=[A0

12 A013 0

]+

(A0

11 +[A0

12 A013 0

]L0

)H0

I 0 00 0


Reading off the last block column of the equation reveals the readily invertible

H01 I

024

!= 0 (7.48)

From H01 = 0 it follows that

H0

I E33 E34E43 E44

= H0

0 E33 E34E43 E44

(7.49)

and hence that H0 is given by either of the following two expressions

H0 =[0

[A0

12 A013

] [A032 I0

33I042

]−1]

=[A0

12 A013 0

] A0

22 A023 I0

24A0

32 I033

I042 0

−1 (7.50)

Since H0 only solves the nominal equation, δH(H0 ) is generally non-zero. However,using (7.49) and (7.45) it is seen that δH(H0 ) = O(m ), so by taking m sufficientlysmall, we know the existence of

∃ cH0 < ∞ :

∥∥∥δH(H0 )∥∥∥

2≤ cH

0 m

Note that it was possible to solve for H0 without introducing additional assumptionsabout distinct eigenvalues, as is typically needed when the equation is in the form

H0 A + BH0 != C. In particular, this means that the linear operatorH defined by

H(H0 ) 4= H0

A22 A23 I24A32 I33I42 F44

− (A11 +

[A12 A13 0

]L)H0

I 0 00 0

has a matrix which is only O(m ) away from an invertible one, and hence taking msufficiently small will enable us to bound the inverse of the operator,

P2–[7.13] Property. The constant cH< ∞ shall be chosen so that

∃ :∥∥∥H−1

∥∥∥2≤ cH

(P2–7.51)

Now that the existence and uniqueness of a nominal solution has been established,we turn to RH. Inserting H = H0 + mRH in (7.47) and cancelling a factor of m in theequation, one obtains

H(RH ) != m−1 δH(H0 ) + RH

I E33 E34E43 E44

L [A12 A13 0

]

+(A11 +

[A12 A13 0

]L)RH

0 E33 E34E43 E44

(7.52)


which is written

H(RH ) != m−1 δH(H0 ) + mh(RH ) (7.53)

by means of the definition

h(RH ) 4= RH m−1

I E33 E34E43 E44

L [A12 A13 0

]

+[A11 +

[A12 A13 0

]L]RH m

−1

0 E33 E34E43 E44

Restricting the analysis to m so small that (7.45) is valid and using that∥∥∥∥[A11 +

[A12 A13 0

]L] ∥∥∥∥

2≤ c1 + m c2 ρL

the gain of the linear function h can be bounded as

‖h(RH )‖2‖RH‖2

≤( ∥∥∥L0

∥∥∥2cE + ρL

)c2 + [ c1 + m c2 ρL ] cE C ch (7.54)

For the operator TH defined by

TH RH4= H−1

(m−1 δH(H0 ) + mh(RH )

)(7.55)

and restricted to RH ∈ H ={RH : ‖RH‖2 ≤ ρH

}, where ρH is to be selected later, we

then obtain

‖TH RH‖2 ≤ cH(cH

0 + m ch ρH

)∥∥∥TH RH,2 − TH RH,1

∥∥∥2

= m∥∥∥H−1 (

h(RH,2 − RH,1 )) ∥∥∥

2

≤ m cHch

∥∥∥RH,2 − RH,1

∥∥∥2

It just remains to set ρH = ( 1 + αH ) cHcH

0 with αH > 0, so that

m ≤ 1cHch

αH

1 + αH(7.56)

ensures that TH maps H into itself, and since this implies that

m cHch < 1

the contraction property imposes no additional requirements on m.

This concludes the proof of existence of the approximation

H = H0 + mRH

valid for sufficiently small m. The provided bound on ‖RH‖2 is no less than cHcH

0 ,and for each choice of the bound, a corresponding bound on m follows from (7.56).


We now end the section with a last analogy to the previous section.∥∥∥∥∥∥∥∥HI E33 E34

E43 E44

∥∥∥∥∥∥∥∥

2

=

∥∥∥∥∥∥∥∥H0

0 E33 E34E43 E44

+ mRH

I E33 E34E43 E44

∥∥∥∥∥∥∥∥

2

≤( ∥∥∥H0

∥∥∥2cE + ρH

)m

(7.57)

7.A.3 Remarks on duality

We indicated in the beginning of section 7.A.2 that the existence of the second de-coupling step could simply be obtained by applying the decoupling developed insection 7.A.1 to the transposed pair P T

L. In this section we shall make some compar-isons that will show whether or not the development in section 7.A.2 was any good.

The theory provides two bounds, one on m which should be large for wider appli-cability, and one on ρH which should be small for increased precision in the results.In view of n view of theorem 2.50, however, obtaining a tight bound on ρH may beof little importance in applications as iterative refinement will generally produce RHwith ‖RH‖2 < ρH anyway. Hence, the crucial bound for the comparison at hand isthat of m.

We need some notation to indicate what the expressions in section 7.A.1 would be ifapplied to the decoupling step in section 7.A.2. We will let

{expr}7.A.1

denote the expression or quantity we would use in place of expr in section 7.A.1.

The pair P TL is given by

I I E33 E34

E43 E44

,

(A11 +

[A12 A13 0

]L)TA

T12

AT130

MT

︸︷︷︸

P TL

where

M =

A22 A23 I24A32 I33I42 F44

−I E33 E34

E43 E44

L [A12 A13 0

]That is, to use the notation of section 7.A.1 we have to make the replacements

I E33 E34E43 E44

7.A.1

∼

I E33 E34E43 E44

T

{A11}7.A.1 ∼(A11 +

[A12 A13 0

]L)T{[

A12 A13 0] }

7.A.1∼ 0


A21A31

0

7.A.1

∼

AT12

AT130

A22 A23 I24A32 I33I42 F44

7.A.1

∼ MT

Using the replacement rules, we find that{A11 +

[A12 A13 0

]L0

}7.A.1

∼(A11 +

[A12 A13 0

]L)T

and hence the only difference between the two operators L andHT is thatA22 A23 I24A32 I33I42 F44

has been replaced by M, which we know is an O(m ) difference.

Noting that∥∥∥H−1

∥∥∥2

=∥∥∥(HT)−1

∥∥∥2, we see that the larger uncertainty in M generally

implies that

{cL}7.A.1 ≥ cH, {cL}7.A.1 − cH = O(m )

Next, from δH(−H )T = {δL}7.A.1 (H ) it follows that

cH0 =

{cL

0

}7.A.1

Hence for a given value of the trade-off parameter α, applying section 7.A.2 wouldyield the better bound on ρH, but the difference is small.

To see if there are any interesting differences in the more crucial bound on m, weassume that αL7.A.1 is selected optimal in this respect. In section section 7.A.2, thebound on (7.56) approaches

1cHch

from below as αH grows. Since {c2}7.A.1 ∼ 0, the condition (7.44) shall be used insection 7.A.1, {

m ≤ 1cE c1 cL

}7.A.1

where

{c1}7.A.1 ∼ c1 + m c2 ρL

This means that {cE c1}7.A.1 constitutes just one of the two positive terms in ch (see(7.54)), and hence the bound (7.56) is more restricting than (7.44), even for largevalues of αH. At the same time, (7.44) is valid as soon as

{αL ≥ 1

cL−1

}7.A.1

.

7.B Example data 221

To conclude this section, the most promising approach to the second decoupling stepis that in section 7.A.1. In a particular problem with uncertain data, however, manyof the triangle inequalities used to establish bounds in this section, take (7.54) as anexample, may be unnecessarily conservative; more direct methods for computing up-per bounds on the gain are likely to produce tighter bounds. Hence, we are unableto tell a priori which approach would superior for some given data. In the imple-mentation behind some of the examples in this thesis, we have only implementedsection 7.A.1.

7.B Example data

This section contains tables with matrix pair data referenced from section 7.1.3. Tofit the matrices on the page, the pair( [

E1 E2

],[A1 A2

] )where the partitioning is related with space constraints — and not with the real struc-ture in the pair — will be typeset as

[A1

[E1

· · ·A2]E2]

and rotated.


1.17

12±1

·10−

9−0.5

6744±1

·10−

90.

1013

4±1

·10−

91.

0545±1

·10−

90.

5480

1±1

·10−

9

1.54

82±1

·10−

90.

6351

3±1

·10−

9−0.8

1415±1

·10−

90.

3438

2±1

·10−

9−0.1

6533±1

·10−

9

0.49

793±

1·1

0−9

0.88

38±1

·10−

91.

1997±1

·10−

9−0.4

5008±1

·10−

9−1.5

43±1

·10−

9

0.40

038±

1·1

0−9

−7.1

453

·10−

2±1

·10−

90.

1257±1

·10−

90.

2926

8±1

·10−

9−6.5

648

·10−

2±1

·10−

9

−5.0

966

·10−

2±1

·10−

96.

6423

·10−

2±1

·10−

94.

5755

·10−

2±1

·10−

90.

4458

9±1

·10−

90.

3134

3±1

·10−

9

0.29

889±

1·1

0−9

−0.4

7065±1

·10−

90.

1418

9±1

·10−

90.

2915

2±1

·10−

9−0.5

7061±1

·10−

9

−0.3

3093±1

·10−

90.

7618±1

·10−

90.

6939

4±1

·10−

90.

1694

1±1

·10−

9−7.7

208

·10−

2±1

·10−

9

−5.9

865

·10−

3±1

·10−

9−0.6

3687±1

·10−

90.

9285

1±1

·10−

90.

5134

8±1

·10−

9−1.3

632±

1·1

0−9

−0.1

3283±1

·10−

91.

0851±1

·10−

9−0.4

3322±1

·10−

9−1.2

893±

1·1

0−9−2.6

105

·10−

2±1

·10−

9 0.

7529±1

·10−

90.

3394

6±1

·10−

95.

5605

·10−

2±1

·10−

9−0.4

5697±1

·10−

9−0.5

1955±1

·10−

9

5.12

93·1

0−2±1

·10−

9−0.3

9125±1

·10−

9−0.6

2017±1

·10−

95.

8362

·10−

2±1

·10−

91.

0882±1

·10−

9

0.58

978±

1·1

0−9

0.48

852±

1·1

0−9

0.15

069±

1·1

0−9

−0.2

2055±1

·10−

9−0.6

6258±1

·10−

9

5.27

23·1

0−2±1

·10−

90.

2873

4±1

·10−

9−0.1

4599±1

·10−

90.

5952

4±1

·10−

9−0.1

5435±1

·10−

9

0.71

232±

1·1

0−9

0.68

238±

1·1

0−9

6.26

73·1

0−2±1

·10−

90.

2420

8±1

·10−

9−0.5

7407±1

·10−

9

0.62

964±

1·1

0−9−0.1

8688±1

·10−

90.

8698

8±1

·10−

9−0.5

9554±1

·10−

99.

1776

·10−

2±1

·10−

9

0.64

198±

1·1

0−9−0.3

3192±1

·10−

90.

6417

7±1

·10−

9−0.6

2088±1

·10−

9−0.1

1226±1

·10−

9

8.10

8·1

0−2±1

·10−

90.

7407

1±1

·10−

99.

9538

·10−

2±1

·10−

9−4.7

671

·10−

3±1

·10−

9−1.4

279±

1·1

0−9

0.37

423±

1·1

0−9

0.10

846±

1·1

0−9−0.4

1661±1

·10−

92.

5639

·10−

2±1

·10−

90.

3367

3±1

·10−

9

···

−0.2

089±

1·1

0−9

−1.3

841±

1·1

0−9

0.22

632±

1·1

0−9

0.57

909±

1·1

0−9

0.29

551±

1·1

0−9

1.67

51±1

·10−

90.

1552±1

·10−

90.

3122

9±1

·10−

9

0.21

854±

1·1

0−9

2.49

01·1

0−2±1

·10−

90.

6208

5±1

·10−

9−0.2

5087±1

·10−

9

0.46

305±

1·1

0−9

0.44

453±

1·1

0−9

−0.8

6007±1

·10−

9−0.2

9519±1

·10−

9

−0.5

5363±1

·10−

9−0.4

0435±1

·10−

90.

6137

6±1

·10−

90.

7160

4±1

·10−

9

5.91

22·1

0−2±1

·10−

90.

1072

4±1

·10−

90.

1068

2±1

·10−

9−3.1

899

·10−

2±1

·10−

9

−0.7

5859±1

·10−

9−5.3

715

·10−

2±1

·10−

9−0.8

4413±1

·10−

97.

6798

·10−

2±1

·10−

9

−0.8

2638±1

·10−

9−0.3

1475±1

·10−

90.

3674

5±1

·10−

9−0.2

3708±1

·10−

9

0.18

827±

1·1

0−9

−0.7

2398±1

·10−

9−6.0

146

·10−

2±1

·10−

91.

176±

1·1

0−9

−0.1

0204±1

·10−

94.

0286

·10−

2±1

·10−

92.

2346

·10−

2±1

·10−

9−0.4

8771±1

·10−

9

−0.2

8646±1

·10−

9−1.2

42±1

·10−

9−0.1

3023±1

·10−

90.

1592

7±1

·10−

9

−0.1

3891±1

·10−

90.

3284

4±1

·10−

90.

3263

1±1

·10−

9−3.0

34·1

0−4±1

·10−

9

0.92

142±

1·1

0−9−0.7

3052±1

·10−

9−4.9

639

·10−

2±1

·10−

90.

5176

6±1

·10−

9

0.36

114±

1·1

0−9−0.2

8374±1

·10−

90.

5256

5±1

·10−

90.

8435

4±1

·10−

9

−0.1

8868±1

·10−

90.

5810

9±1

·10−

90.

3856

9±1

·10−

90.

8763

9±1

·10−

9

1.02

34±1

·10−

9−0.2

4213±1

·10−

9−0.7

8102±1

·10−

9−0.5

8414±1

·10−

9

0.16

812±

1·1

0−9

0.93

53±1

·10−

92.

6376

·10−

2±1

·10−

9−0.8

9432±1

·10−

9

−0.2

9085±1

·10−

9−0.7

3002±1

·10−

90.

2013

6±1

·10−

90.

2503

8±1

·10−

9

Table 7.2: The initial matrix pair.


0.88

959±

1.02

95·1

0−3−1.0

818±

7.63

46·1

0−4−1.3

864±

1.00

81·1

0−3

00

0.53

515±

1.93

35·1

0−3−0.5

101±

1.33

66·1

0−3−0.7

5049±1.9

634

·10−

30

0−0.3

6626±2.7

152

·10−

30.

5291

6±1.

8735

·10−

30.

6205±2.7

679

·10−

30

00

00

00

00

00

00

00

00

00

00

00

00

10

00

00

1 1

00

00

01

00

00

01

00

00

01

00

00

01

00

00

00

00

00

00

00

00

00

00

···

00

00

00

00

00

00

00

10

00

01

10

00

01

00

00

3.06

52·1

0−13±4.3

756

·10−

63.

145

·10−

12±4.7

619

·10−

6

00−4.2

35·1

0−13±5.6

642

·10−

6−3.8

18·1

0−12±6.2

896

·10−

6

0

00

00

00

00

00

00

00

00

00

01.

5999

·10−

16±6.2

549

·10−

61.

5112

·10−

16±1.7

972

·10−

62.

3882

·10−

16±4.0

818

·10−

79.

2357

·10−

17±6.2

096

·10−

7

−7.8

679

·10−

17±6.5

123

·10−

6−3.0

947

·10−

17±1.8

711

·10−

6−2.6

407

·10−

16±4.2

497

·10−

7−1.3

93·1

0−16±6.4

651

·10−

7

−5.2

925

·10−

17±2.2

497

·10−

64.

2506

·10−

16±6.4

639

·10−

7−2.4

822

·10−

16±1.4

681

·10−

7−2.3

711

·10−

16±2.2

334

·10−

7

5.98

23·1

0−17±2.5

663

·10−

6−2.9

196

·10−

16±7.3

733

·10−

73.

8433

·10−

16±1.6

747

·10−

72.

5761

·10−

16±2.5

477

·10−

7

Table 7.3: The decomposed pair in the proposed canonical form.


1.17

1±6.

99·1

0−2

−0.5

674±

2.23

1·1

0−2

0.10

13±1.7

08·1

0−2

1.05

5±1.

25·1

0−2

0.54

8±7.

038

·10−

2

1.54

8±6.

788

·10−

20.

6351±2.0

05·1

0−2

−0.8

142±

1.57

6·1

0−2

0.34

38±1.1

36·1

0−2

−0.1

653±

6.65

1·1

0−2

0.49

8±6.

367

·10−

20.

8838±2.5

12·1

0−2

1.2±

1.84

2·1

0−2

−0.4

501±

1.35

9·1

0−2

−1.5

43±7.0

37·1

0−2

0.40

04±6.8

09·1

0−2

−7.1

46·1

0−2±3.1

8·1

0−2

0.12

57±2.2

03·1

0−2

0.29

27±1.6

46·1

0−2−6.5

63·1

0−2±8.1

63·1

0−2

−5.1

13·1

0−2±6.5

4·1

0−2

6.64

5·1

0−2±4.2

52·1

0−2

4.57

4·1

0−2±2.8

34·1

0−2

0.44

59±2.1

38·1

0−2

0.31

35±9.3

78·1

0−2

0.29

9±4.

122

·10−

2−0.4

707±

2.68

6·1

0−2

0.14

19±1.8

45·1

0−2

0.29

15±1.4

25·1

0−2

−0.5

707±

5.85

8·1

0−2

−0.3

309±

0.12

860.

7618±3.4

14·1

0−2

0.69

39±2.7

21·1

0−2

0.16

94±1.9

81·1

0−2

−7.7

19·1

0−2±0.1

211

−6.0

25·1

0−3±6.0

27·1

0−2−0.6

369±

2.08

·10−

20.

9285±1.5

97·1

0−2

0.51

35±1.1

89·1

0−2

−1.3

63±6.2

52·1

0−2

−0.1

327±

7.33

9·1

0−2

1.08

5±3.

024

·10−

2−0.4

332±

2.17

9·1

0−2−1.2

89±1.6

08·1

0−2−2.6

17·1

0−2±8.2

85·1

0−2

0.75

29±4.4

2·1

0−3

0.33

95±1.0

08·1

0−3

5.56

·10−

2±8.7

74·1

0−4

−0.4

57±5.8

82·1

0−4

−0.5

196±

3.92

1·1

0−3

5.12

9·1

0−2±3.4

31·1

0−3−0.3

913±

8.26

2·1

0−4−0.6

202±

7.22

3·1

0−4

5.83

6·1

0−2±4.8

27·1

0−4

1.08

8±3.

102

·10−

3

0.58

98±4.8

59·1

0−3

0.48

85±1.1

94·1

0−3

0.15

07±9.9

69·1

0−4

−0.2

205±

6.36

·10−

4−0.6

626±

4.46

2·1

0−3

5.27

3·1

0−2±7.6

56·1

0−3

0.28

73±1.8

2·1

0−3

−0.1

46±1.5

24·1

0−3

0.59

52±9.7

55·1

0−4

−0.1

544±

6.94

1·1

0−3

0.71

23±1.0

83·1

0−2

0.68

24±2.6

59·1

0−3

6.26

7·1

0−2±2.1

72·1

0−3

0.24

21±1.3

4·1

0−3

−0.5

741±

9.99

2·1

0−3

0.62

96±5.7

21·1

0−3−0.1

869±

1.44

2·1

0−3

0.86

99±1.1

95·1

0−3

−0.5

955±

7.42

6·1

0−4

9.17

8·1

0−2±5.3

44·1

0−3

0.64

2±7.

147

·10−

3−0.3

319±

1.63

7·1

0−3

0.64

18±1.4

99·1

0−3

−0.6

209±

1.06

2·1

0−3

−0.1

123±

6.28

4·1

0−3

8.10

8·1

0−2±3.9

86·1

0−3

0.74

07±9.5

01·1

0−4

9.95

4·1

0−2±8.3

77·1

0−4−4.7

67·1

0−3±5.5

91·1

0−4−1.4

28±3.5

96·1

0−3

0.37

42±6.3

6·1

0−3

0.10

85±1.5

55·1

0−3−0.4

166±

1.29

2·1

0−3

2.56

4·1

0−2±8.1

48·1

0−4

0.33

67±5.8

36·1

0−3

···

−0.2

09±8.4

43·1

0−2

−1.3

84±4.0

42·1

0−2

0.22

63±1.6

45·1

0−2

0.57

92±5.8

76·1

0−2

0.29

55±8.1

49·1

0−2

1.67

5±3.

565

·10−

20.

1552±1.4

18·1

0−2

0.31

23±5.7

35·1

0−2

0.21

86±7.7

05·1

0−2

2.48

9·1

0−2±4.7

06·1

0−2

0.62

09±1.9

47·1

0−2

−0.2

509±

5.31

7·1

0−2

0.46

31±8.3

1·1

0−2

0.44

45±6.1

01·1

0−2

−0.8

601±

2.56

9·1

0−2

−0.2

953±

5.55

5·1

0−2

−0.5

539±

8.09

8·1

0−2

−0.4

044±

8.41

3·1

0−2

0.61

38±3.6

18·1

0−2

0.71

62±5.2

61·1

0−2

5.91

7·1

0−2±5.1

39·1

0−2

0.10

73±5.2

93·1

0−2

0.10

68±2.2

6·1

0−2

−3.1

91·1

0−2±3.3

9·1

0−2

−0.7

586±

0.15

4−5.3

71·1

0−2±5.8

7·1

0−2−0.8

441±

2.26

9·1

0−2

7.67

5·1

0−2±0.1

076

−0.8

265±

7.29

1·1

0−2

−0.3

148±

3.75

6·1

0−2

0.36

75±1.5

09·1

0−2

−0.2

37±5.0

86·1

0−2

0.18

84±8.8

88·1

0−2

−0.7

24±5.7

11·1

0−2−6.0

14·1

0−2±2.3

86·1

0−2

1.17

6±6.

08·1

0−2

−0.1

02±5.4

91·1

0−3

4.02

9·1

0−2±1.8

63·1

0−3

2.23

5·1

0−2±8.3

36·1

0−4

−0.4

877±

3.91

6·1

0−3

−0.2

865±

4.26

·10−

3−1.2

42±1.5

36·1

0−3

−0.1

302±

6.85

2·1

0−4

0.15

93±3.0

59·1

0−3

−0.1

389±

6.07

·10−

30.

3284±2.2

96·1

0−3

0.32

63±1.0

3·1

0−3

−3.0

25·1

0−4±4.3

16·1

0−3

0.92

14±9.5

99·1

0−3−0.7

305±

3.48

8·1

0−3−4.9

64·1

0−2±1.5

91·1

0−3

0.51

77±6.7

89·1

0−3

0.36

11±1.3

63·1

0−2−0.2

837±

5.20

4·1

0−3

0.52

57±2.3

61·1

0−3

0.84

35±9.6

15·1

0−3

−0.1

887±

7.11

·10−

30.

5811±2.7

97·1

0−3

0.38

57±1.1

89·1

0−3

0.87

64±5.1

13·1

0−3

1.02

3±8.

84·1

0−3

−0.2

421±

2.90

4·1

0−3

−0.7

81±1.2

87·1

0−3

−0.5

841±

6.38

6·1

0−3

0.16

81±4.9

11·1

0−3

0.93

53±1.7

41·1

0−3

2.63

8·1

0−2±7.3

67·1

0−4

−0.8

943±

3.56

·10−

3

−0.2

908±

7.97

8·1

0−3

−0.7

3±3.

009

·10−

30.

2014±1.3

71·1

0−3

0.25

04±5.6

47·1

0−3

Table 7.4: Reconstruction of the original pair by applying the reverse transfor-mations to the pair in its canonical form. Transformations are collapsed to justone matrix on each side of the pair, before the pair itself is transformed.


1.17

1±1.

161

−0.5

674±

0.16

220.

1013±0.1

536

1.05

5±0.

1129

0.54

8±0.

9027

1.54

8±2.

023

0.63

51±0.2

839

−0.8

141±

0.26

830.

3438±0.1

969

−0.1

653±

1.57

40.

4979±2.4

30.

8838±0.3

404

1.2±

0.32

15−0.4

501±

0.23

61−1.5

43±1.8

890.

4004±1.8

21−7.1

45·1

0−2±0.2

545

0.12

57±0.2

403

0.29

27±0.1

768−6.5

66·1

0−2±1.4

15−5.1

·10−

2±3.8

396.

641

·10−

2±0.5

376

4.57

4·1

0−2±0.5

076

0.44

59±0.3

730.

3135±2.9

850.

2989±2.9

61−0.4

706±

0.41

610.

1419±0.3

929

0.29

15±0.2

882

−0.5

706±

2.30

4−0.3

309±

2.28

80.

7618±0.3

214

0.69

39±0.3

032

0.16

94±0.2

226−7.7

21·1

0−2±1.7

79−5.9

87·1

0−3±1.5

66−0.6

369±

0.21

840.

9285±0.2

065

0.51

35±0.1

523

−1.3

63±1.2

17−0.1

328±

1.50

11.

085±

0.21

07−0.4

332±

0.19

89−1.2

89±0.1

46−2.6

1·1

0−2±1.1

68 0.

7529±2.2

72·1

0−2

0.33

95±3.3

92·1

0−3

5.56

1·1

0−2±3.5

76·1

0−3

−0.4

57±2.3

9·1

0−3

−0.5

196±

1.79

3·1

0−2

5.12

9·1

0−2±3.8

55·1

0−2−0.3

913±

5.70

4·1

0−3−0.6

202±

5.89

2·1

0−3

5.83

6·1

0−2±4.0

16·1

0−3

1.08

8±3.

035

·10−

2

0.58

98±4.6

95·1

0−2

0.48

85±6.9

88·1

0−3

0.15

07±7.2

63·1

0−3

−0.2

205±

4.90

9·1

0−3

−0.6

626±

3.70

2·1

0−2

5.27

2·1

0−2±3.8

34·1

0−2

0.28

73±5.6

95·1

0−3

−0.1

46±5.9

91·1

0−3

0.59

52±4.0

08·1

0−3

−0.1

544±

3.02

4·1

0−2

0.71

23±7.6

14·1

0−2

0.68

24±1.1

4·1

0−2

6.26

7·1

0−2±1.1

97·1

0−2

0.24

21±7.9

88·1

0−3

−0.5

741±

6.01

3·1

0−2

0.62

96±5.6

35·1

0−2−0.1

869±

8.42

4·1

0−3

0.86

99±8.7

47·1

0−3

−0.5

955±

5.88

6·1

0−3

9.17

8·1

0−2±4.4

47·1

0−2

0.64

2±4.

694

·10−

2−0.3

319±

7.05

9·1

0−3

0.64

18±7.3

1·1

0−3

−0.6

209±

4.97

2·1

0−3

−0.1

123±

3.70

7·1

0−2

8.10

8·1

0−2±2.8

01·1

0−2

0.74

07±4.2

07·1

0−3

9.95

4·1

0−2±4.4

54·1

0−3−4.7

67·1

0−3±2.9

46·1

0−3−1.4

28±2.2

14·1

0−2

0.37

42±3.3

96·1

0−2

0.10

85±5.0

04·1

0−3−0.4

166±

5.18

6·1

0−3

2.56

4·1

0−2±3.5

29·1

0−3

0.33

67±2.6

73·1

0−2

···

−0.2

089±

1.36

9−1.3

84±0.2

295

0.22

63±7.8

82·1

0−2

0.57

91±0.9

775

0.29

55±2.3

851.

675±

0.40

230.

1552±0.1

375

0.31

23±1.7

040.

2185±2.8

642.

49·1

0−2±0.4

825

0.62

09±0.1

649

−0.2

509±

2.04

60.

4631±2.1

460.

4445±0.3

601

−0.8

601±

0.12

32−0.2

952±

1.53

2−0.5

537±

4.52

5−0.4

043±

0.76

190.

6138±0.2

604

0.71

61±3.2

325.

913

·10−

2±3.4

90.

1072±0.5

90.

1068±0.2

016

−3.1

91·1

0−2±2.4

94−0.7

586±

2.69

7−5.3

72·1

0−2±0.4

55−0.8

441±

0.15

57.

68·1

0−2±1.9

27−0.8

264±

1.84

7−0.3

147±

0.30

830.

3674±0.1

056

−0.2

371±

1.31

70.

1883±1.7

7−0.7

24±0.2

986−6.0

14·1

0−2±0.1

019

1.17

6±1.

265

−0.1

02±2.7

02·1

0−2

4.02

9·1

0−2±5.1

51·1

0−3

2.23

5·1

0−2±2.0

46·1

0−3

−0.4

877±

1.99

3·1

0−2

−0.2

865±

4.57

8·1

0−2−1.2

42±8.5

07·1

0−3

−0.1

302±

3.24

7·1

0−3

0.15

93±3.3

5·1

0−2

−0.1

389±

5.57

9·1

0−2

0.32

84±1.0

49·1

0−2

0.32

63±4.0

45·1

0−3

−3.0

32·1

0−4±4.0

94·1

0−2

0.92

14±4.5

6·1

0−2

−0.7

305±

8.63

6·1

0−3−4.9

64·1

0−2±3.4

21·1

0−3

0.51

77±3.3

58·1

0−2

0.36

11±9.0

55·1

0−2−0.2

837±

1.72

6·1

0−2

0.52

57±6.7

71·1

0−3

0.84

35±6.6

71·1

0−2

−0.1

887±

6.69

6·1

0−2

0.58

11±1.2

69·1

0−2

0.38

57±4.8

33·1

0−3

0.87

64±4.9

16·1

0−2

1.02

3±5.

579

·10−

2−0.2

421±

1.04

8·1

0−2

−0.7

81±3.9

75·1

0−3

−0.5

841±

4.08

8·1

0−2

0.16

81±3.3

34·1

0−2

0.93

53±6.4

13·1

0−3

2.63

8·1

0−2±2.5

73·1

0−3

−0.8

943±

2.46

5·1

0−2

−0.2

908±

4.03

4·1

0−2

−0.7

3±7.

492

·10−

30.

2014±2.8

93·1

0−3

0.25

04±2.9

54·1

0−2

Table 7.5: Reconstruction of the original pair by applying the reverse transfor-mations to the pair in its canonical form. Transformations are applied one byone.

8LTV ODE of nominal index 1

In the previous chapter, we explored some of the difficulties in generalizing the re-sults for lti systems of nominal index 1 to nominal index 2. In this chapter we takeon another generalization of the lti nominal index 1 results, namely that to time-varying systems. In view of the failure to produce a general convergence result forthe lti systems of nominal index 2, treating ltv systems of nominal index 1 is thebest we can hope for. Unlike chapter 6, the technicalities of dealing with non-zeropointwise indicies are avoided in the current chapter (compare with section 6.8).

The idea to use a fixed-point theorem to prove the existence of decoupling transformsfor ltv systems appears in Chang (1969, 1972). When it appears again in Kokotovićet al. (1986, section 5:2), it has been modified slightly, and we shall remark on thedifference in due time.

The chapter is organized as follows. Section 8.1 prepares the analysis of systemswith timescale separation by considering systems where only the fast time scale ispresent. For ltv dae of nominal index 1, the first steps of analysis (correspondingto section 6.1 for lti systems) lead to the linear time-varying matrix-valued singularperturbation form

x′(t) + A11(t) x(t) + A12(t) z(t) != 0

E(t) z′(t) + A21(t) x(t) + A22(t) z(t) != 0

The decoupling of these equations into slow and uncertain subsystems is the topicof section 8.2, and section 8.3 contains some remarks on the difference comparedto the scalar perturbation case. In section 8.4 the results of previous sections aresummarized in a theorem for ltv dae of nominal index 1. Section 8.5 concludes thechapter.

227

228 8 ltv ode of nominal index 1

8.1 Slowly varying systems

Here, the results in Kokotović et al. (1986, section 5.2) are generalized to matrix-valued singular perturbations. The form of equations to be analyzed is

E(t) z′(t) + A(t) z(t) != 0 (8.1)

where E(t) is an unknown square matrix, which is at least assumed non-singular andwith a known bound on the entries, max(E(t)) ≤ m. (For comparison, the uncertaintyhas the form E(t) = εI in Kokotović et al. (1986).) Our interest is restricted to systemswhose time-invariant approximations at each time instant are stable, as formalizedby the following assumption about the eigenvalues λ of the pair ( E(t), A(t) ) for afixed t:

A1–[8.1] Assumption. Assume there exist constants R0 > 0, φ0 ∈ [ 0, π/2 ), anda > supt max(A(t)) such that

|λ| m < a and |λ| > R0 =⇒ |arg(−λ)| ≤ φ0 (A1–8.2)

where a presents a trade-off between generality of the assumption and the quantita-tive properties of the forthcoming convergence results.

We refer to section 6.5, A1–[6.14], and lemma 6.18 for illustration and discussion ofthis assumption. The method used in experiments to produce time-varying pertur-bations in agreement with A1–[8.1] is described in appendix A.

Two more constants are introduced to specify properties of A.

P1–[8.2] Property. The constant c2 < ∞ shall be chosen so that

‖A‖I ≤ c2 (P1–8.3)

P2–[8.3] Property. The constant c3 < ∞ shall be chosen so that∥∥∥A−1∥∥∥I≤ c3 (P2–8.4)

The bound on ‖A‖I is used also in Kokotović et al. (1986), while the bound on∥∥∥A−1

∥∥∥I

is a consequence of the need to deal with the matrix-valued uncertainty E instead ofa scalar.

8.4 Remark. To depend on a bound on∥∥∥A−1

∥∥∥I

is actually very natural for the present setup.Since a bound on ‖A‖I is needed, it is realized that the smaller this bound is, the stronger theconclusions regarding convergence will be. Further, any convergence result should be suchthat smaller values of the bound m on E(t) also leads to stronger conclusions. Hence, if therewould be no need to bound

∥∥∥A−1∥∥∥I, scaling both E and A by some positive factor less than

1 would yield stronger results! This is clearly contradictory, and we may consider bounding∥∥∥A−1∥∥∥I

as a way of fixing the scaling of the problem so that an absolute interpretation of mbecomes meaningful.

8.5 Remark. That P2–[8.3] should relate to the scaling of the problem is also well in agree-ment with example 6.28, where the scaling of the problem was fixed by inverting the trailingmatrix. Then, a bounded

∥∥∥A(t)−1∥∥∥

2 ensures that max(A(t)−1E(t)

)will still be O(m ). In view

8.1 Slowly varying systems 229

of the possibility to invert the trailing matrix, and in view of the success of this approach in

example 6.28, it would also make sense to use E(t) z′(t) + z(t) != 0 as a starting point insteadof (8.1). On the other hand, the inversion of the trailing matrix will generally introduce addi-tional uncertainty in the problem, and it is therefore of value to not assume that this has beendone beforehand.

The assumption A1–[8.1] is recognized as (A1–6.26) in chapter 6, to which we referfor illustration and discussion of this condition. The following results are readilyextracted from corollary 6.19 in the same chapter.

8.6 Lemma. Under A1–[8.1], there is a constant k1 such that∥∥∥mE(t)−1A(t)∥∥∥

2≤ k1 (8.5)

Proof: This is readily extracted from the proof of corollary 6.19.

Since

λmin

(−E(t)−1 A(t)

)≥

∥∥∥A(t)−1 E(t)∥∥∥−1

2≥

∥∥∥A(t)−1∥∥∥−1

2 ‖E(t)‖−12 ≥ c

−13 n−1 m−1

(where n is the dimension of (8.1)) we will only consider

m ≤ R−10 c−1

3 n−1 (8.6)

in the rest of the chapter, so that the A1–[8.1] argument bound on the uncertaineigenvalues applies.

8.7 Lemma. Assume A1–[8.1] and take m according to (8.6). Then there exist con-stants K∗ and a∗ > 0 such that for all θ ≥ 0,∥∥∥∥e−E(t)−1A(t) θ

∥∥∥∥2≤ K∗ e−a∗ θ (8.7)

Proof: As in corollary 6.19, we use that A1–[8.1] together with (8.6) implies thatthere exists a constant a∗ > 0 such that

α(−E(t)−1A(t)

)< −2

12m−1c−1

3 n−1 cos(φ0 )︸︷︷︸≥a∗

< 0

Hence (using lemma 8.6) the ratio ∥∥∥−E(t)−1A(t)∥∥∥

2

−α(−E(t)−1A(t) )

is also bounded by a constant independent of t, and the bound (8.7) is now a conse-quence of theorem 2.27.

8.8 Corollary. Assuming A1–[8.1], using P2–[8.3], and taking m according tolemma 8.7, there is a bound on ∥∥∥mE−1

∥∥∥I


Proof: ∥∥∥mE−1∥∥∥I

=∥∥∥mE−1AA−1

∥∥∥I≤ c3 k1

The most important thing in lemma 8.7 is that the exponential decay rate (with re-spect to θ) is independent of t. From here, it would be possible to derive results alonga parallel path to that taken in Kokotović et al. (1986), but instead of going along aparallel track we shall build upon those results.

Since it has been assumed that E(t) is invertible, the system (8.1) can be written inode form. Scaling the equation by m, the ode reads

mz′(t) = −mE(t)−1A(t) z(t) (8.8)

which reminds of the standard singular perturbation setup, although it contains thematrix-valued uncertainty E(t) on the right hand side. However, not much has tobe known about the right hand side in order to apply the results in Kokotović et al.(1986), and the assumptions 2.1 and 2.2 made there have already been treated inthe current context as part of the proof of lemma 8.7. We now make an additionalassumption regarding the time-variability, corresponding to assumption 2.3 in Koko-tović et al. (1986).

A2–[8.9] Assumption. Assume∥∥∥∥∥ ddtm E(t)−1A(t)

∥∥∥∥∥I≤ β1 (A2–8.9)

for some constant β1.

This assumption involves the time variability of E(t), and may seem hard to justifyin applications. However, in seminumerical approaches to index reduction in dae,the uncertainty E(t) may be a symbolic expression which one is unable (or unwillingto try) to reduce to zero, and then it may be possible to compute a true bound on thetime variability of E(t). By writing

m(E−1A

)′= mE−1AA−1A′ −mE−1AA−1m−1E′mE−1A

it is seen that (using lemma 8.6)∥∥∥∥∥ ddtm E(t)−1A(t)

∥∥∥∥∥2≤ k1 c3

( ∥∥∥A′(t)∥∥∥2

+ k1

∥∥∥m−1E′(t)∥∥∥

2

)It follows that the following two conditions may be a useful alternative to A2–[8.9].

P3–[8.10] Property. The constant β2 < ∞ shall be chosen so that∥∥∥A′∥∥∥I≤ β2 (P3–8.10)

A3–[8.11] Assumption. Assume ∥∥∥E′∥∥∥I≤ mβ3 (A3–8.11)

for some constant β3.

8.1 Slowly varying systems 231

The second of these should be interpreted as a requirement that the bound on ‖E′‖Iscales with the bound on ‖E‖I , which should be reasonable in many situations.

8.12 Lemma. Given a consistent choice of eigenvalue conditions, selecting β3 ≥β2c2

in A3–[8.11] is sufficient to ensure the existence of a perturbation E.

Proof: It suffices to note that the instantiation E(t) = msupt max(A(t)) A(t) satisfies (for

all t) both max(E(t)) ≤ m and max(E′(t)) = mc2

max(A′(t)) ≤ m β2c2

.

The assumptions made so far allow us, according to lemma 2.29, to approximatethe solution to the time-varying system (8.1) by a time-invariant system, as m →0. The lemma would have given a rather detailed account of the convergence if−mE(s)−1A(s) would have been known — in the usual singular perturbation setup−mE(s)−1 = I and A is assumed to be a known slowly varying matrix — but herethe presence of the matrix-valued uncertainty E(s) implies that convergence to zerois the only kind of convergence we can hope for. Since t > s implies∥∥∥∥e−mE(s)−1A(s) ( t−s )/m

∥∥∥∥2→ 0, as m→ 0

according to theorem 2.27, pointwise convergence of φ( t, s ) for t ≥ s is established(here, we used φ( s, s ) = I ). However, we shall also include another proof of this factwithout taking the detour via lemma 2.29.

Let P (t) be the solution to the time-invariant Lyapunov equation (2.44) with M sub-stituted by −mE(t)−1A(t) (so that z′ = − 1

m M). Then

V ( t, z ) = zTP (t) z

is a time-dependent Lyapunov function candidate. Since

ddtV ( t, z(t) ) = − 1

mz(t)T

(P (t)M(t) + M(t)TP (t)

)z(t)

+ z(t)TP ′(t) z(t)

≤ − 1m

(1 −m

∥∥∥P ′(t)∥∥∥2

)|z(t)|2

this can be made negative by taking m sufficiently small if there is a bound on ‖P ′‖I .In addition to such a bound, a bound on ‖P ‖I will allow V ( 0, z(0) ) to be bounded (inrelation to z(0), of course, and for this we only need a bound on ‖P (0)‖2), and a boundon

∥∥∥P −1∥∥∥I

will show that |z(t)| is bounded by a decreasing function. The upper boundon ‖P (0)‖2 is readily obtained by use of (8.7) in the formal solution (2.45) to the Lya-punov equation. The corresponding lower bound is established by theorem 2.31. Tobound P ′(t), the time-dependent Lyapunov equation may be differentiated with re-spect to t, which yields a new Lyapunov equation in P ′(t), and whose formal solutionturns out to be bounded by 2 β1 and the bound for ‖P ‖I squared. These results areapplied in the next lemma.


8.13 Lemma. Under P1–[8.2], P2–[8.3], A2–[8.9], the time-varying system (8.1) is�

uniformly[γw e−

λw(m )m •

]-stable, with

γw = K∗

√c2

a∗

λw(m ) = c2

(1 − m

2 a2∗ /( β1 K4

∗ )

) (8.12)

Proof: Inserting the bounds on ‖P ‖I (upper and lower) and ‖P ′‖I (upper), the con-vergence can be stated via the coupled system in |z(t)| and V (t) = V ( t, z(t) ):

|z(t)| ≤√

2 c2 V (t)

V (0) ≤ K2∗

2 a∗|z(0)|2

V ′(t) ≤ − 1m

(1 − m

2 a2∗ /( β1 K4

∗ )

)|z(t)|2

Solving this system with equalities everywhere will give upper bounds as functionsof t. With V (t) being the upper bound for V (t), one obtains

V (t) ≤ V (t) = V (0) e− 2 c2

m

(1− m

2 a2∗ /( β1 K4∗ )

)t

and it just remains to take a square root.

8.14 Corollary. Under the assumptions of lemma 8.13, bounding m by a constant

less than 2 a2∗ /( β1 K

4∗ ) makes the system (8.1) uniformly

[γw e−

λwm •

]-stable where

λw > 0 is independent of m.

Proof: Follows immediately from lemma 8.13.

8.2 Time-varying systems with timescale separation

In the last section, the main tool was to use bounds for −mE(t)−1A(t), which led toimmediate application of previous results valid for scalar perturbations. In this sec-tion we shall study the decoupling transform in the presence of a matrix-valued un-certainty. For scalar perturbations in time-varying systems, a common technique is tostudy series expansions in the (scalar) perturbation variable.(Naidu, 2002) However,the technique is demanding to generalize since expanding a multivariable functioncan easily result in an overwhelming amount of bookkeeping.

� The choice of notation for the parameters is motivated by the context where the lemma is applied insection 8.2.

8.2 Time-varying systems with timescale separation 233

For the system

x′(t) + A11(t) x(t) + A12(t) z(t) != 0 (8.13x)

E(t) z′(t) + A21(t) x(t) + A22(t) z(t) != 0 (8.13z)

our goal is to study the decoupling transform, that is, the change(s) of variables thatisolates the fast and uncertain dynamics from the slow dynamics which we wish toapproximate. In a fashion similar to Kokotović et al. (1986) the existence of thesetransforms will be established constructively so that their approximation propertiesfor small m are made visible, where m now is the upper bound on E in (8.13) ratherthan (8.1).

8.2.1 Overview

The first decoupling step serves to eliminate x from (8.13z). Making the change ofvariables (compare (6.15) in the time-invariant case)

z(t) = L(t) x(t) + η(t) (8.14)

and eliminating x′(t) from (8.13z) by row operations, one obtains

x′ + (A11 + A12 L ) x + A12 η!= 0 (8.15x)

E η′ + N1 x + (A22 − E LA12 ) η != 0 (8.15η)

where N1 is an expression that is to be eliminated by the choice of L. Equating N1with 0 gives

E L′!= −A21 − A22 L + E L (A11 + A12 L ) (8.16)

Assuming that this equation is solved by the choice of L (we will have to ensurethat the solution can be approximated well for sufficiently small m, even though Eis unknown), we can proceed to the second decoupling step; elimination of η from(8.15x) by the change of variables

x(t) = ξ(t) + mH(t) η(t) (8.17)

Making the change of variables and eliminating η′(t) from (8.15x) by row operations,one obtains

ξ ′ + (A11 + A12 L ) ξ + N2 η!= 0 (8.18ξ)

E η′ + (A22 − E LA12 ) η != 0 (8.18η)

where N2 is to be eliminated by the choice of H . Equating N2 with 0 gives

mH ′!= mH

(E−1A22 − LA12

)− A12 −m (A11 + A12 L )H (8.19)

Until this point, the steps taken to derive the decoupling transform have been almostidentical to those in Kokotović et al. (1986, section 5.3), but as they make a series ex-pansion of L and H in the scalar perturbation parameter (with the coefficients beingfunctions of time), the close similarity must end here.


To make use of the results in the previous section we need to replace P2–[8.3]. Itwould be an unnecessary restriction to require that the slow dynamics of the coupledsystem must not have eigenvalues at the origin, and the property we need is anotherone, stated next.

P4–[8.15] Property. The constant c3 < ∞ shall be chosen so that∥∥∥A−122

∥∥∥I≤ c3 (P4–8.20)

8.2.2 Eliminating slow variables from uncertain dynamics

Since we are interested in convergence as m → 0, rather than forming an expansionof L (and soon H) in E, we write L in the form�

L(t) = mN RL(t) +N−1∑j=0

mj Lj (t) (8.21)

where, in order for the expansion to be meaningful, we must ensure that RL isbounded independently of m, and that each Lj can be approximated sufficiently wellindependently of m. In this work, the main focus is to prove convergence and itsuffices to consider N = 1 — larger values of N are of interest when a more accuratesolution to the original equations is sought. In general, it will only be possible toshow that the expansion is valid for sufficiently small m, but it is of interest to find anupper bound on m where validity is known. To find equations for RL and Lj , (8.21)is used in (8.16), occurrences of E are rewritten as m (m−1E ), and equal powers in m(considering m−1E as one unit) are identified. For N = 1 (8.16) is written

m (m−1E)[mR′L + L′0

] != −A21 − A22 [mRL + L0 ]

+ m (m−1E) [mRL + L0 ] (A11 + A12 [mRL + L0 ] )

Gathering them0 terms and collecting what remains, the following two equations areobtained

0 != A21 + A22 L0 (8.22a)

mR′L + L′0!= −mE−1A22 RL + [mRL + L0 ] (A11 + A12 [mRL + L0 ] )

= −m[E−1A22 − L0 A12

]RL

+ L0 (A11 + A12 L0 )︸︷︷︸f1

+m RL (A11 + A12 (mRL + L0 ) )︸︷︷︸f2(RL )

(8.22b)

By this identification, L0(t) = −A22(t)−1A21(t) is completely independent of E, andassuming A21 and A22 are bounded with bounded derivatives, P2–[8.3] implies that

� Unlike the previous chapters on ltv systems, we now write Lj instead of Lj to denote the different func-tions in the expansion. This notation allows the usual notation for the differentiated function to be usedconveniently, and unlike the nominal index 2 case, there is no partitioning of L with blocks to be referredto using subscripts.


both L0(t) and L′0(t) can be bounded independently of t. It follows that there exists ac4 such that ∥∥∥−L′0 + f1

∥∥∥I≤ c4 (8.23)

It remains to establish boundedness of RL, and in the spirit of Kokotović et al. (1986)and section 7.A.1 we do so by a contraction mapping argument. We shall define anoperator S to act on RL ∈ L =

{RL : ‖RL‖I < ρL

}(ρL will be chosen later) such that

• S RL = RL implies that RL solves (8.22b).

• S maps L into itself if ρL is chosen small enough.

•∥∥∥S RL,1 − S RL,2

∥∥∥I< c

∥∥∥RL,1 − RL,2

∥∥∥I

for some c < 1.

When these conditions hold, it follows that the fix point equation S RL = RL has aunique solution in L, hence establishing boundedness of the solution to (8.22b).

We now introduce the approximation of the η subsystem (8.18η) obtained by replac-ing the decoupling function L by all terms in the expansion (8.21) except for the sofar unknown rest term mRL,

E w′ + (A22 − E L0 A12 )w != 0 (8.24)

It is for this system the assumption A1–[8.1] will be made in the current context. Wemust remark that this is not quite satisfying since (8.24) has not yet been related tothe system features of the system modeled by the equations. We shall discuss thischoice briefly soon hereafter, and then discuss the issue in more detail in section 8.A.

We now check P1–[8.2], P2–[8.3], and P3–[8.10] for the trailing matrix in (8.24) inplace of the A of section 8.1 as follows.

• P2–[8.3] may be checked by first bounding∥∥∥A−1

22

∥∥∥I

and ‖L0 A12‖I , and then us-ing corollary 2.47. Less restrictive bounds on m can be obtained by consideringthe bound as a function of time.

• P1–[8.2] is checked directly by inspection of A22, L0 A12 = −A−122 A21 A12 and

any of the bounds imposed on m (for instance the one used to check P2–[8.3]).

• P3–[8.10] is checked using A3–[8.11] and inspection of A′22 and the time deriva-tive of L0 A12.

Applying corollary 8.14 to (8.24) instead of (8.1) now gives that, for sufficiently small

m, (8.24) is uniformly[γw e−

λwm •

]-stable for some constants γw and λw > 0.

Let φ denote the transition matrix of (8.24), so that taking m sufficiently small gives

φ(t, t) = I

φ(•, τ)′(t) = −(E(t)−1A22(t) − L0(t)A12(t)

)φ(t, τ)

φ(τ, •)′(t) = φ(τ, t)(E(t)−1A22(t) − L0(t)A12(t)

)∥∥∥φ(t, τ)

∥∥∥2≤ γw e−

λwm ( t−τ )


Regarding the choice of system associated with φ, Kokotović et al. (1986) differs fromthe original Chang (1969), and our choice is a compromise. We prefer not to followChang (1969) since that would require the properties of (8.18η), where L appears,to be checked in the process of determining L. On the other hand, we prefer notto follow Kokotović et al. (1986) since that would make the discrepancy larger thannecessary between equations for which A1–[8.1] is assumed and equations which canbe related to the system being modeled. A feature of this work shared by Kokotovićet al. (1986) but not by Chang (1969), is the splitting of L into a nominal part andhigher order terms, providing better insight into approximation properties and thedecoupled problem.

By differentiation with respect to t, the following choice of S is verified to be com-patible with (8.22b) (compare example 2.48, (2.88))

(S RL) (t) 4=1m

t∫0

φ(t, τ)[−L′0(τ) + f1(τ) + m f2(RL )(τ)

]dτ (8.25)

In addition to (8.23), RL ∈ L implies that

‖f2(RL )‖I ≤ cξξρL + m cxz ρ2L (8.26)

for some cξξ , cxz that satisfy

‖A11 + A12 L0‖I < cξξ ‖A12‖I < cxz (8.27)

To ensure that S maps L into itself, we note that for RL ∈ L,

‖S RL‖I ≤1m

t∫0

γw e−λwm ( t−τ )

[c4 + m cξξρL + m2 cxz ρ

2L

]dτ

≤γw c4

λw+γw

λw

(m cξξρL + m2 cxz ρ

2L

)The second of these terms will be made small by imposing a bound onm, so by takingαL > 0 and setting

ρL = ( 1 + αL )γw c4

λw(8.28)

we obtain ‖S RL‖I < ρL whenever m cξξρL + m2 cxz ρ2L < αL c4, or

m <λw

( 1 + αL ) γw c4

− cξξ2 cxz+

√αL c4

cxz+

(cξξ

2 cxz

)2

= αLλw

cξξ γw+ O(α2

L )

(8.29)

To establish the contraction on L, note that

S RL,2(t) − S RL,1(t) =

t∫0

φ(t, τ)[f2(RL,2 )(τ) − f2(RL,1 )(τ)

]dτ


Hence, a sufficient condition for S to be a contraction on L is available in (using (7.37)to express the difference between the terms quadratic in RL)∥∥∥S RL,2 − S RL,1

∥∥∥I≤mγw

λw

(cξξ + 2 cxz ρL

) ∥∥∥RL,2 − RL,1

∥∥∥I

= m

(cξξ γw

λw+ 2 cxz c4 ( 1 + αL )

) ∥∥∥RL,2 − RL,1

∥∥∥I

As αL → 0, this the condition for contraction tends to a constant positive bound onm (we use 0.99 < 1 just to ensure strict contraction),

m <0.99

cξξ γwλw

+ 2 cxz c4 ( 1 + αL )

=0.99

cξξ γwλw

+ 2 cxz c4+ O(αL )

(8.30)

Hence, for small αL (8.30) will be implied by (8.29), but while the bound on m in(8.29) is initially increasing with αL, (8.30) is decreasing for all values of αL so onegenerally has to consider both bounds.

This concludes the contraction mapping argument for RL in the expansion L = L0 +mRL. That is, by taking m less than all of the finitely many positive bounds imposedon it, we obtain ‖RL‖I ≤ ρL. The corresponding decoupling transformation will beapplied in section 8.4.

8.2.3 Eliminating uncertain variables from slow dynamics

We now turn to the second change of variables given by (8.17), where H satisfies(8.19), which can be partitioned either as

mH ′!= mH

(E−1A22 − LA12

)− A12 −m [A11 + A12 L ]H

or

mH ′!= mH

(E−1A22 − L0 A12

)− A12 −m (mH RL A12 + [A11 + A12 L ]H )︸︷︷︸

g2(H )

depending on whether one prefers to reuse the transition matrix φ from the previoussection, or if one rather makes assumptions regarding the (8.18η) system which hasbeen isolated as a subsystem if the system being modeled. Using the first partition-ing, the expression for g2 becomes easier to work with, and to check P3–[8.10] onecan use A3–[8.11] and corollary 8.8 in (8.16). On the other hand, assumptions havealready been made corresponding to the latter partitioning, and this is the choicemade here.

The following approximate expression for the solution can be derived by making anexpansion of H in powers of m,

H = A12 A−122m

−1E + O(m)


Since this expression is dominated by a term which can only be bounded, but notfurther approximated, there is little use of an expansion ofH in powers ofm. Instead,we aim directly for a bound on ‖H‖I , and introduce an operator for this purpose.Let the following operator T be defined for H ∈ H =

{H : ‖H‖I < ρH

}(ρH will be

chosen soon). (Compare example 2.48, (2.89))

T H 4=1m

tf∫t

[A12(τ) + mg2(H )(τ) ] φ(τ, t) dτ (8.31)

For m sufficiently small to make the previous approximation of L valid, it holds that

‖g2(H )‖I‖H‖I

≤ mρL ‖A12‖I + ‖A11 + A12 L0‖I + mρL ‖A12‖I

≤ cξξ + 2mρL cxz

(8.32)

and we obtain

‖T H‖I ≤(‖A12‖I + m

[cξξ + 2mρL cxz

]ρH

) 1m

tf∫t

∥∥∥φ(τ, t)∥∥∥I

dτ

≤(cxz + m cξξ ρH + 2m2 ρL cxz ρH

) γw

λw

We now pick αH > 0 and set�

ρH = ( 1 + αH )γw cxz

λw

which means that T maps H into itself whenever m cξξ ρH + 2m2 ρL cxz ρH ≤ αH cxz.That is, we require

m ≤ 1ρH

√αH

ρH

2 ρL+

(cξξ

2 cxz

ρH

2 ρL

)2

−cξξ

2 cxz

ρH

2 ρL

= αH

cxz

cξξ ρH+ O(α2

H)

(8.33)

Since g2(H ) is linear in H , g2(H2 ) − g2(H1 ) = g2(H2 − H1 ), and hence

‖g2(H2 ) − g2(H1 )‖I ≤(cξξ + 2mρL cxz

)‖H2 − H1‖I

In

‖T H2 − T H1‖I ≤mγw

λw

(cξξ + 2mρL cxz

)‖H2 − H1 ‖I

it is seen that for m < 1, the requirement for contraction here is weaker than thatfor S .

� This bound should be compared with the approximate expression for H , suggesting that the bound couldactually be as small as cxz c3 nz.

8.3 Comparison with scalar perturbation 239

For completeness, ρL is expressed using αL, yielding the following expression for thebound on m (again 0.99 < 1 is an arbitrary choice just to ensure strict contraction)

m ≤ 0.991 + αL

λw

γw cξξ

√√ c2

ξξ

4 c4 cxz

2

+c2ξξ

2 c4 cxz( 1 + αL ) −

c2ξξ

4 c4 cxz

=

0.99λw

γw cξξ

√√ c2

ξξ

4 c4 cxz

2

+c2ξξ

2 c4 cxz−

c2ξξ

4 c4 cxz

︸︷︷︸≤1

+O(αL)(8.34)

This concludes the contraction mapping argument for H . That is, by taking m lessthan all of the finitely many positive bounds imposed on it, we obtain ‖H‖I ≤ ρH.The corresponding decoupling transformation will be applied in section 8.4.

8.3 Comparison with scalar perturbation

Since the preceding section is similar to the treatment for a scalar perturbation εfound in Kokotović et al. (1986), we would like to highlight some of the differences.

• The matrix-valued uncertainty E(t) does not commute with other matrices inthe way ε would.

• In the change of variables (8.17), the boundm on the perturbation is used ratherthan the perturbation E(t) itself.

• In the contraction operators, the transition matrix φ is associated with an ap-proximation of what will turn out as the η subsystem, rather than with the( E, A22 ) system. See section 8.A for a discussion.

• In the equation for H , there is now a term mH E−1A22 where there used to bejust H A22.

• The “nominal” (or “reduced”) solution for H , that is H0, is no longer the knownentity A12 A

−122 , but instead A12 A

−122 m

−1E. However, since all that can be saidabout the solution η is that it will be vanishing with m, not knowing H0 is nolimitation.

8.4 The decoupled system

Starting from the ltv dae

E(t) x′ + A(t) x(t) != 0 x(0) = x0 (8.35)

time-varying row and column reductions are applied in the time-varying analog ofthe transforms for lti dae in section 6.1. The resulting system is (8.13), for whichsection 8.2 shows the existence and approximation properties of the two decoupling


matrices L and H . The two changes of variables given by L and H can be writtencompactly as (

ξη

)=

(I −mH0 I

) (xη

)=

(I −mH0 I

) (I 0−L I

) (xz

)=

(I + mH L −mH−L I

)︸︷︷︸

T −1

(xz

)(8.36)

with inverse given by

T =(I mHL mLH + I

)(8.37)

From the two factors making up T −1 it is readily seen that det T −1 = 1, and withL and H bounded, any bound on m gives that T −1 is bounded, and hence that T −1

defines a Lyapunov transformation (recall definition 2.30).

Applying the transformation yields the system (8.18) which we repeat here

ξ ′ = − (A11 + A12 L ) ξ

= −(A11 − A12 A

−122A21 + mA12 RL

)ξ

(8.38ξ)

E η′ + (A22 − E LA12 ) η != 0 (8.38η)

This system has two isolated parts, where the (8.38ξ) is a regularly perturbed prob-lem which may be addressed with any available method (see section 2.4). To be ableto establish a rate of convergence in this section, we demand one more condition onthe system (which will have to be verified in applications). Let x denote the nominalsolution for ξ (and x as it turns out), that is, the solution to

x′ =(A11 − A12 A

−122A21

)x (8.39)

P5–[8.16] Property. The system (8.39) is uniformly exponentially stable.

By theorem 2.42, P5–[8.16] lets us conclude that (8.38ξ) is uniformly[γξ e−λξ•

]-

stable for some constants γξ, λξ > 0, independently of RL. Then, Rugh (1996,theorem 12.2) provides that (8.38ξ) is uniformly bounded-input, bounded-state sta-ble, which means that the reformulated system (2.66) can be used to conclude thatsupt |ξ(t) − x(t)| = O(m ).

The system (8.38η) is the kind of system considered in section 8.1, and since it is atrue subsystem of the real system, our assumptions apply and lemma 8.13 providesparameters of uniform exponential stability for this system.

Since T −1 in (8.36) is bounded, it follows that bounded initial conditions for x(0)and z(0) implies bounded initial conditions for η(0). Multiplying by the exponentialconvergence parameter γw, one obtains a bound on

∣∣∣η(t)∣∣∣, valid for all t ≥ 0. Looking

at (8.17), it is seen that the boundedness of H then implies that x converges to ξuniformly as m→ 0.

8.4 The decoupled system 241

In order to also obtain convergence for z to z = −A−122A21 x, it is necessary to show that

η is not only bounded, but converges uniformly to 0 as m → 0 (compare with (8.14)and recall that L = −A−1

22A12 + O(m )). This can only follow if the initial conditionsfor η(0) converges to 0 with m (and then the uniform convergence follows). Theconvergence was the subject of lemma 6.4 as well as lemma 7.4. By identifying (8.13)with (7.8), lemma 7.4 applies in the index 1 case as well, and the time-variabilityhere does not matter for initial conditions. Hence, there is no need to derive theconvergence again, and we just remind that the choice z0 = −A22(0)−1A21(0) x0 is theonly fixed choice for z0 that can be used for arbitrarily small m.

The section is concluded with a theorem summarizing the convergence result for ltvdae of nominal index 1.

8.17 Theorem. Consider the nominal index 1 ltv dae (8.35) repeated here,

E(t) x′ + A(t) x(t) != 0 x(0) = x0 (8.35)

and the corresponding partitioned equations (8.13) repeated here,

x′(t) + A11(t) x(t) + A12(t) z(t) != 0 x(0) = x0

E(t) z′(t) + A21(t) x(t) + A22(t) z(t) != 0 z(0) = z0

over the time interval I = [ 0, tf ). Let the nominal equation refer to the same equa-tion, but with E(t) replaced by 0.

Let x0, z0 satisfy the nominal equation, and let x denote the solution to (8.39) (thatis, the nominal differential equation for x). Let max(E(t)) ≤ m for all t, and make thepointwise in time assumption A1–[8.1] regarding the eigenvalues of the approxima-tion (8.24) of the fast and uncertain subsystem, as well as the either of the assump-tions A2–[8.9] or A3–[8.11] regarding the time variability of E(t). Assume that theproperties P1–[8.2], P3–[8.10]–P5–[8.16] are also satisfied.

Then there exists constants k and m0 > 0 such that m ≤ m0 implies

supt∈I|x(t) − x(t)| ≤ k m

supt∈I

∣∣∣z(t) + A22(t)−1A21(t) x∣∣∣ ≤ k m

and the solution to (8.35) converges at the same rate.

Proof: This is a summary of results obtained in the present chapter.

Since the convergence of the full system is dependent of the convergence in the sub-system (8.38ξ), the requirement of P5–[8.16] can be replaced by other conditionswhich enables convergence in (8.38ξ) to be established, and if the established con-vergence is not O(m ) uniformly in time, this convergence rate will replace the ratesin theorem 8.17. On the other hand, in a particular problem when the uncertaintiesare given, the rate of convergence is not of importance since it is only the fixed boundon supt |ξ(t) − x(t)| that matters.


8.5 Conclusions

The solutions of the two timescale system (8.13) with a matrix-valued singular per-turbation have been shown to converge as the bound on the perturbation tends tozero. Aside from properties that can be verified in applications, the eigenvalue as-sumption used for lti systems in chapter 6 is assumed to hold pointwise in timehere, and is formulated with respect to an approximation of the fast and uncertainsubsystem rather than the true system. Further, compared to chapter 6, the time-variability of the system has led to the use of an additional assumption bounding thetime derivative of the perturbation.

Regarding directions for future research, there are some results in chapter 7 whichmight be possible to generalize to ltv systems, and it remains to relax the assump-tion that E(t) be pointwise non-singular, so that the theory covers not only nominalindex 1, but also true index 1 systems.� However, while general convergence resultsfor nominal index 2 lti systems still remain to be derived, extending the results inthis chapter to higher nominal indices should wait. In the mean time, there is alsoplenty of work to be done on the numeric implementation.

� In doing so, the SVD decomposition of Steinbrecher (2006, theorem 2.4.1) is expected to be a key tool.

Appendix

8.A Dynamics of related systems

This section contains some remarks concerning the choice of system to which thetransition matrix φ in section 8.2 belongs. Recall that we would ideally formulatethe eigenvalue assumptions for the matrix pair of (8.38η), but we were led to usethe matrix pair of (8.24) instead since (8.38η) was not available at the time when theassumptions were needed. In retrospect, we would like to indicate how assumptionsabout (8.38η) can justify the use of (8.24).

In the original references on this method, Chang (1969), the transition matrix φ isassociated with the fast (and uncertain) subsystem (8.38η), while Kokotović et al.(1986) associates it with the system

E w′0 + A22 w0!= 0 (8.40)

(but with E = ε I , of course). Our choice (8.24), repeated here,

E w′ + (A22 − E L0 A12 )w != 0 (8.24)

is a third option, and it was explained in section 8.2.2 why this was preferred.

With the (8.38η) subsystem, repeated here,

E η′ + (A22 − E LA12 ) η != 0 (8.38η)

isolated by the choice of L given in section 8.2.2 we now consider associating φ withthis system instead, just like in Chang (1969), so that the eigenvalue assumptionsreally concern instantaneous poles for the slowly varying fast subsystem.

First of all, once a crude estimate of L is available, the (8.38η) subsystem becomesisolated, and it would be possible to start over from the beginning of section 8.2.2with φ associated with (8.38η) instead of (8.24). Doing so would not justify the as-sumptions about (8.24), but there may be other ways of obtaining the initial crudeestimate.

243


We require that, for m small enough, the decoupling matrix satisfies a constantbound,

‖L‖I ≤ l (8.41)

(Note that we do not assume convergence as m → 0.) Then there is also a numberρ ≤ l + ‖L0‖I such that

‖mRL‖I ≤ ρ (8.42)

(It is seen that (8.42) also implies (8.41), so either one may be used as starting point.)

As on page 235, we must check P1–[8.2], P2–[8.3], and P3–[8.10] for the trailingmatrix in (8.38η) in place of the A of section 8.1. P1–[8.2] and P2–[8.3] are readilychecked using (8.41). To check P3–[8.10], the product E L is treated as one unit in(8.38η), and in

dE(t) L(t)dt

(t) = E′(t) L(t) + E(t) L′(t)

it is seen that the time derivative is bounded by inserting (8.16) for E L′ and usingA3–[8.11] to bound E′ L.

Making assumption A1–[8.1] for (8.38η), corollary 8.14 provides that the sys-

tem (8.38η) is uniformly[γη e

−ληm •

]-stable for some constants γη, λη > 0. The

following lemma then shows that we can obtain the same qualitative exponentialconvergence for the approximation (8.24).

8.18 Lemma. The system (8.24) is uniformly[γw e−

λwm •

]-stable for some constants

γw, λw > 0.

Proof: Rewrite (8.24) as a perturbed system,

w′!= −

(E−1A22 − L0 A12

)w = −

[ (E−1A22 − LA12

)+ mRL A12

]w

and time-scale by means of w(t) 4= w(m t). This yields

w′ = −[m

(E−1A22 − LA12

)+ m (mRL )A12

]w

Time-scaling the corresponding “nominal” system

u′ = −m(E−1A22 − LA12

)u

by means of u(t) 4= u(m t) yields

u′ = −(E−1A22 − LA12

)u

which is recognized as the η subsystem (8.38η), with known uniform exponentialconvergence parameters.

Due to P4–[8.15], (8.41), and corollary 2.47 it can be seen that(E−1A22 − LA12

)−1= (A22 − E LA12 )−1 E

8.A Dynamics of related systems 245

can be bounded by some constant αu times m, for m sufficiently small. This, togetherwith A1–[8.1], implies thatm

∥∥∥E−1A22 − LA12

∥∥∥I≤ αu for some constant αu according

to theorem 6.11 (compare (6.28)).

Since the η subsystem is uniformly[γη e

−ληm •

]-stable, the u system is uniformly[

γη e−λη•

]-stable. Since the state feedback matrix of the same system has a norm

bound of αu and we have ‖mRL‖I ≤ ρ, theorem 2.42 provides a bound on m whichmakes the w system uniformly

[γw e−λw•

]-stable for some positive constants γw, λw.

It then follows that the system (8.24) is uniformly[γw e−

λwm •

]-stable.

The lemma shows that making assumptions about the eigenvalues of (8.38η) andusing the bound (8.41) (which may either be derived or postulated, and does notrequire L to converge asm→ 0), the necessary convergence property of the transitionmatrix φ used in section 8.2.2 follows.

We end the section with a corollary which provides a possible substitute forlemma 8.6, applicable in the context of the coupled system instead of the slowlyvarying system in section 8.1.

8.19 Corollary. Taking m sufficiently small will provide a bound on∥∥∥mE−1A22

∥∥∥I.

Proof: Using the αu from the proof of lemma 8.18 we obtain∥∥∥mE−1A22

∥∥∥I

= m∥∥∥E−1A22 − LA12 + LA12

∥∥∥I≤ αu + m l ‖A12‖I

9Concluding remarks

Looking back on the previous chapters in the thesis, we let the self-contained chap-ter 4 on filtering and chapter 5 on the new index speak for themselves. Here, we wrapup our findings concerning matrix-valued singular perturbation problems related touncertain dae, because this is where the emphasis has been in the thesis.

The matrix-valued singular perturbation problems were introduced in section 1.2,and chapter 3 detailed an application of future nonlinear results. The results in thethesis have been limited to autonomous linear dae. Using assumptions regarding thesystem poles, convergence of solutions has been established in the nominal index 1case, for both lti and ltv dae. For lti dae of nominal index 2 we have not (exceptfor a very small example) been able to establish convergence of solutions, but severalresults that are expected to be useful in the future have been derived. These resultsinclude a Weierstrass-like canonical form for uncertain matrix pairs of nominal in-dex 2. Most results assume that the pointwise index of the uncertain dae is 0, but forlti dae of nominal index 1 results were partly extended to pointwise index 1.

Some directions for future research have been mentioned in earlier chapters, but thefollowing short list contains some which we think are particularly interesting.

• The canonical form for lti dae of nominal index 2 should be extended to higherindices, and a reliable numeric implementation should be developed.

• The results for ltv systems of nominal index 1 should be extended from point-wise index 0 to pointwise index 1, as was done in the lti case.

• Other function measures or stochastic formulations of the problems may bothbe relevant in applications and result in better error estimates. To consideralternative function measures appears to be a good option also for systems withinputs.

247

ASampling perturbations

Being unable to compute tight bounds in the analysis of matrix-valued singularlyperturbed systems is very related to the inability to construct worst-case perturba-tions. To illustrate our results, we are left with the option to sample randomly fromthe set of perturbations that agree with our assumptions, and observe how the corre-sponding solution set changes as a function of the parameters in our assumptions. Inthis chapter, we detail how the random samples were generated, so that our examplescan be reproduced and readers can try their own examples.

Since our aim in the examples is to illustrate convergence of the solutions as thebound on the size of the perturbation tends to zero, it is desirable that the perturba-tions are such that there is not much slack in this constraint.

A.1 Time-invariant perturbations

Our sampling strategy for time-invariant perturbations is trivial in the index 0 case,details follow. Given m > 0 and the parameters a, R0, and φ0 in

max(E) ≤ m∀λ : |λ| m < a and |λ| > R0 =⇒ |arg(−λ)| < φ0

we sample each entry of the matrix E from a uniform distribution over the interval

[−m, m ], and then the whole matrix is scaled to satisfy max(E) != m. We then com-pute the eigenvalues (possibly taking also the slow dynamics into account), and rejectany samples that do not satisfy the eigenvalue constraints.

Clearly, one must not select a too small, or an infinite loop of rejections will occur.

249

250 A Sampling perturbations

−0.01

−0.005

0

0.005

0.01

0 1 2 3 t

Figure A.1: The trajectories of the entries of the perturbation E produced inexample A.1.

A.2 Time-varying perturbations

In the time-varying case, it is not only desirable to have little slack in the constraintmax(E) ≤ m, but also that there is small slack in the time variability constraint.This makes sampling time-varying perturbations considerably harder than the time-invariant case.

The word sample has two meanings in the current section�and to remove some of theconfusion we shall refer to the time-varying samples of E as realizations.

To obtain a computable test, the continuous-time eigenvalue constraint is relaxed byonly requiring it to hold at a limited number of sampling instants. If realizationwould produce unexpected results in examples, it is important to remember thisrelaxation and check the conditions more carefully before drawing any conclusions.

Our algorithm, presented in algorithm A.1 (page 251) and algorithm A.2, constructstrajectories for E (that is, realizations) by a sequence of steps. It works with timesamples of the realization, and uses entry-wise linear interpolation between sam-pling instants. The algorithm is initialized with the trivial trajectory given by E(t) =

mmax(M22(t)) M22(t) (which is assumed to fulfill the time-variability constraint), andmaintains feasibility of the trajectory during the process of repeated perturbation ofindividual samples along the trajectory.

To simplify notation, we give the algorithm for the case when there is no slow dynam-ics, but the extension to also include the slow dynamics is straightforward. Followingthe algorithms, chapter ends with an example.

� Note that sample is used with two meanings here. In the time-invariant setting, it was natural to thinkof E as a sample from a random variable. Extending this use of sample to the time-varying setting, weuse it to refer to E as a realization of some stochastic process, and to produce such realizations is whatthe chapter is all about. The other meaning of sample is to evaluate functions of time at certain samplinginstants in time.

A.2 Time-varying perturbations 251

Algorithm A.1 Sampling perturbation trajectories for ltv systems.

Input:

• An interval I of time for which the realization is to be computed.

• The trailing matrix (as a matrix-valued function of time) M22.

• The bound m and bound on the time-variability. The following two types totime-variability constraints will be considered below:

∀t :∥∥∥∥∥ d

dtm E(t)−1A(t)

∥∥∥∥∥2≤ β1 (A.1a)

∀t : max(E′(t)) ≤ mβ3 (A.1b)

Output: A continuous, piecewise linear trajectory E, which satisfies max(E) ≤ m andthe time-variability constraint (at all times for (A.1b), or at a finite set of samplinginstants for (A.1a)), and satisfies the sampling-relaxation of the eigenvalue constraint.

Initialization:

Select an initial number of sampling instants (not counting the initial time), anddistribute the sapling instants evenly over I .

Initialize the trajectory as a linear interpolation of the function t 7→m

max(M22(t)) M22(t) sampled at the sampling instants.

Main loop:

repeat 2 or 3 timesIncrease the number of sampling instants by a positive integer factor.Distribute the sampling instants evenly over I (denote the interval t∆), and

sample the current trajectory accordingly. This results in a sequence of ma-trices { ( ti , Ei ) }i .

Perturb the sequence by performing a fixed number of minor iterations, seealgorithm A.2.

Reconstruct the continuous trajectory by linear interpolation.end

Remarks: By keeping low the initial number of sampling instants, large but slowvariations are obtained at a moderate computational cost. However, the time-variability is constrained by the sampling interval, which is why the number of sam-pling instants is increased in each iteration. It should be validated that the final num-ber of sampling instants is sufficiently large to allow the time variability constraintto be activated.

By multiplying the number of sampling instants by an integer in each major iter-ation, and distributing the sampling instants evenly over I , it is ensured that theup-sampled trajectory is identical to the trajectory at the end of the previous majoriteration. This is important for maintaining feasibility of the trajectory.


Algorithm A.2 Details of algorithm A.1 — minor iterations.

Input/initialization: This algorithm is just a step in algorithm A.1.

Minor iteration: The number of minor iterations during one major iteration is typ-ically in the sane order of magnitude as the number of sampling instants, but mayalso be bounded by computational time considerations. In each minor iteration, thefollowing steps are taken to perturb the sequence of matrix samples.

Select which matrix sample to perturb at random (denoting the corresponding in-dex i) and note the neighboring matrices (at the end points, there is just oneneighbor).

if using (A.1b)Find the intervals of radius β3 t∆ centered at each entry in the neighboring

matrices, intersect the intervals entry-wise, and intersect also with [−m, m ].Using M22(ti), draw a random sample from the derived intervals in the same

manner as for time-invariant systems. That is, draw random samples uni-formly from each interval until the eigenvalue conditions are satisfied.

optional: Scale the obtained matrix so that it has at least one entry at theboundary of the corresponding interval.

else (That is, in case of (A.1a).)Using M22(ti), draw a random sample matrix in exactly the same manner as for

time-invariant systems. Denote the resulting matrix E∗.Starting with 1, try successively smaller values of the scalar parameter a inEi + a ( E∗ − Ei ) until the linear interpolation between the neighbors and thenew matrix satisfies the time variability constraint at a set of intermediatepoints. (Note that maintaining feasibility of the trajectory implies that a = 0is a feasible — though pointless — choice.)

endFor each neighbor, indexed by j, select a small number of time instants evenly

distributed between ti and tj (excluding those of ti and tj that the eigenvalueconditions have already been checked at), and use linear interpolation betweenthe neighbor and the new matrix to check the eigenvalue condition at the inter-mediate time instants.

if any eigenvalue condition failsDiscard the new matrix (and do not update the sequence of matrix samples).

elseReplace Ei by the new matrix.

else

A.2 Time-varying perturbations 253

A.1 ExampleAs an illustration of the sampling algorithm for time-varying perturbations, we con-sider the trailing matrix given by

A( t ) 4=[

0.94 0.94 − 0.05 log(t + 1)−0.46 0.23

]over the time interval I = [ 0, 3.5 ].

The constraints on the time-varying perturbation are given by the following param-eters:

m = 0.01 β1 = 10.0 a = 10.0 φ0 = 1.4 R0 = 5.0

The initial trajectory was first sampled with a sampling interval of 0.3, and 30 minoriterations were performed. Then the trajectory was sampled with a sampling intervalof 0.03, and 300 minor iterations were performed. The four entries of the final E areshown in figure A.1. The eigenvalue assumption is verified in figure A.2, and thetime variability assumption is verified in figure A.3. The figures show that while theconstraints given by m and β1 are satisfied with little or even negative� slack, theother constraints have large slacks. Since the eigenvalues grow as m → 0, the slackin the lower bound given by R0 can always be made large by selecting m small, andhence the large slack in this constraint does not have to be considered a deficiency ofthe sampling algorithm.

Based on corresponding eigenvalue plots for 30 realizations of E, it was seen that theconstraint given by φ0 also can obtain small slack at some point, even though thecurrent algorithm has no component to increase the chance that this will happen.Regarding the upper bound on the eigenvalues given by m−1a, the large slack seenin figure A.2 appears typical for the proposed algorithm; in all 30 realizations, theeigenvalue moduli were less then 400 at all times. In view of example 6.13 and con-sidering that the time-variability constraint is locally independent of the pointwise-in-time constraints, this is expected to be a deficiency of the algorithm, and not dueto the nature of the problem.

It would be a possible future extension of the perturbation sampling algorithm toadd components which try to minimize the slack in the constraints given by φ0 anda.

� The algorithm may violate constraints since it only checks validity at a finite number of points. In thecurrent example, careful inspection of the plots shows that the violation occurs for a part of the trajec-tory which was generated in the first iteration of algorithm A.1. The current implementation checks theconstraints at two intermediate points between the sampling instants, meaning that the constraints arechecked at points 0.1 apart in the first iteration. In the next iteration where the time intervals are tentimes shorter, the violation gets detected, and attempts to modify this part of the trajectory are very likelyto be rejected.


−300 −200

−100

100

Re

Im

Figure A.2: Verification of the eigenvalue assumptions in example A.1. Theeigenvalues of ( E(t), A(t) ) have been sampled with a time interval of 0.01. Thedashed rays show the angle constraint given by φ0. The arc near the originis the lower bound on the eigenvalues given by R0, while the upper boundm−1a = 1000 is outside the figure. All constraints are seen to be satisfied.

0

2.5

5

7.5

0 1 2 3

β1

t

m∥ ∥ ∥d dtmE(t)−1A(t)∥ ∥ ∥ 2

Figure A.3: Verification of the time-variability assumption in example A.1. Timeinstants with sampling interval 0.01 are marked with dots. The horizontal marksare used to label the points where the assumption is checked during the firstiteration of algorithm A.1. The dashed line shows the assumed bound, which isonly violated between the points checked during the first iteration.

Bibliography

Eyad H. Abed. Multiparameter singular perturbation problems: Iterative expansionsand asymptotic stability. Systems & Control Letters, 5(4):279–282, February 1985a.Cited on page 71.

Eyad H. Abed. A new parameter estimate in singular perturbations. Systems &Control Letters, 6(3):153–222, August 1985b. Cited on page 69.

Eyad H. Abed. Decomposition and stability of multiparameter singular perturbationproblems. IEEE Transactions on Automatic Control, AC-31(10):925–934, October1986. Cited on page 71.

Eyad H. Abed and André L. Tits. On the stability of multiple time-scale systems.International Journal of Control, 44(1):211–218, 1986. Cited on pages 71 and 174.

Jeffrey M. Augenbaum and Charles S. Peskin. On the construction of the voronoidiagram on a sphere. Journal of Computational Physics, 59(2):177–192, June 1985.Cited on page 115.

Erwin H. Bareiss. Sylvester’s identity and multistep integer-preserving Gaussianelimination. Mathematics of Computation, 22(103):565–578, July 1968. Cited onpage 86.

William H. Beger, editor. CRC Handbook of mathematical sciences. CRC Press, Inc.,5th edition, 1978. Cited on page 121.

Niclas Bergman. Recursive Bayesian estimation — Navigation and tracking applica-tions. PhD thesis, Linköping University, May 1999. Cited on page 113.

Stephen Boyd, Laurent El Ghaoui, Eric Feron, and Venkataramanan Balakrishnan.Linear matrix inequalities in system and control theory. SIAM Studies in AppliedMathematics, 1994. Cited on pages 58, 61, and 62.

P. C. Breedveid. Proposition for an unambiguous vector bond graph notation. Jour-nal of Dynamic Systems, Measurement, and Control, 104(3):267–270, September1982. Cited on page 24.

255

256 Bibliography

Kathryn Eleda Brenan, Stephen L. Campbell, and Linda Ruth Petzold. Numericalsolution of initial-value problems in differential-algebraic equations. SIAM, 1996.Classics edition. Cited on pages 27, 47, 48, 49, 51, and 100.

Peter N. Brown, Alan C. Hindmarsh, and Linda Ruth Petzold. Using Krylov meth-ods in the solution of large-scale differential-algebraic systems. SIAM Journal onScientific Computation, 15(6):1467–1488, 1994. Cited on page 51.

Anders Brun, Carl-Fredrik Westin, Magnus Herberthson, and Hans Knutsson. In-trinsic and extrinsic means on the circle — a maximum likelihood interpretation.In IEEE Conference on Acoustics, Speech, and Signal Processing, volume 3, pages1053–1056, Honolulu, HI, USA, April 2007. Cited on pages 110 and 119.

Dag Brück, Hilding Elmqvist, Hans Olsson, and Sven Erik Mattsson. Dymola formulti-engineering modeling and simulation. 2nd International Modelica Confer-ence, Proceedings, pages 55–1–55–8, March 2002. Cited on page 34.

R. S. Bucy and K. D. Senne. Digital synthesis of nonlinear filters. Automatica, 7:287–298, 1971. Cited on page 114.

Benno Büeler, Andreas Enge, and Komei Fukuda. Polytopes: Combinatorics andcomputation, pages 131–154. Number 29 in DMV Seminar. Birkhäuser, 2000.Chapter title: Exact volume computation for polytopes: A practical study. Citedon page 122.

Stephen L. Campbell. Least squares completions for nonlinear differential algebraicequations. Numerische Mathematik, 65(1):77–94, December 1993. Cited on page130.

Stephen L. Campbell and C. William Gear. The index of general nonlinear daes.Numerische Mathematik, 72:173–196, 1995. Cited on pages 29, 32, 33, and 106.

Lamberto Cesari. Asymptotic behavior and stability problems in ordinary differentialequations. Springer-Verlag, third edition, 1971. Cited on page 66.

Kok Wah Chang. Remarks on a certain hypothesis in singular perturbations. Pro-ceedings of the American Mathematical Society, 23(1):41–45, October 1969. Citedon pages 69, 214, 227, 236, and 243.

Kok Wah Chang. Singular perturbations of a general boundary value problem. SIAMJournal on Mathematical Analysis, 3(3):520–526, August 1972. Cited on pages 69,211, and 227.

Alessandro Chiuso and Stefano Soatto. Monte Carlo filtering on Lie groups. In Pro-ceedings of the 39th IEEE Conference on Decision and Control, pages 304–309,Sydney, Australia, December 2000. Cited on page 112.

Daniel Choukroun, Itzhack Y. Bar-Itzhack, and Yaakov Oshman. Novel quaternionKalman filter. IEEE Transactions on Areospace and Electronic Systems, 42(1):174–190, January 2006. Cited on page 112.

Bibliography 257

Timothy Y. Chow. The surprise examination or unexpected hanging paradox. Amer-ican Mathematical Monthly, 105(1):41–51, January 1998. Cited on page 47.

Shantanu Chowdhry, Helmut Krendl, and Andreas A. Linninger. Symbolic numericindex analysis algorithm for differential algebraic equations. Industrial & Engi-neering Chemistry Research, 43(14):3886–3894, 2004. Cited on pages 47 and 91.

Earl A. Coddington and Norman Levinson. Theory of ordinary differential equations.Robert E. Krieger Publishing Company, Inc., third edition, 1985. Cited on pages66 and 143.

Cyril Coumarbatch and Zoran Gajic. Exact decomposition of the algebraic Riccatiequation of deterministic multimodeling optimal control problems. IEEE Transac-tions on Automatic Control, 45(4):790–794, April 2000. Cited on page 71.

John L. Crassidis, F. Landis Markley, and Yang Cheng. Survey of nonlinear attitudeestimation methods. Journal of Guidance, Control, and Dynamics, 30(1):12–28,January 2007. Cited on pages 110 and 113.

Fred Daum. Nonlinear filters: Beyond the kalman filter. IEEE Aerospace and Elec-tronic Systems Magazine, 20(8:2):57–69, 2005. Cited on page 112.

Alekseı Fedorovic Filippov. Differential equations with discontinuous righthandsides. Mathematics and its applications. Kluwer Academic Publishers, 1985. Citedon pages 61 and 62.

Theodore Frankel, editor. The geometry of physics — an introduction. CambridgeUniversity Press, 2nd edition, 2004. Cited on page 111.

Peter Fritzson, Peter Aronsson, Adrian Pop, David Akhvlediani, Bernhard Bachmann,David Broman, Anders Fernström, Daniel Hedberg, Elmin Jagudin, Håkan Lund-vall, Kaj Nyström, Andreas Remar, and Anders Sandholm. OpenModelica systemdocumentation — preliminary draft, 2006-12-14, for OpenModelica 1.4.3 beta.Technical report, Programming Environment Laboratory — PELAB, Departmentof Computer and Information Science, Linköping University, Sweden, 2006a. Citedon page 34.

Peter Fritzson, Peter Aronsson, Adrian Pop, David Akhvlediani, Bernhard Bachmann,David Broman, Anders Fernström, Daniel Hedberg, Elmin Jagudin, Håkan Lund-vall, Kaj Nyström, Andreas Remar, and Anders Sandholm. OpenModelica usersguide — preliminary draft, 2006-09-28, for OpenModelica 1.4.2. Technical report,Programming Environment Laboratory — PELAB, Department of Computer andInformation Science, Linköping University, Sweden, 2006b. Cited on page 34.

Komei Fukuda. cddlib reference manual, version 0.94. Institute for OperationsResearch and Institute of Theoretical Computer Science, ETH Zentrum, CH-8092 Zurich, Switzerland, 2008. URL http://www.ifor.math.ethz.ch/~fukuda/cdd_home/cdd.html. Cited on page 122.

Markus Gerdin. Identification and estimation for models described by differential-algebraic equations. PhD thesis, Linköping University, 2006. Cited on page 23.

http://www.ifor.math.ethz.ch/~fukuda/cdd_home/cdd.html

http://www.ifor.math.ethz.ch/~fukuda/cdd_home/cdd.html

258 Bibliography

Developers of GMP. The GNU multiple precision arithmetic library, version 4.3.1.Free Software Foundation, 2009. URL http://gmplib.org/. Cited on page 7.

Sergeı Konstantinovich Godunov. Ordinary differential equations with constant co-efficient, volume 169 of Translations of mathematical monographies. AmericanMathematical Society, 1997. Cited on page 55.

Gene H. Golub and Charles F. Van Loan. Matrix computations. The Johns HopkinsUniversity Press, third edition, 1996. Cited on pages 22 and 100.

P. Gurfil and M. Jodorkovsky. Unified initial condition response analysis of Lur’e sys-tems and linear time-invariant systems. International Journal of Systems Science,34(1):49–62, 2003. Cited on page 53.

Ernst Hairer and Gerhard Wanner. Solving ordinary differential equations II — Stiffand differential-algebraic problems, volume 14. Springer-Verlag, 1991. Cited onpage 51.

Ernst Hairer, Christian Lubich, and Michel Roche. The numerical solution ofdifferential-algebraic systems by Runge-Kutta methods. Lecture Notes in Math-ematics, 1409, 1989. Cited on page 33.

Michiel Hazewinkel, editor. Encyclopedia of mathematics, volume 8. Kluwer Aca-demic Publishers, 1992. URL http://eom.springer.de/. Cited on page 126.

Nicholas J. Higham. A survey of condition number estimation for triangular matrices.SIAM Review, 29(4):575–596, December 1987. Cited on page 82.

Nicholas J. Higham, D. Steven Mackey, and Françoise Tisseur. The conditioning oflinearizations of matrix polynomials. SIAM Journal on Matrix Analysis and Appli-cations, 28(4):1005–1028, 2006. Cited on pages 40 and 72.

Inmaculada Higueras and Roswitha März. Differential algebraic equations withproperly stated leading terms. Computers & Mathematics with Applications, 28(1–2):215–235, 2004. Cited on page 38.

Alan C. Hindmarsh, Radu Serban, and Aaron Collier. User documentation for IDAv2.4.0. Technical report, Center for Applied Schientific Computing, Lawrence Liv-ermore National Laboratory, 2004. Cited on pages 51 and 98.

Alan C. Hindmarsh, Peter N. Brown, Keith E. Grant, Steven L. Lee, Radu Serban,Dan E. Shumaker, and Carol S. Woodward. SUNDIALS: Suite of nonlinear and dif-ferential/algebraic equation solvers. ACM Transactions on Mathematical Software,31(3):363–396, 2005. Cited on page 51.

Lars Hörmander. An introduction to complex analysis in several variables. TheUniversity Series in Higher Mathematics. D. Van Nostrand, Princeton, New Jersey,1966. Cited on pages 143 and 155.

M. E. Ingrim and Masada G. Y. The extended bond graph notation. Journal of Dy-namic Systems, Measurement, and Control, 113(1):113–117, March 1991. Cited onpage 24.

http://gmplib.org/

http://eom.springer.de/

Bibliography 259

Luc Jaulin, Michel Kieffer, Olivier Didrit, and Walter Éric. Applied interval analysis.Springer-Verlag, London, 2001. Cited on pages 77 and 195.

Luc Jaulin, Isabelle Braems, and Eric Walter. Interval methods for nonlinear identifi-cation and robust control. In Proceedings of the 41st IEEE Conference on Decisionand Control, pages 4676–4681, Las Vegas, NV, USA, December 2002. Cited onpage 77.

Andrew H. Jazwinski. Stochastic processes and filtering theory, volume 64 of Math-ematics in science and engineering. Academic Press, New York and London, 1970.Cited on page 17.

Ulf T. Jönsson. On reachability analysis of uncertain hybrid systems. In Proceedingsof the 41st IEEE Conference on Decision and Control, pages 2397–2402, Las Vegas,NV, USA, December 2002. Cited on page 62.

Thomas Kailath. Linear Systems. Prentice-Hall, Inc., 1980. Cited on page 13.

M. Kathirkamanayagan and G. S. Ladde. Diagonalization and stability of large-scalesingularly perturbed linear system. Journal of Mathematical Analysis and Appli-cations, 135(1):38–60, October 1988. Cited on page 71.

R. Baker Kearfott. Interval computations: Introduction, uses, and resources. Euro-math Bulletin, 1(2):95–112, 1996. Cited on page 77.

Hassan K. Khalil. Asymptotic stability of nonlinear multiparameter singularly per-turbed systems. Automatica, 17(6):797–804, November 1981. Cited on page 70.

Hassan K. Khalil. Time scale decomposition of linear implicit singularly perturbedsystems. IEEE Transactions on Automatic Control, AC-29(11):1054–1056, Novem-ber 1984. Cited on page 69.

Hassan K. Khalil. Stability of nonlinear multiparameter singularly perturbed sys-tems. IEEE Transactions on Automatic Control, AC-32(3):260–263, March 1987.Cited on page 71.

Hassan K. Khalil. Nonlinear systems. Prentice Hall, Inc., third edition, 2002. Citedon pages 52, 61, and 66.

Hassan K. Khalil and Peter V. Kokotović. D-stability and multi-parameter singu-lar perturbation. SIAM Journal on Control and Optimization, 17(1):56–65, 1979.Cited on pages 70 and 71.

K. Khorasani and M. A. Pai. Asymptotic stability improvements of multiparameternonlinear singularly perturbed systems. IEEE Transactions on Automatic Control,AC-30(8):802–804, 1985. Cited on page 70.

Petar V. Kokotović. A Riccati equation for block-diagonalization of ill-conditionedsystems. IEEE Transactions on Automatic Control, 20(6):812–814, December 1975.Cited on page 211.

260 Bibliography

Petar V. Kokotović, Hassan K. Khalil, and John O’Reilly. Singular perturbation meth-ods in control: Analysis and applications. Academic Press Inc., 1986. Cited onpages 2, 57, 67, 68, 149, 168, 227, 228, 230, 233, 235, 236, 239, and 243.

Steven G. Krantz and R. Parks, Harold. A primer of real analytic functions. Boston.Birkhäuser, second edition, 2002. Cited on page 155.

Peter Kunkel and Volker Mehrmann. Canonical forms for linear differential-algebraicequations with variable coefficients. Journal of computational and applied mathe-matics, 56(3):225–251, 1994. Cited on page 34.

Peter Kunkel and Volker Mehrmann. A new class of discretization methods for thesolution of linear differential-algebraic equations with variable coefficients. SIAMJournal on Numerical Analysis, 33(5):2941–1961, October 1996. Cited on pages130 and 131.

Peter Kunkel and Volker Mehrmann. Regular solutions of nonlinear differential-algebraic equations. Numerische Mathematik, 79(4):581–600, June 1998. Cited onpage 131.

Peter Kunkel and Volker Mehrmann. Index reduction for differential-algebraic equa-tions by minimal extension. ZAMM — Journal of Applied Mathematics and Me-chanics, 84(9):579–597, 2004. Cited on page 34.

Peter Kunkel and Volker Mehrmann. Differential-algebraic equations — Analysisand numerical solution. European Mathematical Society, 2006. Cited on pages 34,40, 51, 72, 129, 132, 134, 138, 141, 142, and 147.

Peter Kunkel, Volker Mehrmann, Werner Rath, and Jörg Weickert. GELDA: A soft-ware package for the solution of general linear differential algebraic equations,1995. Cited on page 51.

Peter Kunkel, Volker Mehrmann, and Werner Rath. Analysis and numerical solu-tion of control problems in descriptor form. Mathematics of Control, Signals, andSystems, 14(1):29–61, 2001. Cited on page 34.

Alexander B. Kurzhanski and István Vályi. Ellipsoidal calculus for estimation andcontrol. Birkhäuser, 1997. Cited on page 62.

Junghyun Kwon, Minseok Choi, F. C. Park, and Changmook Chun. Particle filteringon the Euclidean group: Framework and applications. Robotica, 25:725–737, 2007.Cited on pages 110 and 112.

Wook Hyun Kwon, Young Soo Moon, and Sang Chul Ahn. Bounds in algebraic ricattiand lyapunov equations: A survey and some new results. International Journal ofControl, 64(3):377–389, June 1996. Cited on pages 55 and 58.

G. S. Ladde and S. G. Rajalakshmi. Diagonalization and stability of multi-time-scalesingularly perturbed linear systems. Applied Mathematics and Computation, 16(2):115–140, February 1985. Cited on page 71.

Bibliography 261

G. S. Ladde and S. G. Rajalakshmi. Singular perturbations of linear systems withmultiparameters and multiple time scales. Journal of Mathematical Analysis andApplications, 129(2):457–481, February 1988. Cited on page 71.

G. S. Ladde and O. Sirisaengtaksin. Large-scale stochastic singularly perturbed sys-tems. Mathematics and Computers in Simulation, 31(1–2):31–40, February 1989.Cited on page 69.

G. S. Ladde and D. D. Siljak. Multiparameter singular perturbations of linear systemswith multiple time scales. Automatica, 19(4):385–394, July 1989. Cited on page71.

Jehee Lee and Sung Yong Shin. General construction of time-domain filters for ori-entation data. IEEE Transactions on Visualization and Computer Graphics, 8(2):119–128, April 2002. Cited on page 110.

Taeyoung Lee, Melvin Leok, and Harris McClamroch. Global symplectic uncertaintypropagation on SO(3). In Proceedings of the 47th IEEE Conference on Decisionand Control, pages 61–66, Cancun, Mexico, December 2008. Cited on page 112.

Ben Leimkuhler, Linda Ruth Petzold, and C. William Gear. Approximation methodsfor the consistent initialization of differential-algebraic equations. SIAM Journalon Numerical Analysis, 28(1):204–226, February 1991. Cited on page 47.

Adrien Leitold and Katalin M. Hangos. Structural solvability analysis of dynamicprocess models. Computers and Chemical Engineering, 25(11–12):1633–1646,2001. Cited on page 47.

Frans Lemeire. Bounds for condition numbers of triangular and trapezoid matrices.BIT Numerical Mathematics, 15(1):58–64, March 1975. Cited on page 83.

C.-W. Li and Y.-K. Feng. Functional reproducibility of general multivariable analyticnonlinear systems. International Journal of Control, 45(1):255–268, 1987. Citedon page 38.

Lennart Ljung. System identification, Theory for the User. Prentice-Hall, Inc., 1999.Cited on page 18.

James Ting-Ho Lo and Linda R. Eshleman. Exponential fourier densities on SO(3)and optimal estimation and detection for rotational processes. SIAM Journal onApplied Mathematics, 36(1):73–82, February 1979. Cited on pages 110 and 113.

David G. Luenberger. Time-invariant descriptor systems. Automatica, 14(5):473–480, 1978. Cited on page 29.

Morris Marden. Geometry of polynomials. American Mathematical Society, secondedition, 1966. Cited on page 82.

R. M. M. Mattheij and P. M. E. J. Wijckmans. Sensitivity of solutions of linear daeto perturbations of the system matrices. Numerical Algorithms, 19(1–4):159–171,1998. Cited on page 72.

262 Bibliography

Sven Erik Mattsson and Gustaf Söderlind. Index reduction in differential-algebraicequations using dummy derivatives. SIAM Journal on Scientific Computation, 14(3):677–692, May 1993. Cited on pages 34 and 95.

Sven Erik Mattsson, Hans Olsson, and Hilding Elmqvist. Dynamic selection of statesin dymola. Modelica Workshop, pages 61–67, October 2000. Cited on page 34.

Volker Mehrmann and Chunchao Shi. Transformation of high order lineardifferential-algebraic systems to first order. Numerical Algorithms, 42(3–4):281–307, July 2006. Cited on page 36.

Roswitha März and Ricardo Riaza. Linear differential-algebraic equations with prop-erly stated leading term: Regular points. Journal of Mathematical Analysis andApplications, 323(2):1279–1299, December 2006. Cited on page 38.

Roswitha März and Ricardo Riaza. Linear differential-algebraic equations with prop-erly stated leading term: A-critical points. Mathematical and Computer Modellingof Dynamical Systems, 13(3):291–314, 2007. Cited on pages 38 and 72.

Roswitha März and Ricardo Riaza. Linear differential algebraic equations with prop-erly stated leading terms: B-critical points. Dynamical Systems: An InternationalJournal, 23(4):505–522, 2008. Cited on page 38.

Hyeon-Suk Na, Chung-Nim Lee, and Otfried Cheong. Voronoi diagrams on thesphere. Computational Geometry, 23:183–194, 2002. Cited on page 115.

D. Subbaram Naidu. Singular perturbations and time scales in control theory andapplications: An overview. Dynamics of Continuous, Discrete and Impulsive Sys-tems, 9(2):233–278, 2002. Cited on pages 67 and 232.

Arnold Neumaier. Overestimation in linear interval equations. SIAM Journal onNumerical Analysis, 24:207–214, 1987. Cited on page 78.

Arnold Neumaier. Interval methods for systems of equations. Cambridge UniversityPress, 1990. Cited on page 79.

Constantinos Pantelides. The consistent initialization of differential-algebraic sys-tems. SIAM Journal on Scientific and Statistical Computing, 9(2):213–231, March1988. Cited on pages 34 and 47.

Xavier Pennec. Intrinsic statistics on Riemannian manifolds: Basic tools for geometricmeasurements. Journal of Mathematical Imaging and Vision, 25(1):127–154, July2006. Cited on page 120.

Linda Ruth Petzold. Order results for Runge-Kutta methods applied to differen-tial/algebraic systems. SIAM Journal on Numerical Analysis, 23(4):837–852, 1986.Cited on page 50.

P. J. Rabier and W. C. Rheinboldt. A geometric treatment of implicit differential-algebraic equations. Journal of Differential Equations, 109(1):110–146, April 1994.Cited on page 32.

Bibliography 263

S. Reich. On an existence and uniqueness theory for non-linear differential-algebraicequations. Circuits, Systems, and Signal Processing, 10(3):344–359, 1991. Citedon page 32.

Gregory J. Reid, Ping Lin, and Allan D. Wittkopf. Differential elimination-completion algorithms for dae and pdae. Studies in Applied Mathematics, 106:1–45, 2001. Cited on page 32.

Gregory J. Reid, Chris Smith, and Jan Verschelde. Geometric completion of differen-tial systems using numeric-symbolc continuation. ACM SIGSAM Bulletin, 36(2):1–17, June 2002. Cited on page 91.

Gregory J. Reid, Jan Verschelde, Allan Wittkopf, and Wenyuan Wu. Symbolic-numeric completion of differential systems by homotopy continuation. Proceed-ings of the 2005 international symposium on symbolic and algebraic computation,pages 269–276, 2005. Cited on page 91.

Michel Roche. Implicit Runge-Kutta methods for differential algebraic equations.SIAM Journal on Numerical Analysis, 26(4):963–975, 1989. Cited on page 50.

Ronald C. Rosenberg and Dean C. Karnopp. Introduction to physical system dynam-ics. McGraw-Hill Book Company, 1983. Cited on page 24.

P. Rouchon, M. Fliess, and J. Lévine. Kronecker’s canonical forms for nonlinear im-plicit differential systems. In Proceedings of the 2nd IFAC Workshop on SystemsStructure and Control, pages 248–251, Prague, Czech Republic, September 1992.Cited on pages 29, 38, 39, 88, and 92.

Walter Rudin. Principles of mathematical analysis. McGraw-Hill, third edition, 1976.Cited on page 73.

Wilson J. Rugh. Linear system theory. Prentice-Hall, Inc., second edition, 1996. Citedon pages 52, 57, 61, 64, 65, 66, and 240.

A. Saberi and Hassan K. Khalil. Quadratic-type Lyapunov functions for singularlyperturbed systems. IEEE Transactions on Automatic Control, AC-29(6):542–550,June 1984. Cited on page 71.

Helmut Schaeben. “Normal” orientation distributions. Texture, Stress, and Micro-structure, 19(4):197–202, 1992. J. changed name in 2008 from Textu. M.-struct.Cited on page 122.

Eliezer Y. Shapiro. On the Lyapunov matrix equation. IEEE Transactions on Auto-matic Control, 19(5):594–596, October 1974. Cited on page 58.

Johan Sjöberg. Optimal control and model reduction of nonlinear dae models. PhDthesis, Linköping University, 2008. Cited on page 23.

Sigurd Skogestad and Ian Postlethwaite. Multivariable feedback control. John Wiley& Sons, 1996. Cited on page 21.

264 Bibliography

Anuj Srivastava and Eric Klassen. Monte Carlo extrinsic estimators of manifold-valued parameters. IEEE Transactions on Signal Processing, 50(2):299–308, Febru-ary 2002. Cited on pages 119 and 120.

Andreas Steinbrecher. Numerical solution of quasi-linear differential-algebraic equa-tions and industrial simulation of multibody systems. PhD thesis, TechnischenUniversität Berlin, 2006. Cited on pages 29, 86, 92, and 242.

G. W. Stewart and Ji-guang Sun. Matrix perturbation theory. Computer Science andScientific Computing. Academic Press, 1990. Cited on pages 40, 42, 43, and 200.

Torsten Ström. On logarithmic norms. SIAM Journal on Numerical Analysis, 12(5):741–753, 1975. Cited on pages 55 and 56.

Tatjana Stykel. Gramian based model reduction for descriptor systems. Mathematicsof Control, Signals, and Systems, 16(4):297–319, 2004. Cited on page 21.

N. Sukumar. Voronoi cell finite difference method for the diffusion operator on ar-bitrary unstructured grids. International Journal for Numerical Methods in Engi-neering, 57(1):1–34, May 2003. Cited on page 115.

Andrzej Szatkowski. Generalized dynamical systems: Differentiable dynamic com-plexes and differential dynamic systems. International Journal of Systems Science,21(8):1631–1657, August 1990. Cited on page 32.

Andrzej Szatkowski. Geometric characterization of singular differential algebraiceqautions. International Journal of Systems Science, 23(2):167–186, February1992. Cited on page 32.

G. Thomas. Symbolic computation of the index of quasilinear differential-algebraicequations. Proceedings of the 1996 international symposium on Symbolic andalgebraic computation, pages 196–203, 1996. Cited on pages 32 and 39.

Henrik Tidefelt. Structural algorithms and perturbations in differential-algebraicequations. Technical Report Licentiate thesis No 1318, Division of Automatic Con-trol, Linköping University, 2007. Cited on pages 9, 10, 85, 91, 149, and 158.

Henrik Tidefelt and Torkel Glad. Index reduction of index 1 dae under uncertainty.In Proceedings of the 17th IFAC World Congress, pages 5053–5058, Seoul, Korea,July 2008. Cited on pages 54, 149, 151, 152, and 157.

Henrik Tidefelt and Torkel Glad. On the well-posedness of numerical dae. InProceedings of the European Control Conference 2009, pages 826–831, Budapest,Hungary, August 2009. Cited on page 157.

Henrik Tidefelt and Thomas B. Schön. Robust point-mass filters on manifolds. InProceedings of the 15th IFAC Symposium on System Identification, pages 540–545, Saint-Malo, France, July 2009. Not cited.

David Törnqvist, Thomas B. Schön, Rickard Karlsson, and Fredrik Gustafsson. Par-ticle filter slam with high dimensional vehicle model. Journal of Intelligent andRobotic Systems, 55(4–5):249–266, August 2009. Cited on page 110.

Bibliography 265

J. Unger, A. Kröner, and W. Marquardt. Structural analysis of differential-algebraicequation systems — theory and applications. Computers and Chemical Engineer-ing, 19(8):867–882, 1995. Cited on pages 24, 47, and 91.

Charles F. Van Loan. The sensitivity of the matrix exponential. SIAM Journal onNumerical Analysis, 14(6):971–981, December 1977. Cited on pages 53, 54, 59,and 60.

R. C. Veiera and E. C. Biscaia Jr. An overview of initialization approaches fordifferential-algebraic equations. Latin American Applied Research, 30(4):303–313,2000. Cited on page 47.

R. C. Veiera and E. C. Biscaia Jr. Direct methods for consistent initialization of daesystems. Computers and Chemical Engineering, 25(9–10):1299–1311, September2001. Cited on page 47.

Krešimir Veselić. Bounds for exponentially stable semigroups. Linear Algebra andits Applications, 358:309–333, 2003. Cited on pages 53 and 56.

Josselin Visconti. Numerical solution of differential algebraic equations, global er-ror estimation and symbolic index reduction. PhD thesis, Institut d’Informatiqueet Mathématiques Appliquées de Grenoble, November 1999. Cited on pages 29and 86.

Wolfram Research, Inc. Mathematica. Wolfram Research, Inc., Champaign, Illinois,2008. Version 7.0.0. Cited on page 23.

Index

abstraction barrier, 110algebraic constraints, 46algebraic equation, 12algebraic term, 12autonomous, 12, 13, 118

backward difference formulas, see bdfmethod

balanced form, 21Bayes’ rule, 112

operator, 117bdf

abbreviation, xviimethod, 47, 134

bond graph, 24boundary layer, 68

Chapman-Kolmogorov equation, 112,117

component-based model, 24constraint propagation, 79contraction

mapping, 73, 182, 214, 218, 235, 238principle, 73

contravariant vector, 11, 111convolution, 118coordinate map, 121, 133covariant vector, 11

D-stability, 70dae, 22

abbreviation, xviiquasilinear, 23

daspk, 51

dassl, 51decoupling transform

lti index 1, 177lti index 2, 210ltv index 1, 232

departure from normality, 54derivative array, 31, 130

equations, 31, 131, 134, 138differential inclusion, 61differential-algebraic equation, see daedifferentiation index, 29, 31, 32, 35, 43,

130, 137drift, 17, 50dummy derivatives, 34

eigenvalues of matrix pair, 42elimination-differentiation, 29embedding, 110, 124, 127, 128

“natural”, 119, 121Euclidean space, 122

notation, xvexponential map, 111

notation, xviextrinsic mean, 120

Fokker-Planck equation, 112forcing function, 12, 13form

balanced, 21implicit ode, 23quasilinear, 23state space, 19

fraction-free, 80fraction-producing, 81

267

268 Index

fundamental matrix, 52synonym, see transition matrix

Gaussian distribution, 112notation, xvi

gelda, 51genda, 51geodesic, 111

Hankel norm, 21

ida, 51ill-posed

in quasilinear shuffle algorithm, 92in shuffle algorithm, 30in structure algorithm, 39initial value problem, 46uncertain dae, 151

implicit ode, 23implicit Runge-Kutta methods, see irkindex, 28

(unqualified), 35differentiation, see differentiation

indexnominal, see nominal indexperturbation, see perturbation indexpointwise, see pointwise, indexsimplified strangeness, see simpli-

fied strangeness indexstrangeness, see strangeness index

index reduction, 3, 28, 34, 51, 85, 109seminumerical, 99, 105

inevitable pendulum, 95initial value problem, 14

consistent initial conditions, 47ill-posed, see ill-posed initial value

probleminner approximation, 78input (to differential equation), 13input matrix, 13interval

matrix, 78real, 78vector, 78

intrinsic mean, 119irk, 50, 51

abbreviation, xvii

iteration matrix, 49

leading matrixof (quasi)linear dae, 12of matrix pair, 42

Lie group, 112, 113local coordinates, 110, 133longevity, 96lti

abbreviation, xviiautonomous dae, 12, 25autonomous ode, 13dae, 12, 25ode, 13

ltvabbreviation, xviiautonomous dae, 13, 26autonomous ode, 13dae, 13, 26ode, 13

Lyapunov equation, 52Lyapunov function, 52

candidate, 52, 231Lyapunov transformation, 57, 240

manifold, 111Mathematica, 11, 23n, 51, 79, 135, 177matrix pair, 41, 186matrix pencil, 40matrix-valued singular perturbation, 2,

71, 151, 196, 227matrix-valued uncertainty, 152measurement update, 111, 116meshless, 115model, 15

component-based, 24residualized, 20truncated, 20

model class, 18model reduction, 7, 19model structure, 18multiparameter singular perturbation,

69multiple time scale singular perturba-

tion, 71

nominal index, 35

Index 269

non-differential equation, 12normal, 7, 54

departure from, see departure fromnormality

odeabbreviation, xviiautonomous, 13nimplicit, 23time-invariant, 13

one-full, 130, 135outer approximation, 78

pair, matrix, 41Pantelides’ algorithm, 34particle filter, 110pencil, matrix, 40perturbation

regular, see regular perturbationsingular, see singular perturbation

perturbation index, 33point

matrix, 78vector, 78

point estimate, 111, 119point-mass distribution, 114point-mass filter, 111, 114pointwise

index, 35non-singular, 78

properly stated leading term, 38

quasilinear form, 5, 12, 23quasilinear shuffle algorithm, 5, 29, 86,

92

radau5, 51reduced equation, 14, 98, 131regular

lti dae, 41matrix pair, 42matrix pencil, 40uncertain matrix, 42, 78uncertain matrix pair, 43

regular perturbation, 58residualization, 20residualized model, 20

right hand sideof ode, 13of quasilinear dae, 12

scalar singular perturbation, 67shuffle algorithm, 29, 30

quasilinear, see quasilinear shufflealgorithm

simplified strangeness index, 134singular

lti dae, 41matrix pair, 42matrix pencil, 40uncertain matrix, 78uncertain matrix pair, 43

singular perturbation, 67matrix-valued, see matrix-valued

singular perturbationmultiparameter, see multiparameter

singular perturbationmultiple time scale, see multiple

time scale singular perturbationscalar, see scalar singular perturba-

tionsingular perturbation approximation, 21square (dae), 14state (vector), 14state feedback matrix, 13state space model, 113nstrangeness index, 34, 72, 131structural zero, 6, 81n, 94structure algorithm, 29, 38, 39

tangent space, 111, 122tessellation, 111, 114, 120time update, 111, 117, 118trailing matrix

of linear dae, 12of matrix pair, 42

transition matrix, 52, 118n, 205, 235truncated model, 20truncation, 20

uniformly bounded-input, bounded-output stable, 60

uniformly exponentially stable, 64unstructured perturbation, 152

270 Index

variable, 14Voronoi diagram, 115, 121

PhD DissertationsDivision of Automatic Control

Linköping University

M. Millnert: Identification and control of systems subject to abrupt changes. Thesis No. 82,1982. ISBN 91-7372-542-0.A. J. M. van Overbeek: On-line structure selection for the identification of multivariable sys-tems. Thesis No. 86, 1982. ISBN 91-7372-586-2.B. Bengtsson: On some control problems for queues. Thesis No. 87, 1982. ISBN 91-7372-593-5.S. Ljung: Fast algorithms for integral equations and least squares identification problems.Thesis No. 93, 1983. ISBN 91-7372-641-9.H. Jonson: A Newton method for solving non-linear optimal control problems with generalconstraints. Thesis No. 104, 1983. ISBN 91-7372-718-0.E. Trulsson: Adaptive control based on explicit criterion minimization. Thesis No. 106, 1983.ISBN 91-7372-728-8.K. Nordström: Uncertainty, robustness and sensitivity reduction in the design of single inputcontrol systems. Thesis No. 162, 1987. ISBN 91-7870-170-8.B. Wahlberg: On the identification and approximation of linear systems. Thesis No. 163, 1987.ISBN 91-7870-175-9.S. Gunnarsson: Frequency domain aspects of modeling and control in adaptive systems. The-sis No. 194, 1988. ISBN 91-7870-380-8.A. Isaksson: On system identification in one and two dimensions with signal processing ap-plications. Thesis No. 196, 1988. ISBN 91-7870-383-2.M. Viberg: Subspace fitting concepts in sensor array processing. Thesis No. 217, 1989.ISBN 91-7870-529-0.K. Forsman: Constructive commutative algebra in nonlinear control theory. Thesis No. 261,1991. ISBN 91-7870-827-3.F. Gustafsson: Estimation of discrete parameters in linear systems. Thesis No. 271, 1992.ISBN 91-7870-876-1.P. Nagy: Tools for knowledge-based signal processing with applications to system identifica-tion. Thesis No. 280, 1992. ISBN 91-7870-962-8.T. Svensson: Mathematical tools and software for analysis and design of nonlinear controlsystems. Thesis No. 285, 1992. ISBN 91-7870-989-X.S. Andersson: On dimension reduction in sensor array signal processing. Thesis No. 290,1992. ISBN 91-7871-015-4.H. Hjalmarsson: Aspects on incomplete modeling in system identification. Thesis No. 298,1993. ISBN 91-7871-070-7.I. Klein: Automatic synthesis of sequential control schemes. Thesis No. 305, 1993. ISBN 91-7871-090-1.J.-E. Strömberg: A mode switching modelling philosophy. Thesis No. 353, 1994. ISBN 91-7871-430-3.K. Wang Chen: Transformation and symbolic calculations in filtering and control. ThesisNo. 361, 1994. ISBN 91-7871-467-2.T. McKelvey: Identification of state-space models from time and frequency data. ThesisNo. 380, 1995. ISBN 91-7871-531-8.J. Sjöberg: Non-linear system identification with neural networks. Thesis No. 381, 1995.ISBN 91-7871-534-2.R. Germundsson: Symbolic systems – theory, computation and applications. Thesis No. 389,1995. ISBN 91-7871-578-4.P. Pucar: Modeling and segmentation using multiple models. Thesis No. 405, 1995. ISBN 91-7871-627-6.

H. Fortell: Algebraic approaches to normal forms and zero dynamics. Thesis No. 407, 1995.ISBN 91-7871-629-2.A. Helmersson: Methods for robust gain scheduling. Thesis No. 406, 1995. ISBN 91-7871-628-4.P. Lindskog: Methods, algorithms and tools for system identification based on prior knowl-edge. Thesis No. 436, 1996. ISBN 91-7871-424-8.J. Gunnarsson: Symbolic methods and tools for discrete event dynamic systems. ThesisNo. 477, 1997. ISBN 91-7871-917-8.M. Jirstrand: Constructive methods for inequality constraints in control. Thesis No. 527, 1998.ISBN 91-7219-187-2.U. Forssell: Closed-loop identification: Methods, theory, and applications. Thesis No. 566,1999. ISBN 91-7219-432-4.A. Stenman: Model on demand: Algorithms, analysis and applications. Thesis No. 571, 1999.ISBN 91-7219-450-2.N. Bergman: Recursive Bayesian estimation: Navigation and tracking applications. ThesisNo. 579, 1999. ISBN 91-7219-473-1.K. Edström: Switched bond graphs: Simulation and analysis. Thesis No. 586, 1999. ISBN 91-7219-493-6.M. Larsson: Behavioral and structural model based approaches to discrete diagnosis. ThesisNo. 608, 1999. ISBN 91-7219-615-5.F. Gunnarsson: Power control in cellular radio systems: Analysis, design and estimation. The-sis No. 623, 2000. ISBN 91-7219-689-0.V. Einarsson: Model checking methods for mode switching systems. Thesis No. 652, 2000.ISBN 91-7219-836-2.M. Norrlöf: Iterative learning control: Analysis, design, and experiments. Thesis No. 653,2000. ISBN 91-7219-837-0.F. Tjärnström: Variance expressions and model reduction in system identification. ThesisNo. 730, 2002. ISBN 91-7373-253-2.J. Löfberg: Minimax approaches to robust model predictive control. Thesis No. 812, 2003.ISBN 91-7373-622-8.J. Roll: Local and piecewise affine approaches to system identification. Thesis No. 802, 2003.ISBN 91-7373-608-2.J. Elbornsson: Analysis, estimation and compensation of mismatch effects in A/D converters.Thesis No. 811, 2003. ISBN 91-7373-621-X.O. Härkegård: Backstepping and control allocation with applications to flight control. ThesisNo. 820, 2003. ISBN 91-7373-647-3.R. Wallin: Optimization algorithms for system analysis and identification. Thesis No. 919,2004. ISBN 91-85297-19-4.D. Lindgren: Projection methods for classification and identification. Thesis No. 915, 2005.ISBN 91-85297-06-2.R. Karlsson: Particle Filtering for Positioning and Tracking Applications. Thesis No. 924,2005. ISBN 91-85297-34-8.J. Jansson: Collision Avoidance Theory with Applications to Automotive Collision Mitigation.Thesis No. 950, 2005. ISBN 91-85299-45-6.E. Geijer Lundin: Uplink Load in CDMA Cellular Radio Systems. Thesis No. 977, 2005.ISBN 91-85457-49-3.M. Enqvist: Linear Models of Nonlinear Systems. Thesis No. 985, 2005. ISBN 91-85457-64-7.T. B. Schön: Estimation of Nonlinear Dynamic Systems — Theory and Applications. ThesisNo. 998, 2006. ISBN 91-85497-03-7.I. Lind: Regressor and Structure Selection — Uses of ANOVA in System Identification. ThesisNo. 1012, 2006. ISBN 91-85523-98-4.

J. Gillberg: Frequency Domain Identification of Continuous-Time Systems Reconstructionand Robustness. Thesis No. 1031, 2006. ISBN 91-85523-34-8.M. Gerdin: Identification and Estimation for Models Described by Differential-AlgebraicEquations. Thesis No. 1046, 2006. ISBN 91-85643-87-4.C. Grönwall: Ground Object Recognition using Laser Radar Data – Geometric Fitting, Perfor-mance Analysis, and Applications. Thesis No. 1055, 2006. ISBN 91-85643-53-X.A. Eidehall: Tracking and threat assessment for automotive collision avoidance. ThesisNo. 1066, 2007. ISBN 91-85643-10-6.F. Eng: Non-Uniform Sampling in Statistical Signal Processing. Thesis No. 1082, 2007.ISBN 978-91-85715-49-7.E. Wernholt: Multivariable Frequency-Domain Identification of Industrial Robots. ThesisNo. 1138, 2007. ISBN 978-91-85895-72-4.D. Axehill: Integer Quadratic Programming for Control and Communication. ThesisNo. 1158, 2008. ISBN 978-91-85523-03-0.G. Hendeby: Performance and Implementation Aspects of Nonlinear Filtering. ThesisNo. 1161, 2008. ISBN 978-91-7393-979-9.J. Sjöberg: Optimal Control and Model Reduction of Nonlinear DAE Models. Thesis No. 1166,2008. ISBN 978-91-7393-964-5.D. Törnqvist: Estimation and Detection with Applications to Navigation. Thesis No. 1216,2008. ISBN 978-91-7393-785-6.P-J. Nordlund: Efficient Estimation and Detection Methods for Airborne Applications. ThesisNo. 1231, 2008. ISBN 978-91-7393-720-7.

Documents

Differential-algebraic equations and matrix-valued singular …276784/... · 2009-11-24 · ing. Martin Enqvist and Daniel Petersson contributed with outstandingly thorough proofreading