1027

Springer Geophysics - psau.edu.sa · In general, we analyze here nonlinear relations between random effects, maybe stochastic or not. Obviously, the center of interest is on linear

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • Springer Geophysics

    For further volumes:http://www.springer.com/series/10173

  • Erik W. Grafarend � Joseph L. Awange

    Applications of Linear andNonlinear Models

    Fixed Effects, Random Effects, andTotal Least Squares

    123

  • Prof. Dr.-Ing. Erik W. GrafarendUniversity of StuttgartGeodetic InstituteGeschwister-Scholl-Strasse 24D70174 [email protected]

    Prof. Dr.-Ing. Joseph L. AwangeProfessor of Environment and Earth SciencesMaseno UniversityMasenoKenya

    Curtin UniversityPerthAustralia

    Karlsruhe Institute of TechnologyKarlsruheGermany

    Kyoto UniversityKyotoJapan

    ISBN 978-3-642-22240-5 ISBN 978-3-642-22241-2 (eBook)DOI 10.1007/978-3-642-22241-2Springer Heidelberg New York Dordrecht London

    Library of Congress Control Number: 2011945776

    c� Springer-Verlag Berlin Heidelberg 2012This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part ofthe material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting, reproduction on microfilms or in any other physical way, and transmission or informationstorage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodologynow known or hereafter developed. Exempted from this legal reservation are brief excerpts in connectionwith reviews or scholarly analysis or material supplied specifically for the purpose of being enteredand executed on a computer system, for exclusive use by the purchaser of the work. Duplication ofthis publication or parts thereof is permitted only under the provisions of the Copyright Law of thePublisher’s location, in its current version, and permission for use must always be obtained from Springer.Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violationsare liable to prosecution under the respective Copyright Law.The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoes not imply, even in the absence of a specific statement, that such names are exempt from the relevantprotective laws and regulations and therefore free for general use.While the advice and information in this book are believed to be true and accurate at the date ofpublication, neither the authors nor the editors nor the publisher can accept any legal responsibility forany errors or omissions that may be made. The publisher makes no warranty, express or implied, withrespect to the material contained herein.

    Printed on acid-free paper

    Springer is part of Springer Science+Business Media (www.springer.com)

  • Foreword

    This book is a source of knowledge and inspiration not only for geodesists andmathematicians, but also for engineers in general, as well as natural scientists andeconomists. Inference on effects which result in observations via linear and non-linear functions is a general task in science. The authors provide a comprehensivein-depth treatise on the analysis and solution of such problems. They start withthe treatment of underdetermined or/and overdetermined linear systems by methodsof minimum norms, weighted least squares and regularization, respectively. Con-sidering the observations as realizations of random variables leads to the questionof how to balance bias and variance in linear stochastic regression estimation.In addition, the causal effects are allowed to be random (general linear Gauss–Markov model of estimation and prediction), even the model function; conditionsare allowed, too (Gauss–Helmert model). Nonlinear functions are comprehended viaTaylor expansion or by their special structures, e.g., within the context of directionalobservations.

    v

  • vi Foreword

    The self-contained monograph is comprehensive in its inclusion of the morefundamental elements of the topic to the state of the art, complemented byan extensive list of references. All chapters have an introduction, often withan indication of fast track reading and with an explanation addressed to non-mathematicians which are of interest also to me as a mathematician. The bookcontains a great variety of case studies and numerical examples ranging fromclassical geodesy to dynamic systems theory. The background and much moreis provided by appendices dealing with linear algebra, probability, statistics andstochastic processes as well as numerical methods. The presentation is lucid andelegant, with historical remarks and humorous interpretations, such that I took greatpleasure in reading the individual chapters.

    I wish all readers of this brilliant encyclopaedic book this pleasure and muchbenefit.

    Prof. Dr. Harro WalkInstitute of Stochastics and Applications, University of Stuttgart, Germany

  • Preface

    “Probability does not exist”, such a statement is at the beginning of the book byB. de Finetti dedicated to the foundation of probability, nowadays called “probabilitytheory”. We believe that such a statement is overdone. The target of the referenceis made very clear, namely the analysis of the random effects or of a randomexperiment. The result of an experiment like throwing dice or a LOTTO outcome isnot predictable. Such an indeterminism is characteristic analysis of an experiment.In general, we analyze here nonlinear relations between random effects, maybestochastic or not. Obviously, the center of interest is on linear models, which areable to approximate nonlinear relations. There are typical nonlinear models, whichcannot be linearized. Our experimental world is divided into linear or linearized andnonlinear models. Sometimes, nonlinear models can be transformed into polynomialequations, also called algebraic, which allow a rigorous solution set. A prominentexample is the resection problem of overdetermined type. If we are able to expressthe characteristic rotation matrix between reference frames by quaternions, analgebraic equation of known high order arrives. But, in contrast to linear models,there exist a whole world of solutions, which have to be discussed. P. Lohsepresented in his Ph.D thesis wonderful examples of which we refer.

    A.N. Kolmogorov founded the axiomatic (mathematical) theory of probability.Typical for the axiomatic treatment, there are special rules relating to measuretheory. Here, we only refer to excellent introductions of measure theory, for instanceH. Bauer (1992), J. Elstrodt (2005) or W. Rudin (1987). Of course, there are verygood textbooks on the theory of probability, for instance H. Bauer (2002), K.L.Chung (2001), O. Kallenberg (2002), M. Loeve (1977), and A.N. Shiryaev (1996).

    It is important to inform our readers about these many subjects, which are nottreated here: robust estimation, Bayes statistics (only examples), stochastic integra-tion, IT OO stochastic processes, stochastic signal processing, martingales, Markovprocesses, limit theorems for associated random fields, modeling, measuring, andmanaging risks, genetic algorithms, chaos, turbulence, fractals, neural networks,wavelets, Gibbs field, Monte Carlo, and many, indeed very important subjects. Thisstatement refers to an overcritical reviewer who complained of the subject, which wedid not treat. Unfortunately we had to strongly restrict our subjects. Our contribution

    vii

  • viii Preface

    is only an atom in the 108 atoms.

    What will be treated?

    It is the task of the observer or the designer to decide: what is the best fit betweena linear or nonlinear model or the “real world” of the observer. Both types of analysishave to make an a priori statement about a lot of facts.

    • “Are the observations random or not, a stochastic effect or a fixed effect?”• “What are the parameters in the model space or would you prefer a ‘model free’

    concept?”• “What is the nature of the parameters, random effects or fixed effects?”• “Could you find the conditions between the observations or between the param-

    eters? What is the nature of the conditions, linear or nonlinear, random or not?”• “What is the bias between the model, and its estimation or prediction?”

    There are various criteria, which characterize the observation space or theparameter space:

    linear or nonlinear, estimation theory, linear and nonlinear distributions,lr � optimality, (LESS: l2), restrictions, mixed models of fixed and randomeffects, zero-, first-, and second order design, nonlinear statistics on curvedmanifolds, minimal bias estimation, translational invariance, total least squares,generalized least squares, minimal distance estimation, optimal design, andoptimal prediction.

    Of key interest is to find equivalence lemmas, for instance: Assume directobservations. Then a least squares fit (LESS) is equivalent to a best linear uniformlyunbiased estimation (BLUUE).

    The Chap. 1 deals with the first problem of algebraic regression, the consistentsystem of linear observational equations or a system of underdetermined linearequations. From the first page, examples onwards, we treat MINOS and the“horizontal rank partitioning”. In detail, we give the eigenvalue decompositionof weighted minimum norm solution. Examples are FOURIER series, FOURIER-LEGENDRE series including the NYQUIST frequency for special data. Specialnonlinear models with datum defects in terms of Taylor polynomials and thegeneralized Newton iterations are introduced.

    The Chap. 2 is based on the special first problem, the bias problem expressedby the Equivalence Theorem of the adjusted minimum norm solution (Gx-MINOS)and linear uniformly minimum biased estimator (S-LUMBE).

    The second problem of algebraic regression, namely solving an inconsistentsystem of linear observational equations as an overdetermined system of linearequations is extensively treated in Chap. 3. We start with a front example for theLeast Squares Solution (LESS). Indeed we discuss alternative solutions dependingon the metric of the observation space like Second Order Design, the Taylor-Karman structure, an optimal choice of the weight matrix and the Fuzzy sets.

  • Preface ix

    By an eiqenvalue decomposition of Gy-LESS, we introduce “canonical LESS”. Ourcase studies range from partial redundancies, latent conditions, the theory of highleverage points versus breaking points, direct and inverse Grassman coordinatesand PLÜCKER coordinates. We conclude with historical notes on C.F. Gauss, A.M.Legendre and generalizations.

    In another extensive review, we concentrate on the second problem of proba-bilistic regression, namely the special Gauss–Markov model without datum defectof Chap. 5. We set up the best, linear, uniformly unbiased estimator (BLUUE) formoments of the first order and the best, quadratically uniformly unbiased estimatorfor the central moments of the second order (BIQUUE). We depart again froma detailed example BLUUE and BIQUUE. We set up ˙y-BLUUE and note theEquivalence Theorem of Gy-LESS and ˙y-BLUUE. For BIQUUE, we study theblock partitioning of the dispersion matrix, the invariant quadratic estimation oftypes IQE, IQUUE, HIQUUE versus HIQE according to F.R. Helmert, variance-component estimation, simultaneous estimation of the first moments and the secondcentral moments, the so called E-D correspondence, inhomogeneous multilinearestimation, and BAYES design of moments estimation.

    The third problem of algebraic regression of Chap. 5 is defined as the problem tosolve inconsistent systems of linear observation equations with datum defect as anoverdetermined and indetermined system of linear equations. In the introduction,we begin with the front page example, subject to the minimum norm- least squaressolution of the front page example, namely by the technique of additive rankpartitioning as well as multiplicative rank partitioning. In more detail, we presentMINOS in the second section including the weights of the MINOS as well as LESS.Especially, we interpret Gy; Gx-MINOS in terms of the generalized inverse. Byeigenvalue decomposition, we find the solution to Gy; Gx-MINOS. A special sectionis devoted to total least squares in terms of ˛-weighted hybrid approximate solution(˛-HAPS) within Tykhonov-Phillips regularization.

    The third problem of probabilistic regression as a special Gauss–Markov modelwith datum problem is treated in Chap. 6. We set up best linear minimum unbiasedestimators of type BLUMBE and best linear estimators of type hom BLE, hom S-BLEand hom-˛-BLE depending on the weight assumption for the bias. For example, weintroduce continuous networks of first and second derivatives. Finally, we discussthe limit process of discrete network into continuum.

    The overdetermined system of nonlinear equations on curved manifolds inChap. 7 is presented for inconsistent system of directional observation equations,namely by minimal geodesic distance (MINGEODISC). A special example isthe directional observation from circular normal distribution of type von MISES-FISHER with a note of angular metric.

    Chapter 8 is relating to the fourth probabilistic regression as a setup of type BLIPand VIP for the central moments of first order. We begin the definition of a randomeffect model according to our magic triangle. Three examples relate to

    (i) Nonlinear error propagation with random effect components(ii) Nonlinear vector-valued error propagation with random effect in

    Geoinformatics

  • x Preface

    (iii) Nonlinear vector-valued error propagation for distance measurementsincluding HESSIANS.

    The fifth problem of algebraic regression in Chap. 9 is defined as solvingconditional equations of type homogeneous and inhomogeneous.

    Alternatively, the fifth problem of probabilistic regression in Chap. 10 is treatedas inhomogeneous general linear GAUSS–MARKOV model including fixed andrandom effects. In an explicit representation of errors in the general GAUSS–MARKOV model with mixed effects, we present an example, “collocation”. Ourcomments relate to the KOLMOGOROV–WIENER prediction and more up-to-dateliterature.

    The sixth problem of probabilistic regression in Chap. 11 is defined as a specialrandom effect model called “error-variables”. Here, we assume with respect to thesecond order statistics the first order design matrix to be random. Another namefor this is the Total Least Squares as an algebraic analogue. At first, we relatethe Total Least Squares to the nonlinear system “error-in-variables”. Secondly, weintroduce the models SIMEX and SYMEX from the approach of Carroll–Cook–Stefanski–Polzehl–Zwanzig depending on the variance-covariance matrix of therandom effects.

    As special problem of nonlinearity, we treat the popular 3d-datum transformationand the PROCRUSTES Algorithm in Chap. 12.

    As the sixth problem of generalized algebraic regression, we introduce finally theGAUSS–HELMERT problem as a system of conditional equations with unknowns.One part is the variance-covariance estimation Dfyg, second the variance-covariance model of conditional equations, and finally the complete conditionalequations with unknown in the third model Ax C By D C. As a result of Chap. 13,we present the block-structure of the dispersion matrix.

    As special problems of algebraic regression as well as stochastic estimation, wetreat in Chap. 14 the multivariate GAUSS–MARKOV model, especially the n-wayclassification model and dynamical systems.

    We conclude with algebraic solution of systems of equations of type linear,bilinear, quadratic and so on referring to combinatoric subsets, the GROEBNERbasis method, and the Multipolynomial resultants in Chap. 15.

    A very important part of our contribution is included in the appendices. AppendixA is an introduction to tensor algebra, linear algebra, matrix algebra as wellas multilinear algebra. Topics are multilinear functions and their decomposition,matrix algebra and the Hodge star operator including self duality. But the majorparts in linear algebra are “Ass”, “Uni”, and “Comm”, Rings, spaces, divisionalgebra, Lie algebra, de Witt algebra, composition algebra, generalized inverse,special matrix like Helmert, Hankel, Vandemonte, scalar measures of matrices,complex algebra, quaternion algebra, octonian algebra, and Clifford algebra.

    A short introduction is given on sampling distribution and their uses in con-fidence intervals and confidence region in Appendix B. Sampling distribution forthe GAUSS–LAPLACE normal distribution and its generalization to the multidi-mensional GAUSS–LAPLACE normal distribution as well as the distribution of the

  • Preface xi

    sample mean and variance-covariance matrix including the correlation coefficientsare reviewed.

    An introduction into statistical notions, random events and stochastic processesin Appendix C range from the moment representation of a probability, the GAUSS–LAPLACE and other normal distributions, to error propagation, scalar-, vector-and tensor valued stochastic processes of one- and multiparameter systems. Simpleexamples of one parameter include (i) cosine oscillation with random amplitudesand phases (ii) superposition of two uncorrelated random functions (iii) pulsemodulation, (iv) random sequences with constant and moving averages of typeARMA (r,s), (v) WIENER processes, (vi) special analysis of the parameter sta-tionary and non-stationary stochastic processes with discrete and continuousspectrum white and coloured noise with band limitation, (vii) multiparametersystems homogeneous and isotropic multipoint systems are given. Examples are(i) two-dimensional EUCLIDEAN networks, (ii) criterion matrices, (iii) spacegravity spectroscopy based on the TAYLOR-KARMAN structures, (iv) nonlinearprediction and (v) nonlocal time series analysis.

    Appendix D is devoted to GROEBNER basis algebra, the BUCHBERGERAlgorithm, and C.F. GAUSS combinatorial formulation.

    Acknowledgements

    E.W. Grafarend wants to thank the colleagues B.G. Blaha (Bedford/Mass., USA),J. Cai (Stuttgart, Germany), A. Dermanis (Thessalonik, Greece), T. Krarup(Copenhagen, Denmark), K.R. Koch (Bonn/Germany), G. Kampmann (Uebach-Palenberg/Germany), L. Kubacek (Olomouc, Slovakia), P. Lohse (Paderborn,Germany), B. Palancz (Budapest, Hungary), F. Pukelsheim (Augsburg/Germany),R. Rummel (Munich, Germany), F. Sanso (Como, Italy), L.E. Sjoberg (Stockholm,Sweden), P. Teunissen (Deft, Netherlands), H. Walk (Stuttgart, Germany), P. Xu(Kyoto, Japan), P. Zaletnyik (Budapest, Hungary) and S. Zwanzig (Uppsala,Sweden).

    J.L. Awange wishes to express his sincere thanks to B. Heck of the Departmentof Physical Geodesy (Karlsruhe Institute of Technology, Germany) for hostinghim during the period of his Alexander von Humboldt Fellowship during theperiod 2008-2011 when part of the book was written. He is further grateful toB. Veenendaal (Head of Department, Spatial Sciences, Curtin University, Australia)and F. N. Onyango (Vice Chancellor, Maseno University, Kenya) for the supportand motivation. Last, but not least, he acknowledges the support of Curtin ResearchFellowship, while his stay at Karlsruhe Institute of Technology (KIT) was supportedby a Alexander von Humboldt’s Ludwig Leichhardt’s Memorial Fellowship. He isvery grateful for the financial support.

    For the transfer of copy rights of our contributions, we thank the SpringerPublishing House (Heidelberg-Berlin-New York) as well as Spectrum Verlag(Heidelberg), and European Geosciences Union. A proper reference is added to the

  • xii Preface

    place we quote from. During the making of this edition of the book, more than 3000references have been cited.

    Finally, special thanks to I. Anjasmara and B. Arora of Curtin University whotypeset parts of the book. This is an Institute for Geoscience Research (TIGeR)publication No. 260.

    Stuttgart and Karlsruhe (Germany), Erik W. GrafarendPerth (Australia) Joseph L. Awange

  • Contents

    1 The First Problem of Algebraic RegressionfAx D yjA 2 Rn�m; y 2 R.A/ � rk A D n D dimYg, MINOS . . . . 1

    1-1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41-11 The Front Page Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51-12 The Front Page Example: Matrix Algebra .. . . . . . . . . . . . . . 51-13 The Front Page Example: MINOS, Horizontal

    Rank Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81-14 The Range R.f / and the Kernel N .f / . . . . . . . . . . . . . . . . . 101-15 The Interpretation of MINOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    1-2 Minimum Norm Solution (MINOS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161-21 A Discussion of the Metric of the Parameter

    Space X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211-22 An Alternative Choice of the Metric of the

    Parameter Space X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221-23 Gx-MINOS and Its Generalized Inverse .. . . . . . . . . . . . . . . . 221-24 Eigenvalue Decomposition of Gx-MINOS:

    Canonical MINOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241-3 Case Study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    1-31 Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381-32 Fourier–Legendre Series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491-33 Nyquist Frequency for Spherical Data . . . . . . . . . . . . . . . . . . . 62

    1-4 Special Nonlinear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631-41 Taylor Polynomials, Generalized Newton Iteration .. . . . 631-42 Linearized Models with Datum Defect . . . . . . . . . . . . . . . . . . 69

    1-5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

    xiii

  • xiv Contents

    2 The First Problem of Probabilistic Regression:The Bias ProblemSpecial Gauss–Markov Model with Datum Defects, LUMBE . . . . . 81

    2-1 Linear Uniformly Minimum Bias Estimator (LUMBE) . . . . . . . . . 842-2 The Equivalence Theorem of Gx-MINOS and S-LUMBE.. . . . . . 872-3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

    3 The Second Problem of Algebraic RegressionInconsistent system of linear observational equations . . . . . . . . . . . . . . 89

    3-1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 923-11 The Front Page Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 923-12 The Front Page Example in Matrix Algebra .. . . . . . . . . . . . 933-13 Least Squares Solution of the Front Page

    Example by Means of Vertical Rank Partitioning .. . . . . . 953-14 The Range R.f / and the Kernel N .f /,

    Interpretation of “LESS” by Three Partitionings .. . . . . . . 983-2 The Least Squares Solution: “LESS” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

    3-21 A Discussion of the Metric of the ParameterSpace X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

    3-22 Alternative Choices of the Metric of theObservation Y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

    3-23 Gx-LESS and Its Generalized Inverse . . . . . . . . . . . . . . . . . . . 1293-24 Eigenvalue Decomposition of Gy-LESS:

    Canonical LESS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1303-3 Case Study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

    3-31 Canonical Analysis of the Hat Matrix, PartialRedundancies, High Leverage Points . . . . . . . . . . . . . . . . . . . . 142

    3-32 Multilinear Algebra, “Join” and “Meet”, theHodge Star Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

    3-33 From A to B: Latent Restrictions, GrassmannCoordinates, Plücker Coordinates. . . . . . . . . . . . . . . . . . . . . . . . 156

    3-34 From B to A: Latent Parametric Equations,Dual Grassmann Coordinates, Dual PlückerCoordinates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

    3-35 Break Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1723-4 Special Linear and Nonlinear Models: A Family of

    Means for Direct Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1803-5 A Historical Note on C.F. Gauss and A.M. Legendre .. . . . . . . . . . . 180

    4 The Second Problem of Probabilistic RegressionSpecial Gauss-Markov model without datum defect . . . . . . . . . . . . . . . . 183

    4-1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1874-11 The Front Page Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1884-12 Estimators of Type BLUUE and BIQUUE of

    the Front Page Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

  • Contents xv

    4-13 BLUUE and BIQUUE of the Front PageExample, Sample Median, MedianAbsolute Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

    4-14 Alternative Estimation MaximumLikelihood (MALE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

    4-2 Setup of the Best Linear Uniformly Unbiased Estimator . . . . . . . . 2054-21 The Best Linear Uniformly Unbiased

    Estimation O� of � W †y-BLUUE . . . . . . . . . . . . . . . . . . . . . . . . . . 2064-22 The Equivalence Theorem of Gy-LESS and

    †y-BLUUE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2134-3 Setup of the Best Invariant Quadratic Uniformly

    Unbiased Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2144-31 Block Partitioning of the Dispersion

    Matrix and Linear Space Generatedby Variance-Covariance Components . . . . . . . . . . . . . . . . . . . . 215

    4-32 Invariant Quadratic Estimation ofVariance-Covariance Components of Type IQE.. . . . . . . . 220

    4-33 Invariant Quadratic Uniformly UnbiasedEstimations of Variance-CovarianceComponents of Type IQUUE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

    4-34 Invariant Quadratic Uniformly UnbiasedEstimations of One Variance Component(IQUUE) from †y-BLUUE: HIQUUE . . . . . . . . . . . . . . . . . . 228

    4-35 Invariant Quadratic Uniformly UnbiasedEstimators of Variance CovarianceComponents of Helmert Type: HIQUUEVersus HIQE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

    4-36 Best Quadratic Uniformly UnbiasedEstimations of One Variance Component: BIQUUE. . . . 234

    4-37 Simultaneous Determination of FirstMoment and the Second Central Moment,Inhomogeneous Multilinear Estimation, theE � D Correspondence, Bayes Design withMoment Estimations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

    5 The Third Problem of Algebraic RegressionfAx C i D yjA 2 Rn�m; y … R .A/ � rkA < minfm;ngg . . . . . . . . . . 263

    5-1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2655-11 The Front Page Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2665-12 The Front Page Example in Matrix Algebra .. . . . . . . . . . . . 2665-13 Minimum Norm: Least Squares Solution of

    the Front Page Example by Means of AdditiveRank Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268

  • xvi Contents

    5-14 Minimum Norm: Least Squares Solutionof the Front Page Example by Means ofMultiplicative Rank Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . 272

    5-15 The Range R(f) and the Kernel N(f)Interpretation of “MINOLESS”by Three Partitionings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276

    5-2 MINOLESS and Related Solutions Like WeightedMinimum Norm-Weighted Least Squares Solutions . . . . . . . . . . . . . 2835-21 The Minimum Norm-Least Squares Solution:

    “MINOLESS”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2835-22 .Gx; Gy/-MINOS and Its Generalized Inverse .. . . . . . . . . 2935-23 Eigenvalue Decomposition of

    .Gx; Gy/-MINOLESS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2975-24 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301

    5-3 The Hybrid Approximation Solution: ˛-HAPSand Tykhonov–Phillips Regularization .. . . . . . . . . . . . . . . . . . . . . . . . . . . 302

    6 The Third Problem of Probabilistic RegressionSpecial Gauss-Markov model with datum defect . . . . . . . . . . . . . . . . . . . . 305

    6-1 Setup of the Best Linear Minimum Bias Estimator ofType BLUMBE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3086-11 Definitions, Lemmas and Theorems . . . . . . . . . . . . . . . . . . . . . 3106-12 The First Example: BLUMBE Versus BLE,

    BIQUUE Versus BIQE, TriangularLeveling Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317

    6-2 Setup of the Best Linear Estimators of Type hom BLE,hom S-BLE and hom a-BLE for Fixed Effects . . . . . . . . . . . . . . . . . . . 332

    6-3 Continuous Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3456-31 Continuous Networks of Second Derivatives Type . . . . . 3466-32 Discrete Versus Continuous Geodetic

    Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357

    7 Overdetermined System of NonlinearEquations on Curved Manifoldsinconsistent system of directional observational equations . . . . . . . . 361

    7-1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3627-2 Minimal Geodesic Distance: MINGEODISC . . . . . . . . . . . . . . . . . . . . 3657-3 Special Models: From the Circular Normal

    Distribution to the Oblique Normal Distribution . . . . . . . . . . . . . . . . . 3707-31 A Historical Note of the von Mises Distribution . . . . . . . . 3707-32 Oblique Map Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3727-33 A Note on the Angular Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . 375

    7-4 Case Study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376

  • Contents xvii

    8 The Fourth Problem of Probabilistic RegressionSpecial Gauss–Markov model with random effects . . . . . . . . . . . . . . . . . 383

    8-1 The Random Effect Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3848-2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399

    9 The Fifth Problem of Algebraic Regression: The Systemof Conditional Equations: Homogeneous andInhomogeneous Equations: fBy D Bi versus �c C By D Big . . . . . . . . 4119-1 Gy-LESS of a System of a Inconsistent Homogeneous

    Conditional Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4119-2 Solving a System of Inconsistent Inhomogeneous

    Conditional Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4159-3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416

    10 The Fifth Problem of Probabilistic Regressiongeneral Gauss–Markov model with mixed effects . . . . . . . . . . . . . . . . . . . 419

    10-1 Inhomogeneous General Linear Gauss–Markov ModelFixed Effects and Random Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421

    10-2 Explicit Representations of Errors in the GeneralGauss–Markov Model with Mixed Effects . . . . . . . . . . . . . . . . . . . . . . . 426

    10-3 An Example for Collocation .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42810-4 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438

    11 The Sixth Problem of Probabilistic Regression- The Random Effect Model – “errors-in-variable” . . . . . . . . . . . . . . . . . 443

    11-1 The Model of Error-in-Variables or Total Least Squares . . . . . . . . . 44711-2 Algebraic Total Least Squares for the Nonlinear

    System of the Model “Error-in-Variables” .. . . . . . . . . . . . . . . . . . . . . . . 44811-3 Example: The Straight Line Fit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45011-4 The Models SIMEX and SYMEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45311-5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459

    12 The Nonlinear Problem of the 3d Datum Transformationand the Procrustes Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46112-1 The 3d Datum Transformation and the Procrustes Algorithm . . . 46312-2 The Variance: Covariance Matrix of the Error Matrix E . . . . . . . . . 470

    12-21 Case Studies: The 3d Datum Transformationand the Procrustes Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471

    12-3 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474

    13 The Sixth Problem of Generalized Algebraic Regressionthe system of conditional equations with unknowns -

    (Gauss–Helmert model) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47713-1 Variance-Covariance-Component Estimation in the

    Linear Model Ax C " D y; y … R.A/. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47913-2 Variance-Covariance-Component Estimation in the

    Linear Model B" D By � c; By … R.A/ C c . . . . . . . . . . . . . . . . . . . . . 482

  • xviii Contents

    13-3 Variance-Covariance-Component Estimation in theLinear Model Ax C " C B" D By � c; By … R.A/ C c . . . . . . . . . 485

    13-4 The Block Structure of Dispersion Matrix Dfyg . . . . . . . . . . . . . . . . . 48914 Special Problems of Algebraic Regression and Stochastic

    Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49314-1 The Multivariate Gauss–Markov Model: A Special

    Problem of Probabilistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49314-2 n-Way Classification Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498

    14-21 A First Example: 1-Way Classification . . . . . . . . . . . . . . . . . . 49914-22 A Second Example: 2-Way Classification

    Without Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50314-23 A Third Example: 2-Way Classification

    with Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50914-24 Higher Classifications with Interaction . . . . . . . . . . . . . . . . . . 514

    14-3 Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517

    15 Algebraic Solutions of Systems of EquationsLinear and Nonlinear Systems of Equations . . . . . . . . . . . . . . . . . . . . . . . . . 527

    15-1 Introductory Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52715-2 Background to Algebraic Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52815-3 Algebraic Methods for Solving Nonlinear

    Systems of Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53215-31 Solution of Nonlinear Gauss–Markov Model . . . . . . . . . . . 53215-32 Adjustment of the combinatorial subsets . . . . . . . . . . . . . . . . 552

    15-4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55615-5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563

    A Tensor Algebra, Linear Algebra, Matrix Algebra,Multilinear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571A-1 Multilinear Functions and the Tensor Space Tpq . . . . . . . . . . . . . . . . . . 572A-2 Decomposition of Multilinear Functions into

    Symmetric Multilinear Functions AntisymmetricMulti-linear Functions and Residual MultilinearFunctions T T pq D Spq ˚ Apq ˚ Rpq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578

    A-3 Matrix Algebra, Array Algebra, Matrix Normand Inner Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584

    A-4 The Hodge Star Operator, Self Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . 587A-5 Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592

    A-51 Definition of a Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . 593A-52 The Diagrams “Ass”, “Uni” and “Comm”.. . . . . . . . . . . . . . 595A-53 Ringed Spaces: The Subalgebra

    “Ring with Identity” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597A-54 Definition of a Division Algebra and

    Non-Associative Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598

  • Contents xix

    A-55 Lie Algebra, Witt Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598A-56 Definition of a Composition Algebra . . . . . . . . . . . . . . . . . . . . 599

    A-6 Matrix Algebra Revisited, Generalized Inverses . . . . . . . . . . . . . . . . . 602A-61 Special Matrices: Helmert Matrix, Hankel

    Matrix, Vandemonte Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606A-62 Scalar Measures of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612A-63 Three Basic Types of Generalized Inverses. . . . . . . . . . . . . . 618

    A-7 Complex Algebra, Quaternion Algebra, OctonianAlgebra, Clifford Algebra, Hurwitz Theorem . . . . . . . . . . . . . . . . . . . . 619A-71 Complex Algebra as a Division Algebra as

    well as a Composition Algebra, Cliffordalgebra C l.0; 1/ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 620

    A-72 Quaternion Algebra as a Division Algebraas well as a Composition Algebra, Cliffordalgebra C l.0; 2/ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622

    A-73 Octanian Algebra as a Non-AssociativeAlgebra as well as a Composition Algebra,Clifford algebra with Respect to H � H . . . . . . . . . . . . . . . . . 629

    A-74 Clifford Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633

    B Sampling Distributions and Their Use: ConfidenceIntervals and Confidence Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637B-1 A First Vehicle: Transformation of Random Variables . . . . . . . . . . . 638B-2 A Second Vehicle: Transformation of Random Variables. . . . . . . . 642B-3 A First Confidence Interval of Gauss–Laplace

    Normally Distributed Observations �, �2 Known, theThree Sigma Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648B-31 The Forward Computation of a First

    Confidence Interval of Gauss–LaplaceNormally Distributed Observations: �; �2 Known . . . . . 653

    B-32 The Backward Computation of a FirstConfidence Interval of Gauss–LaplaceNormally Distributed Observations: �; �2 Known . . . . . 659

    B-4 Sampling from the Gauss–Laplace Normal Distribution: A SecondConfidence Interval for the Mean, Variance Known . . . . . . . . . . . . . 662B-41 Sampling Distributions of the Sample Mean

    O�, �2 Known, and of the Sample Variance O�2 . . . . . . . . . . 677B-42 The Confidence Interval for the Sample Mean,

    Variance Known . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 688B-5 Sampling from the Gauss–

    Laplace Normal Distribution: A Third ConfidenceInterval for the Mean, Variance Unknown .. . . . . . . . . . . . . . . . . . . . . . . 692B-51 Student’s Sampling Distribution of the

    Random Variable . O� � �/= O� . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 692

  • xx Contents

    B-52 The Confidence Interval for the Mean,Variance Unknown .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 701

    B-53 The Uncertainty Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707B-6 Sampling from the Gauss–

    Laplace Normal Distribution: A Fourth ConfidenceInterval for the Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 708B-61 The Confidence Interval for the Variance .. . . . . . . . . . . . . . . 709B-62 The Uncertainty Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715

    B-7 Sampling from the Multidimensional Gauss–LaplaceNormal Distribution: The Confidence Region for theFixed Parameters in the Linear Gauss–Markov Model . . . . . . . . . . . 717

    B-8 Multidimensional Variance Analysis, Sampling fromthe Multivariate Gauss–Laplace Normal Distribution .. . . . . . . . . . . 739B-81 Distribution of Sample Mean

    and Variance-Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740B-82 Distribution Related to Correlation Coefficients . . . . . . . . 744

    C Statistical Notions, Random Events and Stochastic Processes . . . . . . . 753C-1 Moments of a Probability Distribution, the Gauss–

    Laplace Normal Distribution and the Quasi-NormalDistribution .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754

    C-2 Error Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 757C-3 Useful Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 760C-4 Scalar – Valued Stochastic Processes of One Parameter . . . . . . . . . 762C-5 Characteristic of One Parameter Stochastic Processes . . . . . . . . . . . 765C-6 Simple Examples of One Parameter Stochastic Processes . . . . . . . 769C-7 Wiener Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 781

    C-71 Definition of the Wiener Processes . . . . . . . . . . . . . . . . . . . . . . 781C-72 Special Wiener Processes: Ornstein–

    Uhlenbeck, Wiener Processes with Drift,Integral Wiener Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785

    C-8 Special Analysis of One Parameter StationaryStochastic Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793C-81 Foundations: Ergodic and Stationary Processes . . . . . . . . . 793C-82 Processes with Discrete Spectrum . . . . . . . . . . . . . . . . . . . . . . . 795C-83 Processes with Continuous Spectrum .. . . . . . . . . . . . . . . . . . . 798C-84 Spectral Decomposition of the Mean and

    Variance-Covariance Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 808C-9 Scalar-, Vector-, and Tensor Valued Stochastic

    Processes of Multi-Parameter Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 811C-91 Characteristic Functional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 812C-92 The Moment Representation of Stochastic

    Processes for Scalar Valued and Vector ValuedQuantities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814

    C-93 Tensor-Valued Statistical Homogeneous andIsotropic Field of Multi-Point Systems . . . . . . . . . . . . . . . . . . 818

  • Contents xxi

    D Basics of Groebner Basis Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 887D-1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 887D-2 Buchberger Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 889

    D-21 Mathematica Computation of Groebner Basis . . . . . . . . . . 889D-22 Maple Computation of Groebner Basis . . . . . . . . . . . . . . . . . . 891

    D.3 Gauss Combinatorial Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 892

    References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895

    Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1011

  • Chapter 1The First Problem of Algebraic Regression

    Consistent system of linear observational equations – underdeterminedsystem of linear equations: Ax D yjA 2 Rn�m; y 2 R.A/ � rkA D n; n Ddim 2 R.

    The optimisation problem which appears in treating underdetermined linearsystem equations and weakly nonlinear system equations is a standard topic in manytextbooks on optimisation. Here we define a nearly nonlinear system of equationsas a problem which allows Taylor expansion. We start in the first section with afront example, the transformation of a threedimensional parameter space into atwodimensional observation space. The Minimum Norm Solution, in short MINOS,leads to the solution (1.21) and (1.22). We determine as an example the range andthe kernel of the mapping, namely by “horizontal rank partitioning”. In particular,werefer to three types of analysis of underdetermined linear systems : (i) algebraic,(ii) geometric and (iii) set theoretical. The second section is based on a reviewof Minimum Norm Solution of an underdetermined linear system, for instanceusing the reflexive generalized inverse called RAO’s Pandora Box. (C.R. Rao andS.K.Mitra 1971 : pages 44-47, “three basic g-inverses, section 3.1 : minimum normsolution). We apply the method of Lagrange multiplyer, or the Implicit FunctionTheorem, or the theory of extrema with side conditions. The numerical example isthe eigenspace -eigenvector decomposition , namely the left and right eigenspaceelements. While the theory of solving rank deficient linear equations is well knownsince the seventeenth of the 20st century, it is not so for many important applications.

    Beside economical model theory we present applications in (i) Fourier seriesand (ii) Fourier-Legendre series enriched by NYQUIST Frequency Analysis ofspherical data. For instance, for data on a grid 5 minutes by 5 minutes of arcresulting in n D 9,331,200 observations the NYQUIST Frequency N(f) D 2,160as a limit when we develop the series by degree and order. Example (iii) introducesTaylor polynomials and generalized Newton iteration, namely in applying withintwodimensial network analysis,in particular linearized models with Datum Defects.Such a problem arises when we have measured distances and want to derive absolutecoordinates of network stations. Finally, as example (iv) we take reference of theGeneral Datum Problem in R(n) including coordinate differences, distances, angles,angle ratios, cross-ratios and area elements presented in an extended reference list.

    E. Grafarend and J. Awange, Applications of Linear and Nonlinear Models,Springer Geophysics, DOI 10.1007/978-3-642-22241-2 1,

    1

    © Springer-Verlag Berlin Heidelberg 2012

  • 2 1 The First Problem of Algebraic Regression

    Fast track reading: read only Lemma 1.3. Compare with the guideline of Chap. 1.

    The minimum norm solution of a system of consistent linear equations Ax D ysubject to A 2 Rn�m, rkA D n; n < m, is presented by Definition 1.1, Lemma 1.2,and Lemma 1.3. Lemma 1.4 characterizes the solution of the quadratic optimizationproblem in terms of the (1, 2, 4)-generalized inverse, in particular the right inverse.The system of consistent nonlinear equations Y D F.X/ is solved by means of twoexamples. Both examples are based on distance measurements in a planar network,

  • 1 The First Problem of Algebraic Regression 3

    namely a planar triangle. In the first example, Y D F.X/ is linearized at the pointx, which is given by prior information and solved by means of Newton iteration.The minimum norm solution is applied to the consistent system of linear equations4y D A4x and interpreted by means of first and second moments of the nodalpoints. In contrast, the second example aims at solving the consistent system ofnonlinear equations Y D F.X/ in a closed form.

    Since distance measurements as Euclidean distance functions are left equivariantunder the action of the translation group as well as the rotation group – they areinvariant under translation and rotation of the Cartesian coordinate system – at firsta TR-basis (translation-rotation basis) is established. Namely the origin and the axesof the coordinate system are fixed. With respect to the TR-basis (a set of “freeparameters” has been fixed), the “bounded parameters” are analytically fixed. Sinceno prior information is built in, we prove that two solutions of the consistent systemof nonlinear equations Y D F.X/ exist. The mapping y 7�! X/ is 1 7�! 2. In thechosen TR-basis the solution vector X is not of minimum norm. Accordingly, weapply a datum transformation X 7�! X/ of type “group of motion” (decomposedinto the translation group and the rotation group). The parameters of the “groupof motion” (2 for translation, 1 for rotation) are determined under the condition ofminimum norm of the unknown vector x, namely by means of a special Procrustesalgorithm. As soon as the optimal datum parameters are determined, we are able tocompute the unknown vector x, which is minimum norm. Finally, the notes are anattempt to explain the origin of the injectivity rank deficiency, namely the dimensionof the null space N .A/; m�rkA of the consistent system of linear equations subjectto A 2 Rn�m and rkA D n; n < m as well as of the consistent system of nonlinearequations subject to a Jacobi matrix J 2 Rn�m, rkJ D n; n < m D dimX. Thefundamental relation to the datum transformation, also called transformation groups(conformal group, dilatation group (scale), translation group, rotation group, andprojective group), as well as to the “soft” Implicit Function Theorem is outlined.

    By means of a certain algebraic objective function which geometrically arecalled minimum distance functions, we solve the first inverse problem of linearand nonlinear equations, in particular of algebraic type, which relate observationsto parameters. The system of linear or nonlinear equations we are solving hereis classified as underdetermined. The observations, also called measurements, areelements of a certain observation space Y of integer dimension dimY D n, whichmay be metrical, especially Euclidean, pseudo-Euclidean, in general a differentiablemanifold. In contrast, the parameter space X of integer dimension dimX D m ismetrical as well, especially Euclidean, pseudo-Euclidean, in general a differentiablemanifold, but its metric is unknown. A typical feature of algebraic regression is thefact that the unknown metric of the parameter space X is induced by the functionalrelation between observations and parameters.

    We shall outline three aspects of any discrete inverse problem: (i) set-theoretic(fibering), (ii) algebraic (rank partitioning, IPM, the Implicit Function Theorem)and (iii) geometrical (slicing). Here, we treat the first problem of algebraic regres-sion. A consistent system of linear observational equations, which is also calledunderdetermined system of linear equations (“more unknowns than equations”), is

  • 4 1 The First Problem of Algebraic Regression

    Fig. 1.1 The trinity

    solved by means of an optimization problem. The following introduction presentsus a front page example of two inhomogeneous linear equations with unknowns. Interms of five boxes and five figures, we review the minimum norm solution of sucha consistent system of linear equations which is based upon the trinity that is shownin Fig. 1.1.

    1-1 Introduction

    With the introductory paragraph, we explain the fundamental conceptsand basic notions of this section. For you, the analyst, who has thedifficult task to deal with measurements, observational data, modeling,and modeling equations, we present numerical examples and graphicalillustrations of all abstract notions. The elementary introduction is writtennot for a mathematician, but for you, the analyst, with limited remote controlof the notions given hereafter. May we gain your interest.

    Assume an n-dimensional observation space Y, here a linear space parameterizedby n observations (finite, discrete) as coordinates y D Œy1; : : : ; yn�0 2 Rn in whichan r-dimensional model manifold is embedded (immersed). The model manifold isdescribed as the range of a linear operator Of from an m-dimensional parametersspace X into the observation space Y. The mapping f is established by themathematical equations which relate all observables to the unknown parameters.Here, the parameters space X, the domain of the linear operator Of , will be restrictedalso to a linear space which is parameterized by coordinates x D Œx1; : : : ; xn�0 2 Rn.

    In this manner, the linear operator Of can be understood as a coordinate mappingof type A W x 7�! y D Ax. The linear mapping f W X �! Y is geometricallycharacterized by its range R.f /, namely R.A/, defined by R.f / WD fy 2 Y jy D f .x/8x 2 Xg, which in general is a linear subspace of Y, and its null spacedefined by N .f / WD fy 2 Y j f .x/ D 0g. Here, we restrict the rangeR.f /, namelyR.A/, to coincide with the n D r-dimensional observation space Y such thaty 2 R.f /, namely y 2 R.A/. Note that Example 1.1 will therefore demonstrate therange space R.f /, namely R.A/, which coincides here with the observation spaceY (f is surjective or “into”) as well as the null space N .f /, namely N .A/, which

  • 1-1 Introduction 5

    is not empty (f is not injective or “one-to-one”). Furthermore, note that Box 1.1will introduce the special linear model of interest. By means of Box 1.2, it will beinterpreted as a polynomial of degree two based upon two observations and threeunknowns, namely as an underdetermined system of consistent linear equations.Box 1.3 reviews the formal procedure in solving such a system of linear equationsby means of “horizontal” rank partitioning and the postulate of the minimum normsolution of the unknown vector. In order to identify the range space R.A/, thenull space N .A/, and its orthonormal compliment N?.A/, Box 1.4 by means ofalgebraic partitioning (“horizontal” rank partitioning) outlines the general solutionof a system of homogeneous linear equations approaching zero. With a background,Box 1.5 presents the diagnostic algorithm for solving an underdetermined systemof linear equations. In contrast, Box 1.6 is a geometric interpretation of a specialsolution of a consistent system of inhomogeneous linear equations of type MINOS(minimum norm solution). The g-inverse A�m of type MINOS is characterized bythree conditions collected in Box 1.7. Moreover, note that Fig. 1.2 illustrates therange space R.A/, while Figs. 1.3 and 1.4 demonstrate the null space N .A/ aswell as its orthogonal compliment N?.A/. Figure 1.5 illustrates the orthogonalprojection of an element of the null space N .A/ onto the range space R.A�/.In terms of fibering, the set of points of the parameter space as well as of theobservation space, Fig. 1.6 introduces the related Venn diagrams.

    1-11 The Front Page Example

    Example 1.1. (Polynomial of degree two, consistent system of linear equa-tions, i.e. fAx D y; x 2 X D Rm; dimX D mI y 2 Y D Rn; dimY D n; r DrkA D dimYg.

    As a front page example, consider the following front page consistent system oflinear equations:

    x1 C x2 C x3 D 2x1 C 2x2 C 4x3 D 3

    (1.1)

    Obviously, in general, it deals with the linear space X D Rm 3 x; dimX D m, herem D 3, called the parameter space, and the linear space Y D Rn 3 y; dimY D n,here n D 2, called the observation space.

    1-12 The Front Page Example: Matrix Algebra

    By means of Box 1.1 and according to A. Cayley’s doctrine, let us specify (1.1) interms of matrix algebra.

  • 6 1 The First Problem of Algebraic Regression

    Box 1.1. (A special linear model: polynomial of degree two, two observa-tions, and three unknowns).

    y D�

    y1y2

    �D�

    a11 a12 a13a21 a22 a23

    �24x1x2x3

    35

    x D Ax W�

    2

    3

    �D�

    1 1 1

    1 2 4

    �24x1x2x3

    35

    (1.2)

    x0 D Œx1; x2; x3�; y0 D Œy1; y2� D Œ2; 3�x 2 R3�1; y 2 ZC�2�1 � R2�1

    (1.3)

    A W D�

    1 1 1

    1 2 4

    �2 ZC�2�3 � R2�3

    r D rkA D dimY D n D 2(1.4)

    The matrix A 2 Rn�m is an element of Rn�m, the n � m array of real numbers.dimX D m defines the number of unknowns (here m D 3) and dimY D n definesthe number of observations (here n D 2). A mapping { is called linear if f .x1 Cx2/D f .x1/ C f .x2/ holds. Beside the range R.f /, the range space R.A/, thelinear mapping is characterized by the kernel N .f / WD fx 2 Rnjf .x/ D 0g, thenull space N .A/ WD fx 2 RmjA§ D 0g to be specified later on.Why is the front page consistent system of linear equations to beconsidered here called underdetermined? Just observe that we are leftwith only two linear equations for three unknowns Œx1; x2; x3�. Indeed, thesystem of inhomogeneous linear equations is underdetermined. Withoutany additional postulate, we shall be unable to inverse those equationsfor Œx1; x2; x3�. In particular, we shall outline how to find such an additionalpostulate.

    Beforehand, we here have to introduce some special notions that are known fromoperator theory. Within matrix algebra, the index of the linear operator A is therank r D rkA, here r D 2, which coincides with the dimension of the observationspace, here n D dimY D 2. A system of linear equations is called consistent ifrkA D dimY. Alternatively, we say that the mapping f W x 7�! y D f .x/ 2 R.f /or A W x 7�! A§ D † 2 R.A/ takes an element x 2 X into the range R.f / or R.A/,also called the column space of the matrix A: compare with (1.5). Here, the columnspace is spanned by the first column c1 and the second column c2 of the matrix A,the 2 � 3 array: compare with (1.6). The right complementary index of the linearoperator A 2 Rn�m accounts for the injectivity defect that is given by d D m::rkA(here d D m::rkA D 1). Injectivity relates to the kernel N .f / or the null space we

  • 1-1 Introduction 7

    shall constructively introduce later on.

    f W x 7�! y D f .x/; f .x/ 2 R.f /;A W x 7�! Ax D y; y 2 R.A/; (1.5)

    R.A/ D span��

    1

    1

    �;

    �1

    2

    ��: (1.6)

    How can such a linear model of interest, namely a system of consistent linearequations, be generated? Let us assume that we have observed a dynamical systemy(t) which is represented by a polynomial of degree two with respect to timet 2 R, namely y.t/ D x1 C x2t C x3t2. Due to y D 2x3, it is a dynamicalsystem with constant acceleration or constant second derivative with respect totime t. The unknown polynomial coefficients are collected in the column arrayx D Œx1; : : : ; xn�0; x 2 X D R3; dimX D 3. They constitute the coordinates of thethree-dimensional parameter space X. If the dynamical system y.t/ is observedat two instants, say y.t1/ D y1 D 2 and y.t2/ D y2 D 3, say at t1 D 1and t2 D 2, respectively, and if we collect the observations in the column arrayy D Œy1; y2�0 D Œ2; 3�0; y 2 Y D R2; dimY D 2, they constitute the coordinates ofthe two-dimensional observation space Y. Thus, we are left with a special linearmodel interpreted in Box 1.2.

    Box 1.2. (A special linear model: polynomial of degree two, two observa-tions, and three unknowns).

    y D�

    y1y2

    �D"

    1 t1 t21

    1 t2 t22

    #24x1x2

    x3

    35

    ”(1.7)

    t1 D 1; y1 D 2t2 D 2; y2 D 3 W�

    2

    3

    �D�

    1 1 1

    1 2 4

    �24x1x2x3

    35

    (1.8)

    y D Axr D rkA D dimY D n D 2 (1.9)

    In the next section, let us begin with a more detailed analysis of the linearmapping f W Ax D y, namely of the linear operator A 2 Rn�m; r D rkA D dimY Dn. We shall pay special attention to the three fundamental partitioning, namely(i) algebraic partitioning, called rank partitioning of the matrix A, (ii) geometric

  • 8 1 The First Problem of Algebraic Regression

    partitioning, called slicing of the linear space X, and (iii) set-theoretical partitioning,called fibering of the domain D.f /.

    1-13 The Front Page Example: MINOS, Horizontal RankPartitioning

    Let us go back to the front page consistent system of linear equations, namely theproblem to determine three unknown polynomial coefficients from two samplingpoints which we classified as an underdetermined one. Nevertheless we are able tocompute a unique solution of the underdetermined system of inhomogeneous linearequations Ax D y; y 2 R.A/ or rkA D dimY, here A 2 R2�3, x 2 R3�1, y 2R

    2�1, if we determine the coordinates of the unknown vector x of minimum norm(minimal Euclidean length, l2 � norm), here kxk21 D xx D x21 C x22 C x23 D min.Box 1.3 outlines the solution of the related optimization problem.

    Box 1.3. (MINOS (minimum norm solution) of the front page consistentsystem of inhomogeneous linear equations, horizontal rank partitioning).

    The solution of the optimization problem kxk2I D minxAx D yI rkA D dimYgis directly based upon the horizontal rank partitioning of the linear mapping f Wx 7�! y D Ax; rkA D dimY, which we already have introduced. As soon asx1 D �A11A2x2CA11y is implemented in the norm kxk2I , we are prepared to computethe first derivatives of the unconstrained Lagrangean (1.10), which constitutes thenecessary conditions. (The theory of vector derivatives is presented in Appendix B.)Following Appendix A, which is devoted to matrix algebra, namely applying .I CAB/�1A D A.I C BA/�1 and .BA/�1 D A�1B�1 for appropriate dimensions of theinvolved matrices such that the identities (1.12) hold, we finally derive (1.13).

    L.x1; x2/ D kxk21 D x21 C x22 C x23D .y � A2x2/0.A1A10/�1..y � A2x2// C x20x2D y0.A1A10/�1y�2x20A2.A1A10/�1yCx20A20.A1A10/�1A2x2Cx20x2D minx

    @L@x2

    .x2m/ D 0 (1.10)”

    �A20.A1A10/�1y C ŒA20.A1A10/�1A2 C I�x2m D 0”

    x2m D ŒA20.A1A10/�1A2 C I��1A20.A1A10/�1y;(1.11)

  • 1-1 Introduction 9

    x2m D ŒA20.A1A10/�1A2 C I��1A20.A1A10/�1yD A20.A1A10/�1ŒA2A20.A1A10/�1 C I��1yD A20ŒA2A20.A1A10/�1 C I��1.A1A10/�1y;

    (1.12)

    x2m D A20.A1A10 C A2A20/�1y (1.13)The second derivatives that are given by (1.13), due to positive-definiteness of

    the matrix A20.A1A10/�1A2 C I, generate the sufficiency condition for obtaining theminimum of the unconstrained Lagrangean. Carrying out the backward transformx2m 7�! x1m D �A�11 A2x2 C A�11 y, we obtain (1.15).

    @2L@x2@x20

    .x2m/ D 2ŒA20.A1A10/�1A2 C I� > 0; (1.14)

    x2m D A�11 A2A20.A1A10 C A2A20/�1y C A�11 y: (1.15)

    Let us additionally right multiply the identity A1A10 D �A2A20 C .A1A10 C A2A20/by .A1A10 C A2A20/�1, such that (1.16) holds, and left multiply by A10, such that(1.17) holds. Obviously, we have generated the linear forms (1.18)–(1.20).

    A1A10.A1A10 C A2A20/�1 D �A2A20.A1A10 C A2A20/�1 C I; (1.16)A10.A1A10 C A2A20/�1 D A�11 A2A20.A1A10 C A2A20/�1 C A�11 ; (1.17)

    x1m D A10.A1A10 C A2A20/�1y;D A20.A1A10 C A2A20/�1y (1.18)�

    x1mx2m

    �D�

    A10A20

    �.A1A10 C A2A20/�1y (1.19)

    xm D A0.AA0/�1y (1.20)The following relations supply us with a numerical computation with respect to theintroductory example.

    .A1A10 C A2A20/ D�

    3 C7C7 21

    �; .A1A10 C A2A20/�1 D

    24 21 �7

    �7 3

    35

    A10.A1A10 C A2A20/�1 D 114

    �14 �47 �1

    �; A20.A1A10 C A2A20/�1 D 1

    14Œ�7; 6� ;

    (1.21)

  • 10 1 The First Problem of Algebraic Regression

    x1m D�

    3=7

    11=14

    �; x2m D 1=14; kxmkI D 3

    p42=14;

    @2L@x2@x20

    .x2m/ D 28 > 0; (1.22)

    y.t/ D 87

    C 1114

    t C 114

    t2: (1.23)

    In the next section, let us go into the detailed analysis of R.f /, N .f /, and N?.f /with respect to the front page example. Actually how can we identify the range spaceR.A/, the null space N .A/ or its orthogonal complement N?.A/.

    1-14 The Range R.f / and the Kernel N .f /

    The range space R.A/ is conveniently described by the first column c1 D Œ1; 1�0and the second column c2 D Œ1; 2�0 of the matrix A, namely 2-leg with respectto the orthogonal base vector e1 and e2, respectively, attached to the origin O.Symbolically, we write the 2-leg in the form (1.24). Symbolically, we write R.A/in the form (1.25). As a linear space, R.A/ � y is illustrated by Fig. 1.2.

    fe1 C e2; e1 C 2e2jOg or fec1; ec2jOg I (1.24)R.A/ D span Œe1 C e2; e1 C 2e2jO� : (1.25)

    By means of Box 1.4, we identify N .f / or the null space N .A/ and give itsillustration by Fig. 1.2. Such a result has paved the way to the diagnostic algorithmfor solving an underdetermined system of linear equations by means of rankpartitioning presented in Box 1.5.

    Box 1.4. (General solution of a system of homogeneous linear equationsAx D 0, horizontal rank partitioning).

    ec1

    ec2

    y

    e1

    e2

    O

    1

    1Fig. 1.2 Range R.f /, rangespace R.A/, y 2 R.A/

  • 1-1 Introduction 11

    The matrix A is called horizontally rank partitioned if (1.26) holds. (In theintroductory example, A 2 R2�3, A1 2 R2�2, A2 2 R2�1, and d.A/ D 1 applies.)A consistent system of linear equations Ax D y; rkA D dimY is horizontally rankpartitioned if (1.27) for a partitioned unknown vector (1.28) applies.

    A 2 Rn�m ^ A D ŒA1; A2�8<:

    r D rkA D rkA1 D n;A1 2 Rn�r ; A2 2 Rn�d ;d D d.A/ D m � rkA;

    (1.26)

    Ax D y; rkA D dimY ” A1x1 C A2x2 D y; (1.27)�x 2 Rn�m ^ A D

    �x1x2

    �jx1 2 Rr�1; x2 2 Rd�1

    �: (1.28)

    The horizontal rank partitioning of the matrix A as well as the horizontally rankpartitioned consistent system of linear equations Ax D y; rkA D dimY of theintroductory example is

    A D�

    1 1 1

    1 2 4

    �; A1 D

    �1 1

    1 2

    �; A2 D

    �1

    4

    �; (1.29)

    Ax D y; rkA D dimY ” A1x1 C A2x2 D y;x1 D Œx1; x2� 2 Rr�1; x2 D Œx3� 2 R;�

    1 1

    1 2

    � �x1x2

    �C�

    1

    4

    �x3 D y:

    (1.30)

    By means of the horizontal rank partitioning of the system of homogenous linearequations, an identification of the null space (1.31) is provided by (1.32). Inparticular, in the introductory example, we are led to (1.33) and (1.34).

    N .A/ D fx 2 RmjAx D A1x1 C A2x2 D 0g ; (1.31)A1x1 C A2x2 D 0 ” x1 D �A�11 A2x2; (1.32)�

    x1x2

    �D�

    2 �1�1 1

    � �1

    4

    �x3; (1.33)

    x1 D 2x3 D 2�; x2 D �3x3 D �3�; x3 D �: (1.34)

    Here, the two equations Ax D 0 for any x 2 X D R2 constitute the linear spaceN .A/; dimN .A/ D 1, a one-dimensional subspace of X D R2. For instance, ifwe introduce the parameter x3 D � , the other coordinates of the parameter spaceX D R2 amount to x1 D 2x3 D 2�; x2 D �3x3 D �3. In geometric language, thelinear space N .A/ is a parameterized straight line L10 through the origin O illustratedby Fig. 1.3. The parameter space X D Rr (here r = 2) is sliced by the subspace, the

  • 12 1 The First Problem of Algebraic Regression

    Fig. 1.3 The parameterspace X D R3 (x3 is notdisplayed) sliced by the nullspace, the linear manifoldN .A/ D L10 � R2

    linear space N .A/, also called linear manifold, and dimN .A/ D d.A/ D d (here astraight line L10 through the origin O).

    In the following section, let us consider the interpretation of MINOS by the threefundamental partitionings: algebraic (rank partitioning), geometric (slicing), andset-theoretical (fibering).

    1-15 The Interpretation of MINOS

    The diagnostic algorithm for solving an underdetermined system of linear equationsAx D y, rkA D dimY D n, n < m D dimX; y 2 R.A/ by means of rankpartitioning is presented to you in Box 1.5 While we have characterized the generalsolution of the system of homogenous linear equations Ax D 0, we are left withthe problem of solving the consistent system of inhomogeneous linear equations.Again, we take advantage of the rank partitioning of the matrix A summarized inBox 1.6.

    Box 1.5. (A diagnostic algorithm for solving an underdetermined system oflinear equations by means of rank partitioning).

    A diagnostic algorithm for solving an underdetermined system of linear equa-tions Ax D y, rkA D dimY; y 2R.A/ by means of rank partitioning is immediatelyintroduced by the following chain of statements.

    Determine the rank of the matrix A WrkA D dimY D n:

    +

    Compute the horizontal rank partitioning WA D ŒA1; A2�; A1 2 Rr�r D Rn�n; A2 2 Rn�.m�r/ D Rn�.m�n/

    m � r D m � n is called right complementary index.A as a linear operator is not injective, but surjective.

    +

  • 1-1 Introduction 13

    Compute the null space N .A/ WN .A/ D fx 2 RmjAx D 0g D fx 2 Rmjx1 C A�11 A2x2 D 0g:

    +Compute the unknown parameter vector of type MINOS:

    xm D A0.AA0/�1y:

    Box 1.6. (A special solution of a consistent system of inhomogeneous linearequations Ax D y, horizontal rank partitioning).

    Ax D y�

    rkA D dimYy 2 R.A/ ” A1x1 C A2x2 D y (1.35)

    Since the matrix A1 is of full rank, it can be regularly inverted (Cayley inverse). Inparticular, we solve for x1:

    x1 D �A�11 A2x2 C A�11 y (1.36)

    or

    �x1x2

    �D�

    2 �1�1 1

    � �1

    4

    �x3 C

    �2 �1

    �1 1� �

    y1y2

    �; (1.37)

    x1 D 2x3 C 2y1 � y2; x2 D �3x3 � y1 C y2: (1.38)

    For instance, if we introduce the parameter x3 D � , the other coordinates of theparameter space X D R2 amount to (1.39). In geometric language, the admissibleparameter space is a family of a one-dimensional linear space, a family of one-dimensional parallel straight lines dependent on y D Œy1; y2�0, here Œ2; 3�0, inparticular, the family (1.40), including the null space L10;0 D N .A/.

    x1 D 2� C 2y1 � y2;x2 D �3� � y1 C y2;

    (1.39)

    L1y1;y2

    D fx 2 R3j2x3 C 2y1 � y2; x2 D �3x3 � y1 C y2g: (1.40)

    Figure 1.4 illustrates (i) the admissible parameter space L1.y1;y2/, (ii) the line L1?

    which is orthogonal to the null space called N?.A/, and (iii) the intersectionL

    1.y1;y2/

    \ N?.A/, generating the solution point xm as will be proven now.The geometric interpretation of the minimum norm solution is the following:

    With reference to Fig. 1.5, we decompose the vector x D xN .A/ C xN?.A/, wherexN .A/ is an element of the null space N .A/ (here the straight line L10;0), andxN?.A/ � x0 is an element of the null space N .A/ (here the straight line L10;0),while iN .A/ D im is an element of the rang space R.A�/ (here the above outlined

  • 14 1 The First Problem of Algebraic Regression

    (A)~ 1(0, 0)

    O

    x2

    x1

    1(2, 3)

    1⊥~ (A)⊥

    Fig. 1.4 The range spaceR.A�/ (the admissibleparameter space), parallelstraight lines L1.y1;y2/,

    namely L1.2;3/

    Fig. 1.5 Orthogonalprojection of an elementof N .A/ onto the rangespace R.A�/

    straight line L1y1;y2 , namely L12;3 of the generalized inverse matrix A

    � of typeminimum norm solution, which in this book mostly is abbreviated as “MINOS”.k x k2I Dk xN .A/ C xN?.A/ k2Dk x k2I C2 < x j i > C k i k2 is minimalif and only if the inner product < x j i >D 0 and xN .A/ and iN?.A/ D im areorthogonal. The solution point xm is the orthogonal projection of the null pointonto R.A�/: PR.A�/ D A�Ax D A�y for all x 2 D.A/. Alternatively, if thevector xm of minimal length is orthogonal to the null space N .A/, being an elementof N?.A/ (here the line L10;0), we may say that N?.A/ intersects R.A�/ in thesolution point xm. Or we may say that the normal space NL10 with respect to thetangent space TL10 – which is in linear models identical to L

    10, the null space N .A/

    – intersects the tangent space TL10, the range space R.A�/ in the solution point xm.In summary, xm 2 N?.A/ \ R.A�/.

    Let the algebraic partitioning and the geometric partitioning be merged tointerpret the minimum norm solution of the consistent system of linear equations of

  • 1-1 Introduction 15

    type underdetermined MINOS. As a summary of such a merger, we take referenceto Box 1.7.

    Box 1.7. (General solution of a consistent system of inhomogeneous linearequations Ax D y/.

    Consistent system of equations f Wx 7�! y D Ax, x 2 X D Rn (parameter space),y 2 Y D Rn(observation space), r D rkA D dimY;A� generalized inverse of typeMINOS.

    Condition#1 Wf .x/ D f .g.y//

    ”f D f ı g ı f:

    Condition#1 WAx D AA�Ax

    ”AA�A D A:

    (1.41)

    Condition#2 W(reflexive g-inverse mapping)

    x D g.y/ D g.f .x//

    Condition#2 W(reflexive g-inverse)

    x� D A�y D A�AA�y”

    A�AA� D A�:

    (1.42)

    Condition#3 Wg.f .x// D xR.A�/

    ”g ı f D ProjR.A�/:

    Condition#3 WA�A D xR.A�/

    ”A�A D ProjR.A�/:

    (1.43)

    Regarding the above first condition AA�A D A (let us here depart from MINOSof y D Ax, x 2 X D Rn, y 2 Y D Rn, r D rkA D n, i.e., xm D A�my DA0.AA0/�1y: see (1.44). Concerning the above second condition A�AA� D A� see(1.45). Concerning the above third condition A�A D ProjR.A�/: see (1.46).

    Axm D AA�my D AA�mAxm H) Axm D AA�mAxm ” AA�A D A; (1.44)xm D A0.AA0/�1y D A�my D AA�mAx H) xm D A�my D A�mAA�my H)

    H) A�my D A�mAA�my ” A�AA� D A�;(1.45)

    xm D A�my D A�mAxm ” A�A D ProjR.A�/ WD ProjR.A�/ (1.46)

    How are rkA�m D rkA and A�mA interpreted? On the interpretation of rkA�m DrkA: the g-inverse of type MINOS is the generalized inverse of maximal ranksince in general rkA� � rkA holds. On the interpretation of A�mA: A�mA isan orthogonal projection onto R.A�/, but im D I � A�mA onto its orthogonalcomplement R?.A�/.

  • 16 1 The First Problem of Algebraic Regression

    Fig. 1.6 Venn diagram,trivial fibering of the domainD.f /, trivial fibers N .f / andN?.f /, f W Rn D X �!Y D Rn;Y D R.f /, X is theset system of the parameterspace, Y is the set system ofthe observation space

    If the linear mapping f W x 7�! y D f .x/, y 2 R.f / is given, we are aimingat a generalized inverse (linear) mapping y 7�! x D g.y/ in such a manner thaty D f .x/ D f .g.y// D f .g.f .x/// or f D f ıgıf as a first condition is fulfilled.Alternatively, we are going to construct a generalized inverse AA� W y 7�! A�y Dx� in such a manner that the first condition y D Ax� D AA�Ay or A D AA�Aholds. Though the linear mapping f W x 7�! y D f .x/, y 2 R.f / or the system oflinear equations Ax D y, rkA D dimY is consistent, it suffers from the (injectivity)deficiency of the linear mapping f .x/ or of the matrix A. Indeed, it recovers from the(injectivity) deficiency if we introduce the projection x 7�! g.f .x// D q 2 R.g/or x 7�! A�Ax D q 2 R.A�/ as the second condition. The projection matrix A�Ais idempotent, which follows from P2 D P or .A�A/.A�A/ D A�AA�A D A�A.

    Let us here finally outline the set-theoretical partitioning, the fibering of the setsystem of points which constitute the parameters space X, the domain D.f /. Sincethe set system X (the parameters space) is Rr , the fibering is called trivial. Non-trivial fibering is reserved for nonlinear models in which case we are dealing with aparameters space X which is a differentiable manifold. Here, the fibering D.f / DN .f / [ N?.f / produces the trivial fibers N .f / and N?.f /, where the trivialfiber N?.f / is the quotient set Rn=N .f /. By means of a Venn diagram (JohnVenn 1834–1928), also called Euler circles (Leonhard Euler 1707–1783), Fig. 1.6illustrates the trivial fibers of the set system X D Rm that is generated by N .f / andN?.f /. Of course, the set system of points which constitute the observation spaceY is not subject to fibering since all points of the set system D.f / are mapped intothe range R.f /.

    1-2 Minimum Norm Solution (MINOS)

    The system of consistent linear equations Ax D y.A 2 Rn�m; rkA D n < m/allows certain solutions which we introduce by means of Definition 1.1 as asolution of a certain optimization problem. Lemma 1.2 contains the normalequations of the optimization problem. The solution of such a system of normalequations is presented in Lemma 1.3 as the minimum norm solution (MINOS)with respect to the Gx-seminorm. Finally, we discuss the metric of the parameterspace and alternative choices of its metric before we identify by Lemma 1.4 thesolution of the quadratic optimization problem in terms of the (1, 2, 4)-generalizedinverse.

  • 1-2 Minimum Norm Solution (MINOS) 17

    By Definition 1.1, we characterize Gx-MINOS of the consistent system of linearequations Ax D y subject to A 2 Rn�m, rkA D n < m (algebraic condition)or y 2 R.A/ (geometric condition). Loosely speaking, we are confronted with asystem of linear equations with more unknowns m than equations n, namely n < m.Gx-MINOS will enable us to find a solution of this underdetermined problem. Bymeans of Lemma 1.2, we shall write down the “normal equations” of Gx-MINOS.

    Definition 1.1. (The minimum norm solution (MINOS) with respect to theGx-seminorm).

    A vector xm is called Gx-MINOS (minimum norm solution (MINOS) withrespect to the Gx-seminorm) of the consistent system of linear equations (1.47) ifboth Axm D y and – in comparison to all other vectors of solution x 2 X � Rn – theinequality k xm k2Gx WD x0mGxxm � x0Gxx WDk x k2Gx holds. The system of inverselinear equations A�y C i D x, rkA� ¤ m or x 62 R.A�/ is inconsistent.

    Ax D y; y 2 Y � Rn8̂<:̂

    rkA D rkŒA; y� D n < mA 2 Rn�my 2 R.A/:

    (1.47)

    Lemma 1.2. (The minimum norm solution (MINOS) with respect to the Gxseminorm).

    A vector xm 2 X � Rn is Gx-MINOS of (1.47) if and only if the system ofequations (1.48) with the vector � 2 Rn � 1 of Lagrange multipliers is fulfilled. xmexists always and is, in particular, unique if rkŒGx; A0� D m holds or, equivalently,if the matrix Gx C A0A is regular.

    �Gx A0A 0

    � �xm�

    �D�

    0y

    �(1.48)

    Proof. Gx-MINOS is based on the constrained Lagrangean (1.49) such that the firstderivatives (1.50) constitute the necessary conditions. The second derivatives (1.51),according to the positive semi-definiteness of the matrix Gx, generate the sufficiencycondition for obtaining the minimum of the constrained Lagrangean. According tothe assumption rkA D rkŒA; y� D n or y 2 R.A/ the existence of Gx-MINOS xmis guaranteed. In order to prove uniqueness of Gx-MINOS xm, we have to considercase (i) (Gx positive-definite) and we have to consider case (ii) (Gx positive semi-definite).

    L.x; �/ D x0Gxx C 2�0.Ax � y/ D minx;� (1.49)

  • 18 1 The First Problem of Algebraic Regression

    1

    2

    @L@x

    .xm; �m/ D Gxxm C A0�m D 01

    2

    @L@�

    .xm; �m/ D Axm � y D 0

    9>>=>>;

    ”�

    Gx A0A 0

    � �xm�

    �D�

    0y

    �;(1.50)

    1

    2

    @2L@x@x0

    .xm; �m/ D Gx � 0: (1.51)

    Case (i) (Gx positive-definite):Due to rkGx D m and jGxj ¤ 0, the partitioned system of normal equations (ref)is uniquely solved. The theory of inverse partitioned matrices (IPM) is presented inAppendix A.

    jGxj ¤ 0;�

    Gx A0A 0

    � �xm�

    �D�

    0y

    �: (1.52)

    Case (ii) (Gx positive semi-definite):Follow these algorithmic steps. Multiply the second normal equation by A0 in orderto produce A0Ax � A0y D 0 or A0Ax D A0y and add the result to the first normalequation, which generates Gxxm C A0Axm C A0�m D A0y and therefore .Gx CA0A/xm C A0�m D A0y. The augmented first normal equation and the originalsecond normal equation build up the equivalent system of normal equations (1.53),which is uniquely solved due to rkŒGx C A0A�D m; jGx C A0Aj ¤ 0.

    jGx C A0Aj ¤ 0;�

    Gx C A0A A0A 0

    � �xm�

    �D�

    A0In

    �y: (1.53)

    The solution of the system of normal equations directly leads to the linear formxm D Ly, which is Gx-MINOS of (1.48) and can be represented as follows.

    Lemma 1.3. (The minimum norm solution (MINOS) with respect to the Gx-seminorm).

    xm D Ly is Gx-MINOS of the consistent system of linear equations (1.48), i.e.Ax D y and rkA D rkŒA; y� D n or y 2 R.A/, if and only if L 2 Rm�n isrepresented by the following “relations”.

    Case (i) .Gx D Im/ W

    L D A�g D A0.AA0/�1 (right inverse); (1.54)xm D A�g y D A0.AA0/�1y; x D xm C im: (1.55)

    x D xm C im is an orthogonal decomposition of the unknown vector x 2 X � Rminto the I-MINOS vector xm 2 Ln and the I-MINOS vector of inconsistency im 2 Ldsubject to xm D A0.AA0/�1Ax and im D x�xm D ŒIm �A0.AA0/�1A�x. (Accordingto xm D A0.AA0/�1Ax, I-MINOS has the reproducing property. As projection

  • 1-2 Minimum Norm Solution (MINOS) 19

    matrices, the matrix A0.AA0/�1A with rkA0.AA0/�1A D rkA D n and the matrixŒIm � A0.AA0/�1A� with rkŒIm � A0.AA0/�1A� D n � rkA D d are independent).Their corresponding norms are positive semi-definite, namely k xm k2Im and k im k2Imyield k xm k2Im D y0AA0/�1y D x0A0.AA0/�1Ax D x0Gmx and k im k2ImDx0ŒIm � A0.AA0/�1A�x. In particular, note that

    xm D A0.AA0/�1Ax; im D x � xm D ŒIm � A0.AA0/�1A�x: (1.56)Case (ii) .Gx positive-definite/ W

    L D G�1x A0.AG�1x A0/�1 (weighted right inverse); (1.57)xm D G�1x A0.AG�1x A0/�1y; x D xm C im: (1.58)

    x D xm C im is an orthogonal decomposition of the unknown vector x 2 X � Rminto the Gx-MINOS vector xm 2 Ln and the Gx-MINOS vector of inconsistencyim 2 Ld .xm D G�1x A0.AG�1x A0/�1Ax; im D ŒIm � G�1x A0.AG�1x A0/�1Ax/. (Dueto xm D G�1x A0.AG�1x A0/�1Ax, Gx-MINOS has the reproducing property. As pro-jection matrices, the matrix G�1x A0.AG�1x A0/�1A with rkG�1x A0.AG�1x A0/�1A DrkA D n and ŒIm�G�1x A0.AG�1x A0/�1A� with rkŒIm�G�1x A0.AG�1x A0/�1A� D n�rkA D d are independent). Their corresponding norms are positive semi-definite,namely these are given by k xm k2Gx D y0.AG�1x A0/�1y D x0A0.AG�1x A0/�1Ax Dx0Gmx and k im k2Gx D x0ŒGx � A0.AG�1x A0/�1A�x. In particular, note that

    xm D G�1x A0.AG�1x A0/�1Ax; im D ŒIm � G�1x A0.AG�1x A0/�1Ax: (1.59)

    Case (iii) (Gx positive-definite):

    L D .Gx C AA0/�1A0/ŒA0.Gx C AA0/�1A0��1; (1.60)xm D .Gx C AA0/�1A0/ŒA0.Gx C AA0/�1A0��1y; (1.61)

    xm D xm C im: (1.62)

    x D xm C im is an orthogonal decomposition of the unknown vector x 2 X � Rminto the Gx C AA0-MINOS vector xm 2 Ln and the Gx C AA0-MINOS vector ofinconsistency im 2 Ld subject to

    xm D .Gx C AA0/�1A0/ŒA0.Gx C AA0/�1A0��1Ax; (1.63)im D .Im � .Gx C AA0/�1A0ŒA0.Gx C AA0/�1A0��1A/x: (1.64)

    Due to xm D .Gx C AA0/�1A0ŒA0.Gx C AA0/�1A0��1Ax, Gx C AA0-MINOS hasthe reproducing property. As projection matrices, the two matrices (1.65) and (1.66)

    .Gx C AA0/�1A0ŒA.Gx C AA0/�1A0��1ArkŒGx C AA0��1A0ŒA.Gx C AA0/�1A0��1A D rkA D n

    (1.65)

  • 20 1 The First Problem of Algebraic Regression

    .Im � .Gx C AA0/�1A0ŒA.Gx C AA0/�1A0��1A/rkŒIm � .Gx C AA0/�1A0ŒA.Gx C AA0/�1A0��1A D n � rkA D d:

    (1.66)

    Their corresponding norms are positive semi-definite, namely the correspondingnorms read

    k xm k2GxCAA0 D y0ŒA.Gx C AA0/�1A0��1yD x0A0ŒA.Gx C AA0/�1A0��1Ax D x0Gmx; (1.67)

    k im k2GxCAA0 D x0.Im � .Gx C AA0/�1A0ŒA.Gx C AA0/�1A0��1A/x (1.68)

    Proof. A basis of the proof could be C.R. Rao’s Pandora Box, the theory of inversepartitioned matrices. (Appendix A. Fact: Inverse Partitioned Matrix (IPM) of asymmetric matrix). Due to the rank identity rkA D rkAG�1x A0 D n < m, thenormal equations of the case (i) and the case (ii) (�!(1.69)) can be directly solvedby Gauss elimination. For the case (iii), we add to the first normal equation the termAA0xm D A0y. Due to the rank identity rkA D rkA0.Gx C AA0/�1A D n < m,the modified normal equations of the case (i) and the case (ii) (�!(1.70)) can bedirectly solved by Gauss elimination.

    �Gx A0A 0

    � �xm�

    �D�

    0y

    �; (1.69)

    �Gx C A0A A0

    A 0

    � �xm�

    �D�

    A0In

    �y: (1.70)

    Multiply the first equation of (1.69) by AG�1x and subtract the second equationof (1.69) from the modified first one to obtain (1.71). The forward reduction stepis followed by the backward reduction step. Implement �m into the first equationof (1.71) and solve for xm to obtain (1.72). Thus, Gx-MINOS xm and �m arerepresented by (1.73).

    Axm C AG�1x A0�m D 0;Axm D y;

    �m D �.AG�1x A0/�1y;(1.71)

    Gxxm � .AG�1x A0/�1y D 0;xm D G�1x A0.AG�1x A0/�1y

    (1.72)

    xm D G�1x A0.AG�1x A0/�1y;�m D �.AG�1x A0/�1y

    (1.73)

    Multiply the first equation of (1.70) by AG�1x and subtract the second equationof (1.70) from the modified first one to obtain (1.74). The forward reduction step

  • 1-2 Minimum Norm Solution (MINOS) 21

    is followed by the backward reduction step. Implement �m into the first equationof (1.74) and solve for xm to obtain (1.75). Thus, Gx-MINOS xm and �m arerepresented by (1.76).

    Axm C A.Gx C A0A/�1A0�m D A.Gx C A0A/�1A0y;Axm D y (1.74)

    �m D ŒA.Gx C A0A/�1A0��1ŒA.Gx C A0A/�1A0 � In�yD �.ŒA.Gx C A0A/�1A0��1 � In/y:

    .Gx C A0A/xm � A0ŒA.Gx C A0A/�1A0��1y D 0;xm D .Gx C A0A/�1A0ŒA.Gx C A0A/�1A0��1y; (1.75)xm D .Gx C A0A/�1A0ŒA.Gx C A0A/�1A0��1y;�m D �.ŒA.Gx C A0A/�1A0��1 � In/y: (1.76)

    1-21 A Discussion of the Metric of the Parameter Space X

    With the completion of the proof, we have to discuss in more detail the basic resultsof Lemma 1.3