NACoM-2003 Extended Abstracts

NACoM-2003 Extended Abstracts Proceedings of the International Conference on NUMERICAL ANALYSIS & COMPUTATIONAL MATHEMATICS (NACoM-2003) Anglia Polytechnic University, 23 – 26 May 2003, Cambridge, UK Editor: G. Psihoyios

NACoM-2003 Extended Abstracts 1 – 184

Contents

Preface 5

Conference Details 6

M. AbukhaledMean Square Stability of a Class of Runge-Kutta Methods for 2-Dimensional StochasticDifferential Systems 7

G. Ala, E. Francomano, A. Tortorici, and F. ViolaAn Advanced Variant of an Interpolatory Graphical Display Algorithm 10

G. Ala, M.L.Di Silvestre, E. Francomano, E. Toscano, and F. ViolaFinite Difference Time-domain Simulation of Soil Ionization in Grounding Systems underLightning Surge Conditions 12

A. Atieg and G.A. WatsonNumerical Methods for Fitting Curves and Surfaces to Data based on Calculating Orthog-onal Distances 16

Sh. BalajiSpline Based Electronic Modelling 20

G. Vanden Berghe, M. Van Daele, and H. Vande VyverExponentially-fitted Algorithms: Fixed or Frequency Dependent Knot Points? 24

D. N. BokovCharacteristic Directions Technique for the Scalar One-dimensional Non-linear AdvectionEquation with Non-convex Flow Function 28

C.E. Cadenas and V. VillamizarApplication of Least Squares Finite Element Method to Acoustic Scattering and Compari-son with other Numerical Techniques 32

R. CoolsSubdivision Strategies in Adaptive Integration Algorithms 36

J. Cruz and P. BarahonaConstraint Reasoning with Differential Equations 38

A. Cuyt and R.B. LeninVerified Computation of Packet loss Probabilities in Multiplexer Models using RationalApproximation 42

A. Cuyt, B. Verdonk, H. Waadeland, and J. VervloetTowards a Verified Library for Special Functions 46

c© 2003 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

2

A. CuytRecent Applications of Rational Approximation Theory: A Guided Tour 50

R.L. DavidchackEfficient Detection of Periodic Orbits in Chaotic Systems by Stabilising Transformations 53

M. DemiralpWeighted Eigenvalue Problem Approach to the Critical Value Determination of ScreenedCoulomb Potential Systems 57

A. Dhooge, W. Govaerts, and Yu. KuznetsovBifurcations of Periodic Solutions of ODEs using Bordered Systems 61

J. van den Eshof, G. Sleijpen, and M. van GijzenEfficient Iteration Methods for Schur Complement Systems 65

B. Fischer and J. ModersitzkiFast Image Registration: A Variational Approach 69

I. Gerace, P. Pucci, N. Ceccarelli, M. Discepoli, and R. MarianiA Preconditioned Finite Elements Method for the p-Laplacian’s Parabolic Equation 75

H. GuSolving Parameter-dependent Elliptic Problems by Finite Element Method and SymbolicComputation 79

V.Y. Gusev, M.Y. Kozmanov, and V.V. Zav’yalovOn Acceleration of Iteration Convergence for the System of Radiative Heat Transfer inKinetic Approximation 83

B. HendricksonCombinatorial Scientific Computing: Discrete Algorithms in Computational Science andEngineering 87

T. Hopkins and D. BarnesTesting a Medium Sized Numerical Package: A Case Study 91

D.P. Jenkinson, J.C. Mason, and A. CramptonIteratively Weighted Approximation Algorithms for Nonlinear Problems 95

Z. Kalogiratou, Th. Monovasilis, and T.E. SimosNumerical Solution of the Two-dimensional Time Independent Schrodinger Equation 99

T. Kaman and M. DemiralpA Parametric Sensitivity Analysis for the Solution of Extrema Evaluation Problems via aDimensionality Reducing Approximation Method 103

A. Kursunlu, I. Yaman, and M. DemiralpOptimal Control of One-dimensional Quantum Harmonic Oscillator under an ExternalField with Quadratic Dipole Function and Penalty on Momentum 107

D. Liang and W. ZhaoThe Weighted Upwinding Finite Volume Method for the Convection Diffusion Problem ona Nonstandard Covolume Grid 111

A. Mardegan, A. Sommariva, M. Vianello, and R. ZanovelloAdaptive Bivariate Chebyshev Approximation and Efficient Evaluation of Integral Opera-tors 115

NACoM-2003 Extended Abstracts 3

Th. Monovasilis, Z. Kalogiratou, and T.E. SimosNumerical Solution of the Two-dimensional Time-independent Schrodinger Equation bySymplectic and Asymptotically Symplectic Schemes 121

T.L. van NoordenA Jacobi-Davidson Continuation Method 125

K.C. PatidarA Numerical Study of the Dispersion for the Two-dimensional Helmholtz Equation 129

M. Pato and P. LimaNumerical Solution of Singular Nonlinear Boundary Value Problems for Shallow Mem-brane Caps 131

G. Psihoyios and T.E. SimosTrigonometrically-fitted Symmetric Four-step Methods for the Numerical Solution of Or-bital Problems 135

G. Psihoyios and T.E. SimosExponentially-fitted Multiderivative Methods for the Numerical Solution of the SchrodingerEquation 139

I. Rafatov and S. SklyarDifference Schemes for the Class of Singularly Perturbed Boundary Value Problems 145

V. Rotkin and S. ToledoThe Design and Implementation of a New Out-of-Core Sparse Cholesky FactorizationMethod 150

J. SantosPositive Two-level Difference Schemes for One-dimensional Convection-Diffusion Equa-tions 154

O. Shishkina and C. WagnerStability Analysis of High-order Finite Volume Schemes in Turbulent Simulations 158

M.A. Tunga and M. DemiralpA Factorized High Dimensional Model Representation on the Partitioned Random DiscreteData 162

B. Tunga and M. DemiralpOptimally Controlled Dynamics of One Dimensional Harmonic Oscillator: Linear DipoleFunctions and Quadratic Penalty 166

R. VigneswaranIterative Schemes with Extra Sup-steps for Implicit Runge-Kutta Methods 169

G.B. WilsonSpatial Domain Fourier Description of Hand-written Signature Images by use of IterativeDilation 173

I. Yaman and M. DemiralpA High Dimensional Model Representation Approximation of an Evolution Operator witha First-order Partial Differential Operator Argument 177

E.F. Yetkin and H. DagA Comparison of the Model Order Reduction Techniques for Linear System arising fromVLSI Interconnection Simulation 184

EDITOR’S PREFACE

The International Conference on Numerical Analysis & Computational Mathematics (NACoM-2003) was held at the Department of Mathematics – Anglia Polytechnic University (Cambridge Campus) for the four days duration between 23rd and 26th May 2003. NACoM-2003 proved to be a considerable success, despite its late announcement (due to technical difficulties beyond our control) and a complete absence of financial sponsors. Around 80 participants were registered from all over the globe (correct at time of writing the present note) and the overall standard of the submitted papers was exceptionally high. In order to ensure the very high quality of the contributions, the conference had a two stage refereeing process: i.e. an extended abstract had to be submitted and if it was accepted, after a refereeing process, only then the submission of a full-length paper was allowed. This volume contains all the extended abstracts that were accepted for publication (to facilitate the editor’s work, the submission and acceptance dates are identical on all contributions). The extended abstracts or short-papers included in this volume represent concise versions of the papers that were presented during the conference. Concise versions of most of the plenary papers are also included. The short-papers published in this book have been appropriately refereed by at least two referees, in accordance with established practices. NACoM-2003 hosted a special embedded event, the official launch of a new Wiley journal named “Applied Numerical Analysis and Computational Mathematics (ANACM)” (Editor-in-Chief T.E Simos), which was a very significant highlight of the conference. We would like to thank the members of the scientific committee, the sessions chairs, and all participants for their contributions. In particular we would like to thank the plenary speakers who agreed to undertake such a demanding function in a rather short notice. The conference is also grateful to Anglia Polytechnic University for making available various university facilities throughout the bank holiday weekend. Finally, we are pleased to announce that the NACoM-2004 International Conference is expected to be held around the end of June/beginning of July 2004. Further details will be made available through the conference’s website, which will be accessible in due course at: http://www.apu.ac.uk/appsci/maths/NACoM-2004/ Georgios Psihoyios May 2003

CONFERENCE DETAILS

International Conference on

Numerical Analysis & Computational Mathematics (NACoM-2003)

Anglia Polytechnic University, 23 – 26 May 2003, Cambridge, United Kingdom General Chair & Organiser Dr Georgios Psihoyios, Anglia Polytechnic University, Cambridge, UK. Vice-Chairs Prof. Theodore E. Simos, University of Peloponnisos, Greece.

Boz Kempski, Anglia Polytechnic University, Cambridge, UK. Scientific Committee and Plenary Speakers* *Prof. Jeff R. Cash, Imperial College, London, UK.

Prof. Ronald Cools, Katholieke Universiteit Leuven, Belgium.

*Prof. Annie Cuyt, University of Antwerp, Belgium.

*Prof. Bernd Fischer, Medical University of Luebeck, Germany.

Prof. Roland W. Freund, Bell Laboratories, USA.

Prof. Ian Gladwell, Southern Methodist University, USA.

Prof. Bruce Hendrickson, Sandia National Laboratories, USA.

*Prof. Marlies Hochbruck, University of Duesseldorf, Germany.

*Dr William F. Mitchell, National Institute of Standards & Technology, USA.

*Prof. Guido Vanden Berghe, University of Gent, Belgium.

*Prof. G. Alistair Watson, University of Dundee, UK.


Mean Square Stability of a Class of Runge-Kutta Methods for2-Dimensional Stochastic Differential Systems

Marwan Abukhaled∗1

1 Department of Mathematics & Statistics, American University of Sharjah, Sharjah, United Arab Emirates

Received 28 February 2003, accepted 21 March 2003

Though numerical schemes for stochastic differential equations are now abundant, numerical stability ofthese schemes has not been thoroughly investigated. On one front, substantial success in stability analysisof some numerical schemes when applied to linear test SDE has been achieved. On the other front, stabilityanalysis for some of these schemes when applied to a general n dimensional linear SD systems has notbeen studied. In [12], Saito and Mitsui made a break through by establishing a criterion for mean squarestability of Euler Maruyama scheme for general 2-dimensional SD systems. Following their guidelines,we study mean square stability for a class of explicit weak second-order Runge-Kutta methods for general2-dimensional SDE systems.

1 Introduction

An Ito stochastic Initial value problem has the form

dZ(t) = f (t, Z)dt+ g(t, Z) dW (t) (1)

Z(0) = 0

where W (t) is the standard Wiener process. f(t, Z(t)) and g(t, Z(t)) are referred to as the drift anddiffusion coefficients respectively.

The difficulty in dealing with this kind of equations arises from the non-differentiability of sample pathsthat originate as a result of white noise. The conditions that the drift and diffusion coefficients must satisfyin order for (1) to have a closed form solution are almost impossible to hold in practical applications (see[5] for details).

Numerical schemes for SDEs are recursive methods where trajectories of solution are computed atdiscrete points. These numerical methods are nowadays abundance and classified according to their natureand order of convergence [6]. Stability of numerical schemes for SDEs are essential to avoid possibleexplosion of numerical solutions. Out of many stability measures, the one with respect to the secondmoment, usually called mean square stability is our emphasis in this article.

Definition 1.1 The equilibrium position, Z(t) ≡ 0, is said to be asymptotically stable in the meansquare sense (ASMS) if for every ε > 0, there exists a δ1 > 0 such that

‖Z(t)‖ < ε for all t ≥ 0 and |Z0| < δ1, (2)

and furthermore, if there exists a δ2 > 0 such that

limt→∞ ‖Z(t)‖ = 0 for all |Z0| < δ2, (3)

∗ Corresponding author: e-mail: [email protected], Phone: +971-6-515-2531, Fax: +971-6-515-5950


8 M. Abukhaled: Mean Square Stability for SDEs

where ‖Z(t)‖ =√E |Z(t)|2.

Definition 1.2 Suppose that the equilibrium position of (1) is ASMS. Then a numerical scheme thatproduces the iterations Zn(t) to approximate the solution Z(t) is said to be ASMS if

limn→∞ ‖Zn‖ = 0. (4)

Mean square stability of many numerical schemes when applied to the linear autonomous Ito equation(1), in which f(t, Z(t)) = λZ(t) and g(t, Z(t)) = σZ(t), was discussed and stability regions were es-tablished [1, 7, 11]. . The investigation of mean square stability in the nth dimensional case in [7] wasconducted under the restriction that the drift and the diffusion coefficients were simultaneously diagonal-izable. Mean square stability of Euler-Maruyama scheme for a more general 2-dimensional systems wasfurther investigated ([12]).

In this presentation, we investigate mean square stability of an explicit weak second-order scheme fora general 2 dimensional SDE system. In particular, we will focus our attention on a class of explicitRunge-Kutta methods. Let’s first recall the definition of weak convergence of a numerical scheme forSDEs.

Definition 1.3 A discrete time approximation Zn converges weakly towards the exact solution Z(tn)with order n as h → 0, if for every smooth function F , there exists a positive constant C, independent ofh, such that

|E(F (Z(tn))) − E(F (Zn))| = Chn.

The class of explicit Runge-Kutta methods, under investigation, is given by

Zn+1 = Zn + γ1hf (Zn) +√γ1hg(Zn)η1

+γ2hf (K1) +√γ2hg(K1)η2 (5)

+γ4hf (K2) +√γ4hg(K2)η3,

where

K1 = Zn + γ3hf (Zn) +√γ3hg(Zn)η1

K2 = Zn + γ5hf (Zn) +√γ5hg(Zn)η1

. (6)

and η1, η2, and η3 are independent, normally distributed random variables with mean zero and varianceone.

It was shown in [3] that this class of Runge-Kutta methods is of second-order accuracy in the weaksense provided that γs satisfy the nonlinear system:

γ1 + γ2 + γ4 = 1

γ2γ3 + γ4γ5 =12

(7)

√γ1(γ2

√γ3 + γ4

√γ5) =

12

The nonlinear system (7) has infinite number of real and imaginary solutions (see [3] for more details).We now address the main questions of this study: Under what condition(s), does the numerical schemegiven in (??)-(6) provide a mean square stable solution? Is it possible to enlarge the region of mean squarestability of this scheme without sacrificing the order of convergence? Do all weak second-order methodsshare the same mean square stability regions? Numerical examples will be provided to support the analysis.


References

[1] M.I. Abukhaled, Mean Square Stability of Second-Order Weak Numerical Methods for Stochastic DifferentialEquations, submitted.

[2] M.I. Abukhaled and E.J. Allen, Expectation Stability of Second-Order Weak Numerical Methods for StochasticDifferential Equations, Stochastic Analysis and Applications, 20(4) (2002), pp. 693-707.

[3] M.I. Abukhaled and E.J. Allen, A Class of Second-Order Runge-Kutta Methods for Numerical Solution ofStochastic Differential Equations, Stochastic Analysis and Applications, 16(1998), pp. 977-991.

[4] L. Arnold, Stochastic Differential Equations, Wiley, New York, 1974.[5] T.C. Gard, Introduction to Stochastic Differential Equations, Marcel Dekker, New York, 1988.[6] P. Kloeden and E. Platen, Numerical Solutions for Stochastic Differential Equations, Springer-Verlag, Berlin,

1992.[7] Y. Komori and T. Mitsui, Stable ROW-Type Weak Scheme for Stochastic Differential Equations, Monte Carlo

Methods and Applic., 1(1995), pp. 270-300.[8] David W. Lewis, Matrix Theory, World Scientific, Singapore, 1991.[9] N.J. Newton, Asymptotically Efficient Runge-Kutta Methods for a Class of Ito and Stratonovich Equations, SIAM

J. Appl. Anal., 51(1991), pp. 542-567.[10] T. Mitsui, Contributions in Numerical Mathematics, World Scientific Publ., Singapore, 1993, pp. 333-344.[11] Y. Saito and T. Mitsui, Stability Analysis of Numerical Schemes for Stochastic Differential Equations, SIAM J.

Numer. Anal., 33(1996), pp. 2254-2267.[12] Y. Saito and T. Mitsui, Mean Square Stability of Numerical Schemes for Stochastic Differential Systems, Preprint


An Advanced Variant of an Interpolatory Graphical DisplayAlgorithm

G. Ala ∗1, E. Francomano ∗∗2,3, A. Tortorici 2,3, and F. Viola 1

1 Dipartimento di Ingegneria Elettrica - Universita degli Studi di Palermo, viale delle Scienze, 90128Palermo - Italia

2 Dipartimento di Ingegneria Informatica - Universita degli Studi di Palermo, viale delle Scienze, 90128Palermo - Italia

3 ICAR, Istituto per il CAlcolo e Reti ad alte prestazioni, CNR, viale delle Scienze, 90128 Palermo - Italia


In this paper an advanced interpolatory graphical display algorithm based on centered cardinal B-splinefunctions is provided. It is well-known that B-spline functions provide a flexible tool to design various scalerepresentations of a signal. The proposed method allows to display a function at any desirable resolutionlevel by working only with the initial data sequence. The mask of a generic resolution level is generatedso that the structure of the algorithm is independent across the scale. This peculiarity makes it suitable torun on distributed environment. In this paper centered cardinal B-spline functions of order m=4 have beentaken into account and 1d computational masks for different resolution levels have been supplied.

1 Introduction

Let Bm(x) be a B-spline function of order m. The relation :

Bm(x) =m∑k=0

2−m+1

(mk

)Bm(2x− k) =

m∑k=0

pm,kBm(2x− k) (1)

is known as one of the two-scale relations that give rise to a hierarchical representation of a signal atmultiple scales. Because spline spaces provide close and stable approximation of L2(R) it is reasonableto represent a discrete signal or image using B-spline bases. Given a discrete data set of points a(j0)

l =f(l · 2−j0) and f(x) ≈

∑l a

(j0)l Bm(2j0x− l), l ∈ Z, the aim is to compute all the values of the sequence:

f(k · 2−j1) k ∈ Z ∀j1 ≥ j0 (2)

so that to display the graph of f(x) it is adequate to display the sequence (2), when the fixed value j1 issufficiently large. First of all, for j = j0, ..., j1 − 1, the sequence a(j)

l have to be handled by inserting a

zero term between two consecutive elements, namely a(j)l have to be generated where a(j)

2k = a(j)k and

a(j)2k+1 = 0. Hence, by considering:

f (j)(x) =∑l

a(j)l Bm(2jx− l) (3)

∗ e-mail: [email protected], Phone: +39 091 6615288, Fax: +39 091 488452∗∗ Corresponding author: e-mail: [email protected], Phone: +39 091 238266



and taking into account the formula (1), the identity f (j+1)(x) = f (j)(x) generates the following relation:

a(j+1)l =

∑k

2−m+1

(ml − k

)a(j)k (4)

Therefore, f(k · 2−j1) ≈∑l a

(j1)l Bm(k − l). The coefficients a(j+1)

l are recursively computed byinvolving the upsampled values of the previous level. This peculiarity has been avoided by generatinga mapping of the process by involving, at a generic level t, a convolution of the initial data set with anopportune vector weight.First of all let f(l · 2−j(0)) = fl, l ∈ Z, be the initial data set. The vector weight is carried out by means of

a binary tree of depth t . Namely, the root v(0) = v(0)0 is a vector of size m-1 whose entries are the values

of the B-spline function computed in the integer knots of the support [0,m].The vector v(t+1) is obtainedby firstly computing the matrix A of size n x m, A = v

(t)h

⊗pm,k, where h is a sequence of t+1 binary

digits and v(t)h ∈ Rn+m. Then,

v(t+1)(r) =n∑k=1

ak,r−k+1, r = 1, ..., n+m (5)

by assuming ai,j = 0 for j ≤ 0. Therefore, the leaves v(t+1)h0 and v(t+1)

h1 are composed by the entries of thevector v(t+1)located in the position odd and even respectively. In the following some values regarding thecentered cardinal B-spline of order m=4 are reported.

t = 0 :v(0)0 = (1, 4, 1)

t = 1 :v(1)00 = (8, 32, 8) v(1)

01 = (1, 23, 23, 1)

t = 2 :v(2)000 = (64, 256, 64) v(2)

001 = (8, 184, 184, 8) v(2)010 = (27, 235, 121, 1) v(2)

011 = (1, 121, 235, 27)

t = 3 :v(3)0000 = (512, 2048, 512) v(3)

0001 = (64, 1472, 1472, 64) v(3)0010 = (216, 1880, 968, 8) v(3)

0011 = (8, 968, 1880, 216)v(3)0100 = (343, 2003, 725, 1) v(3)

0101 = (27, 1223, 1697, 125) v(3)0110 = (125, 1697, 1223, 27) v(3)

0111 = (1, 725, 2003, 343)

In 2-dimensions the vectors weight can be easily generated by means of using the tensorial product. Sig-nificant improvements have been obtained by implementing the algorithm on a cluster of workstationsusing MPI parallel programming paradigm. The method has shown to be synchronous, with good tasksbalancing and requiring few amount of data transfer.

References

[1] C.K. Chui,An introduction to wavelets, Academic Press, (1992).[2] C.K. Chui,Wavelets: a mathematical tool for signal analysis, SIAM.[3] L.Schumaker, Spline functions: basic theory, Wiley Interscience.[4] C. De Boor, A pratical guide to osplines, Springer Verlag.


Finite Difference Time-domain Simulation of Soil Ionization inGrounding Systems under Lightning Surge Conditions

G. Ala ∗1, M.L.Di Silvestre1, E. Francomano ∗∗2,3, E. Toscano2, and F. Viola1

1 Dipartimento di Ingegneria Elettrica - Universita degli Studi di Palermo, viale delle Scienze, 90128Palermo - Italia

2 Dipartimento di Ingegneria Informatica - Universita degli Studi di Palermo, viale delle Scienze, 90128Palermo - Italia

3 ICAR, Istituto per il CAlcolo e Reti ad alte prestazioni, CNR, viale delle Scienze, 90128 Palermo - Italia


This paper propose a finite difference time domain (FDTD) approach based on the numerical solution ofthe Maxwell’s equations, in order to take into account the non linear behaviour of the earth electrodesunder lightning surge conditions during the ionization phenomena. The time variable resistivity approachproposed by Liew and Darweniza is used for simulating the ionization and the de-ionization processes. Theproposed model enables to study the electromagnetic transient behavior of grounding systems, directly intime domain, without the formulation of an a priori hypothesis about the geometrical shape of the ionizedzone. The proposed approach has been tested with results found in technical literature and good agreementhas been found.

1 Introduction

Exact evaluation of electromagnetic transients in complex grounding system has a fundamental importancefor the lightning protection design and for the evaluation of the electrical stress in electric and electronicequipments of substations, due lightning phenomena. The earth electrodes constitute a fundamental partof the electric apparatus in industrial and civil structures, and it shall have a suitable configuration in or-der to avoid step and touch voltages values which can determine a serious human hazard; moreover, fora lightning protection system used in electric power installations the shape and dimensions of the earthtermination system are more important than a specific value of the resistance of the earth electrode; thismatter in order to disperse the lightning current into the earth without causing dangerous overvoltages. Ex-perimental and theoretical studies reported in technical literature have shown that, when the current leakinginto the soil by the different parts of the electrodes increases, the electric field on the lateral surface of theelectrodes can overcome the soil ionization gradient; so the dielectric breakdown in the ionized regiontakes place. This phenomenon has a great influence on the performance of concentrated electrodes fed bycurrents of high magnitude, but also in extended electrodes it has been shown that the soil ionization cantake place, so modifying the electrodes behavior with respect to the absence of soil breakdown. Usually,the soil ionization region starts at the electrode surface where the current density has its highest value.This region extends up to a distance where the current density decreases to a value that makes the electricfield lower than the critical breakdown value in the soil. With respect to the absence of ionization phe-nomenon, the presence of ionized zones around the electrodes significantly modifies the system behavioursince it becomes non linear. Different models have been proposed in technical literature to describe thesoil ionization process. These models can be classified as based on a variable soil resistivity approach or

∗ e-mail: [email protected], Phone: +39 091 6615288, Fax: +39 091 488452∗∗ Corresponding author: e-mail: [email protected], Phone: +39 091 238266



on a variable electrodes geometry approach. The first one considers a time variable soil resistivity in theregion surrounding the electrode during the ionization and the subsequent de-ionization phenomena. Thistime variable resistivity is a non linear function of the electric field, also. On the other hand, the variableelectrodes geometry approach models a given electrode embedded in an ionized soil as an electrode ofmodified transversal dimensions into a non ionized soil. Therefore, this approach considers the soil re-sistivity unchanged, and the non linear behavior is given by the dependence of the equivalent electrodegeometry on the current flowing into the soil. For each value of the current, the effective transversal di-mension of the electrode is obtained by assuming that the electric field may not exceed the critical value.With this approach, the ionized region is assimilated to the conductor and the electric field in this region isassumed to be null, as if the ionized region was short-circuited with the electrode. In this way, even if theapproach enables to obtain a reduction of the earth resistance of the electrode, it is far from the physics ofthe phenomenon. In order to correctly take into account the non linear behaviour of the earth electrodesduring the ionization and the de-ionization processes, this paper propose a finite difference time domain(FDTD) approach based on the numerical solution of the Maxwell’s equations. The time variable resistiv-ity approach proposed by Liew and Darweniza is used for simulating the ionization and the de-ionizationprocesses. In particular, for the ionization process the variable resistivity is expressed by the following:

ρ = ρ0e− t

τ1 (1)

where ρ0 is the initial resistivity value of the soil without ionization, τ1 is the ionization time constant ofthe soil. The ionization process is driven by the electric field E: at the instants whenE ≥ Ec , the resistivitybehavior is driven by (1); for the de-ionization process the variable resistivity function is expressed by thefollowing:

ρ = ρi + (ρ0 − ρi)(1 − e−t

τ2 )(1 − E

Ec)2 (2)

where ρi is the minimal values reached by the soil resistivity during the ionization process and obtained

Fig. 1 Soil conducibility map for an impulsed 2m vertical rod, -the air half-space reticulus is not shown.

by (1), τ2 is the de-ionization time constant of the soil, E is the electric field and Ec is the breakdownelectric field in the soil. In the formulation proposed in this paper, the de-ionization process is directlydriven by the electric field rather than by the current density as proposed by Darweniza. The half-spaceSommerfeld problem is exactly solved by simulating the two media with different electrical parameters and

14 F. Viola et al.: FDTD Simulation of Soil Ionization in Grounding Systems

by solving the Maxwell equation, numerically. The proposed model enables to study the electromagnetictransient behavior of grounding systems, directly in time domain, without the formulation of an a priorihypothesis about the geometrical shape of the ionized zone. On the contrary, this hypothesis is necessary inthe variable electrodes geometry approach, and it is also used in the papers related to ionization phenomenastudy of concentrated earths. As an example, in Fig. 1 the soil conducibility map is reported for a 2 mvertical rod in homogeneous soil, directly injected by a lightning current simulated with a 1.2-50 µs doubleexponential waveform with a peak value of 10 kA.

Fig. 2 Time profiles for an impulsed 2m vertical rod - observation point 20 cm away from the rod, on the soil surface.

The rod is modelled with 20 cells; the spatial reticulus dimensions are: ∆x = ∆y = 5cm, ∆z = 10cm;the soil parameters are: ρ0 = 100Ωm; εr = 8; µ = µ0. The map of Fig. 1 is referred to the time of 0.25µs. The critical value of the electric field is set to 3 kV/cm. The perfect matched layers (PML) are usedas absorbing boundary conditions for the space of interest. In Fig. 2 the time profiles of the electric field(E) and of the resistivity of the soil ρ, 20 cm away from the rod, on the soil surface, are reported. Theseprofiles are normalized with respect to the critical electric field Ec and to the initial resistivity value of the


soil ρ0, respectively. The proposed approach has been tested with the results found in technical literatureand good agreement has been found as will be reported in the full-length paper.

References

[1] M. Darveniza, A. C. Liew , Dynamic model of impulse characteristic of concentrated earths, in: Proceedings ofIEE, 121, 2, (1974).

[2] L. D. Grcev , Computer analysis of transient voltages in large grounding systems,IEEE Trans. on Power Delivery,11, No.2, (1996).

[3] G. Ala, M. L. Di Silvestre, A simulation model for electromagnetic transients in lightning protection systems,IEEE Trans. on Electromagnetic Compatibility, 44, No. 4, (2002), pp. 539–554.

[4] E. E. Oettl, A new general estimation curve for predicting the impulse impedance of concentrated earth elec-trodes, IEEE Trans. on Power Delivery, 3, No. 4, (1988), pp. 2020–2029.

[5] J. Cidras, A. F. Otero, C. Garrido , Nodal frequency analysis of grounding systems considering the soil ionizationeffect, IEEE Trans. on Power Delivery, 5, No. 1, (2000), pp. 103–107.

[6] Y. Gao, J. He, S. Chen, J. Zou, R. Zeng, X. Liang,in: Proc. of International Conference on Power SystemTechnology, Lightning electromagnetic environments of substation considering soil ionization around groundingsystems, Proc. of International Conference on Power System Technology, Vol. 4 2002, pp. 2096- -2100.

[7] M. Mousa, The soil ionization gradient associated with discharge of high currents into concentrated electrodes,IEEE Trans. on Power Delivery, 9, No. 3, (1994), pp. 1669–1677.

[8] L. D. Grcev, F. E. Menter, Transient electromagnetic fields near large earthing systems, IEEE Trans. on Magnet-ics, 32, No. 3, (1996), pp. 1525–1528.

[9] M. Geri, Behaviour of grounding systems excited by high impulse currents: the model and its validation,IEEETrans. on Power Delivery, 14, No. 3, (1999), pp. 1008–1017.

[10] M. Geri, G. M. Veca, E. Garbagnati, G. Sartorio, Non-linear behaviour of ground electrodes under lightningsurge currents: computer modelling and comparison with experimental results,IEEE Trans. on Magnetics, 28,No. 2, (1992), pp. 1442–1445.

[11] M. E. Almeida, M. T. Correia De Barros, Accurate modelling of rod driven tower footing, IEEE Trans. on PowerDelivery, 11, No. 3, (1996), pp. 1606–1609.

[12] D. M. Sullivan, Electromagnetic simulation using the FDTD method, IEEE Press Series on RF and MicrowaveTechnology, 2000.

[13] T. Noda, S Yokoyama, Thin wire representation in finite difference time domain surge simulation, IEEE Trans.on Power Delivery, 17, No. 3, (2002), pp. 840–847.


Numerical Methods for Fitting Curves and Surfaces to Databased on Calculating Orthogonal Distances

A. Atieg 1 and G. A. Watson ∗1

1 Department of Mathematics, University of Dundee, Dundee DD1 4HN, Scotland

Received 28 March 2003, accepted 21 March 2003

Given a family of curves or surfaces in Rs, an important problem is that of finding a member of the familywhich gives a “best” fit to m given data points. In many application areas, criteria are used which involvefinding orthogonal distances from the data points to the curve. Some problems of this kind are considered.

1 Introduction

Let measured points xi ∈ Rs, i = 1, . . . ,m be given, and let a curve or surface in Rs be defined bya ∈ Rn. For every xi, associate a point zi(a) on the curve or surface. Such a point may be definedparametrically, when we will have

zi(a) = x(a, ti),

for some set of parameters ti, or implicitly, when zi satisfies

f(a, zi(a)) = 0,

for some scalar function f . Define

zi(a) = arg minz

‖xi − z‖, i = 1, . . . ,m,

where unadorned norms indicate least squares norms, so that the points zi(a) are at orthogonal distancesfrom the data points xi, i = 1, . . . ,m. Then the basic fitting problems considered here are to find

mina∈Rn

∥∥∥∥∥∥∥∥‖x1 − z1(a)‖‖x2 − z2(a)‖

..‖xm − zm(a)‖

∥∥∥∥∥∥∥∥p

(1)

or

mina∈Rn

∥∥∥∥∥∥∥∥x1 − z1(a)x2 − z2(a)

..xm − zm(a)

∥∥∥∥∥∥∥∥p

. (2)

Attention has mainly focussed on the case p = 2. These problems are of course then identical, andmethods of Gauss-Newton type have proved popular. Although the problems are identical, the direct

∗ Corresponding author: e-mail: [email protected], Phone: +44 (0)1382 344 472, Fax: +44 (0)1382 345 516



treatment of (1) requires the Gauss-Newton method (or variants) in Rm (for example [5, 6, 7, 14, 15, 17],while the direct treatment of (2) requires methods in Rms (for example [2, 3, 11]). For a unified treatment,see [1]. Methods which impose orthogonality only in the limit by working with extra variables are givenin [4, 12, 13]. In Section 2, we consider a special problem of this type.

However, the assumptions underlying the choice p = 2 may not be satisfied in practice, and in that caseanother criterion from this class may be preferable. In Section 3, we return to the general case, but forother values of p.

2 Least Squares Fitting with Circular Arcs

A problem in system control, or in using a computer controlled cutting machine that executes linear andcircular arc cutting, is to fit circular arc segments to data in R2 (see for example [9, 10]). Interest in the useof the present criterion goes back at least to [8].

It is required to determine m circular arcs Ci with centres (ai, bi) and radii ri, i = 1, . . . ,m to fit datadefined on an interval [x0, xm] of the real line in the following way. The interval [x0, xm] is assumed to besubdivided into m subintervals [xi−i, xi], i = 1, . . . ,m with data values yi given at intermediate points xi,i = 1, . . . ,m− 1. In each interval (xi−1, xi) we are given points

xij ∈ (xi−1, xi), j ∈ Ji,

and corresponding values yij , j ∈ Ji, and we have to determine Ci, i = 1, . . . ,m such that

Ci approximates the data (xij , yij), j ∈ Ji, i = 1, . . . ,m,

and in addition the points (xi, yi) are interpolated by the arcs Ci and Ci+1, i = 1, . . . ,m− 1, and there isC1 smoothness at the join. Suppose the approximating criterion is that the sum of squares of orthogonaldistances from the data points to the corresponding arcs must be minimized. Then the problem to be solvedcan be expressed as

minimizem∑i=1

∑j∈Ji

(ri −√

(xij − ai)2 + (yij − bi)2)2

subject to

(xi − ai)2 + (yi − bi)2 = r2i , (3)

(xi − ai+1)2 + (yi − bi+1)2 = r2i+1, (4)

(xi − ai)(yi − bi+1) = (yi − bi)(xi − ai+1), (5)

for i = 1, . . . ,m− 1.Note that there are 3m unknowns ai, bi, ri, i = 1, . . . ,m and 3(m − 1) constraints, so that we have a

total of 3 degrees of freedom.We develop an efficient Gauss-Newton method for the orthogonal distance problem.

3 lp Orthogonal Distance Regression

Consider the parametric problem, and the formulation (1), which may be written as the minimization of‖r‖p where

r =

‖u1(a, t1(a))‖‖u2(a, t2(a))‖

..‖um(a, tm(a))‖

,

18 A. Atieg and G.A. Watson: Fitting Curves and Surfaces to Data

where

ui(a, ti(a)) = xi − x(a, ti(a)), i = 1, . . . ,m. (6)

and ti(a) minimises ‖ui‖. (It may be appropriate to simplify the model by rotating the data, but we do notdo that here.) The Gauss-Newton step d1 at a is given by the minimizer of

‖r + J1d‖p, (7)

where the ith component of r is ‖ui‖. Various modifications of this (Levenberg-Marquardt, trust region)may be preferable, but for simplicity we concentrate on (7). We will use the notation ∇1 (and ∇2) to referto the operator which gives the matrix of partial derivatives with respect to the first (and second) group ofvariables of a function which has two vector arguments.

Provided ui = 0, the ith row of J1 is

∇a‖ui(a, ti(a))‖ =ui(a, ti(a))T

‖ui(a, ti(a))‖∇aui(a, ti(a))

=ui(a, ti(a))T

‖ui(a, ti(a))‖∇1ui(a, ti(a)), (8)

because ti minimizes ‖ui(a, ti(a))‖ so that ∇2‖ui(a, ti(a))‖ = 0. Note that this simplification does notarise in using the Gauss-Newton method in Rms to treat (2).

If any ui is zero, J1 is not defined. Away from a limit point, we may regard that as a degenerate situation.Obviously more serious is such an occurrence at a limit point of the iteration. That may again be regardedas a degeneracy except when p = 1, as the interpolation characteristics of the l1 solution mean that it isthen an expected event. However, despite this fact, it is shown in [16] that the convergence property ofthe iteration is normally just as in the usual (smooth) case. Therefore the use of (7) does not present anyparticular difficulties here. The treatment of (2) rather than (1) can have advantages.

When p = 1, 2,∞, then (7) is a finite problem. However, when 1 < p < ∞, then it is not a finiteproblem so that the Gauss-Newton method then becomes a doubly infinite process and this is clearlyunsatisfactory. It is then necessary to replace (7) by an appropriate finite problem. We show how his canbe done in such a way that (i) only a linear least squares problem has to be solved, (ii) asymptotically theGauss-Newton direction is generated. A similar argument applies to (2), although there seems to be noadvantage here.

References

[1] A. Atieg and G. A. Watson, A class of methods for fitting a curve or surface to data by minimizing the sum ofsquares of orthogonal distances, J of Comp and Appl Math (to appear)

[2] S. J. Ahn, W. Rauh and H.-J. Warnecke, Least-squares orthogonal distances fitting of circle, sphere, ellipse,hyperbola, and parabola, Pattern Recognition 34, pp. 2283–2303 (2001).

[3] S. J. Ahn, E. Westkamper and W. Rauh, Orthogonal distance fitting of parametric curves and surfaces, in Algo-rithms for Approximation IV, J. Levesley, I. Anderson and J. C. Mason (eds), University of Huddersfield, pp.122–129 (2002).

[4] P. T. Boggs, R. H. Byrd and R. B. Schnabel, A stable and efficient algorithm for nonlinear orthogonal distanceregression, SIAM J. Sci. Stat. Comp. 8, pp. 1052–1078 (1987).

[5] A. B. Forbes, Least squares best fit geometric elements, in Algorithms for Approximation II, eds. J. C. Mason andM. G. Cox, Chapman and Hall, London, pp. 311–319 (1990).

[6] H.-P. Helfrich and D. Zwick, A trust region method for implicit orthogonal distance regression, Numer. Alg. 5,pp. 535–545 (1993).

[7] H.-P. Helfrich and D. Zwick, A trust region algorithm for parametric curve and surface fitting, J. Comp. Appl.Math. 73, pp. 119–134 (1996).


[8] Huu-Hhon Huynh, Least square curve-fitting with circular arcs, Project RMIT (1985).[9] L. Piegl, Curve fitting algorithm for rough cutting, Computer aided Design 18(2), 79–82 (1986).

[10] L. Piegl, Fitting circular arcs to measured data, International J of Shape Modeling, to appear.[11] H. Spath, Orthogonal least squares fitting by conic sections, in Recent Advances in Total Least Squares and

Errors-in-Variables Techniques, ed S. Van Huffel (SIAM, Philadelphia, 1997).[12] D. M. Sourlier, Three dimensional feature independent bestfit in coordinate metrology, PhD Thesis ETH Zurich,

1995.[13] R. Strebel, D. Sourlier and W. Gander, A comparison of orthogonal least squares fitting in coordinate metrol-

ogy, in Recent Advances in Total Least Squares and Errors-in-Variables Techniques, ed S. Van Huffel, SIAM,Philadelphia, 249–258, 1997.

[14] D. A. Turner, The Approximation of Cartesian Co-ordinate Data by Parametric Orthogonal Distance Regression,PhD Thesis, University of Huddersfield (1999).

[15] D. A. Turner, I. J. Anderson, J. C. Mason, M. G. Cox and A. B. Forbes, An efficient separation-of-variablesapproach to parametric orthogonal distance regression, in Advanced Mathematical and Computational Tools inMetrology IV, eds P Ciarlini, A B Forbes, F Pavese and D Richter, Series on Advances in Mathematics forApplied Sciences, Volume 53, World Scientific, Singapore, pp. 246–255 (2000).

[16] G. A. Watson, On the Gauss-Newton method for l1 orthogonal distance regression, IMA J. of Num. Anal. 22, pp.345–357 (2002).

[17] D. S. Zwick, Applications of orthogonal distance regression in metrology, in Recent Advances in Total LeastSquares and Errors-in-Variables Techniques, ed S. Van Huffel (SIAM, Philadelphia), pp. 265–272 (1997).


Spline Based Electronic Modelling

L. Senthil Balaji, Girimurugan ∗1

1 Sona College of Technology, Sona Nagar, Tpt Road, Salem-636005, Tamilnadu, India


To The Lord, My Family, Institution, Friends and Loved ones

This paper utilizes a vital computational technique called ’splines’ in a practical sense, replacing electronicdevices such as diodes, transistors etc., and circuits such as rectifiers ,clippers etc., with mathematical splinemodels that prove to be efficient. With the advent of such computational techniques employed as executableprograms, there may not be a requirement for the aforementioned devices and circuits. The idea could resultin the following meritsi. electric circuit effects will not be palpable [4]ii. the size of electronic equipment will be reduced tremendously when the computing algorithms areemployed using Very Large Scale Integration(VLSI) and Ultra Large Scale Integration(ULSI) techniques.The models are tested in MATLAB simulation environment and some of the results are delineated in thepaper.

1 Interpolating Polynomials

An interpolating polynomial would result in a function s which forms the approximate of the function f[1], as shown in Figure 1. Usually, a linear interpolating polynomial suffers from an oscillatory defect (see[1], [3]) but it is not evident or existent in the case of splines due to their smoothness property [1].

1.1 Spline Polynomials

A spline function consists of polynomial pieces at subintervals, joined together with certain continuityconditions (see [1], [3]). The rudimentary principle behind spline interpolation is linear interpolation, anoften used technique. In this technique the interpolating polynomial p is defined by

p(x) = y0 + [(y1 − y0)/(x1 − x0)][(x− x0)] (1)

Where x represents the input values, y represents the output values and p(x0) = y0 and p(x1) = y1.As shown in Figure 1, the spline function S(x) is a combination of several polynomial pieces ,whereas ina linear interpolation method the entire data domain uses a single interpolated function, obtained approxi-mately using various computational techniques such as Least squares fit or Gregory Newton Forward andbackward difference techniques (for computational interpolation)(see [1], [2], [3]). The polynomial piecesthat form a spline are given in expression (2), with S1(x), S2(x), ....Sn(x), constituting n−1 spline piecesand t1, t2, t3, ....tn, resulting in n knots [1]-[2].

S(x) = S1(x), xε[t1, t2] ;S2(x), xε[t2, t3] ; ... ;Sn(x), xε[tn−1, tn] (2)

∗ Senthil Balaji: e-mail: [email protected], Phone: +91 427 244 3152. In all correspondence please use the followingaddress: Mr G. Senthil Balaji, S/o, S.Girimurugan, 2/129-B, Kumaran Nagar, Alagapuram Kattur Road, Fairlands Post, Salem-636016, Tamilnadu, India



It will be apparent that the spline depicted in Figure 1 contains 11 knots or data points (since the valuesof the function changes at these points [1]. Finally, we could infer that for n knots, the only possibilityis to have n − 1 spline pieces. The spline mentioned above is of degree 1 and possesses the followingproperties:i. the domain of S is an interval [a, b].ii. S is continuous on [a, b].iii. To the left of b and to the right of a, S is defined to have the same extreme values of output as that of aand b.

a)

1.(ti,yi) 2.(t,y) 3.(ti+1 , yi +1) 4. ti 5. ti+1

b)

Fig. 1 (a): Shows a function f(x) and its approximated spline with 11 knots and 10 spline pieces. (b): Shows thegeometrical meaning behind the linear interpolating polynomial which is used of in the Spline function of degree 1.

2 Degree of Splines

A spline of degree k is one with its polynomial pieces of degree k and all the k − 1 derivatives of S arecontinuous [1]. The equation of the first degree spline piece is given by

Si(x) = yi +mi(x− ti) (3)

where

mi = [(yi+1 − yi)/(ti+1 − ti)] (4)

It is obvious that the equations (3) and (4) are similar to the equation (1), this makes it clear that thespline pieces in a first degree spline are indeed linear interpolating polynomials. This spline is made use ofin the simulation discussed ahead. Depending upon the degree, the spline types include quadratic splines,cubic splines etc., Some special splines include the natural splines and B-splines [1]-[2]. With respect tothe intended application, taut or tension splines and natural splines are used in practical computation [2].

3 Simulation

For the evaluation of the spline models’ efficiency, MATLAB software is used. A PN-junction diode (underReverse Bias) [4] model is created and its output is found to resemble the output of the device with less or

22 Sh. Balaji: Spline Based Electronic Modelling

nil perturbation. The practical input and output values are considered for the devices and circuits (placedin the right column amongst the simulation figures) to determine the extent up to which the spline modelproduces an output akin to the original device or circuit. A half wave rectifier and a full wave rectifiercircuit [4] that convert the AC (sinusoidal signal) to DC components allowing the input only during thepositive cycle (half wave rectifier) or both the cycles (full wave rectifier), are also modelled using splineswhich are capable of producing best results.

The input-output data points for the spline interpolation are randomly entered with relevance to thecharacteristics of the device or the circuit but for practical applications of the spline models, the data pointscan be obtained from the laboratory values of the devices or circuits.

For example, the data points of the spline model used for the half wave rectifier circuit areV = [-50, -25, -10, -5, -4, -3, -2, -1, 0, .1, .5, .7, 1, 1.5, 2, 5, 10, 20, 25]; the voltage inputs of a diode in therectifier, andi=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.1 .4, .8, 1, 1.5, 2.2, 4, 5, 10]; represent the current outputs of the devices. Thefinal output of the spline model satisfying all the characteristics of the practical circuit is shown in Figure3(a).

a)

b)

Fig. 2 (a): Shows the Spline Models’ Reverse Biased Diode Output. (b): Shows the Practical Output Of The ReverseBiased Diode.

a)

b)

Fig. 3 (a): Shows the Spline Models’ Half Wave Rectifier Output. (b): Shows the Practical Output Of The Half WaveRectifier Circuit.

4 Conclusions

The paper has presented simulations that would divulge the efficient performance of the spline basedelectronic models and further models such as amplifiers, resistors, filters etc., are being investigated. Thus,the paper has disclosed a notion, which when employed using existing technologies could make thesespline models the ”lynch pins” of the electronic industry. Other simulations are being worked out and maybe presented at a later time.


Fig. 4 Spline Full Wave Rectifier Model Output

Acknowledgements I sincerely thank the Department of Mathematics and Electronics of my College for their en-couragement and support. My special thanks are to the Heads of the respective departments, professors and staff. Myspecial thanks also go to Mrs.Renuka and the Management of Sona College Of Technology.

References

[1] Numerical Mathematics and Computing, by Ward Cheney and David Kincaid, 1994, Brooks-Cole Publishing.[2] Numerical Analysis: Mathematics of Scientific Computing, Third Edition, Thomson Learning Company.[3] Applied Numerical Methods for Engineers Using MATLAB and C, Robert.J.Schilling and Sandra.L.Harries,

Thomson Learning, Brooks-Cole Publishing.[4] Electronic Devices and Circuit Theory, Robert.L.Boylestead, Louis Nashelsky, Sixth Eastern Economy Edition,

1996, Prentice-Hall Private Limited.


Exponentially-fitted Algorithms:Fixed or Frequency Dependent Knot Points?

G. Vanden Berghe∗, M. Van Daele, and H. Vande Vyver

Ghent University, Vakgroep Toegepaste Wiskunde en Informatica,Krijgslaan 281-S9, B-9000 Gent, Belgium


Exponentially-fitted algorithms are constructed for the derivation of Gauss formulae and implicit Runge-Kutta methods of collocation type making them tuned for oscillatory (or exponential) functions. The weightsand the abscissas of these formulae can depend naturally on the frequency ω by the very construction. Fortwo-points Gauss formulae and two-step Runge-Kutta methods a detailed study of the obtained results ismade. In particular the difference in the numerical application of these algorithms with fixed points and/orfrequency dependent nodes is analysed.

1 Introduction

Many phenomena exhibit a pronounced oscillatory or exponential character. The theoretical investigationof such phenomena necessarily implies operations on oscillatory or exponential functions, for instancedifferentiation, quadrature, solving differential equations, etc. In a previous paper [1] Ixaru had focussedon the numerical formulae associated with these operations. He showed that a unifying treatment forderiving formulae for differentiation, quadrature and ODE’s is available in the form of exponential fitting.In the context of ODE’s he only introduced the method for the determination of multistep-like formulae.For the quadrature rules he considered an exponential fitted extension of the Simpson rule. In a separatepaper [2] a first discussion on Gauss quadrature rules for oscillatory integrands is given. In what follows weextend the technique to Runge-Kutta (RK) methods and study in more detail some of the Gauss formulae.The dependence of the abscissas on the frequency will be analysed in detail.

2 Gauss quadrature rules

We consider the integral

I(f) =∫ 1

−1

f(x)dx , (1)

where f(x) shows an oscillatory behaviour and its computation by means of aN -point quadrature formula,i.e.

I(f) ≈ h

N∑k=1

wkf(xk) (2)

∗ Corresponding author: e-mail: [email protected], Phone: +32 (9) 264 48 05 Fax: +32 (9) 264 49 95



In the exponential fitted formalism one introduces the functional

L[f(x), h,a] =∫ x+h

x−hf(z)dz − h

N∑k=1

wkf(x+ xkh) , (3)

where a is the vector of maximum 2N unknowns, a = [w1, w2, . . . , wN , x1, x2, . . . , xN ], which can alldepend on the frequency ω considered. Under certain circomstances the xi, i = 1, . . . , N can be fixed inadvance, reducing the number of unknowns to N . If we like to make the direct connection with the statedproblem (1) with respect to the interval [−1, 1] finally x = 0 and h = 1 is taken. Related to this functionalare the so-called moments Lm(h, a), i.e. L[xm, h,a] at x = 0. We consider the hybrid set of functions(see also [1]):

1, x, x2, . . . , xK , (4)

exp(±ωx), x exp(±ωx), . . . , xP exp(±ωx) , (5)

where K + 2P = M − 2 and M + 1 represents the number of moments Lm(h, a) which can madesimultaneously zero for a particular solution a. By this definition the reference set is characterized by twointeger parameters, K and P . The set in which there is no classical component is identified by K = −1,while the set in which there is no exponential fitting component is identified by P = −1. As explainedin [1] one can either consider the set of power functions (4), the exponential fitting set (5) or a hybrid setconsisting partially of a polynomial and exponential fitted set and compute L for each of these functions.Finding the best values for the parameters in a is usually done by imposing that as many terms as possiblein the expressions for L vanish. For the set of power functions this give rise to the following equations, i.e.Lm(h, a) = 0,m = 0, 1, 2, . . . ,M (for the details of derivation we refer to [2]):

22n− 1

−N∑k=1

wkx2(n−1)k = 0 (6)

N∑k=1

wkx2n−1k = 0 , (7)

for n = 1, 2, . . .. If the whole set a is considered as unknowns the maximal value of n = N . For theexponential fitting set following system of nonlinear equations occurs:

2η0(Z) −N∑k=1

wkξ(x2kZ) = 0 (8)

N∑k=1

wkxkη0(x2kZ) = 0 (9)

2ηn−1(Z) −N∑k=1

wkx2(n−1)k ηn−2(x2

kZ) = 0 (10)

N∑k=1

wkx2n−1k ηn−1(x2

kZ) = 0 (11)

where Z = ω2h2 and where for the two last equations n = 2, 3, . . .. The functions ξ and ηn, n = 0, 1, . . .and their properties are given in the appendix of [1, 2]. Remark that since we consider oscillatory integrandsω = iµ. As shown in [1] the leading term of the error term is given by

error = (−1)P+1h2(P+1)LK+1(h, a(Z))(K + 1)!ZP+1

D(K+1)(µ2 +D2)P+1f(x) . (12)

26 G. Vanden Berghe, M. Van Daele, and H. Vande Vyver: Exponentially-fitted Algorithms

Let us consider as an example the two-points Gauss rule. There are four parameters involved: x1, x2, w1

and w2. From the classical theory [4] one knows that the weights are symmetric (i.e. w1 = w2) and thatthe abscissas are antisymmetric (i.e. x1 = −x2). It can be easily checked that M = 3 and that followingK,P -pairs emerge:

(a) P = −1,K = 3 (the classical case with set 1, x, x2, x3).

(b) P = 0,K = 1 (with the hybrid set 1, x, exp(±ωx) or 1, x, sin(µ x) and cos(µ x)).

(c) P = 1,K = −1 (full exponential set exp(±ωx), x exp(±ωx) or sin(ω x), cos(ω x), x sin(ω x) andx cos(ω x)).

By solving the appropriate system of equations following results are obtained for the different classes:

(a) the well-known classical result: x2 = −x1 =√

33

and w1 = w2 = 1.

(b) x2 = −x1 =arccos

(sin(µ)µ

)µ

and w1 = w2 = 1.

(c) w1 = w2 =sin(µ)

µ cos(x2µ), while x2(x1 = −x2) is the solution of the transcendental equation

−µ cos(µ) cos(µx2) + sin(µ) cos(µx2) + sin(µ)µx2 sin(µx2) = 0 , (13)

and is obviously frequency dependent. On the other hand it is instructive to investigate what is hap-pening when the classical x2 is used instead of the solution of (13).

Analogous results are obtained for three-, four-, etc. points formulae. For a numerical illustration f(x) =cos ((λ+ 1)x), whose integral over [−1, 1] is given by 2 sin(λ + 1)/(λ + 1). The different numericalmethods are applied with µ = λ and the evolution of the error with increasing λ is studied.

3 Implicit Runge-Kutta methods

For the description of exponential fitted Runge-Kutta (EFRK) methods we use the classical Butcher nota-tion [3, 4]

yn+1 = yn + hs∑i=1

bif(xn + cih, Yi) , (14)

Yi = yn + h

s∑j=1

aijf(xn + cjh, Yj) , (15)

with i = 1, . . . , s, or in tableau form

c1 a11 a12 . . . a1s

c2 a21 a22 . . . a2s

. . .cs as1 as2 . . . ass

b1 b2 . . . bs

(16)

Following Albrecht’s approach [4, 5, 6] we observe that each of the s internal stages (15) and the finalstage (14) of a RK-method are linear, in the sense that a linear multistep method is linear. Nonlinearity


only arises when one substitutes one stage into another. We can regard each of the s stages and the finalstage as being a generalized linear multistep method on a nonequidistant grid and associate with it a linearfunctional in exactly the same way as has been done by Ixaru [1] for multistep methods, i.e.

Li[y(x);h;a, c] = y(x+ cih) − y(x) − h

s∑j=1

aij y′(x+ cjh) i = 1, 2, . . . , s (17)

and

L[y(x);h;b, c] = y(x+ h) − y(x) − h

s∑i=1

biy′(x+ cih) . (18)

Also here one can either consider the set of power functions (4), the exponential fitting set (5) or a hybrid setconsisting partially of a polynomial and exponential fitted set and compute L for each of these functions.Finding the best values for the parameters a,b and c is usually done by imposing that as many terms aspossible in the expressions for L and Li vanish

In the framework of this theory two different classes of EFRK collocation methods are considered:methods with fixed nodes and methods with frequency dependent abscissas. For both cases entensionsof the classical two-stage Gauss, RadauIIA and LobattoIIIA methods have been constructed. Numericalexamples reveal important differences between both approaches.

References

[1] L. Gr. Ixaru, Operations on oscillatory functions, Comput. Phys. Commun. 105 1-19 (1997).[2] L. Gr. Ixaru, B. Paternoster, A Gauss quadrature rule for oscillatory integrands, Comput. Phys. Commun. 133

177-188 (2001).[3] J. C. Butcher, The Numerical Analysis of Ordinary Differential Equations, (John Wiley & Sons, Chichester,

New-York, Brisbane, Toronto, Singapore, 1987).[4] J. D. Lambert, Numerical Methods for Ordinary Differential Equations, (Wiley, New York, 1991).[5] P. Albrecht, The extension of the theory of A-methods to RK methods, in: K. Strehmel, ed., Numerical Treatment

of Differential Equations, Proc. 4th Seminar NUMDIFF-4, Tuebner-Texte zur Mathematik (Tuebner, Leipzig,1987) 8–18.

[6] P. Albrecht, A new theoretical approach to RK methods, SIAM J. Numer. Anal. 24 391–406 (1987).


Characteristic Directions Technique for the ScalarOne-dimensional Non-linear Advection Equationwith Non-convex Flow Function

D. N. Bokov∗

Russian Federal Nuclear Center - VNIITF, 456770, Snezhinsk, Chelyabinsk reg., P.O. Box 245, Russia


A concept of the characteristic technique used to obtain a generalized solution of the scalar one-dimensionalnonlinear advection equation with the non-convex flow function is presented. Two meshes: characteristicand Eulerian are used to obtain numerical solution. A characteristic mesh is adaptive both to the propertiesof the initial distribution function and to the properties of the boundary condition function. This allows:development of the algorithm for obtaining a numerical solution on characteristic mesh using the prop-erties of the solution of nonlinear advection equation in smooth region; to reproduce spatial location andsolution value at the discontinuity points and extreme points at the accuracy determined by interpolationand approximation of initial values and boundary condition functions. For the non-convex flow function,algorithms are proposed for the definition of the sequence of Riemann problems (strong discontinuity) andfor their solving. Refined expressions are derived for the velocity of a strong non-stationary discontinuity.Construction of the solution with satisfying of integral preservation law for the non-convex flow function ispresented.

1 Solution of Nonlinear Advection Equation

It is signed the actual problem of finding new approaches for numerical solution of advection equationin paper [1]. In present paper the algorithm of numerical solution of boundary-value Cache problem isdescribed for the nonlinear advection equation:

∂u

∂t+∂F (u)∂x

= 0, t ∈ (0, T ], x ∈ [0,∞), (1)

u∣∣t=0

= u0(x),

u(t, 0) = f(t),∂F (u)∂u

∣∣∣t=0

≥ 0.

Here function F (u) is twice continuous-differentiable over u. Piecewise-smooth function u(t, x) is ageneralized solution to (1) when and only when u(t, x) satisfies this equation in classical sense in thevicinity of each smooth point and at each discontinuity line the Hugoniot condition is satisfied [2]:

dx

dt=

(F (u+) − F (u−))(u+ − u−)

. (2)

Here x = x(t) is the equation of discontinuity line, u− and u+ are one-sided limits of function u(t, x) upto discontinuity line from left- and right-hand of the x-axis, respectively. For the smooth regions of the

∗ Corresponding author: e-mail: [email protected], Phone: +7 35172 547 30, Fax: +7 35172 551 18



solution, (1) can be written in the form:

∂u

∂t+ c(u)

∂u

∂x= 0, c(u) =

∂F (u)∂u

. (3)

A solution of (3) can be considered as a system of two ordinal differential equations:dx

dt= c(u),

du

dt= 0,

(4)

with the initial conditions at t = t0 : x = x0, x0 ∈ [0,∞), u0 = u(x0). This system has an exactsolution u(x, t) ≡ u(x0, t0) along the line x = x0 + c(t − t0), where c = const along dx = cdt (thecharacteristic line of (3)) [3].

2 Numerical Approximation

The function of initial distribution u0(x) in the smooth region is approximated with a given accuracy εby a piecewise-linear function, which nodes contain extreme points, the boundary point and points ofdiscontinuities. Discontinuity is described by two successive points of the same spatial coordinate withthe values of one-sided limits. Such an approximation can be obtained by different methods. All nodes,arranged in increasing order of argument x, form a characteristic mesh. The function of boundary conditionf(t) is approximated by a piecewise-linear function of argument t with the same conditions satisfied.Nodes of its approximation form a set of values, which are the points of time mesh.

3 Calculation of Strong Discontinuity

The classification of solutions at strong discontinuity [u−, u+] for function F (u) of constant signs is de-scribed in [4]. Strong discontinuity is stable if c− ≥ c+ and unstable if c− < c+ with the formation of acentered rarefaction wave. In this case, between the fronts of the discontinuity xni and xni+1, S characteris-tic points are added with the coordinates: xnl = xnl+1 = · · · = xnl+S = xnl+S+1 for the centered rarefactionwave. Values at these S intermediate points must changed monotonously from ui to ui+1 according tothe combination of initial approximations of F (u) and c(u). In case of Fuu(u) with alternating signswe get the Riemann problem of arbitrary discontinuity decay. Let u1, · · · , um are zeroes of the equationFuu(u) = 0 in interval (u−, u+). We arrange only those of them, which are the extreme points for c(u):

u− > u1 > · · · > um > u+ or u− < u1 < · · · < um < u+.

These zeroes divide interval (u−, u+) into the regions of constant signs of Fuu(u), which will alternate.Thus arbitrary discontinuity disintegrates to a series of alternating stable strong discontinuities and centeredrarefaction waves. Following [4], each of the stable strong discontinuities

[ui, ui+1

]∈[u−, u+

]can be

represented as the segment that join the points(ui, F (ui)

)and(ui+1, F (ui+1

). For obtaining the unique

solution each of the segment is transferred to the tangent of the F (u) graph. If two segments of the tangentscrosses, then the regions of these two stable strong discontinuities form the uniting region of a new stablestrong discontinuity. New tangent is formed. All of the strong discontinuities are consequently considered.As a result we get the succession consisting of centered rarefaction waves and strong discontinuities, whichare joint with the fulfillment of tangent condition. At the characteristic corresponding to tangent conditionthe equality of the velocity of the characteristic itself and the velocity of the strong discontinuity is fulfilled.

30 D. N. Bokov: Characteristic Directions Technique for the Scalar Nonlinear Advection Equation

4 Constructing a Numerical Solution

The transition from ”n” to ”n + 1” time proceeds as follows: tn+1 = tn + τ . New spatial coordinatesof points of the characteristic mesh are: xn+1

i = xni + τci. If tn = tk (current time coincides withpoint of boundary condition approximation), then an additional point of the characteristic mesh is addedfrom the boundary: xn+1

1 = xn0 + τc0. Values of u remain: un+1i = uni . Solution at the boundary is

un+10 = f(tn+1). Values of un+1 at the Eulerian points xj are obtained by linear interpolation (AINT

operator) over the points of the characteristic mesh, added with the left boundary point:

un+1j = AINT

[xj

∣∣∣(un+1i , xn+1

i ), (un+1i+1 , x

n+1i+1 )

], xn+1

i ≤ xj ≤ xn+1i+1 , j = 0, · · · , J.

If at time tn+1 two characteristics xl and xl+1 cross, then for these points the following condition issatisfied:

(xnl+1 − xnl )(xn+1l+1 − xn+1

l ) ≤ 0, cl ≥ cl+1.

The time of crossing of the characteristics is

τ∗ =(xnl+1 − xnl )(cl − cl+1)

.

Values ul and ul+1 form stable cl ≥ cl+1 strong discontinuity. The spatial position of discontinuity at timetn+1 is calculated using condition (2):

D =(F (ul+1) − F (ul))

(ul+1 − ul),

x∗l = x∗l+1 = D(τ − τ∗) + x∗.(5)

The value of the solution at the discontinuity at tn+1 is obtained as follows:

u∗l = AINT[x∗l∣∣∣(unl−1, x

n+1l−1 ), (unl , x

n+1l )

], xn+1

l−1 ≤ x∗l ≤ xn+1l . (6)

u∗l+1 = AINT[x∗l+1

∣∣∣(unl+1, xn+1l+1 ), (unl+2, x

n+1l+2 )

], xn+1

l+1 ≤ x∗l+1 ≤ xn+1l+2 . (7)

Then points of the characteristic mesh are corrected:

xn+1l = x∗l ; un+1

l = u∗l ; xn+1l+1 = x∗l+1; un+1

l+1 = u∗l+1.

If xn+1l−1 = x∗l then, after obtaining the solution at the stable strong discontinuity, the point xn+1

l is removedfrom the characteristic mesh. If xn+1

l−1 > x∗l , that is condition (6) is violated, then the time step τ is correctedso that the characteristic xl−1 crosses the strong discontinuity at tn+1, that is

τ∗ =(x∗l − xnl−1)(cl−1 −D)

.

Condition (7) is considered similarly. If τ∗ = 0, that is at tn the stable strong discontinuity already exists,then its trajectory is defined by the equation:

xn+1l = xn+1

l+1 = Dvτ + xnl+1.

For obtaining the velocity Dv of the discontinuity it is necessary to calculate the line contour integral∮F (u(x, t))dt− u(x, t) = 0


along the closed contour, formed by line t = tn and projections of characteristics coming to the point(xn+1l+1 , t

n+1) [2]. Obtain the following expression:

Dv =2F (un+1

l+1 ) − F (un+1l ) − Fu(un+1

l+1 )(un+1l+1 − unl+1) − Fu(un+1

l (un+1l − unl )

(un+1l+1 + unl+1) − (un+1

l + unl ).

This expression coincides with the Hugoniot condition both in the case of steady discontinuity (un+1 =un) and in the limit case when τ → 0. For obtaining Dv , an iteration process using (6) (7) with initialvalue (5) is build. The calculation of stable strong discontinuity for the boundary condition is similar.After the time of discontinuity has been achieved, stable strong discontinuity is inserted into the spatialcharacteristic mesh.

In the case of linear advection equation F (u) = cu, c = const velocities of characteristics and dis-continuities are the same and equal to c. All the discontinuities are stable. Values at the nodes of thepiecewise-linear function are moved along x-axes unchanged. Obtain exact solution ui on the characteris-tic mesh. Solution uj on Eulerian mesh is obtained with interpolation accuracy.

5 Conclusions

The proposed technique helps offers approaches to the following problems:• condensation of features, i.e. when several discontinuities of different types exist at a spatial point;• preservation of extremes of the function of distribution of the solution;• preservation of the asymptotic behavior of the solution;• grid adaptation according to the properties of the solution.

The numerical solution obtained by the proposed technique has the following properties:• independent of the different scales of the spatial grid;• the time step is defined by the interaction of solution elements and does not depend on the spatial grid;• no loss of resolution, i.e. the situation when the spatial distribution of solution elements is less thanthe spatial step of the grid is impossible;• for the linear flow function, the technique gives the exact solution on the characteristic grid;• for the nonlinear flow function, accuracy of the solution is defined by accuracy in the approximationof the initial distribution of the solution;• the integral law of preservation on the extended grid is satisfied.

References

[1] V. M. Goloviznin, A. A. Samarsky, Difference approximation of convective transporting with spatial splitting oftime derivative. Mathematical modelling, 1998, Vol. 10, No. 1, pp. 86–100.

[2] O. A. Oleinik, About a Cashe problem for the nonlinear equations in class of discontinuity functions, DAS, 1954,Vol. 95, No. 3, pp. 451–454.

[3] G. B. Whitham, Linear and nonlinear waves. John Wiley & Sons, Inc., 1974.[4] B. L. Rozhdestvensky, N. N. Yanenko, Systems quasilinear equations and its application to gas dynamics.

Moscow, Nauka, 1978.


Application of Least Squares Finite Element Method toAcoustic Scattering and Comparison with otherNumerical Techniques

Carlos E. Cadenas ∗1 and Vianey Villamizar∗∗2

1 Universidad de Carabobo. FACYT. Departamento de Matematicas, Valencia, Venezuela2 Department of Mathematics, Brigham Young University, Provo, Utah, 84602, USA

Received February 28, 2003, accepted 21 March 2003

A novel implementation of least squares finite element (LSFEM) method to acoustic scattering problems isdevised. First, the boundary value problem is written as a first order system of differential equations and theradiation condition is incorporated into the least squares finite element variational formulation. It is shownthat the order of convergence to the exact solution is h2. For comparison purposes a mixed Galerkin finiteelement and an implicit finite difference method are also applied to the acoustic scattering problem. A studyof the order of convergence, and dispersion errors for the different methods is performed .

1 Introduction

We have chosen a simple one-dimensional problem to illustrate our implementation of the LSFEM forscattering problems and to compare with other well-known numerical methods. An incident plane pressurewave, pinc(x, t) = eikxe−iωt = pinc(x)e−iωt, where k is the wavenumber and ω represents the frequency,is scattered from an infinite rigid wall located at x = 0. After decomposing the pressure as p(x) =pinc(x) + psc(x), the boundary value problem satisfied by the scattered pressure psc(x) reduces to

p′′sc + k2psc = 0, 0 < x < 1, (1)

p′sc(0) = ik, (2)

dpscdx

(1) − ikpsc(1) = 0. (3)

Equation (3) is the well-known Sommerfeld condition. It is usually applied at infinity for unboundeddomains. However for 1-D problems, it is exactly satisfied anywhere in the x-axis. The solution of theboundary value problem (1)-(3) is the stationary wave psc(x) = eikx.

The LSFEM and two more numerical methods to be described later are not implemented directly overthe above boundary value problem. Instead, the scalar Helmholtz equation (1) is reduced to a first ordersystem of equations and the numerical methods are implemented over the boundary value problem whichhas this first order linear system as a governing equations. In fact, by introducing the new dependentvariable z = p′, the above equation (1) is transformed in the first order system of differential equations

Lu = (A1d

dx+ A0)u = 0, (4)

∗ Corresponding author: e-mail: [email protected], Phone: +58 416 647 8397, Fax: +58 241 867 7634∗∗ Corresponding author: e-mail: [email protected], Phone: +801 422 1754, Fax: +801 422 0504



where

A0 =[1 00 k2

], A1 =

[0 −11 0

], u =

[zp

], and 0 =

[00

]. (5)

Also, the boundary conditions (2)-(3) in terms of the new variables are transformed into

z(0) = ik, and z(1) = ikp(1). (6)

2 Formulation of the LSFEM

Consider the boundary value problem defined by Au = f , in a bounded domain Ω ∈ Rn (n=1,2, or 3),with a piecewise smooth boundary Γ, subject to the boundary conditions Bu = g, on Γ, where A is alinear first order operator, B is a boundary algebraic operator, f is a vector-complex valued functions inL2(Ω), and g is a vector-complex valued function in L2(Γ). Let V be an appropriate subspace of L2(Ω)such that A maps the subspace V into L2(Ω). The LSFEM consists on finding a function u ∈ V [2], suchthat u minimizes the quadratic functional

I(u) =12‖Au− f‖2 +

12‖Bu − g‖2

Γ =12(Au − f , Au− f) +

12〈Bu − g, Bu − g〉Γ.

A necessary condition for the existence of such u ∈ V is that

limt→0

dI

dt(u + tv) = 0.

It can be easily proved that

limt→0

dI

dt(u + tv) =

12[(Au, Av) + (Av, Au)− (f , Av) − (Av, f)]

+12[〈Bu, Bv〉Γ + 〈Bv, Bu〉Γ − 〈g, Bv〉Γ − 〈Bv,g〉Γ] = 0. (7)

Using complex numbers algebraic properties and the definition of the inner product in L2(Ω) for vector-complex valued functions (7) leads to

Re(Au, Av) +Re〈Bu, Bv〉Γ = Re(f , Av) +Re〈g, Bv〉Γ. (8)

For our particular boundary value problem,

A = A1d

dx+ A0, B = [1 − ik], f =

[00

]and g = 0, (9)

Where A1and A0 are defined according to equations (5).

3 Description of the Numerical Methods

For comparison purposes, the solution of the boundary value problem defined by (4) and (6) is approxi-mated by an implicit finite difference method (IFDM), a mixed Galerkin finite element method (MFEM),and the LSFEM formulated in the previous section. In all cases, a uniform partition of n subintervals withstepsize h is employed. All these methods lead to linear systems of equations. Except for a few equations(corresponding to the boundary conditions), each equation in these methods have one of the two form

Bzj−1 + 2Azj +Bzj+1 +Dpj−1 −Dpj+1 = 0, j = 1, ..., n− 1 (10)

−Dzj−1 +Dzj+1 + Fpj−1 + 2Epj + Fpj+1 = 0, j = 1, ..., n− 1 (11)

34 C.E. Cadenas and V. Villamizar: Numerical Solution of Acoustic Scattering

Table 1 Coefficients

A B D E F

IFDM h 0 1 hk2 0

MFEM h3

h6

12

hk2

3hk2

3

LSFEM h3 + 1

hh6 − 1

h1+k2

2h3k

4 + 1h

h6k

4 − 1h

where the coefficients A, B, D, E, and F for the different methods are defined in Table 1.The IFDM difference equations were generated using centered difference approximations for the deriva-

tives. While for the MFEM and the LSFEM, we employed piecewise linear basis functions in V . For IFDMand MFEM linear systems of algebraic equations of order 2n were obtained. However, for LSFEM thesystem of equations obtained was of order 4n+ 2. The reason for this increase in the number of equationsis the need to consider both the real and the imaginary part of the unknown variables z y p in equations (8).

The LSFEM stiffness matrix is given by K =[C DDt C

]where C is a banded symmetric matrix of order

2n+ 1 and D has only two non-null entries d2n,2n+1 = k and d2n+1,2n = −k.

4 Numerical Results

4.1 Orders of Convergence

We performed numerical experiments and computed the order of convergence for the different methodsanalyzed in this work. This was accomplished by computing the error in the L2 norm between the exactsolution and the numerical solution for different values of h. More specifically, graphing −log(error)against −log(h), it was shown that they are linearly related as can be seen in Fig. 1 for k varying between 1and 9. Clearly, the slope of this lines corresponds to the order of convergence. One of the lines correspondsto the pressure field and the other corresponds to the secondary variable z. In Table 2, we have summarizedthe resulting orders of convergence for the different methods. It is observed that all of them are of ordertwo for the pressure p and the secondary variable z, except MFEM which is only of order one in z.

Table 2 Orders of convergence in the variables z y p

IFDM MFEM LSFEM

k Order(p) Order(z) Order(p) Order(z) Order(p) Order(z)

1 1.9959 2.0024 200002 1.0000 2.0638 2.0347

5 1.9932 1.9882 1.9989 0.9983 1.9977 1.9984

9 1.9769 1.9886 1.9993 1.0100 1.9541 1.9225

4.2 Numerical Dispersion

It is well known [1], that discrete solutions of Helmholtz equation are dispersive although exact solutionsare not. It means that there is a numerical wavenumber k different than the continuous wavenumber k. Inthis section we summarize our results about phase velocity differences between k and k. By appropriatelycombining difference equations (11), we obtained fourth order algebraic equations for a parameter λ =


4 6 8 1012

14

16

18

20

−log(h)

−log

(erro

r)

4 6 8 108

10

12

14

16

−log(h)

−log

(erro

r)

4 6 8 106

8

10

12

14

−log(h)

−log

(erro

r)

4 6 8 104

6

8

10

12

−log(h)

−log

(erro

r)4 6 8 10

4

6

8

10

12

−log(h)

−log

(erro

r)

4 6 8 102

4

6

8

10

−log(h)

−log

(erro

r)

4 6 8 100

5

10

−log(h)

−log

(erro

r)

4 6 8 100

2

4

6

8

−log(h)

−log

(erro

r)

4 6 8 100

2

4

6

8

−log(h)

−log

(erro

r)

k = 2 k = 3

k = 4 k = 5 k = 6

k = 7 k = 8 k = 9

k = 1

Fig. 1 Order of Convergence for LSFEM

|λ|eikh with coefficients in terms of kh. Once this algebraic equations were solved, we were able to obtainthe following expansions in powers of kh of the relative error between k and k for both methods IFDMand MFEM, respectively.

Er,k =16(kh)2 +

340

(kh)4 +O((kh)6) (12)

Er,k = − 5900

(kh)4 − 52917999992

(kh)6 +O((kh)8) (13)

In Table 3 we are listing values of k with k = 9 for different values of the stepsize h. It is clearly shownthe differences between these two wavenumbers for relative large values of h. The practical importance ofthese results is that they provide a measure of the phase error for large wavenumbers k, when the abovenumerical methods are applied.

Table 3 Values of k for k = 9

h IFDM MFEM LSFEM

0.1 11.1976951 9.03689143 4.69021555

0.01 9.01219450 9.00000328 8.80553538

0.001 9.00012150 9.00000000 8.99807930

0.0001 9.00000122 9.00000000 8.99998079

References

[1] A. Deraemaeker, I. Babusba and Ph. Bouillard, Dispersion and Pollution of the FEM Solution for the HelmholtzEquation in One, Two and Three Dimensions, International Journal for Numerical Methods in Engineering, Vol.46, 471-499(1999).

[2] B.N. Jiang, The Least-Squares Finite Element Method: Theory and Applications in Computational Fluid Dy-namics and Electromagnetics, Scientific Computation, New York, ,p.p. 418, 1998.


Subdivision Strategies in Adaptive Integration Algorithms

Ronald Cools∗

Dept. of Computer Science, K.U.Leuven, Celestijnenlaan 200A, B-3001 Heverlee, Belgium


In this extended abstract we give an overview of subdivision strategies that are nowadays used as part ofglobal adaptive integration algorithms.

1 Introduction

The goal of quadrature (and cubature) software is to produce estimates for (multiple) integrals. Mostsoftware available nowadays is adaptive, i.e. the software selects at run-time the points where the integrandis evaluated based on the behavior of the integrand. It is usually globally adaptive, which means that ateach step the region with the largest absolute error estimate is selected for further processing. A high leveldescription of the classical globally adaptive algorithm is given in Algorithm 1.

Algorithm 1: A globally adaptive quadrature/cubature algorithm.

Initialize the collection of regions with the given regions;Produce approximations for the integral on each given region,and estimate their errors;Compute the global approximation Q and global error estimate E;while E > ε (the requested accuracy) dobegin

Take the subregion with largest absolute error from the collection;Process this region;Update Q and E;Update the region collection;

end

In most of the published algorithms the statement ‘Process these region(s)’ is implemented as twoindependent parts that are executed sequentially: first divide, then compute new approximations for eachregion using an a priori chosen quadrature (cubature) formula and error estimator. In [1] we describedalternative region processors:

1. a region processor that is allowed to decide whether a region should be subdivided or not;

2. a region processor that decides which subdivision to choose.

In this paper we give an overview of the different subdivision strategies that one nowadays encounters insoftware. The following section gives a preview.

∗ Corresponding author: e-mail: [email protected], Phone: +32 16 32 75 62, Fax: +32 16 32 79 96



2 Possible subdivision strategies for different regions

Most of the time, the subdivision strategy used in an adaptive integration routine divides the given regionin a fixed number of similar regions. Some recent implementations do not fix this number of subregions inadvance. In this section we only enumerate them.

In the adaptive routines that are widely available for integration over a finite interval, the region proces-sor starts with bisecting the interval into two equal parts. In the literature one can find only few articlesthat investigate irregular subdivisions or subdivisions into more than two parts.

Since the introduction of HALF [7] it is common to divide a n-cube or hyper-rectangle into 2 equalhalves based on the direction with the largest fourth divided difference. It is assumed that a division in 2is more adaptive than a division in 2n regions congruent with the given hyper-rectangle. Besides, if n islarge, a 2n-division might be too expensive.

In [4] it was suggested that the region processor not just decides on the subdivision direction but alsoon the number of subregions 2i with 1 ≤ i ≤ n. This strategy is nowadays implemented in software for 2and 3 dimensions [2, 3].

Many implementation of Algorithm 1 are available for triangles. Until recently it was common to dividea triangle in 4 congruent triangles. Similarly, for the n-simplex it was common to divide a simplex into2n simplices. In [5] a strategy was introduced to divide an n-simplex into 2. This strategy is generalizedin [6] and implemented as part of CUBPACK [2]. Once a subregion has been selected for subdivision,the globally adaptive algorithm used by CUBPACK will recommend a subdivision into at most 2, 3 or 4pieces, depending on the current progress of the integration. The subdivision procedure then divides thesubregion by cutting one, two or three edges of the selected subregion to produce a 2-division, 3-divisionor 4-division of the selected subregion, respectively.

If a circle appears naturally as the integration region in a practical problem, it is often not a good ideato transform this into a square. It might help to exploit the circular symmetry of the original problem. InCubpack++ [3] a circle is divided in 5 (1 circle and 4 polar rectangles) or 4 (all polar rectangles) subregions,all with the same area.

In Cubpack++ a plane is dissected into a circle and its exterior. The radius of the circle is chosen suchthat he circle and its exterior give approximately equal contributions to the integrand. The exterior of thecircle is then mapped to a circle.

References

[1] R. Cools and A. Haegemans, CUBPACK: Progress report. In Numerical Integration – Recent Developments,Software and Applications, T. Espelid and A. Genz, Eds. NATO ASI Series C: Math. and Phys. Sciences, pp.305-315 (1992).

[2] R. Cools and A. Haegemans, Algorithm 82x: CUBPACK: a package for automatic cubature; framework descrip-tion, ACM Trans. Math. Software, to appear (2003).

[3] R. Cools, D. Laurie and L. Pluym, Algorithm 764: Cubpack++: A C++ package for automatic two-dimensionalcubature, ACM Trans. Math. Software 23, pp. 1–15 (1997).

[4] R. Cools and B. Maerten, A hybrid subdivision strategy for adaptive integration routines, J. of Universal Com-puter Science 4, pp. 485–499 (1998).

[5] A. Genz, An adaptive numerical integration algorithm for simplices, In Computing in the 90s – Proceedings ofthe First Great Lakes Computer Science Conference, N. Sherwani, E. de Doncker, and J. Kapenga, Eds. LectureNotes in Computer Science, vol. 507, Springer-Verlag, New York, 279–292 (1991).

[6] A. Genz and R. Cools, An adaptive numerical cubature algorithm for simplices, ACM Trans. Math. Software, toappear (2003).

[7] P. Van Dooren and L. De Ridder, An adaptive algorithm for numerical integration over an n-dimensional cube, J.Comput. Appl. Math. 2, pp. 207–217 (1976).


Constraint Reasoning with Differential Equations

Jorge Cruz∗1 and Pedro Barahona∗∗1

1 Centro de Inteligencia Artificial, Departamento de Informatica, Faculdade de Ciencias e Tecnologia daUniversidade Nova de Lisboa, 2829-516 Caparica, Portugal


System dynamics is naturally expressed by means of differential equations. Despite their expressive power,they are difficult to reason about and make decisions, given their non-linearity and the important effects thatthe uncertainty on data may cause. In contrast with traditional numerical simulations that may only providea likelihood of the results obtained, we propose a constraint reasoning framework to enable safe decisionsupport despite data uncertainty and illustrate the approach in the tuning of drug design.

1 Introduction

Parametric differential equations are general and expressive mathematical means to model system dynam-ics. Notwithstanding its expressive power, reasoning with such models may be quite difficult, given theircomplexity. Analytical solutions are available only for the simplest models. Alternative numerical simu-lations require precise numerical values for the parameters involved, often impossible to gather given theuncertainty on available data.

To overcome this limitation (given non-linearity, small differences on input values may cause importantdifferences on the output produced), Monte Carlo methods rely on a large number of simulations to estimatethe likelihood of the options under study. However, they cannot provide safe conclusions, given the varioussources of errors accumulated in the simulations (both input and round-of errors).

In contrast, constraint reasoning models the uncertainty of numerical variables within intervals of realnumbers and propagates them through a network of constraints on these variables, to decrease the under-lying uncertainty (i.e. width of the intervals). To be effective, these methods rely on advanced methods toconstrain uncertainty sufficiently as to make safe decisions possible.

Interval analysis techniques (e.g. the interval Newton [8]) provide efficient and safe methods for solvingContinuous Constraint Satisfaction Problems [6] (CCSPs) where real variables are constrained by equal-ities and inequalities. These methods prune the variables domains to impose local consistency [1], guar-anteedly loosing no solutions (value combinations satisfying all constraints). The results obtained may befurther improved with search techniques for imposing stronger consistency requirements such as globalhull consistency [3, 4].

In the context of differential equations, validated [9] and constraint based [7] approaches provide safemethods for solving initial value problems which verify the existence of unique solutions and produceguaranteed bounds for the true trajectory. We developed an approach [2, 5] that uses a validated method,Interval Taylor Series (ITS), to include Ordinary Differential Equations (ODEs) in the CCSP framework.

An ODE system y′ = f(y, t) is considered a restriction on the sequence of values that y can take over t.Since it does not fully determine the sequence of values of y (but rather a family of such sequences), furtherinformation is usually provided commonly in the form of initial or boundary conditions. An ODE systemtogether with related information is denoted a Constraint Satisfaction Differential Problem (CSDP).

∗ Corresponding author: e-mail: [email protected], Phone: +351 212 948 536, Fax: +351 212 948 541∗∗ Second author: e-mail: [email protected], Phone: +351 212 948 536, Fax: +351 212 948 541



A brief introduction in section 2 shows the expressive power of the framework developed, and stressesthe active use of less common constraints on upper and lower values of the functions involved, and onthe time or the area under curve in which they exceed a certain threshold. The expressive power of theapproach is illustrated in the tuning of drug design, presented in section 3. We show how the active use ofconstraints of the types above is sufficient to make safe decisions regarding the intended goals. The paperends with a summary of the main conclusions.

2 Constraint Satisfaction Differential Problems

In a CSDP a special variable (xODE) is associated to an ODE system S for every t within the interval Tthrough a special constraint, ODES,T (xODE). Variable xODE represents all functions that are solutionsof S (during T ) and satisfy all the additional restrictions. The other real valued variables of the CSDP,denoted restriction variables, are used to model a number of constraints of interest in many applications.

Restriction V aluej,t(x), associates variable x with the value of a trajectory component j at a particulartime t, and can be used to model initial and boundary conditions. Restriction maximumV aluej,T (x)associates x with the maximum value of a trajectory component j within a time interval T (minimumrestrictions are similar). Restriction Timej,T,≥θ(x) constrains x to the time within T when trajectorycomponent j exceeds a threshold θ. Similarly, restriction Areaj,T,≥θ(x) associates x with the area of atrajectory component j, within time period T , above threshold θ.

The solving procedure for CSDPs that we developed maintains a safe enclosure for the set of possiblesolutions based on an ITS method for initial value problems. The improvement of such enclosure is com-bined with the enforcement of the ODE restrictions through constraint propagation on a set of narrowingfunctions associated with the CSDP. Some reduce the domain of a restriction variable given the currenttrajectory enclosure. The safety of such pruning is guaranteed by identifying functions within the currentenclosure that maximise and minimise the restriction variable. For example, the area restriction variable ismaximised by a function that for every value of t associates the maximum possible trajectory value withinthe current enclosure. Consequently, the upper bound of such restriction variable cannot exceed the areacomputed for such extreme function. Other narrowing functions safely reduce the uncertainty of the trajec-tory according to the domain of a restriction variable. For example, the trajectory enclosure cannot exceedthe upper bound of a maximum restriction variable. Other narrowing functions reduce the uncertainty ofthe trajectory by successive application of the ITS method between consecutive time points.

The full integration of a CSDP within an extended CCSP is accomplished by sharing the restrictionvariables of the CSDP. The CSDP solving procedure is used as a safe narrowing procedure for reducingthe domains of the restriction variables.

3 A Differential Model for Drug Design

The gastro-intestinal absorption process subsequent to the oral administration of a therapeutic drug isusually modelled by the following two-compartment model [10]:

dx(t)dt

= −p1x(t) +D(t)dy(t)dt

= p1x(t) − p2y(t) (1)

where x is the concentration of the drug in the gastro-intestinal tract;y is the concentration of the drug in the blood stream;D is the drug intake regimen; p1 and p2 are positive parameters.

The effect of the intake regimen D(t) on the concentrations of the drug in the blood stream during theadministration period is determined by the absorption and metabolic parameters, p1 and p2. We assume thatthe drug is taken in a periodic basis (every six hours), providing a unit dosage that is uniformly dissolvedinto the gastro-intestinal tract during the first half hour. Maintaining such intake regimen, the solution

40 J. Cruz and P. Barahona: Constraint Reasoning with Differential Equations

of the ODE system asymptotically converges to a six hours periodic trajectory, the limit cycle, shown inFigure 1 for specific values of the ODE parameters.

0

0.5

1

0 1 2 3 4 5 6t

x(t)

0.5

1

1.5

0 1 2 3 4 5 6t

y(t)

Fig. 1 The periodic limit cycle with p1 = 1.2 and p2 = ln(2)/5.

In designing a drug, it is necessary to adjust the ODE parameters to guarantee that the drug concentra-tions are effective, but causing no side effects. In general, it is sufficient to guarantee some constraints onthe drug blood concentrations during the limit cycle, namely, to impose bounds on its values, on the areaunder the curve and on the total time it remains above some threshold.

We show below how the extended CCSP framework can be used for supporting the drug design pro-cess. We will focus on the absorption parameter, p1, which may be adjusted by appropriate time releasemechanisms (the metabolic parameter p2, tends to be characteristic of the drug itself and cannot be easilymodified). The tuning of p1 should satisfy the following requirements during the limit cycle: (i) the con-centration in the blood bounded between 0.8 and 1.5; (ii) its area under the curve (and above 1.0) boundedbetween 1.2 and 1.3; (iii) it cannot exceed 1.1 for more than 4 hours.

3.1 Using the Extended CCSP for Parameter Tuning

The limit cycle and the different requirements may be represented in the extended CCSP framework. Due tothe intake regimen definition D(t), the ODE system has a discontinuity at time t = 0.5, and is representedby two CSDP constraints in sequence.

The first, PS1, ranges from the beginning of the limit cycle (t = 0.0) to time t = 0.5, and the secondPS2, is associated to the remaining trajectory of the limit cycle (until t = 6.0). Both CSDP constraintsinclude Value, Maximum Value, Minimum Value, Area and Time restrictions for associating variableswith different trajectory properties. Besides variables representing the ODE parameters, the initial trajec-tory values and the final trajectory values, there are variables representing the maximum and minimumdrug concentration values and respective area above 1.0 and time above 1.1 during the segment of timeassociated with each constraint.

The extended CCSP P , shown below, connects in sequence the two ODE segments by assigning thesame variables to both the final values of PS1 and the initial values of PS2 (parameters p1 and p2 are sharedby both constraints). Moreover, the 6 hours period is guaranteed by the assignment of the same variablesto both the initial values of PS1 and the final values of PS2. In addition to the restriction variables of eachODE segment, new variables for the whole trajectory sum up the values in each segment.

CCSP P = (X,D,C) where:X = < x0, y0, p1, p2, x05, y05, ymax1, ymax2, ymin1, ymin2, ya1, ya2, yarea, yt1, yt2, ytime >D = <Dx0,Dy0,Dp1,Dp2,Dx05,Dy05,Dymax1,Dymax2,Dymin1,Dymin2,Dya1,Dya2,Dyarea,Dyt1,Dyt2,Dytime>C = PS1(x0, y0, p1, p2, x05, y05, ymax1, ymin1, ya1, yt1), yarea = ya1 + ya2,

PS2(x05, y05, p1, p2, x0, y0, ymax2, ymin2, ya2, yt2), ytime = yt1 + yt2, The tuning of drug design may be supported by solving P with the appropriate set of initial domains

for its variables. We will assume p2 to be fixed to a five-hour half live (Dp2 = [ln(2)/5]) and p1 tobe adjustable up to about ten-minutes half live (Dp1 = [0..4]). The initial value x0, always very small,


is safely bounded in interval Dx0 = [0.0..0.5]. Additionally, the following bounds are imposed by theprevious drug requirements:

Dymin1 = [0.8..1.5], Dymax1 = [0.8..1.5], Dyarea = [1.2..1.3],Dymin2 = [0.8..1.5], Dymax2 = [0.8..1.5], Dytime = [0.0..4.0]

Solving the extended CCSP P (enforcing global hull consistency), with a precision of 0.001, narrowsthe original p1 interval to [1.191..1.543] in less than 3 minutes (the tests were executed in a Pentium 4computer at 1.5 GHz with 128 Mbytes memory). Hence, for p1 outside this interval the set of requirementscannot be satisfied.

This may help to adjust p1 but offers no guarantees on specific choices within the obtained interval.However, guaranteed results may be obtained for particular choices of the p1 values. Solving P withinitial domains Dx0 = [0.0..0.5], Dy0 = [0.8..1.5], Dp1 = [1.3..1.4] and Dp2 = [ln(2)/5] narrows theremaining unbounded domains to:

ymin1 ∈ [0.881..0.891], ymax1 ∈ [1.090..1.102], yarea ∈ [1.282..1.300],ymin2 ∈ [0.884..0.894], ymax2 ∈ [1.447..1.462], ytime ∈ [3.908..3.967]

Notwithstanding the uncertainty, these results do prove that with p1 within [1.3..1.4] (an acceptableuncertainty in the manufacturing process), all limit cycle requirements are safely guaranteed. Moreover,they offer some insight on the requirements showing, for instance, the area to be the most critical constraint.

4 Conclusion

This paper presents a framework to make decisions with models expressed by differential equations, witha constraint reasoning approach. In contrast to Monte Carlo and other stochastic techniques that can onlyassign likelihoods to the different decision options, and despite the data uncertainty and approximationerrors during calculations, the enhanced propagation techniques developed (enforcing global hull consis-tency) allow safe decisions to be made. Whereas the traditional use of complex differential models forwhich there are no analytical solutions is currently unsafe, the constraint reasoning framework extendsthe possibility of practical introduction of this type of models in decision making, specially when safedecisions are required.

References

[1] Collavizza, H., Delobel, F., Rueher, M.: A Note on Partial Consistencies over Continuous Domains. Principlesand Practice of Constraint Programming. Springer (1998) 147-161.

[2] Cruz, J., Barahona, P.: Handling Differential Equations with Constraints for Decision Support. Frontiers ofCombining Systems, Springer (2000) 105-120.

[3] Cruz J., Barahona, P.: Global Hull Consistency with Local Search for Continuous Constraint Solving. 10thPortuguese Conference on AI. Springer (2001) 349-362.

[4] Cruz J., Barahona, P.: Maintaining Global Hull Consistency with Local Search for Continuous CSPs. 1st Int.Workshop on Global Constrained Optimization and Constraint Satisfaction, Sophia-Antipolis, France (2002).

[5] Cruz J.: Constraint Reasoning for Differential Equations. PhD thesis, submitted (2003).[6] Sam-Haroud, D., Faltings, B.V.: Consistency Techniques for Continuous Constraints. Constraints 1(1,2) (1996)

85-118.[7] Janssen, M., Van Hentenryck, P., Deville, Y.: Optimal Pruning in Parametric Differential Equations. Principles

and Practice of Constraint Programming. Springer (2001).[8] Moore R.E.: Interval Analysis. Prentice-Hall, Englewood Cliffs, NJ (1966).[9] Nedialkov, N.S.: Computing Rigorous Bounds on the Solution of an Initial Value Problem for an Ordinary

Differential Equation. PhD thesis, Univ. of Toronto, Canada (1999).[10] Spitznagel, E.: Two-Compartment Pharmacokinetic Models. C-ODE-E. Harvey Mudd College, Claremont, CA

(1992).


Verified Computation of Packet loss Probabilities inMultiplexer Models using Rational Approximation

A. Cuyt∗1 and R.B. Lenin1

1 Department of Mathematics and Computer Science, University of Antwerp, Middelheimlaan 1, B2020Antwerp, Belgium


A statistical multiplexer is a basic model used in the design and the dimensioning of communication net-works. The multiplexer model consists of a finite buffer, to store incoming packets, served by a singleserver with constant service time, and a more or less complicated arrival process. The aim is to determinethe packet loss probability as a function of the capacity of the buffer. An exact analytic approach is unfea-sible, and hence we show how techniques from rational approximation can be applied for the computationof the packet loss. Since the parameters used in such networks may not produce precise probabilities ofinterest without having to introduce drastic assumptions, we also need to carry out a perturbation analysiswith respect to these parameters. This will be done by means of interval arithmetic.

1 Introduction

Fixed length packet switches have been studied extensively in the context of ATM switching models.However, since the Internet is primarily TCP/IP with variable length packets, it is even more important toanalyze switching in the new context. Variable bit rate (VBR) communications with real time constraintsin general, and video communication services (video phone, video conferencing, television distribution)in particular, are expected to be a major class of services provided by the future Quality of Service (QoS)enabled Internet. The introduction of statistical multiplexing techniques offers the capability to efficientlysupport VBR connections by taking advantage of the variability of the bandwidth requirements of individ-ual connections.

In order to assess the multiplexing gain, a variety of techniques have been developed in recent years,based on the exact analysis using matrix-analytic methods [3], fluid approximation [4] and simulation [5]to study these multiplexer models. In particular, considerable work has been spent at the development ofanalytical techniques for evaluating packet loss probabilities, also called cell loss probabilities (CLP).

Recently another and very efficient approach to compute the CLP as a function of the system size hasbecome available, based on the use of rational approximation [6]. The motivation behind this approachis that using the matrix-analytic method, it is computationally feasible to evaluate the CLP as a functionof the system size when the system size is small, and moreover it is often possible to obtain informationabout the asymptotic behavior [7]. In [6], the authors have employed rational approximants to computethe CLP, but only for models where the correlation between the cells was ignored. Considering a highdegree of correlation is of major importance when the input consists of more video sources [3]. In [2], asampling technique in combination with rational data fitting is proposed for models with more correlationbetween the cells. In rare cases it happens that the poles of the computed rational function disturb the fitof the packet loss probability. In [1], the authors have proposed a method to avoid this problem through an




a priori optimal placement of poles. The location of the poles is determined, indirectly, from the systemparameters.

Since the parameters used in such networks may not produce precise probabilities of interest withouthaving to introduce drastic assumptions, it is interesting and important to carry out a perturbation analysiswith respect to these parameters. This will be done here, concurrently with the computation of the rationalmodel, using interval arithmetic.

2 Model Description

In the multiplexer environment, the arrival of packets to the switch happens in discrete time, with discreteservice time, which makes the discrete time Markov chain a natural modeling choice. We assume thatthe arrival of cells which are transmitted by M independent and non-identical information sources to themultiplexer, can be modeled as a discrete time batch Markovian Arrival Process (D-BMAP), the discrete-time version of BMAP. Each information source is controlled by a Markov chain, called the backgroundMarkov chain. The basic queueing system which models the multiplexer is a D-BMAP/D/c/N queue withc discrete time servers, where each server can serve at most one cell per time unit. These servers serve abuffer with a capacity of N cells which is fed by M independent information sources. When the server isbusy, a maximum of c cells will depart in each slot. Service starts at the beginning of each time slot.

The D-BMAP queueing model is an M/G/1 type queue which is basically a two-dimensional discretetime Markov chain (Xn, Yn), n ≥ 0, where Xn is the number of cells in the buffer and Yn repre-sents the state of the M sources during the nth time slot. We are interested in the steady state behavior(X,Y ) ≡ lim

n→∞(Xn, Yn).Let D be the transition probability matrix of the process Y and let Dm (m = 0, 1, . . . ,M ) denote the

matrix corresponding to m arrivals during a time slot. These matrices can be calculated from the followingsystem parameters [2]:

1. the number of sources M ;

2. the transition probabilities of the background Markov chains:

- p = [p1, p2, . . . , pM ] and q = [q1, q2, . . . , qM ], if the sources are heterogeneous,

- p and q, if the sources are homogeneous;

3. the cell generation probability:

- d =

d1(0) d1(1)d2(0) d2(1)

......

dM (0) dM (1)

, if the sources are heterogeneous,

- d, if the sources are homogeneous.

The average arrival rate of cells at the multiplexer is given by

λ = η

(M∑m=0

mDm

)e, (1)

where e is a column vector of ones and the vector η is such that ηD = η with ηe = 1.Under the condition of ergodicity of the chain (X,Y ), i.e. the load ρ = λ/c < 1, the stationary

distribution vector Π := π0,π1, . . . ,πN with πi ∈ 2M

satisfies

ΠP = Π and Πe = 1, (2)

44 A. Cuyt and R.B. Lenin: Verified Computation of Packet loss Probabilities

where the transition probability matrix P of the process (X,Y ) [2] is a square matrix of size

- (N + 1)2M × (N + 1)2M , if the sources are heterogeneous,

- (N + 1)(M + 1) × (N + 1)(M + 1), if the sources are homogeneous,

and is given by

P =

D0 D1 . . . DN−c . . . DN−1 BN

D0 D1 . . . DN−c . . . DN−1 BN

......

D0 D1 . . . DN−c . . . DN−1 BN

0 D0 . . . DN−c−1 . . . DN−2 BN−1

0 0 . . . DN−c−2 . . . DN−3 BN−2

......

0 0 . . . D0 . . . Dc−1 Bc

, (3)

with Bi =∑Mj=i Dj , i = c, . . . , N .

The packet loss probability function, as a function of the buffer size N , is then given by

PL(N) :=1λ

N∑n=0

πn

M∑k=0

[k + n− (N + min(N, c))]+Dke, (4)

where [x]+ := max(0, x).It has been proved that for M/G/1-type queues, the logPL(N) decays exponentially [7]. That is,

logPL(N) ≈ ξN, as N → ∞. (5)

3 Rational Approximation

Because of the fact that the function logPL(N) asymptotically behaves as ξN for large N , polynomialapproximation techniques for logPL(N) are not suitable. Every polynomial model of degree larger thanone, would blow up for large N . However, a rational function rn(N) of numerator degree n + 1 anddenominator degree n, has a similar asymptotic behavior as that of logPL(N). Remains to compute thecoefficients in numerator and denominator of the rational function

rn(N) =pn(N)qn(N)

=∑n+1i=0 aiN

i∑ni=0 biN

i, (6)

mostly from computed or measured values of logPL(Nj) for small buffer sizes Nj , to fit the behavior oflogPL(N) using (4).

The rational model is fully specified when we know its numerator and denominator coefficients b1, . . . ,bn and a0, . . . , an+1, a total of 2n+ 2 coefficients (the constant term b0 in the denominator is only a nor-malization constant for the rational function [10] and is therefore assigned the value 1, whenever possible).These coefficients are determined from sampling logPL(N) at chosen Nj for j = 0, . . . , 2n while onevalue is determined from the asymptotic behavior

limN→∞

logPL(N) ≈ ξN =an+1

bnN. (7)

In rare cases it happens that the poles of the computed rational function disturb the fit of the packet lossprobability. To circumvent this problem, in [1], the authors have used a multipoint Pade-type approxima-tion technique.


4 Interval Arithmetic

In this paper we study the effect of uncertainties in the parameters p (or p), q (or q) and d (or d) onone hand and/or uncertainties in the values of logPL(Nj), when coming from measurements or verifiedcomputations performed for minimal and maximal load of ρ of the networks under investigation. Toautomatically incorporate this perturbation analysis in the computation of the rational fitting technique, weneed to use tools from interval arithmetic. The verified computation of rn(N) requires enclosures for orthe verified computation of:

- p (or p), q (or q) and d (or d) which are involved in the matrices Dm;

- the samples logPL(Nj);

- the asymptotic slope ξ [9].

One of the important finds is that PL(N) is rather sensitive with respect to small changes in the parameters.This of course has its effect on the data fitting problem. When using a rational model with prescribed polesto fit the interval data, as in the multipoint Pade-type approach, the verified computation of rn(N) can bebased on the results developed in [8].

References

[1] A. Cuyt, R.B. Lenin, Computing packet loss probabilities in multiplexer models using adaptive rational interpo-lation with optimal pole placement. submitted.

[2] A. Cuyt, R.B. Lenin, G. Willems, C. Blondia, and P. Rousseeuw, Rational approximation technique to computecell loss probabilities in multiplexer models, IEEE Trans. on Computers, 52, pp. 1-12 (2003).

[3] C. Blondia and O. Casals, Performance analysis of statistical multiplexing of VBR sources: A matrix-analyticalapproach, Performance Evaluation, 16, pp. 5-20 (1992).

[4] A.I. Elwalid and D. Mitra, Analysis, approximations and admission control of a multi-service multiplexing systemwith priorities, Proc. INFOCOM ’95, pp. 463-472 (1995).

[5] C.S. Chang, P. Heidelberger, S. Juneja and P. Shahabuddin, Effective bandwidth and fast simulation of ATM intreenetworks, Proc. Performance ’93, Rome (Italy, October 1993).

[6] W.B. Gong and H. Yang, On Global Rational Approximants for Stochastic Discrete Event Systems, InternationalJournal of Discrete Event Dynamic Systems, 7, pp. 93-116 (1997).

[7] E. Falkenberg, On the asymptotic behavior of the stationary distribution of Markov chains of M/G/1-type, Com-mun. Statist. Stochastic Models, 10, 1994, pp. 75-97.

[8] S. Markov, E. Popova, U. Schneider, J. Schulze, On linear interpolation under interval data, Math. Comput.Simulation, 42, pp. 35-45 (1996).

[9] S.M. Rump, Computational error bounds for multiple or nearly multiple eigenvalues, Linear Algebra and Appl.,324, pp. 209-226 (2001).

[10] D. Warner, Hermite interpolation with rational functions (Ph.D. dissertation, University of California, 1974).


Towards a Verified Library for Special Functions

Annie Cuyt∗ 1, Brigitte Verdonk∗∗ 1, H. Waadeland∗∗∗2, and Johan Vervloet†1

1 Dept Mathematics and Computer Science, University of Antwerp, Middelheimlaan 1, B2020 Antwerp,Belgium

2 Dept Mathematics, NTNU, NO-7491 Trondheim, Norway


The technique to provide a floating-point implementation of a function differs substantially when goingfrom a fixed precision context to a multiprecision context. In the former, the aim is to provide an optimalmathematical model, valid on a reduced argument range and requiring as few operations as possible. Hereoptimal means that, with respect to the model’s complexity, the truncation error is as small as it can get. Thetotal relative error should not exceed a prescribed threshold, round-off error and argument reduction effectincluded. In the latter, the goal is to provide a more generic technique, from which an approximant with theuser-defined accuracy can be obtained at runtime. Hence best approximants are not an option, since thesemodels would have to be recomputed every time the precision is altered and a function is evaluated. At thesame time the generic technique should generate an approximant of as low complexity as possible.

We point out how continued fraction representations of functions can be helpful in the multiprecisioncontext. The newly developed generic technique is mainly based on the use of sharpened a priori truncationerror estimates. The technique is very efficient and even quite competitive when compared to the traditionalfixed precision implementations. The implementation is reliable in the sense that it allows to return a sharpinterval enclosure for the evaluation of the function.

In this work we outline, as far as space restrictions allow, the tools needed to achieve the reliable imple-mentation of a number of elementary and special functions.

1 Tools

A lot of well-known constants in mathematics, physics and engineering, as well as elementary and specialfunctions enjoy very nice and rapidly converging continued fraction representations. We shall especiallybe interested in real-valued limit-periodic continued fractions and their use in the reliable multiprecisionimplementation of the functions they represent. This implementation is built on top of multiprecisionfloating-point arithmetic compliant with the principles of the IEEE 754-854 floating-point standards.

1.1 IEEE-based Arithmetic

We assume we have available a multiprecision floating-point implementation of the basic operations, com-parisons, base and type conversions, which is compliant with the principles of the IEEE 754-854 standards.Such an implementation is characterised by four parameters: the base β, the precision t and the exponentrange [L,U ]. In the current context, we are at least aiming at non-standard precisions t > 64 when β = 2.

To provide an implementation of a function f(x) in a particular precision, one first needs to de-velop an efficient mathematical model or approximation F (x) for f(x). This is usually a very time-consuming effort, because the model changes whenever the precision does. The sum of the truncation

∗ e-mail: [email protected]∗∗ e-mail: [email protected]∗∗∗ e-mail:[email protected]† e-mail:[email protected]



error |f(x)−F (x)|/|f(x)| and the rounding error |F (x)−F(x)|/|f(x)|, where F(x) denotes the machineimplementation of the model F (x), should preferably not exceed a few ulp where

1ulp = β−t+1

A typical double precision implementation (β = 2, t = 53) of the elementary functions achieves this inabout 25 basic operations. When analyzing the efficiency of our multiprecision implementation, we shallcompare the number of basic operations, required in our approach when the precision is set to t = 53, tothis reference.

1.2 Continued Fractions

We consider continued fraction representations of the form

f(x) =∞∑n=1

an1

an := an(x) (1)

Here an is called the n-th partial numerator. Especially useful are continued fractions of the form (1) wherean(x) = anx with an > 0. Such continued fractions are called S-fractions. The N -th approximant fN (w)of (1) and the N -th tail tN of (1) are given by

fN (w) =N−1∑n=1

an1

+aN

1 + w(2)

tN =∞∑

n=N+1

an1

(3)

A continued fraction is said to converge if limN→∞ fN (0) exists. Note that convergence to ∞ is allowed.The N -th approximant of a continued fraction can also be written as

fN (w) = (s1 . . . sN )(w) sn(w) =an

1 + wn = N, . . . , 1

1.3 Useful Tails

Using the linear fractional transformations sn, one can define a sequence Vnn∈N of value sets for f(x)by:

sn(Vn) =an

1 + Vn⊆ Vn−1 n = N, . . . , 1

The importance of such a sequence of sets lies in the fact that these sets keep track of where certain valueslie. For instance, if w ∈ VN then fN (w) ∈ V0. More importantly, when Vnn∈N is a sequence of valuesets for a convergent continued fraction, tN ∈ V N and hence f(x) ∈ V 0 [3, p. 111]. When carefullymonitoring the behaviour of the continued fraction tails, very accurate approximants fN (w) for f(x) canbe computed by making an appropriate choice for w.

We call a continued fraction (1) limit-periodic with period k, if

limp→∞ apk+q = aq q = 1, . . . , k

More can be said about tails of limit-periodic continued fractions with period one, also called one-limit-periodic continued fractions. Let a = limn→∞ an and let w be the fixpoint with smallest modulus of thelinear fractional transformation s(w) = a/(1 + w). It can be shown [3] that

limN→∞

tN = w

48 A. Cuyt, B. Verdonk, H. Waadeland, and J. Vervloet: Verified Library for Special Functions

and also

limN→∞

∣∣∣∣f(x) − fN (w)f(x) − fN (0)

∣∣∣∣ = 0

Hence a suitable choice of w in (2) may result in more rapid convergence of the approximants (w = 0 isusually used as a reference).

1.4 Oval Sequence Theorem

Besides the sequence of value sets, an equally important role is played by the sequence of convergence setsEnn∈N, of which the elements guarantee convergence of the continued fraction as long as each partialnumerator an belongs to the respective set En:

∀n ≥ 1 : an ∈ En ⇒∞∑n=1

an1

converges

Very sharp truncation error estimates can be obtained from the oval sequence theorem [3]. Here we citeonly the real version of this theorem.

Theorem 1.1 Let 0 < Rn < |1 + Cn| and |Cn−1|Rn < |1 + Cn|Rn−1. Then Vnn∈N with

Vn = w ∈ R : |Cn − w| < Rn

is a sequence of value sets for the sequence Enn∈N of convergence sets given by

En = a ∈ R : |a(1 + Cn) − Cn−1((1 + Cn)2 −R2n)| +Rn|a| ≤ Rn−1((1 + Cn)2 − R2

n)

For w ∈ VN the truncation error |f(x) − fN (w)| is bounded by

|f(x) − fN (w)| ≤ 2RN|C0| +R0

|1 + CN | −RN×N−1∏k=1

Mk

where Mk = max| w1+w | : w ∈ V k

The oval En given above actually reduces to an interval [pn, qn] in the real case. It is clear that thesmaller the sets Vn, the smaller the values Mn and hence the smaller the upper bound on the truncationerror |f(x) − fN (w)|. A key role herein is played by the radii Rn.

2 Results

When combining the above ingredients with the characteristic monotonicity behaviour of the partial numer-ators in a lot of continued fraction representations of elementary and special functions, we obtain extremelysharp truncation error bounds. The monotonicity properties of the partial numerators indeed make it possi-ble to give explicit expressions for the radii Rk and the maxima Mk in the oval sequence theorem, and thisfor several classes of continued fraction representations. The truncation error bounds obtained are almostindistinguishable from the true truncation error. Other truncation error bounds which can be found in theliterature, either only hold for w = 0 [2], or are not equally sharp [1].

Since the accumulated rounding error is included in the total error which we bound by only a few ulp,the actual evaluation of f(x) needs to take place in a slightly larger working precision s > t. The optimalworking precision shall be determined dynamically, depending on the target precision t, the rounding erroranalysis and the required accuracy. Our rounding error analysis also takes the effect of argument reductioninto account and hence guarantees a fully reliable evaluation of f(x) over the entire domain.


References

[1] C. Baltus and W.B. Jones. Truncation error bounds for modified continued fractions with applications to specialfunctions. Numer. Math., 55:281–307, 1989.

[2] W.B. Jones and W.J. Thron. Continued fractions: analytic theory and application, volume 11 of Encyclopedia ofmathematics and its applications. Addison-Wesley Publishing Company, New York, 1980.

[3] Lisa Lorentzen and Haakon Waadeland. Continued fractions with applications. North-Holland Publishing Com-pany, Amsterdam, 1992.


Recent Applications of Rational Approximation Theory:A Guided Tour

A. Cuyt∗1

1 Department of Mathematics and Computer Science, University of Antwerp, Middelheimlaan 1, B2020Antwerp, Belgium


Rational functions have a clear advantage compared to polynomials. They can simulate singularities anddifferent kinds of asymptotic behaviour. These features of rational functions have proved to be very useful ina lot of applications including communication networks, EM models, shape reconstruction and the verifiedcomputation of special functions. We present a number of the more recent results.

1 Introduction

Rational approximation theory has been around for several centuries. To illustrate that its applicationsare still in demand and involve some technologically advanced problem domains, the following recentlyobtained results shall be discussed:

- reliable and highly efficient cell loss probability computation in the context of multiplexer models;

- adaptive multivariate rational interpolation requiring as few as possible, expensive to obtain engineer-ing data;

- shape reconstruction from multidimensional moment information, compared to the inverse Radontransform using projections;

- a verified and fast multiprecision implementation of a class of special functions, based on the latestcontinued fraction results.

2 Cell loss probability

A statistical multiplexer is a basic model used in the design and the dimensioning of communication net-works. The multiplexer model consists of a finite buffer, to store incoming packets, served by a singleserver with constant service time, and a more or less complicated arrival process. The aim is to determinethe packet/cell loss probability (CLP) as a function of the capacity of the buffer. A variety of techniqueshave been developed in recent years, based on the exact analysis using matrix-analytic methods [1], fluidapproximation [2] and simulation [3] to compute the CLP.

Recently another and very efficient approach to compute the CLP as a function of the system size hasbecome available, based on the use of rational approximation [7]. The motivation behind this approachis that using matrix-analytic method, it is computationally feasible to evaluate the CLP as a function ofthe system size when the system size is small, and moreover it is often possible to obtain informationabout the asymptotic behavior [5]. In [7], the authors have employed rational approximants to compute the




CLP, but only for models where the correlation between the cells was ignored. Considering a high degreeof correlation is of major importance when the input consists of more video sources [1]. A samplingtechnique in combination with rational data fitting for models with more correlation between the cells willbe discussed. In rare cases it happens that the poles of the computed rational function disturb the fit of thepacket loss probability. A method to avoid this problem through an a priori optimal placement of poles ispresented. The location of the poles will be determined, indirectly, from the system parameters.

3 Adaptive multivariate sampling

During the design process of a complex physical system, computer-based simulations are often used tolimit the number of expensive prototypes. However, despite the steady and continuing growth of computingspeed and power, the computational cost of complex high-accuracy simulations, such as in electromagneticmodelling, can also be high. A single simulation of a design may take several minutes or hours to complete[10, 4].

We present an adaptive meta-modelling technique, in one or more dimensions, using as few data pointsas possible. Such a meta-model is a fast running surrogate approximation, in our case a rational approxi-mation, of a complex time-consuming computer simulation. The sparse data used to compute the rationalmodel are carefully selected. It is clear that the location of the free denominator zeroes of the computedmodel can have a lot of impact on the overall accuracy of the model. Therefore the distance of the nearestpoles to the design space is monitored and influences the choice of the optimal rational model.

4 Shape reconstruction

In shape reconstruction, the celebrated Fourier slice theorem plays an essential role. It allows to reconstructthe shape of a quite general object from the knowledge of its Radon transform, in other words from theknowledge of projections of the object. In case the object is a polygon [6], or when it defines a quadraturedomain in the complex plane [8], its shape can also be reconstructed from the knowledge of its moments.Essential tools in the solution of the latter inverse problem are quadrature rules and formal orthogonalpolynomials.

We show how shape reconstruction of general compact objects can also be realized from the knowledgeof the moments. To this end we use a less known homogeneous Pade slice property. Again integral trans-forms, in our case the Stieltjes transform, formal orthogonal polynomials in the form of Pade denominators,and multidimensional integration formulas or cubature rules play an essential role.

5 Verified multiprecision function library

The technique to provide a floating-point implementation of a function differs substantially when goingfrom a fixed finite precision context to a finite multiprecision context. In the former, the aim is to providean optimal mathematical model, valid on a reduced argument range and requiring as few operations aspossible. Here optimal means that, with respect to the model’s complexity, the truncation error is as smallas it can get. The total relative error should not exceed a prescribed threshold, round-off error and argumentreduction effect included. In the latter, the goal is to provide a more generic technique, from which anapproximant yielding the user-defined accuracy, can be deduced at runtime. Hence best approximants arenot an option since these models would have to be recomputed every time the precision is altered and afunction evaluation is requested. At the same time the generic technique should propose an approximantof as low complexity as possible.

In the current approach we point out how continued fraction representations of functions can be helpfulin the multiprecision context [9]. The developed generic technique is mainly based on the use of sharpenednew a priori truncation error estimates. The technique is very efficient and even quite competitive when

52 A. Cuyt: Applications of Rational Approximation Theory

compared to the traditional fixed precision implementations. The implementation is reliable in the sensethat it allows to return a sharp interval enclosure for the requested function evaluation.

References

[1] C. Blondia and O. Casals, Performance analysis of statistical multiplexing of VBR sources: A matrix-analyticalapproach, Performance Evaluation, 16, pp. 5-20 (1992).

[2] A.I. Elwalid and D. Mitra, Analysis, approximations and admission control of a multi-service multiplexing systemwith priorities, Proc. INFOCOM ’95, pp. 463-472 (1995).

[3] C.S. Chang, P. Heidelberger, S. Juneja and P. Shahabuddin, Effective bandwidth and fast simulation of ATM intreenetworks, Proc. Performance ’93 (Rome, Italy, October 1993).

[4] J. De Geest, T. Dhaene, and N. Fache, Adaptive CAD-model building algorithm for general planar microwavestructures, IEEE Trans. Microwave Theory Tech., 47, pp. 1801-1809 (1999).

[5] E. Falkenberg, On the asymptotic behavior of the stationary distribution of Markov chains of M/G/1-type, Com-mun. Statist. Stochastic Models, 10, pp. 75-97 (1994).

[6] G.H. Golub, P. Milanfar, and J. Varah, A stable numerical method for inverting shape from moments, SIAM J.Sci. Statist. Comput., 21, pp. 1222–1243 (1999).

[7] W.B. Gong and H. Yang, On Global Rational Approximants for Stochastic Discrete Event Systems, InternationalJournal of Discrete Event Dynamic Systems, 7, pp. 93-116 (1997).

[8] B. Gustafsson, C. He, P. Milanfar, and M. Putinar, Reconstructing planar domains from their moments, InverseProblems, 16, pp. 1053–1070 (2000).

[9] L. Lorentzen and H. Waadeland, Continued Fractions with Applications. Studies in Computational Mathematics;3, North-Holland, Amsterdam, 1992.

[10] R. Lehmensiek and P. Meyer, An efficient adaptive frequency sampling algorithm for model-based parameterestimation as applied to aggressive space mapping, Microwave and Optical Technology Letters, 24, pp. 71-78(2000).


Efficient Detection of Periodic Orbits in Chaotic Systems byStabilising Transformations

Ruslan L. Davidchack ∗

Department of Mathematics and Computer Science, University of Leicester, Leicester LE1 7RH, UK


Recently developed efficient algorithm for detecting periodic orbits in chaotic systems [1] combines theset of stabilising transformations proposed by Schmelcher and Diakonos [2] with a modified semi-implisitEuler iterative scheme and seeding with periodic orbits of neighboring periods. The difficulty in applyingthe algorithm to higher-dimensional systems is mainly due to the fact that the number of stabilising trans-formations grows extremely fast with increasing dimension. Here we analyse the properties of stabilisingtransformations and propose an alternative approach to constructing a smaller set of transformations.

1 Introduction

Unstable periodic orbits (UPOs) are widely recognised as fundamental building blocks of chaotic dynam-ical systems. They form a “skeleton” for chaotic trajectories [3]. A well regarded definition of chaos [4]requires the existence of an infinite number of UPOs that are dense in the chaotic set. Different geometricand dynamical properties of chaotic sets, such as natural measure, Lyapunov exponents, fractal dimen-sions, entropies [5], can be determined from the location and stability properties of the embedded UPOs. Itis thus of paramount interest in the study of chaotic systems that a complete set of UPOs can be computed.In a limited number of cases, this can be achieved due to the special structure of the systems. Examplesinclude the Biham-Wenzel method applicable to Henon-like maps [6], or systems with known and wellordered symbolic dynamics [7]. For generic systems, however, most methods described in the literatureuse some type of an iterative scheme that, given an initial condition (seed), converges to a periodic orbit ofthe chaotic system. In order to locate all UPOs, the convergence basin of each orbit for the chosen iterativescheme must contain at least one seed. The seeds are often chosen either at random from within the regionof interest, from a regular grid, or from a chaotic trajectory with or without close recurrences. Typically,the iterative scheme is chosen from one of the “globally” convergent methods of quasi-Newton or secanttype. However, experience suggests that even the most sophisticated methods of this type suffer from acommon problem: with increasing period, the basin size of the UPOs becomes so small that placing a seedwithin the basin with one of the above listed seeding schemes is practically impossible [8].

Recently, a promising new approach to the detection of UPOs in generic chaotic systems has beenproposed by Schmelcher and Diakonos (SD) [2, 9]. The basic idea is to transform the dynamical system insuch a way that the UPOs of the original system become stable and can be located by simply following theevolution of the transformed dynamical system. That is, to locate period-p orbits of a discrete dynamicalsystem

U: xj+1 = f(xj), f: Rn → R

n , (1)

∗ e-mail: [email protected], Phone: +44(0) 116 252 3819, Fax: +44(0) 116 252 3915


54 Ruslan L. Davidchack: Detection of Periodic Orbits in Chaotic Systems

one considers an associated flow

Σ:dx

ds= Cg(x) , (2)

where g(x) = fp(x) − x and C is an n × n constant orthogonal matrix. It is easy to see that map fp(x)and flow Σ have identical sets of fixed points for any C, while C can be chosen such that unstable period-porbits of U become stable fixed points of Σ. Since it is not generally possible to choose a single matrix Cthat would stabilise all UPOs of U , the goal is to find the smallest possible set of matrices C = CkKk=1,such that, for each UPO of U , there is at least one matrix C ∈ C that transforms the unstable orbit of Uinto a stable fixed point of Σ. Schmelcher and Diakonos have conjectured that, for hyperbolic orbits, thisrequirement is satisfied by the set CSD of all possible n × n orthogonal matrices with only one non-zeroentry ±1 per row or column. Note that the set CSD forms a group isomorphic to Weyl’s reflection groupBn [10]. The number of such matrices is K = 2nn!. This conjecture has been verified for n ≤ 2 andappears to be true for n > 2, but, thus far, no proof has been presented. The main advantage of the SDapproach is that the basins of attraction of the stabilised UPOs appear to be much larger than the basinsof convergence produced by other iterative schemes [9, 11, 12], making it much easier to select a usefulseed. Moreover, depending on the choice of the stabilising transformation, the SD method may convergeto several different UPOs from the same seed.

To increase the efficiency of the SD approach, and recognising the typical stiffness of the flow Σ,Davidchack and Lai have developed a modified semi-implicit Euler method for integrating the flow inEq. (2) [1]:

D: xj+1 = xj + [βsjCT −Gj ]−1g(xj) , (3)

where β > 0 is a scalar parameter, sj ≡ ||g(xj)|| is an L2 norm, Gj ≡ Dg(xj) is the Jacobian matrix, and“T” denotes transpose. Note that, away from the root of g, the scheme D is a semi-implicit Euler methodwith step size h = (βsj)−1 for integrating the flow Σ, while close to the root it converges quadratically,analogous to the Newton-Raphson method.

Another important ingredient of the algorithm presented in [1] is the seeding with periodic orbits ofneighbouring periods. This seeding scheme appears to be superior to the typically employed schemes andenables fast detection of all 1 periodic orbits of increasingly larger periods in generic low-dimensionalchaotic systems. As an illustration, for the Ikeda map attractor at traditional parameter values, the al-gorithm presented in [1] was able to locate all periodic orbits up to period 22 for a total of over 106 orbitpoints. Obtaining a comparable result with generally employed techniques requires an estimated 105 largercomputational effort.

While the stabilisation approach is obviously extremely efficient for relatively low-dimensional systems,direct application to the higher-dimensional systems is much less efficient due to the rapid growth of thenumber of matrices in CSD. Even thought it appears that, in practice, far fewer transformations are requiredto find all periodic orbits of a given chaotic system, the sufficient subset of transformations is not knowna priori. It is thus clear that a better understanding of the stabilising transformations is necessary for anefficient extension of this approach to higher dimensions, which is the aim of this article.

2 Stabilising transformations in two dimensions2

The stability of a fixed point x∗ of the flow Σ is determined by the real parts of the eigenvalues of thematrix CG, where G ≡ Dg(x∗) is the Jacobian matrix of g(x) evaluated at x∗. That is, for x∗ to be a

1 Even though we have no rigorous proof of the completeness of the detected orbits, the completeness has been verified in somecases using a less efficient but rigorous method of Galias [13].

2 This case has already been investigated in [14]. However, here we adopt a different approach that allows better extrapolation ofour understanding of the two-dimensional case to higher dimensions.


stable fixed point of Σ, the matrix C has to be such that all the eigenvalues of CG have negative real parts.In order to understand what properties of G determine the choice of a particular stabilising transformationC, we use the following parametrisation for the general two-dimensional orthogonal matrices:

Cs,α =(s cosα sinα−s sinα cosα

)(4)

where s = ±1. When α = −π/2, 0, π/2, or π, we obtain the set of matrices CSD. For example,C1,π/2 =

(0 1

−1 0

)and C−1,π =

(1 00 −1

).

If we write G ≡ gij , (i, j = 1, 2), then the eigenvalues of Cs,αG are given by the following equations:

σ1,2 = −A cos(α− θ) ±√A2 cos2(α− θ) − s detG (5)

where detG = g11g22 − g12g21, A = 12

√(sg11 + g22)2 + (sg12 − g21)2, and

θ = arctansg12 − g21−sg11 − g22

, −π < θ ≤ π (6)

It is clear from Eq. (5) that both eigenvalues have negative real parts when

s = s ≡ sgn detG, and |α− θ| < π/2 (7)

This analysis clearly shows that, for n = 2, the set CSD of 8 matrices is sufficient to stabilise any fixedpoint, provided that detG = 0. In fact, there are typically two matrices in CSD that stabilise a given fixedpoint.

Parameter θ clearly plays an important role in the above analysis, and can be shown to relate to thedirections of the eigenvectors of the Jacobian matrix Dfp(x∗). This, in turn, indicates that for invertiblemaps3, where the foliation structure of the invariant manifolds does not permit self-intersections, a trans-formation matrix that stabilises a given fixed point x∗ of fp will also stabilise fixed points of all periods inits vicinity. With the seeding strategy of using periodic orbit points as seeds, it is thus possible to use theinformation about invariant directions at the seed in order to construct stabilising transformations for thedetection of neighbouring periodic orbits. A specific application of this idea in higher-dimensional systemsis explored in the next Section.

3 Extension to higher-dimensional systems

To extend the analysis of the preceding Section to higher-dimensional systems, we note that the matrixCs,θ, as defined by Eqs. (6) and (7), is closely related to the orthogonal part of the polar decomposition ofG [15]. Recall that any non-singular n× n matrix can be uniquely represented as a product

G = QB , (8)

where Q is an orthogonal matrix and B is a symmetric positive definite matrix. It can be shown that Cs,θis related to Q as follows:

Cs,θ = −QT , (9)

so that Cs,θG = −B is indeed a matrix with all negative eigenvalues.

3 Note that the maps obtained from flows by the Poincare section are always invertible.

56 Ruslan L. Davidchack: Detection of Periodic Orbits in Chaotic Systems

For n > 2, the representation of a general orthogonal matrix similar to that in Eq. (4) can be introduced,i.e. C = Cs,αi, where s = ±1 and αi = α1, α2, . . . , αn(n−1)/2 are the angle parameters repre-senting rotations in the n(n − 1)/2 coordinate planes. If a fixed point x∗ of an n-dimensional flow has anon-singular matrix G ≡ Dg(x∗), then we can calculate the polar decomposition G = QB and define

Cs,θi = −QT . (10)

By analogy with the two-dimensional case, we can expect that an orthogonal matrix Cs,αi will stabilisethe fixed point if s = s and αi is close4 to θi.

As mentioned in the previous Section, the relation between θi and the invariant directions of thelinearised map at a fixed point allows us to construct transformations that will stabilise periodic orbits ofall periods in the vicinity of the fixed point. With the seeding scheme proposed in [1], where the detectedorbit points of period p are used as seeds to detect period-(p+1) orbits, we can determine the eigenvectorsof the linearised map at the seed x∗

G = Dfp(x∗) − I = V (Λ − I)V −1 , (11)

where Λ ≡ diag(λ1, . . . , λn) is the diagonal matrix of eigenvalues ofDfp(x∗) and I is the identity matrix.Next, we compute the polar decomposition G = QB and use CT = −Q as the stabilising transformationin Eq. (3). In addition, note that the neighbourhood of the seed x∗ can also contain periodic orbits with thesame invariant directions but with some eigenvalues having the opposite sign. Therefore it is often usefulto apply stabilising transformations constructed from matrices of the form

G = V (SΛ − I)V −1 , (12)

where S = diag(±1,±1, . . . ,±1) are diagonal matrices of plus or minus ones. The largest possiblenumber of such transformations is 2n, but in practice, based on the properties of specific systems underconsideration, not all of them are useful.

We have tested this stabilisation scheme on the four-dimensional kicked double rotor map [16]. Thedirect application of the stabilising scheme CSD requires checking for each seed the stabilising property ofeach of the 384 matrices, majority of which do not converge at all. The scheme presented here can detectthe same orbits in a fraction of the computational effort needed by the SD method, using only 8 stabilisingtransformation for every seed. This shows the feasibility of extending the efficiency of the detection bystabilising transformations to higher-dimensional systems.

References

[1] R. L. Davidchack, Y.-C. Lai, Phys. Rev. E 60, 6172–6175 (1999).[2] P. Schmelcher, F. K. Diakonos, Phys. Rev. Lett. 78, 4733–4736 (1997).[3] P. Cvitanovic, , Physica D 51,138–151 (1991).[4] R. L. Devaney, An Introduction to Chaotic Dynamical Systems, second ed. (Addison-Wesley, Reading, MA,

1989).[5] E. Ott, Chaos in Dynamical Systems, (Cambridge University Press, Cambridge, 1993).[6] O. Biham, W. Wenzel, Phys. Rev. Lett. 63, 819–822 (1989).[7] K. T. Hansen, Phys. Rev. E 52, 2388–2391 (1995).[8] J. R. Miller, J. A. Yorke, Physica D 135, 195–211 (2000).[9] P. Schmelcher, F. K. Diakonos, Phys. Rev. E 57, 2739–2746 (1998).

[10] J. E. Humphreys, Reflection Groups and Coxeter Groups, (Cambridge University Press, Cambridge, 1990).[11] A. Klebanoff, E. M. Bollt, Chaos, Solitons, and Fractals 12, 1305–1322 (2001).[12] R. L. Davidchack, Y.-C. Lai, A. Klebanoff, E. M. Bollt, Phys. Lett. A 287, 99–104 (2001).[13] Z. Galias, Int. J. of Bifurcation and Chaos 11, 2427–2450 (2001).[14] D. Pingel, P. Schmelcher, F. K. Diakonos, Phys. Rev. E 64, 026214 (2001).[15] P. Halmos, Finite Dimensional Vector Spaces, second ed. (Van Nostrand, Princeton, 1958).[16] F. J. Romeiras, C. Grebogi, E. Ott, W. P. Dayawansa, Physica D 58, 165–192 (1992).

4 The exact nature of the proximity of parameters θi and αi needs further investigation. However, based on numericalevidence, it appears that an appropriate norm can be defined in order to formulate a condition similar to that in Eq. (7).


Weighted Eigenvalue Problem Approach to the Critical ValueDetermination of Screened Coulomb Potential Systems

Metin Demiralp∗1

1 Computational Science and Engineering Program, Informatics Institute, Istanbul Technical University,Maslak, 80626, Istanbul, Turkey


In this work, the radial time-independent Schrdinger equation of a screened Coulomb potential system atthe zero energy limit is first converted to a weighted eigenvalue problem of an ordinary differential operator.Then by using an appropriate coordinate transformation the differential equation is transformed into a formwhose first and second order derivative related terms become same as the Extended Jacobi Polynomials’differential equation’s corresponding terms. Only difference is the appearance of a multiplicative operatorwhich can be considered as an effective potential. Work focuses on the point whether the solution is obtainedeasily depending on the structure of this potential.

The radial time-independent Schrodinger equation for a screened Coulomb potential system is givenbelow

−12d2ψ

dr2− 1r

dψ

dr+l(l + 1)

2r2ψ − V (γr)

rψ = Eψ (1)

where ψ, E, r, l and γ stand for the wave function, system’s energy, radial variable, azimuthal quantumnumber and the screening parameter respectively. If we set E = 0 to get the critical value determinationof screening parameter at the threshold of the continuous spectrum then we write

−12d2ψcrdr2

− 1r

dψcrdr

+l(l + 1)

2r2ψcr −

V (γcr r)r

ψcr = 0 (2)

Which can be transformed into the following equation by taking r = γcr r

−d2ψcrdr2

− 2r

dψcrdr

+l(l + 1)r2

ψcr =2γcr

V

rψcr (3)

This equation defines an eigenvalue problem for the linear ordinary differential operator which acts on ψcrin the left side of the above equation. This operator is Hermitian and positive definite over the Hilbertspace of the bound state wavefunctions under the weight function, r2. The eigenvalue parameter is definedas 2

γcr. The right hand side of the above eigenvalue problem has a weight function, V (r)

r , which is anHermitian algebraic operator and is positive definite as long as V (r) does not change its sign over theentire domain r. This means that the solution of the above eigenvalue problem produces real eigenvalueswithout any warranty of their positiveness. This must depend on the structure of the screening functionV (r). In this work, V (r), is assumed to be given in a such way that the eigenvalues of the related problemremains on the positive real axis and belongs to the discrete spectrum only.

Even under these assumptions the numerical solution of the equation (3) is not as easy as expected dueto the numerical instability. This instability can arise due to the decay of the above weight function when r

∗ Corresponding author: e-mail: [email protected], Phone: +90 212 285 70 82, Fax: +90 212 285 70 73


58 M. Demiralp: Weighted Eigenvalue Problem Approach

goes to infinity. Because of the discrete and finite capabilities of the computational tools employed in thenumerical implementation the matrix representation of the weight operator, V (r)

r , may have eigenvaluesaccumulated to zero. This increases the condition number of the matrix and makes the relevant numericalprocedure instable. The extent of the instability depends on how the screening function behaves at largevalues of r. In these cases one can soften the instability by constructing appropriate basis functions for thematrix representation. This task, of course, necessitates experience and sufficient insight to catch the mostappropriate structures for the basis function. We do not intend to discuss this issue in details.

The above discussions imply that the conversion of the weighted eigenvalue problem to a weightlessone may be fruitful for the determination of the critical values of the screening parameter. For this purposewe may define a new independent variable u which is a continuous and monotonous function, u(r), of r insuch a way that u(0) = −1 and u(∞) = 1. The equation (3) takes the following form when we use u asindependent variable.

−u2r

d2ψcrdu2

−[urr +

2rur

]dψcrdu

+l(l + 1)r2

ψcr =2γcr

V (r)r

ψcr (4)

where the subscript r denotes the differentiation with respect to r. If we assume

u2r = K2

(1 − u2

) V (r)r

(5)

for the determination of u(r) we can arrive at the following equation.

−(1 − u2

) d2ψcrdu2

− F1(u)dψcrdu

+ F2(u)ψcr = λcr (6)

where K is an arbitrary constant to be determined. The definition of λ, F1 and F2 are given below.

λ ≡ 2K2γcr

(7)

F1(u) ≡r

K2V (r)

[urr +

2rur

]; F2(u) ≡

l(l + 1)K2rV (r)

(8)

where F1(u) and F2(u) are assumed to be expressed as functions of u after rewriting the derivatives ofu in terms of u. The solution of the equation (5) under the conditions u(0) = −1 and u(∞) = 1 whichimpose a specific value on K is given below.

u(r) = − cos

(K

∫ r

0

(V (t)t

) 12

dt

)(9)

where K must be defined as follows.

K =π[∫∞

0

(V (t)t

) 12dt

] (10)

The equation (6) can be put into such a form that its differential operator part (the sum of the termsproportional to the first and second order differentiation operators with respect to u) becomes completelysame as the corresponding part of the equation satisfied by the extended Jacobi polynomials. For thispurpose the following transformation on ψcr can be defined

ψcr(u) = f(u)g(u) (11)


where f(u) and g(u) are to be determined. The resulting form of the equation (6) can be written as adifferential equation for f(u) as below.

−(1 − u2

)d2fdu2 −

[(1 − u2

)2g

(dgdu

)+ F1(u)

]dfdu

+[−(1 − u2

)1gd2gdu2 − F1(u) 1

gdgdu + F2(u)

]f = λf (12)

where g(u) is to be determined in such a way that the coefficient of the first order derivative of f(u) withrespect to u becomes equivalent to the corresponding term of the differential equation for the extendedJacobi polynomials. Thus we obtain(

1 − u2) 2g

(dg

du

)+ F1(u) = (β + 1) (1 − u) − (α+ 1) (1 + u) (13)

where α and β correspond to the parameters of the extended Jacobi polynomials and may give someflexibility to control the numerical instability (if exists).

If we use the equation (8) for F1(u) in the above equation and make it totally integrable then thefollowing expression for the function g(u) is obtained after some standard steps.

g(u) =(1 − u)

α+12 (1 + u)

β+12

r

(dr

du

) 12

C (14)

where C denotes the arbitrary integration constant. To determine this constant we can consider the follow-ing orthonormality condition for the functions belonging to the eigenfunctions of the weighted eigenvalueproblem given by the equation (6). If Φj and Φk stand for any two functions in this eigenfunction set thenwe can write∫ ∞

0

(Φj(r)

V (r)r

Φk(r))r2dr = δj,k (15)

where V (r)r, which is assumed to be nonnegative for all nonnegative values of r, stands for the weightfunction of the integration. If we transform Φ functions through Φj(r) = φj(u(r))g(u(r)) where g(u) isgiven via the equation (14) then one can write the following equation after some intermediate algebra.∫ 1

−1

(1 − u)α(1 + u)βφj(u)φk(u)C2

K2du = δj,k (16)

where K is given by the equation (10). Since the eigenfunctions of the weightless eigenvalue problem weare trying to construct has a differential operator part which is composed of the terms proportional to thefirst or second order differentiation operators with respect to u and is Hermitian over the weight function,(1 − u)α(1 + u)β , it is quite natural to impose the orthonormality condition to φ(u) functions over thesame weight function. This, however, implies

C = K (17)

By using these results one can write the following equation for the weightless eigenvalue problem of f(u).

−(1 − u2)d2f

du2− [(β + 1)(1 − u) − (α+ 1)(1 + u)]

df

du+G(u)f = λf (18)

where G(u) is given below.

G(u) = −(1 − u2)1g

(d2g

du2

)− F1(u)

1g

(dg

du

)+ F2(u) (19)

60 M. Demiralp: Weighted Eigenvalue Problem Approach

In all the formulae above it is necessary to express the first derivative of r with respect to u. This canbe done analytically if the structure of u(r) permits. In such cases G(u) can be constructed analyticallyotherwise numerical procedures become necessary for the construction of G(u). The function G(u) canbe considered as an effective potential term which determines the spectral properties of the eigensolutionsto the equation satisfied by f(u). Depending on G(u), the spectrum of λ may contain undesired partslike negative discrete or positive, negative, (or both) continuous spectra. The expected situation is the casewhere only positive discrete spectrum exists. This case can be named as Normal Case where each spectralvalue of λ defines a critical value for the screening parameter referring to the hydrogen atom bound states.The unexpected situations which can be named as Abnormal Cases may lead us to some anomalies. Workincludes the discussions about the relation between G(u) and V (r) and the ease in the solution of theequation (18).

References

[1] M. Yukawa, Proc. Phys. Math. Soc. Jpn., 17, 48 (1935).[2] M. Demiralp, Theor. Chim. Acta, 75, 223 (1989).[3] N. A. Baykara and M. Demiralp, J. Math. Chem., 16, 167 (1994).


Bifurcations of Periodic Solutions of ODEs usingBordered Systems

A. Dhooge1, W. Govaerts∗1, and Yu. A. Kuznetsov2

1 Department of Applied Mathematics and Computer Science, Ghent University, Krijgslaan 281-S9,B-9000Gent, Belgium

2 Mathematisch Instituut, Universiteit Utrecht, Boedapestlaan 6, 3584 CD Utrecht, The Netherlands


We discuss the numerical computation and continuation of the fold, flip and Naimark - Sacker bifurcationsof periodic solutions of ODEs. We use minimal extended systems based on bordered operators betweenfunction spaces. These are discretized by a method based on orthogonal collocation. The bordering func-tions and vectors of the continuous problem correspond to bordering vectors of the discretized matrices. Wediscuss the implementation of these methods in the Matlab software packages matcont and cl matcont.

1 Introduction

We consider a dynamical system of the form

dx

dt= f(x, α) (1)

with x ∈ IRn, f(x, α) ∈ IRn, and α a vector of parameters.Numerical continuation under parameter variation of equilibria, limit points, limit cycles etcetera of (1)

is a well - understood subject, see e.g. [4], [5]. Good software for this is also available [2], [6]. However, itis not easily accessible and its input and output are not compatible with standard software such as Matlab.

The new package MATCONT is a Matlab toolbox which has the look - and - feel of [6] but is completelyrewritten. The current version of the package is freely available at:

http://allserv.rug.ac.be/˜ajdhooge

where also a slightly more general non - GUI version CL MATCONTis available.Details on the general structure of MATCONT are given in [1]. In the present paper we concentrate on

the implementation in MATCONT of the new algorithms for the computation of flip, fold and Naimark -Sacker bifurcations of limit cycles which are studied in [3]. These use a minimal extended system, i.e. weappend only a scalar equation to the definition of limit cycles in the case of flip and fold; we introduce anadditional variable and append two equations in the case of Naimark - Sacker.

These algorithms are not implemented in any other publicly available package. The only existing soft-ware to perform the continuation is AUTO97-00 [2] which uses a maximal extended system, i.e. the numberof state variables is approximately doubled (flip and fold) or tripled (Naimark - Sacker).

∗ Corresponding author: e-mail [email protected], Phone +32 9 2644893, Fax +32 9 32 9 264 49 95


62 A. Dhooge, W. Govaerts, and Yu. Kuznetsov: Bifurcations of Periodic Solutions

2 Limit Cycles and their Bifurcations

2.1 Limit Cycles

A limit cycle is an isolated periodic solution of (1) with period T , i.e. x(0) = x(T ). Since T is not knownin advance one usually considers an equivalent system defined on the fixed interval [0, 1] by rescaling time.Then the system reads

dxdt − Tf(x, α) = 0x(0) = x(1) (2)

A phase shifted function φ(t) = x(t + s) is also a solution of (2) for any value of s. To obtain a uniquesolution an extra constraint is needed. The following integral constraint is often used [2],[6]:∫ 1

0

〈x(t), xold(t)〉dt = 0 (3)

where xold(t) is the tangent vector of a previously calculated limit cycle and is therefore known, 〈x, v〉is just a different notation for xT v. This condition tries to select the solution which has the smallestphase difference with respect to the previous solution xold. The complete boundary value problem (BVP)defining a limit cycle consists of (2) and (3).

2.2 Bifurcations of Limit Cycles

In [3] it was shown that Flip, Fold and Naimark - Sacker bifurcations of limit cycles can be characterizedby the fact that certain operators between function spaces are rank deficient. These operators can beembedded in one - to - one and onto operators by extending the definition and range spaces. In finitedimensions this corresponds to bordering their matrix representations. This, in turn, leads to test functionsfor the bifurcations in the spirit of bordering methods for matrices [4]. We now collect the main results, atthe same time generalizing them slightly to bring them closer to the numerical implementation.

2.3 Flip Bifurcation of Limit Cycles

A Flip or Period Doubling bifurcation of limit cycles (PD) generically corresponds to a period doubling. Itcan be characterized by adding an extra constraint G = 0 to (2), (3) where G is the Flip test function. Thecomplete BVP defining a PD point consists of (2), (3) and

G[x, T, α] = 0 (4)

where G is a scalar defined by requiring

NPD

(vG

)=

001

. (5)

Here v is a function, and

NPD =

D − Tfx(x(t), α) w01

δ1 + δ0 w02

Intv01 0

. (6)

Here δ is the Dirac evaluation operator, i.e. δ1(v) = v(1) etcetera. The bordering functions v01, w01 andvector w02 are chosen so that NPD is one - to - one and onto (for details see [3]).


2.4 Fold Bifurcation of Limit Cycles

A Fold bifurcation of limit cycles or Limit Point of Cycles (LPC) generically corresponds to a turning pointof a curve of limit cycles. It can be characterized by adding an extra constraint G = 0 to (2), (3) where Gis the Fold test function. The complete BVP defining a LPC point consists of (2), (3) and

G[x, T, α] = 0 (7)

where G is defined by requiring

NLPC

vSG

=

0001

. (8)

Here v is a function, S and G are scalars and

NLPC =

D − Tfx(x(t), α) −f(x(t), α) w01

δ1 − δ0 0 w02

Intf(x(·),α) 0 w03

Intv01 v02 0

(9)

where the bordering functions v01, w01, vector w02 and scalars v02 and w03 are chosen so that NLPC isone - to - one and onto (for details see [3]).

2.5 Torus Bifurcation of Limit Cycles

A Torus or Naimark - Sacker bifurcation of limit cycles (NS) generically corresponds to a bifurcation to aninvariant torus. To handle this case we formally extend the time interval [0, 1] by periodicity to the interval[0, 2]. Also we introduce the new variable κ whose value at the NS point is given by

κ =12(eiθ + e−iθ) = cos(θ) = (e±iθ)

where e±iθ are the Naimark - Sacker multipliers.The complete BVP defining a NS point consists of (2), (3) and

G[x, T, α] = 0 (10)

where G is a 2 × 2 matrix defined by requiring

NNS

v1 v2G11 G12

G21 G22

=

0 00 01 00 1

. (11)

Here v1, v2 are functions, and

NNS =

D − Tfx(x(t), α) w01 w02

δ0 − 2κδ1 + δ2 w03 w04

Intv01 0 0Intv02 0 0

(12)

where the bordering functions v01, v02, w01, w02 and vectors w03, w04 are chosen so that NNS is one - to-one and onto (for details see [3]).

64 A. Dhooge, W. Govaerts, and Yu. Kuznetsov: Bifurcations of Periodic Solutions

We note that the BVP problem for NS points also admits another type of solutions, namely limit cycleswith a pair µ1, µ2 of real multipliers with product µ1µ2 = 1; in this case κ = 1

2 (µ1 + µ2). This situationis similar to that of Hopf points vs. neutral saddles, cf. [4],[5],[6].

We also note that (10) contains 4 equations while only two are expected. In the numerical implementa-tions the continuer code usually does not allow over-determined systems. This situation is similar to that ofHopf points, cf. [4],[5],[6] and can be treated in the same way, i.e. by selecting two out of four componentsto make the linearized system for NS as well - conditioned as possible. The choice of the two componentsshould be adaptive, i.e. it can be changed at every continuation point.

3 Implementation.

In the February 2003 release of MATCONT and CL MATCONT the continuation of PD and LPC cycles isimplemented.

We discretize the BVP exactly as is done in AUTO [2], CONTENT [6], i.e. by orthogonal collocation.In the discretization the bordering functions and vectors that appear in (6), (9) and (12) are replaced byvectors without reference to the continuous situation; the borders are chosen simply to make the discretizedmatrices NPD , NLPC , NNS as well conditioned as possible. These borders can be adapted at eachcomputed continuation point if desired.

The systems that arise in this way are typically sparse and their sparsity increases with the numberof test intervals used in the discretization. In MATCONT and CL MATCONT the sparsity of the linearizedsystems is exploited by using the Matlab sparse matrix routines.

To compare minimal extended systems and maximal extended systems we did some tests in the PD caseand found that at least in the Matlab implementation minimal extended systems are much more efficient.

References

[1] A. Dhooge, W. Govaerts and Yu. A. Kuznetsov, MATCONT: A MATLAB package for numerical bifurcation anal-ysis of ODEs, to appear in ACM TOMS (2003).

[2] E. J. Doedel, A. R. Champneys, T. F. Fairgrieve, Yu. A. Kuznetsov, B. Sandstede and X. J. Wang, AUTO97-AUTO2000 : Continuation and Bifurcation Software for Ordinary Differential Equations (with HomCont), User’sGuide, Concordia University, Montreal, Canada (1997-2000). (http://indy.cs.concordia.ca).

[3] E. J. Doedel, W. Govaerts and Yu. A. Kuznetsov, Computation of Periodic Solution Bifurcations in ODEs usingBordered Systems, to appear in SIAM Journal on Numerical Analysis (2003).

[4] W. Govaerts, Numerical Methods for Bifurcations of Dynamical Equilibria, SIAM, Philadelphia (2000).[5] Yu. A. Kuznetsov, Elements of Applied Bifurcation Theory, 2nd edition, Springer-Verlag, New York (1998)[6] Yu. A. Kuznetsov and V. V. Levitin,CONTENT: Integrated Environment for analysis of dynamical systems. CWI,

Amsterdam (1997): ftp://ftp.cwi.nl/pub/CONTENT


Efficient Iteration Methods for Schur Complement Systems

Jasper van den Eshof∗1, Gerard L. G. Sleijpen1, and Martin B. van Gijzen2.1 Department of Mathematics, Utrecht University, P.O. Box 80.010, NL-3508 TA Utrecht, The Netherlands2 CERFACS, 42 Avenue Gaspard Coriolis, 31057 Toulouse Cedex 01, France.


Schur complement systems arise whenever a coupled system of linear equations is solved by eliminating onepart of the variables. The resulting linear system can be solved by straightforwardly applying an iterativemethod in which in every step, for the matrix-vector product, a linear system must be solved to a certainprecision. In this paper we discuss efficient iteration methods for solving these Schur complement systems.The first (and known) idea is to tune the precision of the approximate solve using a so-called relaxationstrategy. In this work we show that a further significant reduction in computing time can be obtained bycombining a relaxation strategy with the nesting of inexact Krylov methods. Flexible Krylov subspacemethods allow variable preconditioning and can therefore be used in the outer most loop of our overallmethod. We derive for several flexible Krylov methods strategies for controlling the accuracy of both theinexact matrix-vector products and of the inner iterations. The results of our analysis will be illustrated withan example that models global ocean circulation.

1 Introduction

To explicitly form a Schur complement one has to calculate the inverse of a matrix. This makes the Schurcomplement relatively dense, despite the fact that it is normally expressed in terms of sparse matrices.For high dimensional problems it is therefore not feasible to store this matrix. Here, iterative methods, inparticular Krylov subspace methods, come into the picture since they only require the product of the Schurcomplement matrix with some vector, without the need to explicitly form it. For every multiplication onehas to approximately solve a linear system. In this work we discuss and analyze efficient iteration methodsfor Schur complement problems. Our observations are illustrated with an ocean circulation model forbarotropic flow [8].

Solving a Schur complement problem amounts to solving a linear system where the accuracy of thematrix-vector product can be controlled. In recent technical reports different authors [1, 5, 4] have inves-tigated the use of relaxation strategies for the precision of the matrix-vector product in Krylov subspacemethods for linear systems. The goal of these relaxation strategies is, given a required residual precisionε, to minimize the amount of work that is spent in the computation of the matrix-vector product. From apractical point of view this means that these strategies try to allow the error in the product to be as largeas possible without compromising the accuracy of the method or its convergence speed too much (withrespect to ε). In this work we argue that the computational gain is often modest for Schur complementproblems and, to overcome this limitation, we propose a modification to this approach.

2 A model problem: global ocean circulation

Our model problem stems from a finite element discretization of a model that describes the steady barotropicflow in a homogeneous ocean with constant depth and in nearly equilibrium as described in [8]. The model



66 J. van den Eshof, G. Sleijpen, and M. van Gijzen: Efficient Iteration Methods for Schur Complement Systems

is described by the following set of partial differential equations:

−r∇2ψ − β∂ψ

∂x− α∇2ζ = ∇× F in Ω

∇2ψ + ζ = 0,

in which ψ is the stream-function and ζ the vorticity. The domain Ω is the part of the world that is coveredby sea. The external force field F is a function of the wind stress, the average water depth and the waterdensity. The other parameters in these equations are: the lateral viscosity α, the bottom friction r andthe Coriolis parameter β. The above equations are complemented with no-slip conditions on continentboundaries. The equations are commonly expressed in spherical coordinates, which maps the physicaldomain onto a rectangular domain with periodic boundary conditions. The resulting system is subsequentlydiscretized with the method described in [8]. This gives the following linear system of equations[

rL − C αL−L

∗M

] [ψ

ζ

]=[

f0

].

From this equation we can eliminate ψ, which gives the Schur complement for x = ζ:

Sx = b with S ≡ M + αL∗(rL − C)−1L and b ≡ L

∗(rL − C)−1f. (1)

Solving the Schur complement system for the vorticity instead of for the stream function has several ad-vantages. The matrix M is diagonal, due to a luumping approach, and this turns out to be an effectivepreconditioner for S. Furthermore, rL−C is a discretized convection-diffusion operator for which reason-ably effective preconditioners are readily available. Another advantage of solving for the vorticity first isthat, since M is diagonal, we can cheaply construct solutions ζ for various values of α by constructing theKrylov subspace only once using ideas from the so-called class of multi-shift Krylov subspace methods,e.g., [2] and [4, Section 10].

3 Inexact Krylov subspace methods and relaxation strategies

The central problem in this report is to find a vector x′ that approximately satisfies the equation

Sx = b such that ‖b − Sx′‖ < ε,

for some user specified, predefined value of ε. Without loss of generality we assume that the vector b isof unit length. An important class of iterative solvers for linear systems are Krylov subspace solvers inwhich in each step only basic linear algebra operations are required including the matrix-vector product.In an inexact Krylov subspace method we have available, instead of the exact matrix-vector product, somedevice that computes an approximation Sη(v) to the matrix-vector product to a relative precision η as

Sη(v) = Sv + g with ‖g‖2 ≤ η‖S‖2‖v‖2.

For general inexact matrix-vector products within GMRES, Bouras and Fraysse reported in [1] variousnumerical results with a relative precision for the matrix-vector product in step j + 1 that was essentiallygiven by

ηj =ε

‖rj‖2. (2)

Here, rj is the last computed residual in the GMRES method. An interesting property of this empiricalchoice for ηj is that it requires very accurate matrix-vector products in the beginning of the process, and theprecision is relaxed as soon as the method starts converging, that is the residuals become small. For an im-pressive list of numerical experiments they observe that the GMRES method with tolerance (2) convergesroughly as fast as the unperturbed version and the norm of the true residual (‖b−Sxj‖) seemed to stagnatearound a value of O(ε). In [5] a theoretical explanation is offered for these remarkable observations. Formore specific analysis and references of using this criterion for Schur complement systems see [4].


4 Nested inexact Krylov subspace methods

In this section we give an overview of our proposed talk. GMRES often shows superlinear convergence,which means that the residual decrease in the beginning is much smaller than at the final iterations. If weuse in this case a relaxed tolerance for the matrix-vector product then a lot of effort is spent in the first fewiterations, despite the fact that progress is slow at this stage. Another point of concern about this approachis that the accumulation of the errors can be considerable if the number of iterations required to reach thedefined precision is large. This problem can be compensated by working with a smaller tolerance ε (seealso [4]). Of course, this comes at some cost.

In order to reduce the number of necessary iterations of the inexact Krylov method, we consider theidea of ‘preconditioning’ the inexact Krylov subspace method by another inexact Krylov subspace methodset to the tolerance of ξj > ε in step j + 1. Notice that this is a so-called flexible preconditioner thatchanges every step. Methods that can be used here are the so-called flexible methods. These are methodsthat are specially designed for dealing with variable preconditioning and we combine them with an inexactmatrix-vector product. We will in detail consider Richardson iteration, flexible GMRES [3] and GMRESR[6]. In our talk we address two important issues.

The first is concerned with the necessary precision of the matrix-vector product in the outer iteration.For the inner iteration we can use GMRES with the choice given by (2). By extending our ideas andanalysis from [5] we will show that this is also a suitable choice for the considered flexible methods, atleast it is so in practically relevant situations.

A second issue is the choice of the precision for the flexible preconditioner (or inner iteration) ξj . Adrawback of nesting methods is that it generally affects, usually negatively, the efficiency with respectto the total number of matrix-vector products (the ‘convergence speed’). The goal is to make a trade-offbetween choosing the ξj small (leading to a small k in this case) and thereby avoiding the computation ofmany accurate matrix-vector products in the outer iteration and, on the other hand, keeping the ξj largewhich is expected to reduce the total number of necessary matrix-vector products (since the optimality ofthe outer iteration becomes more effective). By assuming a simple model for the cost of the matrix-vectorproduct and making certain assumptions on the convergence speed of the Krylov subspace methods wederive a simple strategy for choosing the ξj .

In order to illustrate the efficiency of our proposal we present a numerical experiment for our modelproblem from Section 2. The physical parameters are chosen as in [8, Section 7.1] except for the viscosityparameter α which is 105 in this experiment. We have given a contour plot of the stream-function ψ in theleft picture in Figure 1 for these parameters. The precision is set to two degrees which results in a matrixof dimension 26455.

If (1) is solved using a Krylov subspace method then a system with a discrete convection-diffusionoperator has to be solved for every matrix-vector product with the Schur complement. In our experimentsthis was done using BiCGstab with an incomplete LU preconditioner [7]. The BiCGstab method wasterminated when a relative residual precision was achieved of η. Note that it follows from the analysis in[4, Section 8] that this does only approximately guarantee a relative error of η for the matrix-vector productand ideally an additional factor should be taken into account.

The first method that we consider is the inexact (full) GMRES method with the relaxation strategy ofBouras and Fraysse (2). As a preconditioner we have used the diagonal matrix M. The results for thisstrategy are plotted in Figure 1. The number of GMRES iterations is large for this problem, about 130,which limits the precision of the inexact Krylov method because of the accumulation of the errors in thematrix-vector product. For this reason we have chosen the empirical value ε = 10−7.

The alternative is to precondition (inexact) flexible GMRES and GMRESR with an inexact GMRESmethod set to a precision of ξ = 10−1. For the GMRES methods we have used both in the outer andin the inner iteration the Bouras-Frasse relaxation strategy (2), with ε respectively given by 10−5 and10−1. The results are given in Figure 1. For the nested method only a few outer iterations are necessaryand therefore the residual gap is less contaminated by errors in the matrix-vector product. The reduction

68 J. van den Eshof, G. Sleijpen, and M. van Gijzen: Efficient Iteration Methods for Schur Complement Systems

2.5

2

1.5

1

0.5

0

0.5

1

x 104

0 200 400 600 800 1000 1200 1400 1600 1800 2000

10−4

10−3

10−2

10−1

100

Iterations BiCGstab

Nor

m tr

ue r

esid

ual

Fig. 1 Left: stream-function, contour/density plot. Right: norm true residual as function of the totalnumber of iterations BiCGstab for the inexact GMRES method with fixed precision ε = 10−6 (+), re-laxed GMRES method with ε = 10−7 (squares) and relaxed flexible GMRES preconditioned with relaxedGMRES set to a precision 0.1 (*) and same for GMRESR in outer iteration (o).

in the total number of BiCGstab iterations is very large which can partly be explained by the observedconvergence behavior of the BiCGstab solver; the convergence curves show a very rapid residual reductionin the first iterations followed by a phase of very slow convergence. This demonstrates a clear advantageof our approach since it, roughly speaking, replaces accurate matrix-vector products by many less accuratematrix-vector products.

References

[1] A. Bouras and V. Fraysse, A relaxation strategy for inexact matrix-vector products for Krylov methods, TechnicalReport TR/PA/00/15, CERFACS, France, 2000.

[2] Biswa Nath Datta and Youcef Saad, Arnoldi methods for large Sylvester-like observer matrix equations, andan associated algorithm for partial spectrum assignment, Linear Algebra Appl. 154/156 (1991), 225–244. MR92b:65032

[3] Youcef Saad, A flexible inner-outer preconditioned GMRES algorithm, SIAM J. Sci. Comput. 14 (1993), no. 2,461–469. MR 1204241

[4] Valeria Simoncini and Daniel Szyld, Theory of inexact Krylov subspace methods and applications to scientificcomputing, Tech. Report 02-4-12, Department of Mathematics, Temple University, 2002, Revised version Novem-ber 2002.

[5] Jasper van den Eshof and Gerard L. G. Sleijpen, Inexact Krylov subspace methods for linear systems, Preprint1224, Dep. Math., University Utrecht, Utrecht, the Netherlands, February 2002, Submitted.

[6] H. A. van der Vorst and C. Vuik, GMRESR: a family of nested GMRES methods, Numer. Linear Algebra Appl. 1(1994), no. 4, 369–386. MR 95j:65034

[7] Henk A. van der Vorst, Bi-CGSTAB: A fast and smoothly converging variant of Bi-CG for the solution of nonsym-metric linear systems, SIAM J. Sci. Stat. Comput. 13 (1992), no. 2, 631–644. MR 92j:65048

[8] M. B. van Gijzen, C. B. Vreugdenhil, and H. Oksuzoglu, The finite element discretization for stream-functionproblems on multiply connected domains, J. Comput. Phys. 140 (1998), no. 1, 30–46. MR 98k:76088


Fast Image Registration: A Variational Approach

Bernd Fischer∗1 and Jan Modersitzki1

1 Institute of Mathematics, University of Lubeck, D-23560 Lubeck, Germany


Image registration is central to many challenges in medical imaging and therefore it has a vast range ofapplications. The purpose of this note is to provide a unified but extremely flexible framework for imageregistration. This framework is based on a variational formulation of the registration problem. We discussthe framework as well as some of its most important building blocks. These include some of the mostpromising non-linear registration strategies used in today medical imaging.

The overall goal of image registration is to compute a deformation, such that a deformed version ofan image becomes similar to a so-called reference image. Hence, the similarity measure is an importantbuilding block. Depending on the application at hand, it is inevitable to constrain the wanted deformationin an appropriate way. Thus, regularization is also a main building block. Finally, it is often desirable toincorporate higher level information about the expected deformation. We show how such constraints orinformation can easily be integrated in our general framework and discuss some examples. Moreover, theproposed general framework allows for a unified algorithmic treatment of the various building blocks.

1 Introduction

Registration is the determination of a geometrical transformation that aligns points in one view of anobject with corresponding points in another view of the same object or a similar object. There exist manyinstances in a medical environment which demand for a registration, including the treatment verificationof pre- and post-intervention images, study of temporal series of cardiac images, and the monitoring ofthe time evolution of an agent injection subject to patient motion. Another important area is the needfor combining information from multiple images, acquired using different modalities, like for examplecomputer tomography (CT) and magnetic resonance imaging (MRI).

To be successful, each individual application should be treated by a specific registration technique. Itis the purpose of this note to provide a general (theoretical) framework and a (practical) software toolboxfor non-linear registration schemes, which may be adapted to the special problem class under considera-tion. The main building blocks of this toolbox resemble typical user demands and may be assembled in aconsistent and intuitive fashion.

Due to the wide range of applications a variety of different registration techniques have been developed.Here, we focus on so-called intensity-driven approaches. These schemes aim to match intensity patternbetween a deformed scan (template) and the target (reference) based on a mathematical similarity measure.For this type of problems, we provide a toolbox of registration routines which enables the user to choosein a consistent way building blocks for schemes which cover a wide range of applications. The idea isto phrase each individual block in terms of a variational formulation. This not only allows for a unifiedtreatment but also for a fast and reliable implementation. The various building blocks comprises threecategories: smoother and internal forces, distances and external forces, and “hard” or “soft” constraints.

Internal forces are defined for the wanted displacement field itself and are designed to keep the dis-placement field smooth (or “natural”) during deformation. In contrast, external forces are computed from

∗ Corresponding author: e-mail: [email protected]


70 Bernd Fischer and Jan Modersitzki: Fast Image Registration - A Variational Approach

the image data and are defined to drive the displacement in order to arrive at the desired registration result.Whereas the internal forces implicitly constrain the displacement to obey a smoothness criteria, the addi-tional constraints force the displacement to satisfy explicit criteria, like for example landmark or volumepreserving imposed constraints.

2 A variational concept

Given two images, a reference R and a template T , the aim of image registration is to find a global and/orlocal transformation from T onto R in such a way that the transformed template matches the reference.Ideally there exists a coordinate transformation u such that the reference R equals the transformed tem-plate Tu, where Tu(x) = T (x − u(x)). Given such a displacement u, the registration problem reducesto a simple interpolation task. However, in general it is not possible to come up with a perfect u, andthe registration problem is to compute an application conformal transformation u, given the reference andtemplate image.

It should be pointed out, that apart from the fact that a solution may not exist, it is not necessarilyunique. For an example, see MODERSITZKI [11]. In other words, intensity based registration is inherentlyan ill-posed problem. That is, a regularization of the problem is necessary.

Another important issue is the fact, that a displacements u which does produce a perfect or nearlyperfect alignment of the given images is not necessarily a “good” displacement. For example, a computeddisplacement which interchanges the eyes of one patient when registered to a probabilistic atlas in orderto produce a nearly perfect alignment, has obviously to be discarded. Also, folding and cracks of thetransformed template are typically not wanted. Therefore it is desirable to have a possibility to incorporatefeatures into the registration model, such that the computed displacement u does resemble the propertiesof the acquisition, like for example the elastic behavior of a human brain. To mimic the elastic propertiesof the objects under consideration is a striking example for internal forces. These forces constrain thedisplacement to physically meaningful movements.

In contrast, the external forces are designed to push the deformable template into the direction of thereference. These forces are based upon the intensities of the images. The idea is to design a similaritymeasure, which is ideally calculated from all voxel values. An intuitive measure is the sum of squares ofintensity differences (SSD). This is a reasonable measure for some applications like the serial registrationof histological sections. If the intensities of corresponding voxels are no longer identical, the SSD measuremay perform poorly. However, if the intensities are still linearly related, a the correlation coefficient (CC)based measure is the measure of choice for monomodal situations. In contrast, the mutual information(MI) related measure is based on the cooccurrence of intensities in both images as reflected by their jointintensity histogram. It appears to be the most successful similarity measure for multimodal imaginary, likeMR-PET.

Finally, one may want to guide the registration process by incorporating additional information whichmay be known beforehand. Among these are landmarks and fiducial markers. Sometimes it is also desir-able to impose a local volume-preserving (incompressibility) constraint which may, for example, compen-sate for registration artifacts frequently observed by processing pre- and post-contrast images. Dependingon the application and the reliability of the specific information, one may want to insist on a perfect ful-fillment of these constraints or on a relaxed treatment. For examples, in practice, it is a tricky (and timeconsuming) problem to determine landmarks to subvoxel precision; see, e.g., ROHR [14]. Here, it does notmake sense to compute a displacement which produces a perfect one to one match between the landmarks.

Summarizing, the general registration problem may be phrased as follows.(IR) image registration problem:

J [u] = D[R, T ;u] + αS[u] = min, subject to Cj [u] = 0, j = 1, 2, . . . ,m.

Here, D models the distance measure (external force, e.g., MI), S the smoother or regularizer (internal


force, e.g., elasticity), and C explicit constraints (e.g., landmarks). The regularization parameter α may beused to control the strength of the smoothness of the displacement versus the similarity of the images.

3 The building blocks

Our approach is valid for images of any spatial dimension d, i.e., there is no restriction to d = 2, 3, 4.The reference and template images are represented by the compactly supported mappings R, T : Ω → R,where without loss of generality, Ω =]0, 1[d. Hence, T (x) denotes the intensity of the template at thespatial position x, where for ease of discussion we set R(x) = bR and T (x) = bT for all x ∈ Ω. Here,bR and bT are appropriately chosen background intensities. The overall goal is to find a displacement u,such that ideally Tu is similar to R, where Tu is the deformed image, i.e., Tu(x) = T (x−u(x)). Note thatu = (u1, . . . , ud) denotes a vector field.

The starting point of our numerical treatment is the minimization of problem (IR). In order to computea minimizer we apply a steepest descent method, where we take advantage of the calculus of variations.To end up with an efficient and fast converging scheme, we require to have explicit expressions of thederivatives of building blocks D, S, and C. In the following subsections we will exemplarily discuss themost popular building blocks as well as their derivatives.

3.1 Smoother and Internal Forces

The nature of the deformation depends strongly on the application under consideration. For example, a sliceof a paraffin embedded histological tissue does deform elastically, whereas the deformation between thebrains of two different individuals is most likely not elastically. Therefore, it is necessary to supply a modelfor the nature of the expected deformation. We now present some of the most prominent smoothers S. Animportant point is, that we are not restricted to a particular smoother S. Any smoother can be incorporatedinto the toolbox, as long as it possesses a GATEAUX-derivative. In an abstract setting, the GATEAUX-derivative looks like

dS[u; v] := limh→0

1h

(S[u+ hv] − S[u]) =∫

Ω

〈A[u], v〉Rd dx,

where A denotes the associated linear partial differential operator. Note that for a complete derivationone also has to consider appropriate boundary conditions. However, these details are omitted here forpresentation purposes; see MODERSITZKI [11] for details.

Elastic registration. This particular smoother measures the elastic potential of the deformation. In connec-tion with image registration it has been introduced by BROIT [2] and discussed by various image registra-tion groups; see, e.g., BAJCSY & KOVACIC [1] or FISCHER & MODERSITZKI [4]. The partial differentialoperator associated with the GATEAUX-derivative of the elastic potential is the well-known NAVIER-LAME

operator. For this smoother, two natural parameters, the so-called LAME-constants can be used in orderto capture features of the underlying elastic body. A striking example, where the underlying physics sug-gests to look for deformations satisfying elasticity constraints, is the three-dimensional reconstruction ofthe human brain from a histological sectioning. Details are given in MODERSITZKI [11].

Fluid registration. Due to the fact that an elastic body memorizes its non-deformed initial state (rubberband), elastic registration schemes are only able to compensate for small deformations. The situationchanges for the viscous fluid model. Here the body adapts to its current state (honey) and consequentlyis much more flexible than an elastic body. The viscous fluid approach was introduced to image registra-tion by CHRISTENSEN [3]. His derivation was based on a specific linearization of the NAVIER-STOKES

equation. However, there is yet another derivation of the underlying partial differential equations, whichdoes fit into “design rules of our toolbox. Roughly speaking, one obtains these equations by consideringthe elastic potential of the velocity of the displacement field. It should come as no surprise that the partial


differential operator is again the NAVIER-LAME operator, this time, however, applied to the velocity. Thewanted deformation is related to the velocity via the material derivative and is straightforward to recover.

Since the viscous fluid approach is quite flexible, it is mainly used when the focus is more on similaritythan on a “natural deformation process. For example, for the design of a probabilistic brain atlas, a bio-physical model for the nature of the deformations is not available. However, the fluid registration has beenproven to be a valuable tool.

Diffusion registration. For image registration problems FISCHER & MODERSITZKI [5] introduced theso-called diffusion regularization,

Sdiff [u] := 12

∑d=1

∫Ω‖∇u‖2 dx, (1)

which is well-known for optical flow applications; see HORN & SCHUNCK [10]. The associated GA-TEAUX-derivative leads to the well-studied LAPLACE-operator, i.e., Adiff [u] = ∆u = (∆u1, . . . ,∆ud),where ∆u = ∂x1x1u + · · · + ∂xdxd

u. It measures the gradient of the deformation. The main reason forintroducing this smoother was its exceptional computational complexity. FISCHER & MODERSITZKI [5]devised an O(N) (!) implementation of the registration scheme, where N denotes the number of imagevoxels. It is based on an additive operator splitting scheme (which parallelizes in a very natural way). Itsoutstanding computational speed makes the diffusion registration scheme to a very attractive option forhigh-resolution, high dimensional, and/or time critical applications. Examples include the registration ofa time series of three-dimensional MRI’s or the online correction of the so-called brain shift during thesurgery.

Curvature registration. As a last example, we present the curvature smoother,

Scurv[u] := 12

∑d=1

∫Ω

(∆u)2dx, (2)

introduced by FISCHER & MODERSITZKI [7], which measures the curvature of the deformation. Thedesign principle behind this choice was the idea to make the non-linear registration phase more robustagainst a poor (affine linear) pre-registration. Since the smoother is based on second order derivatives,affine linear maps do not contribute to its costs. In contrast to other non-linear registration techniques, affinelinear deformations are corrected naturally by the curvature approach. Again the GATEAUX derivative isexplicitly known and leads to the so-called bi-harmonic operator Acurv[u] = ∆2u.

3.2 Distances and External Forces

Another important building block is the similarity criterion. As for the smoothing operators, we concentrateon those measures D which allow for differentiation. Moreover, we assume that there exists a functionf : R

d × Rd → R

d, the so-called force field, such that

dD[R, T ;u; v] = limh→0

1h

(D[R, T ;u+ hv] −D[R, T ;u]) =∫

Ω

〈f(R, T, x, u(x)), v(x)〉Rd dx.

Again, we are not restricted to a particular distance measure. Any measure can be incorporated into ourtoolbox, as long as it permits a GATEAUX-derivative and a force field. Among those are the most com-mon choices for distance measures in image registration, namely the sum of squared differences, crosscorrelation, cross validation, and mutual information. Exemplarily, we discuss two of them; see MODER-SITZKI [11] or ROCHE [12] for details.

Sum of squared differences. The measure is based on a point-wise comparison of image intensities,

DSSD[R, T ;u] := 12

∫Ω

(R(x) − Tu(x)

)2dx,


and the force-field is given by fSSD(R, T, x, y) = (T (x− y)−R(x)) · ∇T (x− y). This measure is oftenused when images of the same modality have to be registered.

Mutual information. Another popular choice is mutual information. It basically measures the entropy ofthe joint density ρR,T , where ρR,T (g1, g2) counts the number of voxels with intensity g1 in R and g2 in T .The precise formula is

DMI[R, T ;u] := −∫

R2 pR,Tu log pR,Tu

pRpTud(g1, g2),

where pR and pTu denote the marginal densities. Typically, the density is replaced by a PARZEN-windowestimator; see, e.g. VIOLA [15]. The associated force-field is given by

fMI(R, T, x, y) =∫

Ω

[Ψσ ∗ ∂g2LR,Tu ](R(x), Tu(x)) · 〈∇Tu(x), v(x)〉Rd ,

where LR,Tu := 1 + pR,Tu(log pR,Tu − log(pRpTu) and Ψ is the PARZEN-window function; see, e.g.,HERMOSILLO [9] or VIOLA [15]. This measure is useful when images of a different modality have to beregistered.

3.3 Additional Constraints

Often it is desirable to guide the registration process by incorporating additional information which maybe known beforehand, like for example fiducial marker. To incorporate such information, the idea is toadd additional constraints to the minimization problem. For example, to restrict the deformation to volumepreserving mappings, one has to add the quantity C[u] := 1

2

∫Ω(det∇u)2 dx to the smoother; see also

ROHLFING & MAURER [13]. Note that the JACOBIAN det∇u(x) has to vanish, if the deformation at x isincompressible.

In other applications, one may want to incorporate landmarks or fiducial markers. Let rj be a landmarkin the reference image and tj be the corresponding landmark in the template image. The toolbox allows foreither adding explicit constraints Cj [u] := u(tj) − tj + rj , j = 1, 2, . . . ,m, which have to be preciselyfulfilled Cj [u] = 0 (“hard” constraints), or by adding an additional cost term C[u] :=

∑mj=1 λj ‖Cj [u]‖

2Rd

to the smoother (“soft” constraints, since we allow for deviations). For a more detailed discussion, we referto FISCHER & MODERSITZKI [6].

4 Numerical Treatment

As already pointed out, our numerical approach is based on the EULER-LAGRANGE equations for theproblem (IR)

A[u](x) + f(R, T, x, u(x)) +∑mj=1 λjdCj [u](x) = 0 and Cj [u] = 0, j = 1, . . . ,m,

which basically states, that all associated GATEAUX-derivatives have to vanish. Here, A is related to theGATEAUX-derivative of S and λj’s are LAGRANGE parameter. It remains to efficiently solve this systemof non-linear partial differential equations. Of course, different solution schemes can be used; see, e.g.,HENN & WITSCH [8]. We use a time-stepping approach. After an appropriate space discretization, weend up with a system of linear equations. As it turns out, these linear systems have a very rich structure,which allows one to come up with very fast and robust solution schemes for all of the above mentionedbuilding blocks; see MODERSITZKI [11]. It is important to note that the system matrix does not depend onthe force field and the constraints. Thus, changing the similarity measure or adding additional constraintsdoes not change the favorable computational complexity. Moreover, fast and parallel solution schemes canbe applied to even more reduce the computation time.


5 Conclusions

In this note we presented a general approach to image registration. Its flexibility enables one to integrateand to combine in a consistent way various different registration modules.We discussed the use of differentsmoothers, distance measures, and additional constraints. The numerical treatment is based on the solutionof a partial differential equation related to the EULER-LAGRANGE equations. These equations are wellstudied and allow for fast, stable, and efficient schemes. Various computed real life examples may befound on the authors home-page: http://www.math.uni-luebeck.de/SAFIR.

References

[1] R. Bajcsy and S. Kovacic, Multiresolution elastic matching, Computer Vision, Graphics and Image Processing46, 1–21 (1989).

[2] C. Broit, Optimal Registration of Deformed Images, PhD thesis, Computer and Information Science, UniPensylvania (1981).

[3] G. E. Christensen, Deformable Shape Models for Anatomy, PhD thesis, Sever Institute of Technology, Wash-ington University (1994).

[4] B. Fischer and J. Modersitzki, Fast inversion of matrices arising in image processing, Num. Algo. 22, 1–11(1999).

[5] B. Fischer and J. Modersitzki, Fast diffusion registration, In AMS Contemporary Mathematics, Inverse Problems,Image Analysis, and Medical Imaging 313, 117–129 (2002).

[6] B. Fischer and J. Modersitzki, Combination of automatic non-rigid and landmark based registration: the best ofboth worlds, Institut of Mathematics, University of Lubeck, Preprint A-03-01 (2003).

[7] B. Fischer and J. Modersitzki, Curvature based image registration, JMIV 18(1), 81–85 (2003).[8] S. Henn and K. Witsch, A multigrid approach for minimizing a nonlinear functional for digital image matching,

Computing 64(4), 339–348 (2000).[9] G. Hermosillo, Variational methods for multimodal image matching, PhD thesis, Universite de Nice, France

(2002).[10] B. K. Horn and B. G. Schunck, Determining optical flow, Artificial Intelligence 17, 185–204 (1981).[11] J. Modersitzki, Numerical Methods for Image Registration, Oxford University Press, to appear 2003.[12] A. Roche, Recalage d’images medicales par inference statistique, PhD thesis, Universite de Nice, France (2001).[13] T. Rohlfing and C. R. Maurer, Jr, Volume-Preserving Non-Rigid Registration of MR Breast Images Using Free-

Form Deformation with an Incompressibility Constraint, IEEE TMI, to appear 2003.[14] K. Rohr, Landmark-based Image Analysis, Computational Imaging and Vision, Kluwer Pub. (2001).[15] P. A. Viola, Alignment by Maximization of Mutual Information, PhD thesis, MIT (1995).


A Preconditioned Finite Elements Method for the p-Laplacian’sParabolic Equation

I. Gerace ∗1, P. Pucci1, N. Ceccarelli1, M. Discepoli1, and R. Mariani1

1 Dipartimento di Matematica e Informatica, Universita degli Studi di Perugia, Via Vanvitelli, 1, I-06123Perugia (PG) - Italy


In this paper we propose a method for the discretization of the parabolic p-Laplacian equation. In par-ticular we use alternately either the backward Euler scheme or the Crank-Nicholson scheme for the time-discretization and the first order Finite Elements Method for space-discretization as in [7]. To obtain thenumerical solution we have to invert a block Toeplitz with Toeplitz blocks matrix. To this aim we use aConjugate Gradient (CG) algorithm preconditioned by a block circulant with circulant blocks matrix. ATwo-Dimensional Discrete Sine-Cosine Fast Transform (2D-DSCFT) is applied to invert the block circu-lant with circulant blocks matrix. The experimental results show how the application of the preconditionerreduces the iterations of the CG algorithm of about the 56% − 69% of the original ones.

1 Introduction

The parabolic p-Laplacian equation models many physical processes, such as the non-Newtonian fluidflows [8] and Smagorinsky type meteorology model [9]. The study of the parabolic p-Laplacian equationis very critical due to the fact that the p-Laplacian operator is strongly non-linear, is not self-adjoint andnot commutable with many other operators. So many results, known in the literature for the semi-linearproblems, can not be applied in this case. In this paper the numerical approximation of parabolic par-tial differential equations, with p-Laplacian operator, is studied. In particular we consider the followingproblem

∂tu(x, t) = div(|∇u(x, t)|p−2∇u(x, t)

)+ f, (x, t) ∈ Ω × R

+,

u(x, t) = 0, (x, t) ∈ ∂Ω × R+,

u(x, 0) = u0(x), x ∈ Ω,

where Ω ⊂ R2 is a domain, with a regular boundary; u0 ∈ W 1,p0 (Ω) = v ∈ W 1,p(Ω)| v|∂Ω = 0 and

f ∈ W−1,p′(Ω), where W−1,p′(Ω) is the dual space of W 1,p0 (Ω), 1/p+ 1/p′ = 1. The data u0 and f are

known; for simplicity, we suppose f time-independent.In [1], it is proved the convergence of the Finite Element Method (FEM) combined with a backward

Euler time-discretization under extra regularity assumptions on the solution. Moreover, this result is validonly for time t ∈ [0, T ]. More recently, Ju in [7], for p > 1, has proved the convergence in L2(Ω) andW 1,p(Ω) of the full discretization obtained combining FEM and alternatively either the backward Euler orthe Crank-Nicholson scheme. Moreover, Ju has proved that the backward Euler scheme is asymptoticallystable.



76 I. Gerace et al.: A PFEM for the p-Laplacian’s Parabolic Equation

In this paper we assume that Ω = [0, n]2 and compute the continuous piecewise linear finite elementapproximation of the above problem. In particular, we define a triangulation of the domain by usingisosceles rectangle triangles with catheti of fixed length h. We reduce the p-Laplacian problem to solve thefollowing linear system (for details see [2]):

B(ut+∆t − ut) = dt, (1)

where ut is a vector of dimension nh × n

h , such that the entry (i, j) corresponds to the value u((hi, hj), t),and h is a positive parameter; the definite positive matrix B, of dimension n2

h2 × n2

h2 , is a block Toeplitzwith Toeplitz blocks matrix and the vector dt depends on the function f and on ut. In particular we have

B =

B1 B2 O ··· O

BT2 B1 B2 ··· O

O BT2 B1 ··· O

. . .. . .

. . . B2

O O ··· BT2 B1

, where B1 =

a b 0 ··· 0b a b ··· 00 b a ··· 0

. . .. . .

. . . b0 0 ··· b a

, B2 =

b 0 0 ··· 0112 b 0 ··· 0

0 112 b ··· 0

. . .. . .

. . . 00 0 ··· 1

12 b

,

and

a :=12

+ 2θ∆t[1 + (

√2)p−2

], b :=

112

− θ∆t

[1 + (

√2)p−2

]2

,

with θ = 1 if the backward Euler scheme is considered and θ = 12 if the Crank-Nicholson scheme is used.

To solve the system (1) at each iteration we must invert the matrix B. To this aim we use a Precondi-tioned Conjugate Gradient (PCG) algorithm. In particular we use as preconditioner a block circulant withcirculant blocks matrix. Let C be the set of all the block circulant with circulant blocks matrices, then wechoose as preconditioner

C = arg minC∈C

‖B − C‖1 , where ‖A‖1 =∑i,j

|ai,j |,

when Strang’s preconditioner [3, 10] is used, and

C = arg minC∈C

‖B − C‖1 , where ‖A‖F =∑i,j

a2i,j ,

when Chan’s preconditioner [4, 5] is adopted. The matrix C assumes the following form:

C =

C1 C2 O ··· CT

2

CT2 C1 C2 ··· O

O CT2 C1 ··· O

. . .. . .

. . . C2

C2 O ··· CT2 C1

, where C1 =

a b∗ 0 ··· b∗b∗ a b∗ ··· 00 b∗ a ··· 0

. . .. . .

. . . b∗b∗ 0 ··· b∗ a

, C2 =

b∗ 0 0 ··· c∗c∗ b∗ 0 ··· 00 c∗ b∗ ··· 0

. . .. . .

. . . 00 0 ··· c∗ b∗

,

and

b∗ =

b for Strang’s preconditionerbn/h−1

n/h for Chan’s preconditioner,c∗ =

112 for Strang’s preconditioner112

(n/h−1)2

n2/h2 for Chan’s preconditioner.

At each step of the PCG algorithm we invert a matrix similar to M = C−1B. Note that M , in the caseof Strang’s preconditioner, has at least

(nh

)2 −(4nh − 4

)columns equal to those of the identity matrix.

Hence M has(nh

)2 −(4nh − 4

)eigenvalues equal to 1. Since the PCG algorithm converges in a number

of steps almost equal to the number of distinct eigenvalues of the matrix M , the PCG algorithm convergesin almost 4nh − 3 steps.


To invert the matrix C we have defined a Two-Dimensional Discrete Sine-Cosine Fast Transform (2D-DSCFT) by means of which is possible to reduce the number of additions from 6n

2

h2 log2nh to 4n

2

h2 log2nh

respect to a classic 2D-DFFT when the matrix under consideration is symmetric (for more details see [6]).In particular we assume that the matrix C is diagolalizable as

C = TDT t,

where D is a diagonal matrix whose entries are equal to the eigenvalues of C, and the entries of the matrixT are products composed by three terms: the first one is a normalizer constant and the second and the thirdone are alternatively a sine or a cosine. The 2D-DSCFT is used to perform the product between the matrixT , or T t, and a vector.

In Figure 1 there are shown the values of f and u0 in one of our numerical experiments. Note that werepresent every function on [0, n]2 as an image, and the high value of the function in (i, j) corresponds toa light value of the pixel (i, j) in the image.

a) b)

Fig. 1 a) Function f ; b) function u0.

a) b) c) d)

Fig. 2 Representation of the numerical solution u for p = 1.1 at different times: a) t = 10, b) t = 35, c) t = 70 andd) t = 100.

a) b) c) d)

Fig. 3 Representation of the numerical solution u for p = 5 at different times: a) t = 10, b) t = 35, c) t = 70 and d)t = 100.

In Figure 2 there are the images relative to the numerical solution u at the instants 10, 35, 70 and 100,obtained considering n = 128, p = 1.1, θ = 1, h = 1 and ∆t = 1. In Figure 3 there are the images ofthe function u obtained numerically with the same parameters of the ones used for images in Figure 2, butnow when p = 5.

78 I. Gerace et al.: A PFEM for the p-Laplacian’s Parabolic Equation

Table 1 Comparison between the total number of iterations after that t = 100 required from the PCG and the CGalgorithms.

p θ iterations of iterations of Strang’s spit/it iterations of Chan’s cpit/it

CG (it) PCG (spit) PGC (cpit)

1.1 1/2 803 317 .39477 349 0.43462

1.1 1 1010 416 .41188 423 0.41881

2 1/2 892 322 .36099 370 0.41480

2 1 1110 433 .39009 432 0.38919

5 1/2 1378 436 .31640 490 0.35559

5 1 1641 526 .32054 604 0.36807

By comparing the Preconditioned Conjugate Gradient (PCG) and the Conjugate Gradient (CG) algo-rithms, as shown in Table 1, we note that the gain of the PCG algorithm in terms of reduction of the numberof iterations is significant. Indeed, the ratio between the total number of iterations of the PCG algorithmafter that t = 100, denoted by spit in the case of Strang’s preconditioner and by cpit in the case of Chan’spreconditioner, and the total number of iterations of the CG algorithm after the same time, denoted by it,remains always between 44% and 31%.

References

[1] J. W. Barrett and W. B. Liu, Finite Element Approximation of the Parabolic p-Laplacian, SIAM J. Numer. Anal.,31, 413–428 (1994).

[2] N. Ceccarelli, Analisi Numerica per l’Equazione Parabolica del p-Laplaciano, Tesi di Laurea in Matematica A.A.2000–2001, Univ. di Perugia (maggio 2002).

[3] R. Chan, Circulant Preconditioners for Hermitiam Toeplitz System, SIAM J. Matrix Anal. Appl., 10, 542–550,(1989)

[4] R. Chan, A.M. Yip and M.K. Ng, The Best Circulant Preconditioners for Hermitian Toeplitz Matrices, SIAM J.Numer. Anal., 38, 876–896, (2001).

[5] T. Chan, An Optimal Circulant Preconditioner for Toeplitz Systemy, SIAM J. Sci. Stat. Comput., 9, 766–771,(1988)

[6] I. Gerace and R. Mariani, Una Nuova Trasformata Veloce Seni-Coseni per Matrici Strutturate, Nota Interna delDip. di Matematica e Informatica, n. 2, Univ. di Perugia (2003).

[7] N. Ju, Numerical Analysis of Parabolic p-Laplacian: Approximation of Trajectories, SIAM J. Numer. Anal., 37,1861–1884 (2000).

[8] J. L. Lions, Quelques Methodes de Resolution des Problemes aux Limites Non Lineaires, Dunod, Paris, (1969).[9] J. L. Lions, R. Temam and S. Wang, Problemes a Frontiere Libre pour les Modeles Couples de l’Ocean et de

l’Atmosphere, C. R. Acad. Sci. Paris, Ser. I Math., 318, 1165–1171 (1994).[10] G Strang, A Proposal for Toeplitz Matrix Calculation, Stud. Appl. Math. 74, 1771–176, (1986)


Solving Parameter-dependent Elliptic Problems byFinite Element Method and Symbolic Computation

H. Gu∗1

1 RISC-Linz Institute, Johannes Kepler University, 4232 Hagenberg, Linz, Austria


In this paper we are solving parameter-dependent elliptic problems with the aid of Finite Element Methodand the use of symbolic computation.

Many geometric inverse problems can be represented by a system of nonlinear partial differential equa-tions with boundary restriction, like the famous Plateau problem (minimal surfaces) [2]. However, due tothe high order differential operator constructing the equations, it is always very expensive to compute thediscrete solutions when we try the finite element approximation. For minimizing the nonlinear discreteform, the known Newton method will only works nicely if the variational mapping is strictly convex andparameter-free. But for the parameter-dependent system (like partially defined boundary conditions, vis-cous solution, we even know in advance the iteration will be globally convergent), deciding a good initialguess is not trivial, since it can obviously save Newton steps and make the result as simple as possible. Thatmakes sense if we are able to solve the finite element solution directly by symbolic methods (e.g. [8, 9]etc.).

In this paper, we initiate the idea by investigating a typical example: Solving u from the followingelliptic differential equation

d2u

(dx)2− ε(1 + (

du

dx)2) = 0,

on domain [0, 1] with boundary condition u(0) = 0 and u(1) = 0. If ε = 0, then the equation only containsone solution. The finite element convergence theory based on such special case is clearly known [1].Nevertheless, we could not generally extend the finite element convergence theory to the above systemwith general parameter ε, since the real solutions of the above equation

u(x, ε) = −1ε(ln(cos(εx+ ε/2 + kπ)) − ln(cos(ε/2))), k = 0, 1

are only well-defined on [0, 1] when −π/3 < ε < π/3 for k = 0, and −π < ε < −π/3 for k = 1, etc..In case we approximate the solutions of the above differential equation by the finite element method, we

can usually discretize it based on a variational form and then solved all the solutions directly by symbolicelimination approaches [8]. If we take x-axis representing the value of ε and y-axis to the value of thefinite element solution uh at point x = 2/3, Figure 1 shows the changing property of uh(2/3) with respectto the parameter ε.

From the picture Fig. 1, it reflects that we can only obtain a unique discrete solution when the parameterε is very close to 0 or at point ε = 0.7 or ε = −2.9 (roughly, and we notice that 0.7 is close to π/3 and -2.9



80 H. Gu: Solving Parameter-dependent Elliptic Problems by FEM and Symbolic Computation

–10

–5

0

5

10

y

–2.8 –2.6 –2.4 –2.2 –2 –1.8 –1.6 –1.4 –1.2 –1 –0.8 –0.6 –0.4 –0.2 0 0.2 0.4 0.6x

Fig. 1 The graph of the finite element solution uh(2/3, ε)

is close to −π). And for ε > 0.7 or ε < −2.9, the equation do not have any solutions. Within the domain(−π/3, π/3), the function graph looks coincident to the antisymmetric property −u(x, ε) = u(x,−ε) fork = 0. And while the mesh partition is sufficiently refined, the above function graph will be closer to thereal solutions u(2/3, ε).

This investigation results also implys: To approximate the solution(s) of the above elliptic differentialequation by the Newton method, it might only globally convergent when parameter ε is 0 or equals to othertwo values, and will not converge to any value when ε > 0.7 or ε < −2.9. Similarly we can also investigatethe changing properties of the solutions based on other domains.

If we extend the last example into a 2-dimensional domain, by denoting ∂xu = ux, ∂y(∂xu) = uxy ,etc., the elliptic equation appears like the following:

(1 + u2x)uyy + (1 + u2

y)uxx − 2uxuyuxy

2(√

1 + u2x + u2

y)3/2=

ε

2√

1 + u2x + u2

y

, (1)

with the boundary value u = f on ∂Ω and Ω is the domain.The geometric meaning of the left side is just the mean curvature of a surface which is represented

explicitly by vector form (x, y, u(x, y)). Typically, if ε = 0, the solution to the above 2-dimensional


boundary problem will be minimal surfaces subject to the Plateau Problem. Approximating the Plateauproblems by the finite element method on a convex domain is basically proved feasible in [7]. The discreteequation can be solved by the Newton method if the boundary condition f is completely defined and theequation is parameter-free, since the variational operator∫

Ω

uxvx√1 + u2

x + u2y

= 0, ∀ v ∈ H1(Ω) (2)

is a strictly convex mapping. But for giving the parameter-dependent boundary condition f , to discretizethe Plateau problem into an algebraic form in a finite element space Sh(Ω) (if that form is well-definedregarding Sh(Ω))∫

Ω

[(1 + u2h,y)uh,xvx + (1 + u2

h,x)uh,yvy + 6uh,xyuh,xuh,yv] = 0, ∀ v ∈ Sh(Ω), (3)

and then try the symbolic methods to get the solution(s), is the recent work [3, 5] (Note that it is notconvenient to use any symbolic computation based on the discrete scheme (2)). The symbolic methodsfor computing the discrete form (3) is promising, since the complexity is not affected by the set of in-determinates generated by f . This new idea has also been integrated into constructing a 2-grid speed-upalgorithm [10]. Namely, we use inexpensive symbolic approach to get in advance a good initial guessingbased on the coarse grid partitions, and then re-corrected by only one Newton step on the refined mesh.This 2-grid algorithm is cheap but still maintains the high accuracy of the finite element solution [4]. Someparallel algorithms based on the local mesh refinement [11] have also been proved and applied to solve thePlateau problems on large scale domains [3].

In this paper, we will consider solving the general parameter-dependent system (1) based on the follow-ing discrete variational form∫

Ω

[(1 + u2h,y)uh,xvx + (1 + u2

h,x)uh,yvy + 6uh,xyuh,xuh,yv] − ε(1 + u2h,x + u2

h,y)v] = 0,

for all v in Sh(Ω) and uh = f on ∂Ω. That is still an algebraic form so that we can compute it bythe symbolic method. We could extend the convergence theory, if the isomorphic property still holds forcertain parameter ε. Consequently we can also prove and implement the 2-grid algorithms and parallelalgorithms for such parameter-dependent case, in order to speed up the computation. The graph of finiteelement solution(s) on the nodes solved by the symbolic method will be 2-D curves with univariable ε. Incase the existence or uniqueness solution property of the equation might be changed with respect to thevalue of ε, the breaking place could be clearly reflected by the possible singular point(s) from those curves,which generated by Maple ′′pacP lot′′ function based on the computer algebra software [6] (like Fig. 1shows).

The research in this subject was motivated by a series of useful stimulating discussions and collaborationwith Dr. Martin Burger (Dept. of Industrial Math., University of Linz, Austria). And this work has beenpartially supported by the Austrian “Fonds zur Forderung der wissenschaftlichen Forschung (FWF)” underproject nr. SFB F013/F1304.

References

[1] P. Ciarlet, The Finite Element Method for Elliptic Problems, North-Holland, (1978).[2] J. Gray, The Hilbert Challenge, Oxford University Press, (2000).[3] H. Gu, Numerical Methods and Symbolic Computations for Generating Minimal Surfaces, PhD thesis, RISC-

Linz, (to appear).

82 H. Gu: Solving Parameter-dependent Elliptic Problems by FEM and Symbolic Computation

[4] H. Gu, Graphical Generating of Minimal Surfaces Subject to the Plateau Problems, Reports of the SFB “Numer-ical and Symbolic Scientific Computing” 02-24, J.Kepler University, Nov., 2002.

[5] H. Gu, Generating Minimal Surfaces Subject to the Plateau Problems by Finite Element Method, NumericalMethods and Applications, 5th International Conference NMA’02, Borovets, Bulgaria, Springer Lecture Notesin Computer Science 2542, 471-478, 2002.

[6] R. Hemmecke, E. Hillgarter and F. Winkler CASA, in Handbook of Computer Algebra: Foundations, Applica-tions, Systems, J.Grabmeier, E.Kaltofen, V.Weispfenning (eds.), pp. 356–359, Springer-Verlag, 2003

[7] C. Johnson and V. Thomee, Error estimates for a finite element approximation of a minimal surface, Math. ofComp. Vol. 29, Num. 130, 343-349, 1975.

[8] D. Wang, Elimination Methods, Springer-Verlag, Vienna, 2001.[9] F. Winkler, Polynomial Algorithms in Computer Algebra, Series “Texts and Monographs in Symbolic Computa-

tion”, Springer-Verlag, Wien, New York, 1996.[10] J. Xu, Two-Grid discretization techniques for linear and nonlinear partial differential equation, SIAM J. Numer.

Anal. 33, 1759-1777, 1996.[11] J. Xu and A. Zhou, Local and parallel finite element algorithms based on two-grid discretizations for nonlinear

problems, Adv. Comput. Math. 14, 393-327, 2001.


On Acceleration of Iteration Convergence for the System ofRadiative Heat Transfer in Kinetic Approximation

V.Y. Gusev∗1, M.Y. Kozmanov∗∗1, and V.V. Zav’yalov∗∗∗1

1 Russian Federal Nuclear Center - VNIITF, 456770, Snezhinsk, Chelyabinsk reg., P.O. Box 245, Russia


The authors of [1], [2] developed a rather economic iterative technique to solve systems of radiation transportequation and energy equation in different approximations. The degree of convergence was good both intransparent and dense media. This paper describes its modification which makes the iterations to convergefaster, with the iteration step being a bit more ”expensive” but not affecting the merits of the algorithm.

1 Problem Statement and Solution Algorithm

Consider the following system of spectral equations in 1D plane case with no motion:1c

∂Iν∂t

+ µ∂Iν∂x

+(aν + kν

)Iν = aνBν +

kν2

∫ 1

−1

Iνdµ,

∂E

∂t=∫ ∞

0

aν

(∫ 1

−1

Iνdµ− 2Bν

)dν.

Here c is the velocity of light, x is the spatial coordinate, t is time, ν is frequency, µ is the cosine anglebetween the photon direction and the x-axis, Iν(x, µ, ν, t) is spectral intensity, aν(x, ν, T ) > 0 is thecoefficient of absorption with allowance for re-emission, kν(x, ν, T ) ≥ 0 is scattering coefficient, T (x, t)is the temperature of the medium, E(x, T ) = AT is specific internal energy, A(x) is heat capacity timesdensity, Bν(ν, T ) is the intensity of thermal radiation (Planck function). The solution is sought in theregion

G =(x, µ, ν, t), 0 ≤ x ≤ l, −1 ≤ µ ≤ 1, 0 < ν <∞, 0 ≤ t ≤ tk

for the following initial and boundary conditions:

T

∣∣∣∣t=0

= T 0(x), Iν

∣∣∣∣t=0

= I0(x, µ, ν), Iν

∣∣∣∣x=0µ≥0

= ϕ(x, µ, ν), Iν

∣∣∣∣x=lµ<0

= ψ(x, µ, ν).

Subdivide G by a finite-difference mesh with steps h, µ and τ for space, direction and time variables,respectively:

Gh =(xi, µj , νλ, tn), i = 0, . . . , N ; j = 0, . . . ,M ; λ = 0, . . . ,Λ; tn = nτ, n = 0, 1, . . .

,

T 0i = T 0(xi), I0

ijλ = I0(xi, µj , νλ),

∗ Corresponding author: Phone: +7 35172 547 30, Fax: +7 35172 551 18∗∗ Corresponding author: e-mail: [email protected], Phone: +7 35172 547 30, Fax: +7 35172 551 18∗∗∗ Corresponding author: e-mail: [email protected], Phone: +7 35172 547 30, Fax: +7 35172 551 18


84 V.Y. Gusev, M.Y. Kozmanov, and V.V. Zav’yalov: On Acceleration of Iteration Convergence

In0jλ

∣∣∣∣µj≥0

= ϕ(x0, µj , νλ, tn), InNjλ

∣∣∣∣µj<0

= ψ(xN , µj , νλ, tn).

Let

Uν =∫ 1

−1

Iνdµ

denote c times the spectral volume density of energy. The approximation is made with a first-order implicitupwind scheme, assuming the temperature Ti+ 1

2and temperature-dependent quantities piecewise constant

in the cell, and referring radiation intensity to integer indices. Note that other schemes can be used too(see, for example, [3]).

In+1ijλ − Inijλ

cτ+∣∣µj∣∣In+1

i,jλ − In+1i−σ,jλ

hi−σ/2+(an+1i−σ/2,λ + kn+1

i−σ/2,λ

)In+1ijλ =

an+1i−σ/2,λB

n+1i−σ/2,λ + 0.5kn+1

i−σ/2,λUn+1i−σ/2,λ,

En+1i+0.5 − Eni+0.5

τ=

Λ∑λ=0

Wλ

[an+1i+0.5,λ

(Un+1i+0.5,λ − 2Bn+1

i+0.5,λ

)].

(1)

Here W is the weight factor of relevant quadrature, σ = sign(µj ). Let s denote the iteration index (in fur-ther manipulations only index s+ 1 remains). It should be noted that the temperature-dependent functionsaν and kν , are taken from the previous iteration. Using [1], we construct the following iterative algorithm.Express Is+1

ijλ replacing elsewhere the indices of the unknown functions in (1) by relevant iteration indices

and taking In+1i−σ,jλ from the previous iteration:

Is+1ijλ =

Inijλ + Ci−σ/2,jIi−σ,jλ +Ki−σ/2,λBs+1i−σ/2,λ + 0.5Si−σ/2,λUs+1

i−σ/2,λ1 +Ki−σ/2,λ + Si−σ/2,λ + Ci−σ/2,j

, (2)

where

Ci−σ/2,j = cτ∣∣µj∣∣h−1

i−σ/2, Si−σ/2,λ = cτki−σ/2,λ, Ki−σ/2,λ = cτai−σ/2,λ

and

Us+1i+1/2,λ =

∫ 0

−1

Is+1ijλ dµ+

∫ 1

0

Is+1i+1,jλdµ. (3)

Using (2) and (3), express Us+1i+ 1

2 ,λin terms of Bs+1

i+ 12 ,λ

and substitute into the energy equation. The tem-

perature T s+1i+ 1

2is found with the Newton method [4]. Thus found temperatures are used to define Bs+1

i+ 12 ,λ

and Us+1i+ 1

2 ,λ. Then the intensity Is+1

ijλ is incrementally calculated, beginning from the right boundary for

µj ≤ 0 and from the left one for µj > 0, using the specified boundary conditions. This iterative algorithmperforms very well in our calculations.

Consider the following modification of the algorithm. Complete (1) with a chain of equations obtainedfrom (1) by shifting the index by χσ, (χ = 1, . . . ,m). Solving the new system of equations by m-foldmultiple recursion yields a system of equations for temperature. For m = 1, the system takes the form:

γ 12T s+1

12

− β 12T s+1

32

= δ 12, i = 0,

−αi+ 12T s+1i− 1

2+ γi+ 1

2T s+1i+ 1

2− βi+ 1

2T s+1i+ 3

2= δi+ 1

2, i = 1, . . . , N − 2,

−αN− 12T s+1N− 3

2+ γN− 1

2T s+1N− 1

2= δN− 1

2, i = N − 1,


where αi+ 12 , βi+ 1

2

, γi+ 12

and δi+ 12

are known coefficients. This system can be easily solved by tridiagonal

sweeping. Having found the temperature, we then follow the initial algorithm. In multidimensional casesthe intensity can be found through point-to-point runs and the temperature can be found by sweeping ateach iteration for relevant direction.

2 Evaluation of the Degree of Convergence

Similarly to [1], consider the case of gray matter and try to evaluate the degree of convergence. AssumeE = AB, B =

∫∞0Dνdν = bT 4, A, a, k are positive constants, b is the Stefan-Boltzmann constant, and∣∣µ∣∣ = 1. Finite differences are formulated for increments

δZs+1 = Zn+1 − Zs+1,

where Z are sought functions. Let

δIs+1 = maxij

∣∣∣∣In+1ij − Is+1

ij

∣∣∣∣and

δBs+1 = maxi

∣∣∣∣Bn+1i+ 1

2−Bs+1

i+ 12

∣∣∣∣.Then, with the above assumptions, we obtain

δIs+1 ≤ q1δIs

and

δBs+1 ≤ q2δIs,

where

q1 =cτC

[2τa(a+ k) + kA

][1 + cτ

(a+ k

)][A(1 + cτa+ C

)+ 2τa

(1 + C

)]q2 =

2τaCA(1 + cτa+ C

)+ 2τa

(1 + C

)for the iterative algorithm from [1]. For the single recursion

(m = 1

), we obtain

q1 =cτC2

[2τa(a+ k) + kA

][1 + cτ

(a+ k

)][A(1 + cτa+ C

)2 + 2cτa2 + 2τa(1 + 2C + 2C2

)]q2 =

2τaC2

A(1 + cτa+ C

)2 + 2cτa2 + 2τa(1 + 2C + C2

)3 Example

The second Fleck’s problem [5] was taken as an example. The problem considers the heating of a systemconsisting of dense and transparent layers. Obtained results prove the convergence of the technique to berather good.

86 V.Y. Gusev, M.Y. Kozmanov, and V.V. Zav’yalov: On Acceleration of Iteration Convergence

References

[1] V. Y. Gusev, M. Y. Kozmanov and E. B. Rachilov, A method to solve implicit finite-difference approximationsto radiation transfer and diffusion equations. Journal Computing Mathematics and Mathematical Physics, 1984,Vol. 24, No. 12, pp. 1842–1849.

[2] V. Y. Gusev, M. Y. Kozmanov, Methods for solving finite-different approximations to thermal radiation transferequations. Journal Computing Mathematics and Mathematical Physics, 1986, Vol. 26, No. 11, pp. 1654–1660.

[3] M. Y. Kozmanov, Monotonic schemes for radiation transfer equations. Journal of Atomic Science and Engineer-ing, 1989, Series Mathematical Modeling of Physical Processes, Is. 2, pp. 51–55.

[4] A. I. Zuyev, Application of the Newton-Kantorovich method to the transfer of non-equilibrium radiation. JournalComputing Mathematics and Mathematical Physics, 1973, Vol. 13, No. 3, pp. 792–798.

[5] J. F. Fleck, Cummings Jr. and J.D., An Implicit Monte-Carlo Scheme for Calculating Time and Frequency De-pendent Nonlinear Radiation Transport. J.Comput.Phys, 1971, Vol.8, No. 3, pp. 313–342.


Combinatorial Scientific Computing: Discrete Algorithms inComputational Science and Engineering

Bruce Hendrickson∗1

1 Sandia National Labs, Albuquerque, NM, USA


Although scientific computing is generally viewed as the province of differential equations and numericalanalysis, combinatorial techniques have long played a crucial role. For instance, graph theory is essentialto the study of molecular structures and material science, many problems in linear algebra involve discretealgorithms, and the parallelization of scientific computations leads to numerous combinatorial problems.Some of these many successes are reviewed, and suggestions are made for new opportunities at this inter-section of disciplines.

1 Combinatorial Scientific Computing

The history of scientific computing is suffused with examples of the enabling power of discrete mathemat-ics and combinatorial algorithms. It is noteworthy that Cayley invented the terminology of graph theory inhis studies of molecular structure. In more recent decades, discrete algorithms have played a pivotal sup-porting role in the steady advance of scientific and engineering computation. Some illustrative examplesof such work are sketched below.

Unfortunately, as the field of scientific computing has grown in both breadth and depth, fragmentationand specialization of expertise has become inevitable. While the fundamental importance of linear algebra,differential equations and related areas are universally recognized, the role of discrete mathematics is oftenoverlooked. Furthermore, combinatorial researchers play a diverse set of roles in scientific computing, andso are easily fragmented into small subdisciplines. Although these subdisciplines may be far apart in anytaxonomy of scientific computing, they are closely related in their underlying aesthetic, mathematical andalgorithmic methodologies.

In recognition of this commonality, the moniker of Combinatorial Scientific Computing has been adoptedto refer to the development, analysis, and application of combinatorial algorithms to address problems inscientific computing. The remainder of this paper reviews several of the areas in which discrete algorithmshave had a major impact on scientific computing, and contains suggestions for future research directions.

2 Some of the Many Roles of Discrete Algorithms

2.1 Linear Algebra

Linear algebra is at the heart of many problems in scientific computing. Combinatorial methods, graphalgorithms in particular, play a very important supporting role in many aspects of linear algebra. A wellknown example of this arises in the exploitation of sparsity in matrix factorizations. Cholesky factorizationof a symmetric, positive-definite matrix remains numerically stable under any symmetric reordering of thematrix. This freedom allows for reorderings to maximally exploit sparsity. This reordering problem is

∗ e-mail: [email protected], Phone: +1 505 845 7599, Fax: +1 505 845 7442


88 Bruce Hendrickson: Combinatorial Scientific Computing

naturally phrased in terms of the graph of the nonzero structure of the matrix. The two most widely usedreordering strategies are instantiations of two of the most common algorithmic paradigms in computer sci-ence. Minimum degree and its many variants are greedy algorithms, while nested dissection is an exampleof a divide-and-conquer approach.

Other reordering problems on sparse matrices are also most naturally posed as graph problems. Forsome solvers, it is desirable to have all the nonzeros close to the diagonal. Algorithms for this objectiveoften exploit breadth-first traversals of the graph of the matrix. Graph eigenvectors have also been appliedto this problem.

To reduce pivoting when performing LU factorization, it is helpful reorder the rows and columns of thematrix to place large values on the diagonal. This problem can be fruitfully recast as the identification of aheavy, maximum-cardinality matching in the bipartite graph of the matrix.

Iterative methods for solving linear systems also often lead to graph problems, particularly for precon-ditioning. Incomplete factorization preconditioners make use of many of the same graph ideas employedby sparse direct solvers. Efficient data structures for representing and exploiting the sparse structure, andreordering methods are all relevant here. Domain decomposition preconditioners rely on good partitionsof a global domain into subproblems, and this is commonly addressed by (weighted) graph or hypergraphpartitioning. Algebraic multigrid methods make use of graph matchings and independent sets in their con-struction of coarse grids or smoothers. Support theory techniques for preconditioning often make use ofspanning trees, graph embeddings or matroid bases.

2.2 Parallel Computing

Parallel computing has become a key technology for enabling scientific computations. The power of multi-ple CPUs and memories allows for larger and more detailed simulations to be performed quickly. However,harnessing the potential of parallel computing requires significant effort in algorithm design and softwaredevelopment. Commonly, solutions to the algorithmic challenges associated with parallelizing scientificcodes involve combinatorial techniques.

One area in which discrete algorithms have made a major impact in parallel scientific computing is inpartitioning for load balance. The challenge of decomposing an unstructured computation among the pro-cessors of a parallel machine can be naturally expressed as a graph (or hypergraph) partitioning problem.New algorithms and effective software for partitioning have been key enablers for parallel unstructuredgrid computations. Research in partitioning algorithms and models continues to be an active area.

Another graph problem that arises with some frequency in parallel scientific computing is vertex color-ing (or its close cousin, independent set). A coloring is an assignment of a color to each vertex in a graph sothat no adjacent vertices have the same color. The goal is to accomplish this by using only a small numberof colors. The utility of a coloring arises from the observation that none of the vertices with the same colordepend upon each other, and so operations can be performed on all of them simultaneously. This insighthas been exploited in the parallelization of adaptive mesh codes, in parallel preconditioning and in othersettings.

Some problems, e.g. particle simulations, are described most naturally in terms of geometry insteadof the language of graphs. A variety of geometric partitioning algorithms have been devised for suchproblems. In addition, space-filling curves and octree methods have been developed to parallelize mul-tipole methods. In all these examples, and many more, the techniques required to parallelize a scientificsimulation involve ideas and abstractions from theoretical computer science and combinatorial algorithms.

2.3 Mesh Generation

Mesh generation is an essential first step for many kinds of simulations on complex geometries. The meshgeneration community makes use of many different areas of mathematics, but geometric algorithms arepredominant. Triangular and tetrahedral mesh generation methods are usually based upon some form of


Delaunay tessellation. Convex hulls, intersection checking and other geometric primitives are widely usedin mesh generation.

The topology of meshes is also an important consideration, and this is most naturally described in graphtheoretic terms, where the connectivity of either the mesh or its dual is represented as a graph. Particularlyfor hexahedral meshes, this topological description can be an important component of meshing algorithms.

As in the areas discussed above, mesh generation is not only a consumer of existing algorithms, but alsoan important source of new algorithmic questions or variants for consideration by theoretical computerscientists.

3 Future Directions

The breadth of applications of discrete algorithms in the computational sciences seems to be growingrapidly, with no end in sight. Here, I just touch on two topical examples.

3.1 Computational Biology

In recent years, biology has experienced a dramatic transformation into a computational and even an in-formation theoretic-discipline. New technologies for genetic sequencing have played a key role in thistransformation. Genetic analysis has motivated and required great advances in a wide variety of stringalgorithms. New algorithms for fragment assembly, sequence comparison and other fundamental opera-tions have been essential to the biological revolution. Much of biology is now dependent upon advanceddatabase searches for similarities between sequences.

Other aspects of biology are also being transformed by computer science. The study of proteins nowrelies on sophisticated algorithms for structural comparison, and graph algorithms play a key role in theinterpretation of NMR and mass spectrometry experiments. Phylogenetics, the reconstruction of historicalrelationships between species or individuals, is now intensely computational, involving string and graph al-gorithms. The analysis of micro-array experiments, in which many different cell types can simultaneouslybe subjected to a range of environments, involves cluster analysis and techniques from learning theory.

In short, discrete algorithms have been a driving force behind many of the major advances in biology.There remain innumerable opportunities and needs for further work.

3.2 Information Analysis

Biology has relied on discrete algorithms to help deal with the overwhelming flood of data that has re-sulted from recent advances in experimental techniques. But a similar data explosion is impacting othersciences, and even the broader societies we live in. Graph algorithms are already playing a key role inindexing, searching and understanding the world wide web, and this trend seems likely to grow. Google’spage ranking algorithm uses eigenvectors of the link graph of the web to identify important web sites. Un-derstanding the output of large scale scientific simulations is increasingly demanding tools from learningtheory and sophisticated visualization algorithms. The class of methods known as latent semantic analysisfor discerning structure in sets of documents or complex data objects relies on linear algebra operations onthe combinatorial structure of the objects.

4 Conclusions

Although often under-appreciated, combinatorial algorithms have long played a crucial, enabling role inscientific computing. This importance seems to be growing. Not only are traditional areas of applicationcontinuing to expand, but new areas of scientific discovery seem to rely even more heavily upon discretealgorithms. New applications and models continue to invigorate the applied algorithms research commu-nity.

90 Bruce Hendrickson: Combinatorial Scientific Computing

Combinatorial abstractions provide alternative ways of thinking about many important problems inscientific computing. Having multiple perspectives on a problem is almost invariably a good thing. Butthe continued expansion of discrete methods raises challenges as well. There are too few ties betweenresearchers in scientific computing and those in discrete algorithms. This reflects an educational structurethat treats these areas as completely unrelated. Improved efforts at cross training would be very welcome.

Professionals working in combinatorial scientific computing face many challenges. Relevant confer-ences, journals and professional societies are usually targeted to either discrete math or scientific comput-ing, and often fail to well serve their intersection. And work at the boundary between fields is often notfully embraced by either discipline, leading to problems for young faculty in particular.

Despite these challenges, interdisciplinary work is often where the most important advances occur.Research in combinatorial scientific computing combines theoretical rigor with enormous potential forpractical impact. It looks likely to be a thriving and important area for many years to come.

0 Due to the broad range of topics touched on very briefly in this paper, I have decided to forego citations. I will be happy to tryto provide pointers to interested readers.


Testing a Medium Sized Numerical Package: A Case Study

Tim Hopkins∗ and David Barnes∗∗

1,2 Computing Laboratory, University of Kent, Canterbury, Kent, CT2 7NF, UK


We report on our experiences of applying a number of software testing techniques and software qualitymetrics to a medium sized numerical package. This package includes its own testing routines and we reporta number of areas where we believe both the testing process and the code may be improved. We also reporta number of faults and discuss a testing regimen which we have developed that appears to be more effective,efficient and extensible than the one currently provided with the package.

1 Introduction

Hopkins [1] describes how the judicious setting of compiler flags to a number of modern compilers coupledwith a metric driven approach to software testing uncovered a number of faults in a published numericalcode. In this paper we extend this approach to the Lapack suite [2]. This consists of well over a thousandroutines which solve a wide range of numerical linear algebra problems. In the work presented here wehave restricted our attention to the single precision real and complex routines which approximately halvesthe total amount of code to be inspected. Nevertheless, there is a great deal of common code between thetwo precisions which means that our results apply to more than half in practice.

The first release of Lapack was in 1992. Part of the intention of the package’s designers was to providea high quality and highly-portable library of numerical routines. In this, they have been very successful.Since its early releases, which were written in Fortran 77, the software has been modernized with the re-lease of Lapack 95 [3]. Lapack 95 took advantage of features of Fortran 95, such as optional parametersand generic interfaces, to provide wrapper routines to the original Fortran 77 code. This exploited thefact that parameter lists of some existing routines were effectively configured by the values of their actualparameters, and that calls to similar routines with different precisions could be configured by type infor-mation. These wrappers provided the end user with improved and more robust calling sequences whilstpreserving the investment in the underlying, older code.

Comprehensive test suites are available for both the Fortran 77 and Fortran 95 versions of the packageand these have provided an excellent starting point for the focus of our work on the testing of the package.Furthermore, a package of this size and longevity is potentially a valuable source of data for the proving ofsoftware quality metrics. Over the past 10 years there have been eight releases of Lapack and this historyhas allowed us to track the changes made to the software since its initial release. This has provided us with,among other things, data on the numbr of faults corrected and this has enabled us to investigate whethersoftware quality metrics can be used to predict which routines are liable to require future maintenance. Ithas also been possible to determine whether the associated test suites have been influenced by the faulthistory.

∗ Corresponding author: e-mail: [email protected], Phone: +44 1227 823793, Fax: +44 1227 7628111

∗∗ Corresponding author: e-mail: [email protected], Phone: +44 1227 827696, Fax: +44 1227 7628112


92 Tim Hopkins and David Barnes: Testing Numerical Software

2 Testing

The approach to testing that is supplied with the package is very much black-box and self checking. Inthe main, the purpose of these testing routines is to support the porting process. In many cases randommatrices are generated and the output is identified as being ‘correct’ by performing further calculationsusing the input data and the generated results. In addition tests are performed to ensure that, as far aspossible, all illegal or inconsistent data is rejected by the numerical routines. While this is an excellentmethod for improving an installer’s confidence in a porting exercise it does not provide any quantitativefeedback as to how well the code is actually being tested, in terms of coverage metrics such as statementcoverage, basic block coverage, and so on.

An expectation of the installation process is that all of the supplied tests should pass without failures.We were keen to explore how safe this expectation was, by using state-of-the-art compile-time and runtimechecks. We began by executing all the test software using a variety of compilers (including NagWareFortran 95, Lahey Fortran 95 and Sun Fortran 95) that allowed a large number of internal consistencychecks to be performed. This process detected a number of faults in both the testing routines and thenumerical routines of the current release, including accessing array elements outside of their declaredbounds, type mismatches between actual arguments and their definitions and the use of variables beforethey have been assigned values. Such faults are all non-standard Fortran and could affect the final computedresults. Although many of these faults were in the distributed package some of the problems encounteredwere due to the way in which the testing software had been configured and this has, in turn, pointed to theneed to re-order those checks within a number of the numerical routines that validate input parameter data.

Particular testing strategies should be chosen to fit specific goals and it is important to appreciate thatthe existing test strategy is strongly influenced by the desire to check portability. A feature of the testingstrategy employed with the package is to batch up tests in two ways

• using multiple data sets on a single routine

• using similar data on multiple routines.

It was discovered that the very act of batching up tests allowed some errors to be masked — typicallythrough uninitialised variables used in later tests having been set by earlier tests. Thus, while it requiresfar more organization, there are very definite advantages to be gained from running each test separately.An approach using separate tests also supports the introduction of additional tests whenever a new error isdiscovered and then corrected. Such additional tests serve as regression tests for future releases.

In order to judge how effective a testing process is quantitative measurements are required. The mostbasic white-box testing metric for the purpose of analysing test coverage is basic block coverage, wherewe attempt to derive test cases that ensure that all statements within the package are executed at least once.One purpose of using multiple data sets on a single routine is to attempt to achieve full coverage of thatroutine by following different paths through its basic blocks. The use of a source code profiler to generatebasic block coverage data found that a large percentage of the test cases provided did not, in fact, improvethis metric and could be pruned with respect to the coverage goal. This dramatically reduced the totalnumber of tests to be run, with a consequent reduction in testing time. The time required to run a full testsuite is often a factor influencing how often testing is performed. We will present data detailing both thecoverage obtained using the current test data as well as the reduction in the number of test cases requiredto provide the same level of basic block coverage.

The need for dealing with a large number of test cases has led us to develop a more flexible testingenvironment which allows for the easy documentation and addition of test cases while keeping track ofthe code coverage obtained. Ideally, we would aim to extend the existing test suite in order to achieve100% basic block coverage. Our major problem at present is finding data that will exercise the remainingunexecuted statements. The calling structure of the numerical routines is hierarchical and, in many casesit is clear that unexecuted code in higher level routines would only be executed after the failure of some


lower level routine. It is not clear whether such situations are actually known to occur or if the higher levelcode is the result of defensive programming. While defensive programming practices are obviously not tobe discouraged, they clearly potentially confuse the picture when considering how thorough a test suite isin practice.

Basic block coverage is considered by many experts [4] to be the weakest coverage criterion although, inthe case of Lapack, it has proved extremely hard to approach 100% coverage. To supplement this, therefore,we briefly discuss a number of other test metrics (for example, branch coverage, condition coverage, andmutation testing) and provide some preliminary results.

3 Software Quality Metrics

There have been many attempts during the past three decades to quantify the quality of software. Thishas led to a huge number of software metrics being proposed; a review of many of these may be found inZuse [5].

In earlier work, we have successfully used knot counts [6] to measure the decrease in code complexitywhen using automated tools to translate Fortran 66 to Fortran 77 and Fortran 90 ([7] and [8]). We havealso shown a strong relationship between a routine’s path count ([9] and [10]) and the number of faultsreported after its release [11]. This work using NAG Library routines was extended to an earlier versionof Lapack [12]. We now have further results which include the last major release (v3.0) and two minorreleases (3.0a and 3.0b) of Lapack. This shows quite clearly the strong predictive properties of the pathcount metric as a means of identifying code that is liable to cause serious maintenance problems. We reporton how restructuring tools (spag [13] and nag struct [14]) fared in attempting to reduce the complexity ofthese routines and provide data on the effects such restructuring has on testing. Such routines shouldprovide a particular focus for testing efforts and this does not necessarily fit well with a generic approachto testing batches of routines.

4 Conclusions

We have shown that the use of a measurable approach to the testing of numerical software coupled with theuse of software tools may be used to uncover faults that remain in good quality numerical software evenafter extensive black-box testing.

We have set up a framework for testing the Lapack suite of routines and have increased the basic blockcoverage. Even so we are still some way from achieving 100% basic block coverage which is consideredto be the weakest coverage metric. Indeed Beizer [15] has argued that even if complete branch coverage isachieved then probably less than 50% of the faults left in the released software will have been found. Wewill be endeavouring to increase the basic block coverage and we intend to start work on improving thecondition and branch coverage metrics.

We have also used the successive releases of the package to study the link between a small number ofsoftware metrics, that have proved useful in the past, and the occurrence of faults at a routine level. Ourresults show that the path count metric does appear to be a good predictive measure of which routines areliable to cause post-release maintenance problems and we recommend that these routines should be thefocus of supplementary testing efforts.

References

[1] Tim Hopkins, A Comment on the Presentation and Testing of CALGO Codes and a Remark on Algorithm 639:To Integrate Some Infinite Oscillating Tails, ACM Trans. Math. Software 28, 285–300 (2002).

[2] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammar-ling, A. McKenney, and D. Sorensen, LAPACK Users’ Guide, 3rd ed. (SIAM, Philadelphia, 1999).

94 Tim Hopkins and David Barnes: Testing Numerical Software

[3] V. A. Barker, L. S. Blackford, J. Dongarra, J. Du Croz, S. Hammarling, M. Marinova, J. Wasniewski, and P.Yalamov, LAPACK95 Users’ Guide, (SIAM, Philadelphia, 2001).

[4] C. Kaner, J. Falk, and H. Q. Nguyen, Testing Computer Software (Wiley, Chichester, 1999).[5] H. Zuse, Software Complexity: Measures and Methods (W. de Gruyter, Berlin, 1991).[6] M. R. Woodward, M. A. Hennell and D. Hedley, A Measure to Control Flow Complexity in Program Text, IEEE

Transactions on Software Engineering SE-5, 45–50 (1979).[7] Tim Hopkins, Is the Quality of Numerical Subroutine Code Improving? in Modern Software Tools for Scientific

Computing, edited by E. Arge, A. M. Bruaset and H. P. Langtangen (Birkhauser Verlag, Basel, 1997) pp 311–324.[8] Tim Hopkins, Restructuring Software: A Case Study, SP&E 26, 967–982 (1996),[9] B. A. Nejmeh, NPATH: A Measure of Execution Path Complexity and its Applications, Commun. ACM 31,

188–200 (1988).[10] Programming Research Ltd., Hersham, Surrey, UK, QA Fortran 6.0 (1992).[11] Tim Hopkins and L. Hatton, Experiences with FLINT, a software metrication tool for Fortran 77 in The Sympo-

sium on Software Tools (Napier Polytechnic, Edinburgh, 1988).[12] D. J. Barnes and Tim Hopkins, The Evolution and Testing of a Medium Sized Numerical Package in Advances

in Software Tools for Scientific Computing (Lecture Notes in Computational Science and Engineering, Volume10), edited by H. P. Langtangen, A. M. Bruaset and E. Quak (Springer-Verlag, Berlin, 2000) pp 225–238.

[13] Polyhedron Software, Oxford, UK, plusFORT, Revision D (1997).[14] Numerical Algorithms Group Ltd., Oxford, UK, NAGWare f77 Tools, Release 4.0 (1999).[15] B. Beizer, Software System Testing and Quality Assurance (Van Nostrand Reinhold, New York, 1984).


Iteratively Weighted Approximation Algorithms for NonlinearProblems

D.P. Jenkinson∗1, J. C. Mason∗∗ 1, and A. Crampton∗∗∗ 1

1 University of Huddersfield, Huddersfield HD1 3DH, UK.


A number of rational or nonlinear approximation problems can be tackled by setting up an iterative proce-dure based on weighted linear estimators in the spirit of Loeb’s algorithm [4]. In l∞, convergence is notalways assured [2], but the method seems to converge consistently in l2 [5]. In the l2 norm this methodusually yields a near-best rather than best approximation which is nevertheless useful and superior to theinitial approximation. We describe novel extensions of Loeb’s scheme to approximations by a function of alinear form, and in particular, adopting a new form, a function of a radial basis function.

1 Introduction

Consider the l2 approximation on [0,∞) of a naturally decaying function f by the form g (P ) ≡ A/[P ]R,where A is a fixed function, R a constant and P is a polynomial form. For this form Appel [1] introduceda direct algorithm using an appropriate weight function w (x). Mason and Upton [5] then extended thisalgorithm to fit general transformations of linear forms in lp, l2 or l∞ norms. We extend this algorithm tofit an approximation g (L) to a linear combination of Radial Basis Functions (RBFs) where g is the transferfunction and L is the linear combination of RBFs. The work of Mason and Upton [5], based on Carta [3],also applied an iterative algorithm of the form

minL

‖ ε∗i+1 − (ε∗i − εi) ‖, i = 0, 1, 2, . . .

where εi is the previous true error, ε∗i is the previous modified error and ε∗i+1 is the current modified errorfor the solution of the transformed linear form. This aims to correct the modified error ε∗, at iteration stepi by comparing the true error ε and ε∗ at the previous step. We also consider this algorithm and apply thenew alternative iterative algorithm

minL

‖ ε∗i+1

(εiε∗i

)‖

to a transformed RBF form. This algorithm is simply based upon minimising ε∗ at the current step multi-plied by the ratio of the error ε to the modified error ε∗ at the previous iteration. Throughout this paper weshall use

L (x) =n∑j=1

bjφ (‖ x− λj ‖2) x, λj ∈ d

as the standard RBF approximation form, where φ is a standard univariate basis function, bj are a set ofcoefficients, xi is the abscissa vector and λj are the RBF centres. We concentrate on the cubic form in

∗ e-mail: [email protected], Phone: +44 1484 473047∗∗ e-mail: [email protected], Phone: +44 1484 472680,∗∗∗ e-mail: [email protected], Phone: +44 1484 472899,


96 D.P. Jenkinson, J.C. Mason, and A. Crampton: Iteratively Weighted Approximation Algorithms

which φ (r) = r3 and on the (squared) l2 norm

‖ f (xi) − L (xi) ‖2=∑i

[fi − Li]2

. The results of these methods are then compared with those of the Gauss-Newton algorithm

2 Notation

Mason and Upton [5] adopt and detail the transformation functions which are needed. An absolute approx-imation error of ε gives

f − g (L) = ε (1)

and taking a Taylor series expansion of the error gives the weighted equation

w (x)L (x) − w (x) g−1 (f (x)) = ε∗ (2)

where

w (x) = [(g−1)′

(f (x))]−1. (3)

The weights in (2) ensure that the modified error ε∗ is almost equal to ε, given by (1).

3 Iterative Algorithm

The original data is given by (1) and so

g−1 (f − ε) = L.

As a least squares solution we ideally wish to minimise the nonlinear quantity

minm∑i=1

ε2i =m∑i=1

(f (xi) − g (Li))2, Li = L (xi)

where by (1), (2), (3), we have

ε∗ = w (x) [−L+ g−1 (f)] = w (x) [−g−1 (f − ε) + g−1 (f)].

We therefore seek to minimise

min∑(

ε∗i+1

)2 ε2iε∗i

2

on the assumption that, in the limit as i→ ∞, this behaves like

m∑i=1

ε2i .

4 Gauss Newton Algorithm

The Gauss Newton Method is a nonlinear method of modelling and is reliant on first derivatives only. Inthe present context we iteratively minimise the sum of squares of the errors ε and this should yield a bestl2 approximation (for comparison purposes).


5 Method of Approximation

We consider two transfer functions

g =1L2

and g = exp (L)

where L is defined to be the cubic RBF approximation. Taking each transfer function in turn and using (2),the weights are given as w (x) = 2f

32 and w (x) = f . The iterative algorithm is then applied to each form

of g and the method is repeated for two separate test functions with each data set containing added noise.The first test function f (x) = exp (−10x) on [0, 1] using 5 and 10 RBF centres to approximate 80 values

and f (x) = exp(− (x− 0.5)2

)on [−1.5, 2.5] using 5 and 10 centres to approximate 100 values. Noise

is added to the original function values in the proportions

exp (−10x) → exp (−10x) + 0.0001 × randn(20, 1)

and alsoexp(− (x− 0.5)2

)→ exp

(− (x− 0.5)2

)+ 0.005 × randn(30, 1)

using a Matlab code.

6 Discussion of Results

RBF domain [0, 1] 5 10L = cubic RBF 0.03131 0.00688g = 1

L2 2.99999 × 10−4 1.06409 × 10−4

‖ ε∗i+1 − (ε∗i − εi) ‖ for g 2.99720 × 10−4 1.06026 × 10−4

‖ ε∗i+1

(εiε∗i

)‖ for g 2.99424 × 10−4 1.05635 × 10−4

Gauss Newton Algorithm for g 2.99113 × 10−4 1.05251 × 10−4

g = exp (L) 5.32166 × 10−4 1.47190 × 10−4

‖ ε∗i+1 − (ε∗i − εi) ‖ for g 2.82659 × 10−4 1.05327 × 10−4

‖ ε∗i+1

(εiε∗i

)‖ for g 2.82208 × 10−4 1.04607 × 10−4

Gauss Newton Algorithm for g 2.74926 × 10−4 1.01608 × 10−4

Table 1 Results in approximating 80 abscissae to f (x) = exp (−10x)

Tables 1 and 2 illustrate the residual mean square errors (RMS) recorded when approximating eachfunction using a given number of abscissae and two sets of centres. From Table 1 it is immediately observedthat both sets of centres record very accurate approximations. The cubic in its unaltered form performsquite modestly but a significant improvement is achieved once both transfer functions are fitted to the RBFform as all the approximations are comparable to 10−4 in accuracy. With g = L−2 for both sets of centres,the application of g has reduced the approximation error significantly. When the iterative algorithms areapplied to g, further improvements are obtained and all the errors are very close to the best approximationobtained by the Gauss-Newton method. The results are similar for g = exp(L) where for 5 centres theRMS is 5.32166 × 10−4 which is higher and uncharacteristic of the other results. Both of the iterativealgorithms are able to significantly reduce the RMS from the original cubic form and are quite close to theGauss-Newton method. Table 2 illustrates that the transformation results using 5 centres with g = L−2

are less impressive but due to the larger domain this would be expected. The cubic RBF performs verywell for both sets of centres giving very small RMS values of 0.01418 and 0.00473 respectively. The

98 D.P. Jenkinson, J.C. Mason, and A. Crampton: Iteratively Weighted Approximation Algorithms

RBF domain [−1.5, 2.5] 5 10L = cubic RBF 0.01418 0.00473g = 1

L2 0.01379 0.00461‖ ε∗i+1 − (ε∗i − εi) ‖ for g 0.01334 0.00459‖ ε∗i+1

(εiε∗i

)‖ for g 0.01293 0.00457

Gauss Newton Algorithm for g 0.01262 0.00455g = exp (L) 0.00527 0.00457‖ ε∗i+1 − (ε∗i − εi) ‖ for g 0.00520 0.00456‖ ε∗i+1

(εiε∗i

)‖ for g 0.00516 0.00454

Gauss Newton Algorithm for g 0.00512 0.00452

Table 2 Results in approximating 100 abscissae to f (x) = exp− (x − 0.5)2

function of the RBF reduces, but not greatly, these approximations. Both iterative algorithms also furtherimprove the approximations and for both sets of centres converge close to the best approximation. Using5 centres the function g = exp(L) of an RBF performs very well initially, reducing the approximationerror by roughly two thirds. Using ten centres the initial cubic approximation is very accurate and so nosignificant improvement is obtained using the function g. Both iterative algorithms are able to reduce theapproximation further converging very close to the Gauss-Newton approximation on both occasions.

7 Conclusions

All of the methods considered have demonstrated that for mild error in data sets the RMS approximationcan be reduced.On the test cases above the most consistent transfer function applied to each function is exp(L) which isidentified by its greater consistency.The most impressive iterative norm is

‖ εi+1

(εiε∗i

)‖

which performs superior to i =‖ ε∗i+1 − (ε∗i − εi) ‖ in all cases of centres used and functions tested.All the methods improve the RMS but are not able to converge to the solution obtained from the Gauss-Newton method in each case.

References

[1] Appel, K., Rational Approximation of Decay-Type Functions, BIT 2, 69–75, 1962.[2] Barrodale, I. and Mason, J. C., Two Simple Algorithms for Discrete Rational Approximation, Mathematics of

Computation, Volume 24, 877–891 1970.[3] Carta, D., Minimax Approximation by Rational Functions of the Inverse Polynomial Type, BIT 18, 490–492,

1978.[4] Loeb, H. L., On Rational Fraction Approximations at Discrete Points, Convair Astronautics, 1957.[5] Mason, J. C. and Upton N. K., Linear Algorithms for Transformed Linear Forms, Approximation Theory VI:

Volume 2, 417–420, 1989.


Numerical Solution of the Two-dimensional Time IndependentSchrodinger Equation∗

Z. Kalogiratou1, Th. Monovasilis2, and T.E. Simos ∗∗∗∗∗2

1 Department of International Trade, Technological Educational Institute of Western Macedonia at Kastoria,P.O. Box 30, GR-521 00, Kastoria, Greece

2 Department of Computer Science and Technology, Faculty of Sciences and Technology, University ofPeloponnese, GR-221 00 Tripolis, Greece.


The solution of the two-dimensional time-independent Schrodinger equation is considered by partial dis-cretisation. The discretised problem is treated as an ordinary differential equation problem and Numerov’smethod, and a modified Numerov method with minimum phase-lag are used to solve it. Both methods areapplied for the computation of the eigenvalues of the two-dimensional harmonic oscillator and the two-dimensional Henon-Heils potential. The results are compared with the results produced by full discretisa-tion.

1 Introduction

The time-independent Schrodinger equation is one of the basic equations in quantum mechanics. Plentyof methods have been developed for the solution of the one-dimensional time-independent Schrodingerequation. Authors have treated the two-dimensional problem which is a partial differential equation bymeans of discretisation of both variables x and y which transforms the problem into an eigenvalue problemof a block tridiagonal matrix, this is the well known five-points method. Here we use partial discretisationonly on the variable y, then we have an ordinary differential equation problem. We apply to this problemNumerov’s method and a modified Numerov method with minimum phase-lag. Both methods as well asthe full discretization method are applied in order to find the eigenvalues of the two-dimensional harmonicoscillator and the two-dimensional Henon-Heils potential. The Numerov-type methods are proven to havesuperior performance as expected.

2 Partial discretisation of the two-dimensional equation

The two-dimensional time-independent Schrodinger equation can be written in the form

∂2ψ

∂x2+

∂2ψ

∂y2+ (2E − 2V (x, y))ψ(x, y) = 0, (1)

ψ(x,∓∞) = 0, −∞ < x <∞,

ψ(±∞, y) = 0, −∞ < y <∞

∗ Funding by research project 71239 of Prefecture of Western Macedonia and the E.U. is gratefully acknowledged.∗∗ Corresponding author: e-mail: [email protected], Phone: +30 210 94 20 091, Fax: +30 210 94 20 091. Please use

the following address for all correspondence: Dr. T.E. Simos, 26 Menelaou Street, Amphithea - Paleon Faliron, GR-175 64 Athens,Greece.

∗∗∗ Active Member of the European Academy of Sciences and Arts


100 Z. Kalogiratou, Th. Monovasilis, and T.E. Simos : Numerical Solution of the Schrodinger Equation

where E is the energy eigenvalue, V (x, y) is the potential and ψ(x, y) the wave function. The wavefunctions ψ(x, y) asymptotically approaches infinity away from the origin. We consider ψ(x, y) for y inthe finite interval [−Ry, Ry] and

ψ(x,−Ry) = 0 and ψ(x,Ry) = 0

the boundary conditions. We also consider partition of the interval [−Ry, Ry]−Ry = y−N , y−N+1, . . . , y−1, y0, y1, . . . , yN−1, yN = Ry

where yj+1 − yj = h = Ry/N .We approximate the partial derivative with respect to y with the difference quotient

∂2ψ

∂y2=

ψ(x, yj+1) − 2ψ(x, yj) + ψ(x, yj−1)h2

and substitute into the original equation

∂2ψ

∂x2= − 1

h2ψ(x, yj+1) − B(x, yj)ψ(x, yj−1) −

1h2ψ(x, yj−1)

where

B(x, yj) = 2(E − V (x, yj) −

1h2

)We also define the 2N − 1 length vector

Ψ(x) = (ψ(x, y−N+1), ψ(x, y−N+2), . . . , ψ(x, y0), . . . , ψ(x, yN−2), ψ(x, yN−1))T

then equation (1) can be written as

∂2Ψ∂x2

= −S(x)Ψ(x) (2)

where S(x) is a (2N − 1) × (2N − 1) matrix

S(x) =

B(x, y−N+1) 1/h2

1/h2 B(x, y−N+2) 1/h2

. . .. . .

. . .1/h2 B(x, yN−2) 1/h2

1/h2 B(x, yN−1)

The matrix S(x) can be written in terms of three matrices the identity matrix I, the diagonal matrix V whichcontains the potential at the mesh points y−N+1, . . . , yN−1 and the tridiagonal matrix M with diagonalelements −2 and off diagonal elements 1.

S(x) = 2EI + V (x) +1h2M

3 Application of Numerov-type methods.

Now we consider x in the interval [−Rx, Rx] with boundary conditions

Ψ(−Rx) = 0, Ψ(Rx) = 0.

For convenience we consider Rx = Ry = R,we also take a partition of the above interval of length N

−Rx = x−N , x−N+1, . . . , x−1, x0, x1, . . . , xN−1, xN = Rx

then the step size is as before xn+1 − xn = h = R/N .


3.1 Numerov’s method

The well known Numerov’s method is

ψn+1 − 2ψn + ψn−1 =h2

12(fn+1 + 10fn + fn−1)

where f is the right hand side function in our case f(x,Ψ) = −S(x)Ψ.We apply the method to equation (2)

Ψn+1 − 2Ψn + Ψn−1 =h2

12(−S(xn+1)Ψn+1 − 10S(xn)Ψn − S(xn−1)Ψn−1

)(3)

each Ψn for n = −N + 1, . . . , 0, . . . , N − 1 is the k = (2N − 1) length vector Ψ(x) evaluated at xn andS(xn) for n = −N+1, . . . , 0, . . . , N−1 is a (2N−1)×(2N−1) matrix. Now let the l = k2 = (2N−1)2

length vector

Ψ =(Ψ−N+1,Ψ−N+2, . . . ,Ψ0, . . . ,ΨN−2,ΨN−1

)TAlso consider the matrices A and B block tridiagonal matrix of size l × l, each block is a diagonal

matrix of size k × k. The diagonal blocks of A are −2 I , while the diagonal blocks of B are 10 I . Theoff diagonal blocks for both A and B are the identity matrix I . Also consider the block diagonal matrix Cwith diagonal blocks M , and the diagonal matrix V with diagonal blocks V (x−N+1) to V (xN−1).

We rewrite (3) in matrix form

AΨ = −16h2EBΨ +

16h2BVΨ − 1

12CBΨ

or

(P + Eh2Q) Ψ = 0

where

P = A− 16h2BV +

112CB, and Q =

16B

3.2 Numerov’s method with minimum phase-lag.

We also applied a modification of Numerov’s method with minimum phase-lag

yn = yn − αh2(fn+1 − 2fn + fn−1)

yn+1 − 2yn + yn−1 =h2

12(fn+1 + 10fn + fn−1)

where α = 1200 (see [4]).

Substitution into equation (2) gives the following generalized eigenvalue problem

(P + E h2 Q − E2 h4 R) Ψ = 0

where

P = A− 16h2BV +

112CB + αh2 5

6(DA− 2V CA− 2CAV ) + αh4 10

3V AV,

Q =16B + α

103CA − αh2 10

3(AV + V A), and R = −α10

3A

and D is a block diagonal matrix with each block equal to A2.Matrices P,Q,R are real, symmetric and sparse, they are very large even for small N (e.g. l = 1521

for N = 20). In order to manage to work as we increase N we treat them as sparse matrices in terms ofstorage and computational work.

102 Z. Kalogiratou, Th. Monovasilis, and T.E. Simos : Numerical Solution of the Schrodinger Equation

4 Numerical Results

We applied both numerical methods developed above to the calculation of the eigenvalues of the two-dimensional harmonic oscillator and the two-dimensional Henon-Heiles potential. Results are comparedwith those produced using the full discretisation technique.

The potential of the two-dimensional harmonic oscillator is

V (x, y) =12

(x2 + y2)

The Henon-Heiles potential is

V (x, y) =12(x2 + y2) + (0.0125)1/2

(x2y − y3

3

)The results show that both methods give more accurate results than full discretisation. Also the mini-

mum phase-lag modification gives better results than Numerov’s method. The computational time is almostthe same (due to space limitations, the numerical results will be presented at the full-length version of thispaper).

References

[1] Davis M. J., Heller E. J., Semiclassical Gaussian basis set method for molecular vibration wave function, Journalof Chemical Physics 71(1982), 5356-5364.

[2] Hajj F. Y., Eigenvalues of the two-dimensional Schrodinger equation, Journal of Physics B: At. Mol. Phys.15(1982), 683-692.

[3] Liu, X.S., Liu X.Y, Zhou, Z.Y., Ding, P.Z., Numerical Solution of the Two-Dimensional Time-IndependentSchrodinger equation by using symplectic schemes, International Journal of Quantum Chemistry 38(2001), 303-309.

[4] Chawla, M. M., Rao, P. S., A Numerov-type method with minimal phase-lag for the integration of second orderperiodic initial-value problems, J. Comput. Appl. Math. 11(1984), 277-281.


A Parametric Sensitivity Analysis for the Solution ofExtrema Evaluation Problems via a DimensionalityReducing Approximation Method

Tulin Kaman∗1 and Metin Demiralp∗∗1



This work aims to apply High Dimensional Model Representation (HDMR) and Factorized High Dimen-sional Model Representation (FHDMR) to the sensitivity coefficient determination of the solutions of amultivariate extrema problem. The derivations are made for general functional structure however the illus-trative applications which are not explicitly given in this abstract are related to structures where the resultingextrema equations are of matrix inversion or matrix eigenvalue problems.

1 Introduction

The extrema of a given multivariate function can be found by standard and straightforward methods numer-ically or analytically. However, depending on the structure of the function under consideration, generallyspeaking, it is not possible to find analytical solutions. Instead, numerical methods are used to solveresulting equations after setting the equations obtained via first order partial derivatives with respect toparameters to zero. The equations are generally nonlinear and may produce incompatibilities in certaincases. Even when there is no incompatibility, the solutions of the extrema finding equations may becomedifficult if the number of the unknowns increase unboundedly. Hence it is desirable to develop a methodwhich works well at the infinite dimension limit. There have been recent attempts to construct efficientmethods for dealing with multivariate functions in such a way that the components of the approximationformula to express a multivariate function are ordered starting from a constant and gradually approachingmultivariance as we proceed along terms like univariate, bivariate and so on. This type of the methodwas first suggested by Sobol[1] and its revised and generalized form was proposed and applied to variousproblems by Rabitz[2-8]. The method is called High Dimensional Model Representation (HDMR). Itsbasic philosophy can be given through the following general equation for a given multivariate functionf(x1, ..., xN ).

f(x1, ..., xN ) = f0 +N∑i1=1

fi1(xi1) +N∑

i1,i2=1i1<i2

fi1,i2(xi1 , xi2) + · · ·+ f12...N (x1, ..., xN ) (1)

The right-hand side additive terms stand for the orthogonal components of the original function. Theorthogonality condition declares that each component except for the constant term vanishes when it isintegrated over a chosen interval under a prescribed weight. The weight is normalized to make its integral

∗ Corresponding author: e-mail: [email protected], Phone: +90 212 285 70 82, Fax: +90 212 285 70 73∗∗ e-mail: [email protected], Phone: +90 212 285 70 82, Fax: +90 212 285 70 73


104 T. Kaman and M. Demiralp: A Parametric Sensitivity Analysis

unity, with respect to one of its arguments. f(x1, ..., xN ), and the individual right hand side componentsare assumed to be square integrable functions.∫ bi

ai

dxiWi(xi)fi1i2...iN (x1, ..., xN ) = 0, 1 ≤ i ≤ N (2)

The orthogonal separable geometry used in the definition of the vanishing integral properties is assumedto be a hyperprism located at some specific point in hyperspace.∫ bi

ai

dxiWi(xi) = 1 (3)

The location of this specific point and the size parameters of the hyperprism can be used to control thenumerical accuracy obtained from the first few terms of the above expansion.

2 Formulation

In this work we aim not to evaluate the unknowns themselves, but the sensitivity of the unknowns withrespect to some scaling factors artificially inserted into each individual additive term. These scaling pa-rameters αj’s will be taken 1 when actual calculations are carried out. That is, we assume that the functionwhose extrema is under consideration is given as follows.

f(x1, · · · , xN ) =N∑j=1

αjfj(x1, · · · , xN ) (4)

The extrema equations are obtained by taking first order partial derivatives of equation (4) with respect tounknowns xi.

∂f

∂xi=

N∑j=1

αj∂fj∂xi

= 0 i = 1, · · · , N (5)

The equations for the sensitivity coefficients are obtained by taking the partial derivatives of extrema equa-tions with respect to the related scaling factor.

∂fj∂xi

+N∑i1=1

[N∑i2=1

αi2∂2fi2∂xi1∂xi

]∂xi1∂αj

= 0 i, j = 1, · · · , N (6)

We consider a space spanned by α1, · · · , αN and locate a hyperprism at a specific point say (β1, · · · , βN ).The position of a point in this hyperprism is defined by y1, · · · , yN coordinates which are defined as below.

yj ≡ αj − βj j = 1, · · · , N (7)

Constant weight functions whose values are1αj

(1 ≤ j ≤ N) are used because of the orthogonal

requirement in HDMR.∫ γj

0

Wj(yj)dyj = 1 ⇒Wj(yj) =1γj

j = 1, · · · , N (8)

where γj s are corresponding to the right points of each interval.The HDMR expansion mentioned above can be inserted to the equations for the determination of the

sensitivity coefficients and then the resulting equations can be separated into its HDMR components by


using same geometry and weights. The obtained components are used as the equations to determine theHDMR components. Equations are half integral half algebraic and can be solved by using standard ap-proaches. This method is extended to increase its approximation power for almost multiplicative func-tions which can almost be expressed as the product of univariate functions of each independent variable.This form has been called Factorized High Dimensional Model Representation (FHDMR) by Demiralp[9]and Rabitz within a collaboration and teamwork of a subgroup in Rabitz Group of Princeton Univer-

sity. In the HDMR expansion of∂xi∂αj

; the constant,the univariate and bivariate functions are named as

φj0(i),φji1

(i)(αi1), φji1,i2

(i)(αi1 , αi2) and so on. All these functions called HDMR components are or-thogonal to each other.

∂xi∂αj

= φj(i) = φj0

(i) +N∑i1=1

φji1(i)(αi1)+

N∑i1,i2=1i1<i2

φji1,i2

(i)(αi1 , αi2) + · · ·+φj12...N

(i) (9)

i, j = 1, · · · , NBy using the equation (9) in equation (6),the following equation is obtained.

N∑i1=1

[N∑i2=1

αi2∂2fi2∂xi1∂xi

][φj0

(i) +N∑i1=1

φji1(i)(αi1) + · · · + φj12...N

(i)

]= −∂fj

∂xi(10)

i, j = 1, · · · , NThe equation (10) will be integrated N times over the relative position variables of hyperprism yj through-out the close interval [0, γj ]. (1 ≤ j ≤ N)

N∑i1=1

[N∑i2=1

(γi22

+ βi2

) ∂2fi2∂xi1∂xi

]φj0

(i1) +N∑i1=1

[N∑i2=1

(ζi2

(i1)) ∂2fi2∂xi1∂xi

]= −∂fj

∂xi(11)

ζi2(i1) =

∫ γi2

0

Wi2(yi2)yi2φji2(i1)(yi2)dyi2 (12)

i1, i2, i, j = 1, · · · , NTo obtain the first order HDMR terms, both sides of the equation(10) are integrated (N−1) times excludingthe integration over the variable yi (where 1 ≤ i ≤ N ). To simplify the result we can define the followingentities.

Ajk(i) =

∂fi∂xj∂xk

(13)

Bi2(i1) =

∫ γi2

0

Wi2(yi2)yi2φji2,i

(i1)(yi2 , yi)dyi2 i2 < i (14)

δi2(i1) =

∫ γi2

0

Wi2(yi2)yi2φji,i2

(i1)(yi, yi2)dyi2 i > i2 (15)

u(i1)i = (yi −

γi2

)φj0(i1) − ζ

(i1)i + (yi + βi)φji

(i1)(yi) (16)

106 T. Kaman and M. Demiralp: A Parametric Sensitivity Analysis

v(i1)i2

= (γi22

+ βi2)φji(i1)(yi) +Bi2

(i1) (17)

w(i1)i2

= (γi22

+ βi2)φji(i1)(yi) + δi2

(i1) (18)

With all these definitions, the following equation is obtained.

Ajk(i)u(i1)

i +i−1∑i2=1

Ajk(i2)v(i1)

i2+

N∑i2=i+1

Ajk(i2)w(i1)

i2= 0 (19)

The equation (11) contains φj0(i) which is the constant HDMR component of φj0

(i) and ζ(i1)i2

whichis related to univariate HDMR components. If we suffice to keep only constant component then we canapproximate this equation by discarding univariate-component-related items. This can be considered aszeroth HDMR approximation. On the other hand, first order HDMR approximation is defined in such away that the equation (11) is retained as it is and the equation (10) is approximated by discarding bivariate-component-related terms. So, the solutions of the approximated equations in both cases, zeroth and firstorder HDMR, produces zeroth and first order approximations to the sensitivity coefficients.

After obtaining constant and univariate terms we can use the first order approximate Factorized HighDimensional Model Representation [9].

φ(i)j = φ

(i)j0

N∏i1=1

1 +φ

(i)ji1

(αi1)

φ(i)j0

(20)

to examine whether better approximation. In fact, the additive functions are better approximated by HDMRwhile FHDMR works for multiplicative functions well.

References

[1] I.M. SOBOL, Sensitivity Estimates for Nonlinear Mathematical Models, MMCE, VOL. 1, NO. 4. 407, 1993.[2] H. RABITZ AND O. ALIS, General Foundations of High Dimensional Model Representations, J. Math. Chem.,

25, 197-233 (1999).[3] H. RABITZ, O. F. ALIS, J. SHORTER, AND K. SHIM, Efficient input-output model representations, Computer

Phys. Comm., 117, 11-20 (1999).[4] J.A. SHORTER, P.C. IP, AND H. RABITZ, An Efficient Chemical Kinetics Solver Using High Dimensional

Model Representation, J. Phys. Chem. A, 103, 7192-7198.[5] H. RABITZ AND O. ALIS, Managing the Tyranny of Parameters in Mathematical Modelling of Physical

Systems, in Sensitivity Analysis, A. Saltelli, K. Chan, and M. Scott, eds., p. 199-223 (John Wiley & Sons,Chichester, 2000).

[6] O. ALIS AND H. RABITZ, Efficient Implementation of High Dimensional Model Representations, J. Math.Chem., 29, 127-142 (2001).

[7] G. LI, C. ROSENTHAL, AND H. RABITZ, High Dimensional Model Representations, J. Phys. Chem. A, 105,7765-7777 (2001).

[8] G. LI, S.-W. WANG, C. ROSENTHAL, AND H. RABITZ, High dimensional model representations generatedfrom low dimensional data samples. I. mp-Cut-HDMR, J. Math. Chem., 30, 1-30 (2001).

[9] M. DEMIRALP AND H. RABITZ, Factorized High Dimensional Model Representation of Mutivariate Func-tions (To be published)


Optimal Control of One-dimensional Quantum HarmonicOscillator under an External Field with Quadratic DipoleFunction and Penalty on Momentum

A. Kursunlu∗1, Irem Yaman∗∗1, and Metin Demiralp∗∗∗1



In this work, the optimal control of an harmonic oscillator is considered. External field is assumed to beweak, hence is represented by only dipole interaction. Dipole function is taken quadratic in spatial coordi-nate. Penalty term related operator is taken as momentum. Some specific structures for spatial dependenceis assumed and temporal equations are obtained for unknowns. The equations represent forward and back-ward evolution. The connection is provided by an algebraic equation coming from field amplitude relatedequation. Solutions are obtained relatively.

1 Introduction

The Hamiltonian of a molecule in the presence of an external field which can be approximated by dipolepolarizability can be written as follows

H = H0 + µE(t) (1)

Here, H0 represents time-independent Hamiltonian and E(t) is the amplitude of electrical field applied tothe molecule. µ stands for the time-independent dipole function of the molecule under consideration andthe field amplitude varies only with time.

It is assumed that the interaction between the field and molecule occurs over the time interval 0 < t ≤ T ,and an observable which is characterized by an Hermitian operator O is aimed to arrive at prescribed value.If we want the expectation value of O to become as close as possible to a predetermined value O, then thefollowing objective term can be chosen as a part of the cost functional of the optimization.

Jo =12

(〈ψ(T )|O|ψ(T )〉 − O

)2

(2)

The next step is the definition of the penalty terms. In this work, two different penalty terms will beconsidered. The first one aims to suppress the expectation value of an undesired observable operatordenoted by O′ during the field-molecule interaction via an appropriately chosen weight function denotedby Wp(t) and is expressed as follows.

J (1)p =

12

T∫0

dtWp(t)〈ψ(t)|O′|ψ(t)〉2 (3)

∗ Corresponding author: e-mail: [email protected], Phone: +90 212 285 70 77, Fax: +90 212 285 70 73∗∗ e-mail: [email protected], Phone: +90 212 285 70 77, Fax: +90 212 285 70 73∗∗∗ e-mail: [email protected], Phone: +90 212 285 70 82, Fax: +90 212 285 70 73


108 A. Kursunlu, I. Yaman, and M. Demiralp: Optimal Control of One-dimensional Quantum Oscillator

Wp(t) > 0; t ∈ [ 0, T ] (4)

The second penalty term allows us to minimize the field fluency, which is given as follows

J (2)p =

12

T∫0

dtWE(t)E(t)2 (5)

WE(t) > 0; t ∈ [ 0, T ] (6)

where WE(t) is an appropriate weight function. The wave function must satisfy the fundamental equationof quantum mechanics. The Schrodinger equation may be introduced explicitly in the cost functionalas a constraint term via a Lagrange multiplier λ, which varies temporally and spatially. Therefore, byconsidering a real-valued contribution, the following cost term can be written:

Jc,d =

T∫0

dt〈λ(t)|ih ∂∂t

−H(t)|ψ(t)〉 +

T∫0

dt〈λ∗(t)|−ih ∂∂t

−H(t)|ψ∗(t)〉 (7)

where bracket notation is used. The total cost term can be written as a sum of these individual cost termsas follows:

J = Jo + J (1)p + J (2)

p + Jc,d (8)

The dynamical equations of the system, which is optimally controlled through the above cost functional,are obtained by the stationary variational condition of J

δJ = 0 (9)

Since J depends on λ(t), ψ(t) in addition to the field amplitude E , the variation of J can be expressedas a linear combination of the variations of these variables. Therefore, the coefficients of this linear com-bination must individually vanish. These equations can be reduced to the following form:

ih∂ψ(t)∂t

= [H0 + µE(t)]ψ(t) (10)

ψ(0) = ψ (11)

ih∂λ(t)∂t

= [H0 + µE(t)]λ(t) −Wp(t)〈ψ(t)|O′|ψ(t)〉O′ψ(t) (12)

λ(T ) = − i

hηOψ(T ) (13)

E(t) =2

WE(t) (〈λ(t)|µ|ψ(t)〉) (14)

〈ψ(T )|O|ψ(T )〉 = O + η (15)

where denotes the real part. The intermediate constant variable η is introduced to facilitate furtheranalysis.


To determine the wave function of an isolated one dimensional quantum harmonic oscillator it is betterto use physically dimensionless coordinates. That is, the following equation,

ih∂ψ(x, t)∂t

= − h2

2m∂2ψ(x, t)∂x2

+12kx2ψ(x, t) (16)

can be converted into

i∂ψ(x, t)∂t

= −12∂2ψ(x, t)∂x2

+12x2ψ(x, t) (17)

by using the following transformations.

(mk)14

h12

x −→ x (18)

√k

mt −→ t (19)

If an external field with an amplitude E is applied to the harmonic oscillator then the equation (10) can berewritten as follows

i∂ψ(x, t)∂t

= −12∂2ψ(x, t)∂x2

+(

12x2 + µE

)ψ(x, t) (20)

The initial form of the wave function accompanying initial condition of (20) is chosen as the ground stateof the harmonic oscillator:

ψ(x, 0) = π− 14 e−

x22 (21)

In this work, we will consider a quadratic µ function, which is of the form

µ = µ0x+ µ1x2 (22)

With this equality, equation (20) can be written as

i∂ψ(x, t)∂t

= −12∂2ψ(x, t)∂x2

+(

12x2 + E(t)

(x+ ρx2

))ψ(x, t) (23)

where E(t) = µ0E(t) and ρ = µ1µ0

. Let us assume that the function ψ(x, t) has the following form:

ψ(x, t) = A(t)eα1(t)x+α2(t)x2

(24)

as we can guess from the structure of the equation (23). If this form is substituted into equation (23), thefollowing equalities are obtained.

iA(t)′

A(t)= −

(12α1(t)2 + α2(t)

)(25)

iα1(t)′ = −2α1(t)α2(t) + E(t) (26)

iα2(t)′ = −2α2(t)2 +12

+ E(t)ρ (27)

The next step is solving these three equations. From equation (25) it is found that

A(t) = ei

t0 ( 1

2α1(τ)2+α2(τ))dτ (28)

110 A. Kursunlu, I. Yaman, and M. Demiralp: Optimal Control of One-dimensional Quantum Oscillator

In this work we consider the following forms for the operators O and O′ .

O ≡ x, O′ ≡ −i ∂∂x

(29)

These enable us to easily evaluate the related expectation values. The costate function, λ(x, t) must havethe same exponential structure due to the equation it has to satisfy. The same structure implies that λ(x, t)must have a linear polynomial in x as a factor. The coefficient of this polynomial varies with time. We canobtain these equations by inserting the following structure in (12) and (13)

λ(x, t) = (c0(t) + c1(t)x)eα1(t)x+α2(t)x2

(30)

This assumption does not create any incompatibility and can be considered as the solution for λ(x, t)because of the uniqueness of the solutions. We do not intend to give the equations explicitly but just statethat they are final value problems. That is, the conditions are given at t = T . The equations to these andprevious temporal entities contain the unknown field amplitude E(t). Hence we need an extra equation forthat, which can be obtained from (14) by using the forms for ψ(x, t) and λ(x, t). The result is an algebraicequation. The differential equation for temporal unknown functions above and this algebraic equationsform three structures, one for forward and one for backward evolution, and the algebraic one connectsthem. So the newly resulting equations are in fact boundary value problems . The solution can be obtainediteratively.

References

[1] M. Demiralp and H. Rabitz, Phys. Rev. A, 47, 809 (1993).[2] P. Gross, D. Neuhauser, and H. Rabitz, J. Chem. Phys.,98, 4557 (1993).[3] C.D. Schwieters and H. Rabitz, J. Phys.Chem. , 97, 8864 (1993).[4] M.Demiralp and H. Rabitz, J. Math. Chem., 16, 185 (1994).


The Weighted Upwinding Finite Volume Method for the Con-vection Diffusion Problem on a Nonstandard Covolume Grid

Dong Liang∗1 and Weidong Zhao2

1 Department of Mathematics and Statistics, York University, 4700 Keele Street, Toronto, Ontario, M3J1P3, Canada

2 School of Mathematics and System Sciences, Shandong University, Jinan, 250100, P. R. China


In this paper we propose a weighted upwinding finite volume method on a nonstandard covolume grid forthe variable coefficient convection-diffusion problems. We give a simple method of choosing the optimalweighted factors depending on the local Peclet’s numbers of the original problem. With the optimal fac-tors the method overcomes numerical oscillation and avoids the numerical dispersion and has high-ordercomputing accuracy. The conservation law and the maximum principle are proved. The second-order er-ror estimates in L2 and discrete H1 norms are obtained. Numerical experiments are given to illustrate theperformance of the method.

1 Introduction

The convection-diffusion equations, which describe many realistic procedures in many problems of scienceand technology; eg., fluid mechanics, heat and mass transfer, groundwater modelling, petroleum reservoirsimulation and environmental protection, are very important and difficult in numerical simulation (see, forexample, [2], [3] and [4]). The numerical simulation of the convection-diffusion equations is very difficultwhen the transport velocities are much larger than the diffusion coefficients. Standard finite difference orfinite element methods will introduce severe nonphysical oscillations into the numerical solutions, becausethe corresponding discrete schemes are unstable for the problem.

Because of satisfying both the stability and the conservation of mass, finite volume methods with up-winding technique have been developed to effectively approximate the convection diffusion problems. Themethods of finite-volume-type have obtained high successes in the numerical simulation of convection dif-fusion problems. (see, for example, [1], [5], [6], [7], [8], [9], [10], etc.) Herbin [7] and Galloret, Herbin& Vignal [6] studied the finite volume method on triangular mesh for convection-diffusion problems withgeneral boundary conditions. Lazarov, Mishev & Vassilevski [8] studied the modified upwind differenceschemes as finite volume schemes with upwinding on rectangular mesh for convection-diffusion prob-lems. Morton & Stynes [9] and Morton, Stynes & Suli [10] studied a cell-vertex finite element schemefor convection-diffusion problems. Recently, Chou, Kwak and Vassilevski [5] proposed and analysed themixed upwinding covolume methods on rectangular grid for convection-diffusion problems. The concen-tration approximation and concentration flux approximation are obtained in the scheme based on the lowestorder Raviart-Thomas space at the same times.

The standard upwinding technique treating convection terms usually derives first-order accuracy schemesfor convection diffusion problems. Hence, there is considerable interest in forming finite volume schemesfor convection diffusion problems, especially for the problems with large Peclet’s numbers, to satisfythe conversation of mass, the unconditional stability and have high-order accuracy. In this paper, we

∗ Corresponding author: e-mail: [email protected], Phone: +1 416 736 2100 ext.77743, Fax: +1 416 736 5757


112 D. Liang and W. Zhao: The Weighted Upwinding Finite Volume Method

present a weighted upwinding finite volume method on a nonstandard covolume mesh for variable coef-ficient convection-diffusion problems on a unidimensional domain. In order to obtain high order finitevolume scheme for the problem, we propose a nonstandard covolume dual mesh and introduce the op-timal weighted factors depending on the local Peclet’s numbers of the problem. Different from normalmethods in which the covolume grid is defined as the cell-centred grid or vertex-centred grid, we constructthe covolume dual grid to be a nonstandard grid: the nodes of covolumes of the covolume dual grid aremoved in the interior of the different volumes of primary grid based on the weighted factors, which arefurther depending on the local Peclet’s number of the original problem. The convection term is treatedby weighted upwinding technique with the optimal weighted factors αopt

i− 12

, which can be easily obtained

from a function depending on the local Peclet’s numbers Pei− 12

. With these optimal weighted factors,the finite volume method leads to an effective discrete scheme for the convection diffusion problem, Forthe theoretical analysis, the conservation law and the maximum principle are proved for the weighted up-winding finite volume method, and the second-order error estimates in L2 and discrete H1 norms for theoptimal weighted upwinding finite volume method are proved. Numerical experiments are presented todemonstrate the performance of the scheme.

2 The weighted upwinding finite volume method

Consider the variable coefficient convection-diffusion problem− d

dx

(a(x)

dc

dx

)+

d

dx

(b(x)c

)= f(x), x ∈ Ω ≡ (0, 1),

c(0) = c0, c(1) = c1(1)

where c(x) is the transport quantity; a(x) ≥ a0 > 0, x ∈ Ω, is the diffusion coefficient; and b(x) is thetransport velocity.

Take a arbitrary spatial partition Th for domain Ω, which is called the primary volume grid, with thenodes xi, i = 0, 1, · · · , N + 1, such that

0 = x0 < x1 < · · · < xi−1 < xi < · · · < xN+1 = 1,

and let Ii = [xi−1, xi] be the volumes, hi = xi − xi−1 be the step sizes.Define the covolume grid T ∗

h with nodes xi− 12, i = 0, 1, · · · , N + 2, which satisfy xi−1 < xi− 1

2< xi

for i = 1, 2, · · · , N + 1, and x− 12

= x0, xN+ 32

= xN+1 at two endpoints of domain Ω,

x0 = x− 12< x 1

2< · · · < xi− 1

2< xi+ 1

2< · · · < xN+ 3

2= xN+1.

Let I∗i = [xi− 12, xi+ 1

2] be the covolumes. The nodes xi− 1

2of the covolume grid T ∗

h are not chosen as themidpoints of the volumes of primary grid Th. (See Remark 2.1 and 2.2.)

Take trial space Vh as the piecewise linear function space on the primary volume grid Th with basefunctions ϕi (i = 0, 1, · · · , N + 1). Let test space V ∗

h be the piecewise constant function space with basefunctions ψi(i = 1, 2, · · · , N + 1) on the covolume grid T ∗

h and v|I∗0 = 0, v|I∗N+1= 0, for v ∈ V ∗

h .For obtaining a high-order non-oscillation finite volume scheme, we should treat the convection term

well. Here, we introduce a weighted upwinding technique for the problem on a nonstandard covolume gridT ∗h .

Let H(x) = 1 for x ≥ 0, H(x) = 0 for x < 0, and αi− 12

be the weighted factors,12≤ αi− 1

2< 1, for

i = 1, 2, · · · , N + 1. Let

B∗(ch, ψi) = bi+ 12

H(bi+ 1

2)[αi+ 1

2chi + (1 − αi+ 1

2)chi+1

]+(1 −H(bi+ 1

2))[(1 − αi+ 1

2)chi + αi+ 1

2chi+1

]− bi− 1

2

H(bi− 1

2)[αi− 1

2chi−1 + (1 − αi− 1

2)chi]

+(1 −H(bi− 12))[(1 − αi− 1

2)chi−1 + αi− 1

2chi

], (2)


together with

A(ch, ψi) = −[ai+ 1

2

chi+1 − chihi+1

− ai− 12

chi − chi−1

hi

]. (3)

Define the bilinear and linear functionals, for c ∈ H1(Ω), f ∈ L2(Ω) and vh ∈ V ∗h ,

A(c, vh) =N∑i=1

vh(xi)A(c, ψi), B∗(c, vh) =N∑i=1

vh(xi)B(c, ψi), (4)

F (vh) =N∑i=1

vh(xi)(f, ψi). (5)

Then the weighted upwinding finite volume method for problem (1) is defined as: find ch ∈ Vh andch(0) = c0, c

h(1) = c1, such that

A(ch, vh) +B∗(ch, vh) = F (vh), vh ∈ V ∗h . (6)

Remark 2.1 Different from the normal finite volume method, in the construction of the covolume gridT ∗h for the scheme (6), we introduce the nonstandard covolume grid. The nodes xi− 1

2(1 ≤ i ≤ N + 1) are

not located at the centres of the primary volumes Ii(1 ≤ i ≤ N + 1), however, they are defined as

xi− 12

= xi − hi

[H(bi− 1

2)αi− 1

2+ (1 −H(bi− 1

2))(1 − αi− 1

2)], (7)

which move depending on the weighted factor αi− 12(1 ≤ i ≤ N + 1).

Remark 2.2 Let Pe = Pei− 12

= |bi− 12|hi/ai− 1

2be the local Peclet’s number, we can find the optimal

weighted factors αopti− 1

2(1 ≤ i ≤ N +1) satisfying α2 − (1− 2

Pe)α− 1

Pe= 0. With the optimal weighted

factors it is called the optimal weighted upwinding finite volume method.

3 Theoretical results

In the realistic flow, the transport quantity satisfies the maximum principle and the conservation law. Hence,it is very important for a numerical method to preserve these properties in the numerical simulation. Weprove that the weighted upwinding finite volume scheme (6) satisfies these two properties.

Theorem 3.1 The weighted upwinding finite volume scheme (6) is conservative.

Theorem 3.2 If the weighted factors hold

max(12, 1 − 1

Pei− 12

) ≤ αi− 12< 1, 1 ≤ i ≤ N + 1, (8)

then the weighted upwinding finite volume scheme (6) satisfies the maximum principle for positive functiona(x) and nondecreasing function b(x).

Moreover, for the optimal weighted upwinding finite volume method, we have the following theorem.

Theorem 3.3 If c(x), the solution of the problem (1), is in H3(Ω). The primary grid Th is a arbitraryspatial partition of Ω, the optimal weighted factors αopt

i− 12(1 ≤ i ≤ N + 1) and the nonstandard covolume

grid T ∗h with the optimal weighted factors. ch is the solution of the optimal weighted upwinding finite

volume method (6), then

|c− ch|1,h ≤ h2

6√

5‖c(3)‖, ‖c− ch‖ ≤M3h

2(‖c(3)‖ + ‖c(2)‖

), (9)

where h = max1≤i≤N+1

hi and M3 does not depend on h and a, b, and c(x).

114 D. Liang and W. Zhao: The Weighted Upwinding Finite Volume Method

4 Numerical Experiment

Standard finite volume method with upwinding overcomes the non-physical oscillations but decreases theconvergence order. This motivates us to construct the optimal weighted upwinding finite volume method.Numerical experiments have shown the efficiency of the method. The following is one example.

Example 4.1 In problem (1), let a > 0 be the diffusion constant, the velocity b = −1, the right-sidefunction f(x) = −2(x + a) and the boundary values c(0) = 0, c(1) = 4. The primary volume grid This chosen as a uniform partition with the step size h > 0. For different a > 0 and h > 0, the maximumnorm and the discrete L2 norm errors of the central difference method (CDM), the full upwind differencemethod (FUDM), the modified upwind difference method (MUDM) adding a artificial modified term tothe diffusion term (see [1] and [8] and the optimal weighted upwinding finite volume method (OWUFVM)presented in this paper are given in Table 1. We can see that the optimal weighted upwinding finite volumemethod has high order computing accuracy.

Table 1 Maximum Norm and Discrete L2 Norm Errors with h = 1/100

a CDM FUDM MUDM OWUFVM

L∞ ERRORS:

10 0.31238 × 10−8 0.10618 × 10−3 0.56214 × 10−7 0.15624 × 10−8

10−1 0.92002 × 10−3 0.47798 × 10−1 0.14583 × 10−2 0.45909 × 10−3

10−3 2.0001 0.26360 0.40930 × 10−1 0.29278 × 10−1

10−5 15.16312 0.97969 × 10−2 0.98742 × 10−2 0.29999 × 10−5

10−7 1499.97194 0.98699 × 10−2 0.98998 × 10−2 0.29991 × 10−9

L2 ERRORS:

10 0.22811 × 10−8 0.77549 × 10−4 0.41054 × 10−7 0.11409 × 10−8

10−1 0.39511 × 10−3 0.19777 × 10−1 0.58441 × 10−3 0.19722 × 10−3

10−3 0.26833 0.26973 × 10−1 0.62247 × 10−2 0.29279 × 10−2

10−5 9.70357 0.56858 × 10−2 0.57186 × 10−2 0.29999 × 10−6

10−7 1059.60108 0.57296 × 10−2 0.57300 × 20−2 0.29991 × 10−10

Acknowledgements This research was partly supported by Natural Sciences and Engineering Research Council ofCanada.

References

[1] O. Axelsson and I. Gustafsson, J. Int. Math. Appl., 23, 867–889 (1979).[2] K. Aziz and A. Settari, Petroleum Reservoir Simulation, (Applied Science Publisher LTD, London, 1979).[3] B.R. Baliga and S.V. Patankar, Num. Heat Transfer, 3, 393–409 (1980).[4] J. Bear, Dynamics of Fluids in Porous Media, (American Elsevier Publishing Company, New York, 1972).[5] S.H. Chou, D.Y. Kwak and P.S. Vassilevski, SIAM J. Sci. Comput., 21, 145–165 (1999).[6] T. Gallouet, R. Herbin and M.H. Vignal, SIAM J. Numer. Anal., 37, 1935–1972 (2000).[7] R. Herbin, Numer. Methods Partial Differential Equations, 11, 165–173 (1995).[8] R. D. Lazarov, I.D. Mishev and P.S. Vassilevski, SIAM J. Numer Anal., 33. 31–55 (1996).[9] K.W. Morton and M. Stynes, Math. Model. Anal. Numer., 28, 699–724 (1994).

[10] K.W. Morton, M. Stynes, and E. Suli, Math. Comput., 66, 1389–1406.


Adaptive Bivariate Chebyshev Approximation and EfficientEvaluation of Integral Operators

Alberto Mardegan1, Alvise Sommariva1, Marco Vianello∗1, and Renato Zanovello1

1 Dept. of Pure and Applied Mathematics, University of Padova, Via Belzoni 7- 35131, Padova, Italy


We propose an adaptive algorithm which extends Chebyshev series approximation to bivariate functions ondomains which are smooth transformations of a square. We apply the method for evaluating efficiently theaction of linear and nonlinear bivariate integral operators.

Chebyshev series expansion

f(x) =∞∑k=0

ckTk(x) , x ∈ [−1, 1] , ck =2π

∫ 1

−1

1√1 − x2

f(x)Tk(x) dx , k > 0 , (1)

c0 = 1π

∫ 1

−1f(x)/

√1 − x2 dx, is a useful and popular tool for the approximation of sufficiently regular

univariate functions. In fact, its partial sums provide an asymptotically optimal polynomial approximation,which can be constructed in a very efficient way by resorting to the FFT algorithm; see, e.g., the classicalbook [11], and e.g. the various algorithms in the Netlib and Cernlib repositories for the practical imple-mentations. Concerning its extension to bivariate functions only some spread results, restricted to the caseof rectangular domains (see, e.g., [10]), seem to have appeared in the literature. In particular, an adaptivealgorithm for bivariate Chebyshev approximation, taking into account the different behaviors of the under-lying function in different parts of the domain, and working also on a wide class of domain geometries,does not seem to be yet available.

Let us begin by the simplest case, that is a function f(x, y) defined on the square [−1, 1]2, the adap-tation to rectangular domains being straightforward. The basic idea is simple: suppose that f(x, y)can be expanded in Chebyshev series in y (uniformly in x), then we get f(x, y) =

∑∞α=0 cα(x)Tα(y).

Moreover, suppose that we can expand each coefficient cα(x) in Chebyshev series (uniformly in α), i.e.cα(x) =

∑∞β=0 cαβTβ(x). It is not difficult to show that both expansions take place for example when

log (n) osc(f ; 1/n) → 0 as n → ∞, osc(f ; 1/n) := max|P−Q|≤1/n |f(P ) − f(Q)|, P,Q ∈ [−1, 1]2.The latter is indeed the natural bivariate extension of the well-known Dini-Lipschitz condition for Cheby-shev univariate approximation (see [11]), and is verified for example by any globally Holder-continuousfunction. Under this assumption, we obtain indeed that, for every fixed tolerance ε > 0, there exist anindex α(ε), and a sequence β(α) = β(α; ε), such that the following bivariate polynomial approximationof f(x, y) with degree maxα α+ β(α) holds (see the forthcoming paper [12] for a proof)

‖f − p‖∞ = max(x,y)∈[−1,1]2

|f(x, y) − p(x, y)| ≤ ε , p(x, y) =α(ε)∑α=0

β(α)∑β=0

cαβ Tβ(x)Tα(y) . (2)

As in the univariate case, the smoother the function f , the smaller the approximation degree at the sametolerance. Observe that all the information necessary to reconstruct the function f on the square is now

∗ Corresponding author: e-mail: [email protected]. This work has been partially supported by the University of Padova,in particular by the project CPDA028291 “Efficient approximation methods for nonlocal discrete transforms”, and by the 2003 project“Innovative methods for structured problems in stationary and evolutive models” of the GNCS-INdAM.


116 A. Mardegan, A. Sommariva, M. Vianello, and R. Zanovello: Adaptive Bivariate Chebyshev Approximation

compressed in the coefficients matrix cαβ, which is in general not full, but has a “reversed nonincreasinghistogram” shape.

The practical implementation of the method works as follows. First, one chooses a good univariate codefor Chebyshev series approximation: we selected the robust Fortran code [7] in the Cernlib, and translatedit in the C language with some minor adaptations in view of the bivariate embedding, the most remarkablebeing that of allowing discrete values of the function at the Chebyshev nodes as inputs (indeed, the originalcode requires an analytic definition), and of introducing an additional stopping criterion based on a possible“stalling” of convergence. The “regular” exit is based on a classical criterion [3], looking for the first tripleof consecutive coefficients whose absolute values sum is below a given tolerance, while the output errorestimate uses for safety the complete tail of the last computed coefficients. Other important features of thecode [7] are that it doubles iteratively the computed coefficients minimizing the computational effort via aclever use of the FFT, and moreover that it minimizes a posteriori the final number of output Chebyshevcoefficients, by eliminating those below a suitable threshold.

For the construction of the bivariate polynomial p(x, y) in (2), we take n “cuts” on the square, corre-sponding to the first n Chebyshev-Lobatto nodes ξj = cos (jπ/(n− 1)), j = 0, 1, ..., n − 1. The func-

tion f(x, y) is approximated by a truncated Chebyshev series f(x, y) =∑α(x;θε)α=0 cα(x)Tα(y) at each cut

x = ξj , by the univariate code up to a fraction θ of the global tolerance (say θ = 0.5); this step provides thevalues cα(ξj) of the coefficient functions cα(x), α = 0, 1, ..., α(θε) =: maxj α(ξj ; θε), at the Cheby-shev x-nodes ξj. Then, these values are passed to the univariate code, which produces the Chebyshevapproximation cα(x) =

∑n−1β=0 cαβTβ(x), and checks whether ‖cα − cα‖∞ ≤ (1 − θ)ε/(1 + α(θε)) for

every α = 0, 1, ..., α(θε). In fact, we can consider the error estimate ‖f−p‖∞ ≤ ‖f− f‖∞+‖f−p‖∞ ≈maxj ‖f(ξj , ·) − f(ξj , ·)‖∞ +

∑α(θε)α=0 ‖cα − cα‖∞ ≤ θε+ (1 + α(θε)) maxα ‖cα − cα‖∞. If the stop-

ping criterion is not satisfied (the approximation in the x-direction is not satisfactory), then the number nof Chebyshev cuts is doubled and the procedure repeated. It is worth noting the full adaptivity of this algo-rithm, which is even improved by some implementation tricks. For example, as usual the global toleranceis taken as a combination of a relative and an absolute one, ε = εr ‖f‖∞ + εa, where the estimate of themaximum norm of the function is dinamically updated when new cuts are added. Moreover, whenever astalling of convergence occurs, for example during the Chebyshev approximation along a cut, the absolutetolerance εa is automatically modified and set to a value close to the estimated size of the error at stalling,in order to avoid overprecision (and thus saving computational work) on other cuts, or in the x-direction.The error stalling is detected by the univariate code, simply by comparing two consecutive error estimates,and is typical of low regularity of the function, for example in the presence of noise, or in the applicationto discrete integral operators with weakly-singular kernels, as we shall see below.

We can now face more general situations. Consider a function f defined on a bivariate compact domainΩ, that corresponds to the square [−1, 1]2 through the smooth surjective transformation [−1, 1]2 → Ω,(X,Y ) → (x(X,Y ), y(X,Y )). Then, we can construct by the method described above the Chebyshev-like polynomial approximation p(X,Y ) ≈ F (X,Y ) := f(x(X,Y ), y(X,Y )) on the square [−1, 1]2 (X,Y ), and finally obtain an approximation like

f(x, y) ≈ φ(x, y) = p(X(x, y), Y (x, y)) =α(ε)∑α=0

β(α)∑β=0

cαβ Tβ(X(x, y))Tα(Y (x, y)) , (3)

(x, y) ∈ Ω, which is in general no more polynomial. In (3), (X(x, y), Y (x, y)) denotes the “inverse”transformation Ω → [−1, 1]2, which is allowed to be undefined only in a finite number of points. Observethat, from the theoretical point of view, when the function f is globally Holder-continuous in Ω, thena Holder-continuous transformation suffices to ensure convergence of the Chebyshev-like approximationmethod. However, a key point in order to avoid loss of smoothness in this process and thus an artificialslowing down of convergence, is to choose a transformation as smooth as possible, and in any case with atleast the same degree of regularity of the function f . This role of the transformation will be clarified by


the example in Table 1 below. Now we are ready to describe two important classes of domain geometries,with corresponding transformations.

• x-regular domains in Cartesian coordinates: the domain Ω is defined by a ≤ x ≤ b, g1(x) ≤ y ≤g2(x), g1 and g2 being suitable functions (these are the typical domains where double integrals canbe splitted). Here the transformation is x(X,Y ) = x(X) = a + (X + 1)(b − a)/2, y(X,Y ) =g1(x) + (Y + 1)(g2(x) − g1(x))/2, and its inverse X(x, y) = X(x) = −1 + 2(x − a)/(b − a),Y (x, y) = −1 + 2(y − g1(x))/(g2(x) − g1(x)); the latter is not defined at the possible points(x, y) where g1(x) = g2(x), but this is not a real problem, since in such cases the Chebyshev se-ries on the corresponding X-cut is constant, and our algorithm manages the situation computingp(X(x), Y (x, y)) =

∑β(0)β=0 c0β Tβ(X(x)), cf. (3). The regularity of the transformation is clearly

given by the regularity of the functions g1 and g2.

• θ-regular domains in polar coordinates: these are defined by θ1 ≤ θ ≤ θ2, ρ1(θ) ≤ ρ ≤ ρ2(θ),and the transformation is the composition of one analogous to that described above with the changefrom Cartesian to polar coordinates, with inverse X(x, y) = −1 + 2(θ − θ1)/(θ2 − θ1), Y (x, y) =−1 + 2(ρ − ρ1(θ))/(ρ2(θ) − ρ1(θ)), with ρ = ρ(x, y) =

√x2 + y2, θ(x, y) = arctan (y/x). The

special case of the origin is managed by choosing θ(0, 0) = 0, while the angles where ρ1 = ρ2 aretreated as above. Again, the regularity of the transformation is determined by the functions ρ1 and ρ2.The simplest case is that of a circle centered at the origin, i.e. 0 ≤ θ ≤ 2π, 0 ≤ ρ ≤ R (notice thatthe transformation is analytic in this case, while that corresponding to the circle represented directlyin Cartesian coordinates is not even C1, since we have g1(x) = −

√R2 − x2, g2(x) =

√R2 − x2,

which have singular derivatives at x = ±R).

A remarkable subclass of θ-regular is given by star domains, i.e. [θ1, θ2] = [0, 2π], ρ1(θ) ≡ 0 (up to atranslation). Here a different transformation can be defined, which allows to avoid unnecessary clusteringof sampling nodes at the origin [9, 12]. For the sake of brevity, we omit a third type of domain, the triangle,whose importance stems from the possibility of triangulating efficiently domains with complex geometries:also in this case different transformations can be used, which lead to quite different performances of theapproximation method; see again [9, 12], where domain splitting techniques are also discussed in orderto manage possible singularities of the function f . The following table illustrates the importance of thechoice of the transformation.

Table 1 Adaptive Chebyshev approximation of f(x, y) = ex(sin (y)+xy2) on the unit circle in Cartesian and polarcoordinates; the absolute and relative tolerances have been set to εa = 10−8, εr = 10−6.

number of coeffs number of nodes relative error

Cartesian 6525 88409 8 · 10−6

polar 197 457 2 · 10−7

Notice that, while the function is extremely smooth, the choice of representing the unit circle as an x-regular domain leads to computational failure, since the singularity of the transformation entails very slowconvergence (the relative error in ‖ · ‖∞ is computed by comparison with the exact values on a suitablecontrol grid). The overall number of function evaluations at the sampling Chebyshev nodes is higherthan the number of output Chebyshev coefficients, since as already observed the code is able to discardunsignificant coefficients.

At this point, we could summarize by observing that bivariate Chebyshev approximation is particularlyuseful when the domain is a smooth transformation of a square, and the function exhibits the followingfeatures: it is sufficiently regular to guarantee a satisfactory convergence rate; it can be evaluated at anypoint of the domain; the evaluation is costly (indeed, after the approximation process all the information,


up to the tolerance, is compressed in the Chebyshev coefficients array). All these features are usuallyshown by functions coming from the action of bivariate integral operators, or of their discrete versions,

f(P ) =∫ ∫

Ω

K(P,Q, u(Q)) dQ ≈ fd(P ) =M∑i=1

N∑j=1

wijK(P,Qij , uij) , (4)

where P = (x, y), Q = (t, s) ∈ Ω, u : Ω → D ⊆ R, K : Ω × Ω × D → R is a pentavariatekernel function, wij and Qij are suitable cubature weights and nodes, respectively, and uij ≈ u(Qij).Observe that computing the bivariate functions f or fd in (4) at a large number of “target” points is a verycostly process, since each evaluation corresponds to the computation of a double integral. For example, ifthe discrete integral transform fd has to be evaluated at all the MN cubature points Qij, as it is usual inthe numerical solution of integral equations within iterative solvers of the corresponding discrete systems,a quadratic complexity like O(M2N2) arises. Starting from the basic work of Rokhlin and Greengard inthe ’80s on the fast multipole method [6], which represented the turning point in the simulation of largescale linear integral models, several fast methods have been proposed, all sharing the task of acceleratingthe evaluation of dense discrete linear operators: we may quote, e.g., wavelet-based methods [2], and morerecently H-matrix methods [8]. Roughly summarizing, these fast methods act on the kernel of the linearoperator by approximation theory techniques, and are able to obtain impressive speed-ups in evaluatingthe target vector at a specified precision, reducing the complexity even from quadratic to linear. On theother hand, they are usually taylored on the specific structure of the kernel, and are conceived for linearoperators.

In some recent papers [4, 5, 13], a different approach has been explored in the framework of univariateintegral operators, that is of accelerating the evaluation by approximating directly the action of the op-erator (i.e. the output function) via Chebyshev series or polynomial interpolation at Leja nodes. In thepresent paper we apply the same idea to bivariate integral operators, by means of our adaptive Chebyshevapproximation algorithm described above. It is worth emphasizing some features of this approach (ap-proximating the action instead of the kernel of the operator): we compress the functions f or fd in (4)into the array of the µ Chebyshev coefficients cαβ, µ =

∑α(ε)α=0 (1 + β(α)), and reduce consequently

the cost of evaluation of the discrete operator from O(M2N2) to O((µ+ ν)MN), µ+ ν MN (whereν = ν(ε) denotes the overall number of sampling Chebyshev nodes used by the adaptive algorithm alongthe cuts); we exploit the smoothing effect of integration, a fact that has been often overlooked concerningthe construction of fast methods; we are able to treat linear as well as nonlinear problems, because weoperate after integration; even in linear instances, K(P,Q, u) = H(P,Q)u, we work in lower dimension,since we face a bivariate approximation problem while the kernel H is quadrivariate. It is worth recallinghere that the idea of approximating directly the action of the operator (the potential), has already appearedin the framework of computational methods of potential theory, cf. e.g. [1].

In order to illustrate the effectiveness of our approach, we present some examples, collected in Tables 2-4 below. The computations have been performed on a AMD K6-III 400 Mhz processor; in all the examplesthe absolute and relative tolerances in the Chebyshev approximation code have been set to εa = 10−8,εr = 10−6. For the adaptive cubatures we have used the C++ package CubPack++, while the discreteoperators have been obtained by a simple trapezoidal-like cubature formula on a N × N uniform grid inthe square [−1, 1]2, via a suitable change of variables by the transformations described above.

Table 2 concerns the adaptive Chebyshev approximation and consequent compression of a logarithmicpotential with a constant density, K(P,Q, u) = log (|P −Q|)u, u(Q) ≡ 1, and of a nonlinear transformwith kernel K(x, y, t, s, u) = sin (yt+ x|s| + u)/(1 + u) and argument u(t, s) = | sin (et + |s|(s+ 1))|,the domain being the unit circle. As known, logarithmic potentials are solutions of the Poisson equation∆f = 2πu, cf. [14]. The relative error of Chebyshev approximation has been computed in ‖ · ‖∞ on asuitable control grid. Notice in particular that all the information necessary to reconstruct the logarithmicpotential up to a relative error of the order of 10−7, is completely contained in only 15 output Chebyshev


coefficients (and has required the computation of 45 double integrals). Here we have adopted a high-precision adaptive cubature method, and this corresponds to approximate directly the function f in (4).

On the contrary, in Tables 3 and 4 we approximate the action of operators that have been discretized bya cubature formula on a fixed N × N grid Qij (which is also the target grid), and this correspondsto work with the function fd in (4). The transforms in Table 3 have respectively K(x, y, t, s, u) =exp (xt− ys)u, u(t, s) = sin (t) + 1 if t ≥ s and u(t, s) = sin (s) − 1 if t < s, and K(x, y, t, s, u) =exp (u sin (x+ s) + y + t)/u, u(t, s) = et if t2 + s2 ≥ 1 and u(t, s) = sin (s) + 2 if t2 + s2 < 1. Thedomain, defined by −2 ≤ x ≤ 2, − sin (2x) − 2 ≤ y ≤ − sin (3x) + 2, is treated as a x-regular one.Observe that due to the discontinuity of the operator arguments u(t, s) the cubature is not very precise,but the smoothness of the kernels entails that the bivariate Chebyshev approximation works satisfacto-rily, with errors very close to the required tolerance and good speed-ups w.r.t. direct evaluation. Thespeed-up is defined as (direct time)/(construction time + evaluation time), “direct” denoting the machine-time for computing fd(Qij), “construction” that for computing the Chebyshev coefficients cαβ ofthe approximating function φd(x, y) ≈ fd(x, y) (cf. (3)), and “evaluation” the machine-time for evalu-ating the output vector φd(Qij). As for the relative errors, they are measured in the maximum norm,maxij |fd(Qij) − φd(Qij)|/maxij |fd(Qij)|. Notice that the evaluation time is a small fraction of theconstruction time, since the bulk of the algorithm is given by computation of fd. Indeed, the observedspeed-ups are not far from the rough speed-up estimate N2/(number of nodes): this means that taking forexample a 500 × 500 grid, we could expect an increase of the speed-up by a factor 25.

Finally, it is worth to comment the examples in Table 4, where two discrete logarithmic potentials havebeen computed on a 500 × 500 grid in polar coordinates on the unit circle. The first row in the tablecorresponds to a constant density, while the second to a C1 density which has partial second derivativesdiscontinuous on the Cartesian axes, u2(t, s) is equal to et + s3 in the first quadrant, 1 + t + s3 in thesecond, 1 + t− s2 in the third, and et − s2 in the fourth. Here, we have a weakly-singular kernel, and theChebyshev approximation error is not able to reach the required tolerance (a stalling phenomenon appears:the Chebyshev error stagnates at the size of the cubature error). This fact has already been observed inunivariate instances, cf. [4, 5, 13] where it is analyzed and qualitatively explained; the key point is thatin weakly-singular instances the discrete transform fd is singular at the cubature points, while f can beregular. Observe that the stalling phenomenon does not represent a real disadvantage, since one usually isnot interested in approximating beyond the underlying discretization error.

Table 2 Adaptive Chebyshev compression on the unit circle of a Urysohn-type nonlinear transform with smoothkernel and of a logarithmic potential, pointwise evaluated by adaptive cubature with tolerance 10−7.

number of coeffs number of nodes relative error

log potential 15 45 2 · 10−7

Urysohn 99 289 4 · 10−8

Table 3 Adaptive Chebyshev approximation of a linear discrete and of a Urysohn-type nonlinear discrete transformwith smooth kernels and discontinuous arguments, on a x-regular domain (100 × 100 grid).

coeffs nodes rel. err. cub. err. constr. eval. direct speed-up

linear 815 1329 4 · 10−7 3 · 10−3 43 sec 1.25 sec 319 sec 7.2

Urysohn 418 1065 3 · 10−6 6 · 10−4 53 sec 0.75 sec 584 sec 10.9


Table 4 Adaptive Chebyshev approximation of two discrete logarithmic potentials on the unit circle (500×500 polargrid): K(P, Q, u) = log (|P − Q|)u, u1(Q) ≡ 1, u2 ∈ C1 has discontinuous second partial derivatives.

coeffs nodes rel. err. cub. err. constr. eval. direct speed-up

u = u1 273 337 5 · 10−4 5 · 10−4 237 sec 12.5 sec 51 hours 734

u = u2 352 825 1 · 10−3 1 · 10−3 738 sec 7.5 sec 61 hours 294

References

[1] G. Allasia, Approximating potential integrals by cardinal basis interpolants on multivariate scattered data in:“Radial basis functions and partial differential equations”, Comput. Math. Appl. 43 (2002), 275-287.

[2] B. Alpert, G. Beylkin, R. Coifman and V. Rokhlin, Wavelet-like bases for the fast solution of second-kind integralequations, SIAM J. Sci. Comput. 14 (1993), 159-184.

[3] C.W. Clenshaw and A.R. Curtis, A method for numerical integration on an automatic computer, Numer. Math. 2(1960), 197-205.

[4] S. De Marchi and M. Vianello, Approximating the approximant: a numerical code for polynomial compressionof discrete integral operators, Numer. Algorithms 28 (2001), 101-116.

[5] S. De Marchi and M. Vianello, Fast evaluation of discrete integral transforms by Chebyshev and Leja polynomialapproximation, in: “Constructive Functions Theory” (Varna 2002), B. Bojanov Ed., DARBA, Sofia, 2003, pp.347-353.

[6] L. Greengard, Fast algorithms for classical physics, Science 265 (1994), 909-914.[7] T. Havie, Chebyshev series coefficients of a function, CERN Program Library, algorithm E406, 1986

(http://wwwinfo.cern.ch/asd/cernlib/mathlib.html).[8] W. Hackbusch and B.N. Khoromskij, Towards H-matrix approximation of the linear complexity, in: Operator

Theory: Advances and Applications, vol. 121, Birkhauser, 2001, pp. 194-220.[9] A. Mardegan, “Bivariate Chebyshev series and approximation of integral operators”, Laurea Thesis in Mathe-

matics (Italian), University of Padova, 2002 (advisors M. Vianello and A. Sommariva).[10] L. Reichel, Fast solution methods for Fredholm integral equations of the second kind, Numer. Math. 57 (1990),

719-736.[11] T. J. Rivlin, “The Chebyshev polynomials”. Pure and Applied Mathematics. Wiley-Interscience, New York-

London-Sydney, 1974.[12] A. Sommariva, M. Vianello and R. Zanovello, Adaptive bivariate Chebyshev approximation, in preparation.[13] M. Vianello, Chebyshev-like compression of linear and nonlinear discretized integral operators, Neural, Parallel

and Sci. Comput. 8 (2000), 327-353.[14] V.S. Vladimirov, “Equations of mathematical physics”, MIR, Moscow, 1984.


Numerical Solution of the Two-dimensional Time-independentSchrodinger Equation by Symplectic and AsymptoticallySymplectic Schemes∗

Th. Monovasilis1, Z. Kalogiratou2, and T.E. Simos ∗∗∗∗∗1


2 Department of International Trade, Technological Educational Institute of Western Macedonia at Kastoria,P.O. Box 30, GR-521 00, Kastoria, Greece


The solution of the two-dimensional time-independent Schrodinger equation is considered by partial dis-cretisation. The discretised problem is treated as an ordinary differential equation problem and solved nu-merically by asymptotically symplectic methods. The problem is then transformed into an algebraic eigen-value problem involving real, symmetric, large sparse matrices. The eigenvalues of the two-dimensionalharmonic oscillator and the two-dimensional Henon-Heils potential are computed by the application of themethods developed. The results are compared with the results produced by full discretisation.

1 Introduction

The time-independent Schrodinger equation is one of the basic equations in quantum mechanics. Plentyof methods have been developed for the solution of the one-dimensional time-independent Schrodingerequation. Authors have treated the two-dimensional problem which is a partial differential equation bymeans of discretization of both variables x and y which transforms the problem into an eigenvalue problemof a block tridiagonal matrix, this is the well known five-points method. Here we use partial discretizationonly on the variable y, then we have an ordinary differential equation problem. Symplectic integrators wereproven to be suitable integrators for the numerical solution of the one-dimensional Schrodinger equation.Recently in their work Liu, et. al. [3] developed a numerical method for the numerical solution of the two-dimensional time independent Schrodinger equation. Liu et. al. [3] applied Yoshida’s [5] first and secondorder method in order to transform the problem into an algebraic eigenvalue problem. The associatedmatrices are very large and sparse so they have to be treated as sparse matrices in storage and computation.In this work we apply Ruth’s third order method as well as asymptotically symplectic methods developedby the authors in [1]. The methods proposed here as well as the full discretization method are applied inorder to find the eigenvalues of the two-dimensional harmonic oscillator and the two-dimensional Henon-Heils potential. The asymptotically symplectic methods have superior performance on these problems asexpected.

∗ Funding by research project 71239 of Prefecture of Western Macedonia and the E.U. is gratefully acknowledged.∗∗ Corresponding author: e-mail: [email protected], Phone: +30 210 94 20 091, Fax: +30 210 94 20 091. Please use

the following address for all correspondence: Dr. T.E. Simos, 26 Menelaou Street, Amphithea - Paleon Faliron, GR-175 64 Athens,Greece.

∗∗∗ Active Member of the European Academy of Sciences and Arts


122 Th. Monovasilis, Z. Kalogiratou, and T.E. Simos : Solving the Schrodinger Equation by Symplectic Schemes

2 Partial discretization of the two-dimensional equation

The two-dimensional time-independent Schrodinger equation can be written in the form

∂2ψ

∂x2+

∂2ψ

∂y2+ (2E − 2V (x, y))ψ(x, y) = 0, (1)

ψ(x,∓∞) = 0, −∞ < x <∞,

ψ(±∞, y) = 0, −∞ < y <∞

where E is the energy eigenvalue, V (x, y) is the potential and ψ(x, y) the wave function. The wavefunctions ψ(x, y) asymptotically approaches infinity away from the origin. We consider ψ(x, y) for y inthe finite interval [−Ry, Ry] and

ψ(x,−Ry) = 0 and ψ(x,Ry) = 0

the boundary conditions. We also consider partition of the interval [−Ry, Ry]

−Ry = y−N , y−N+1, . . . , y−1, y0, y1, . . . , yN−1, yN = Ry

where yj+1 − yj = h = Ry/N .We approximate the partial derivative with respect to y with the difference quotient

∂2ψ

∂y2=

ψ(x, yj+1) − 2ψ(x, yj) + ψ(x, yj−1)h2

and substitute into the original equation

∂2ψ

∂x2= − 1

h2ψ(x, yj+1) − B(x, yj)ψ(x, yj−1) −

1h2ψ(x, yj−1)

where

B(x, yj) = 2(E − V (x, yj) −

1h2

)We also define the 2N − 1 length vector

Ψ(x) = (ψ(x, y−N+1), ψ(x, y−N+2), . . . , ψ(x, y0), . . . , ψ(x, yN−2), ψ(x, yN−1))T

then equation (?) can be written as

∂2Ψ∂x2

= −S(x)Ψ(x) (2)

where S(x) is a (2N − 1) × (2N − 1) matrix

S(x) =

B(x, y−N+1) 1/h2

1/h2 B(x, y−N+2) 1/h2

. . .. . .

. . .1/h2 B(x, yN−2) 1/h2

1/h2 B(x, yN−1)

The matrix S(x) can be written in terms of three matrices the identity matrix I, the diagonal matrix V whichcontains the potential at the mesh points y−N+1, . . . , yN−1 and the tridiagonal matrix M with diagonalelements −2 and off diagonal elements 1.

S(x) = 2EI + V (x) +1h2M


3 Application of symplectic methods.

Liu et. al. [3] applied Yoshida’s [5] second order method to transform the problem into an algebraiceigenvalue problem. As it was observed in their work the computational time need to be spend in order tocompute accurate eigenvalues with small step size h is very high. Notice that the matrix S(x) is of orderk = 2N − 1, and the eigenvalue problem is of l = k2 order, i.e. for a N = 20 points partition of theinterval [−R, R] we need to find the eigenvalues of a 1521 size matrix.

We first have to write the methods as two step methods

α2Ψn+1 − α1Ψn + α0Ψn−1 = 0

each Ψi is the k length vector Ψ(x) evaluated at xi and S(xi) is a k × k matrix.Application of Ruth’s [4] third order method transforms the problem into a generalised eigenvalue

problem involving not only the eigenvalue E but also powers of the eigenvalue up to E4. This problemis transformed into a generalized algebraic problem involving only E but the matrices become four timeslarger. Application of Yoshida’s [5] fourth or sixth order method results an even larger algebraic problem.

In this paper we apply only Ruth’s method and our asymptotically symplectic methods [1] of third andfifth order.

The third order asymptotically symplectic method

α0 = 1 − h2

6S(xn+1/2)

α1 = 2 − 23h2(S(xn−1/2) + S(xn+1/2)

)α2 = 1 − h2

6S(xn−1/2)

The fifth order asymptotically symplectic method

α0 = 1 − h2

3!S(xn+1/2) +

1/5!h

4

S2(xn+1/2)

α1 = 2 − 23h2(S(xn−1/2) + S(xn+1/2)

)− 1

20h4(S2(xn−1/2) + S2(xn+1/2

)−1

6h4S(xn−1/2)S(xn+1/2)

α2 = 1 − h2

3!S(xn−1/2 +

1/5!h

4

S2(xn−1/2)

Now let the l length vector

Ψ =(Ψ−N+1,Ψ−N+2, . . . ,Ψ0, . . . ,ΨN−2,ΨN−1

)TApplication of the third order asymptotically symplectic method into equation (2) gives

(P + Eh2Q) Ψ = 0

where

P = A− 16CB +

13h2V, and Q = −1

3B

The matrices A and B block tridiagonal matrix of size l× l, each block is a diagonal matrix of size k × k.The diagonal blocks of A are −2 I , while the diagonal blocks of B are −8 I . The off diagonal blocks for

124 Th. Monovasilis, Z. Kalogiratou, and T.E. Simos : Solving the Schrodinger Equation by Symplectic Schemes

both A and B are the identity matrix I . The block diagonal matrix C has diagonal blocks M . Finally, V isa block tridiagonal matrix

V =

−4(V−N+1/2 + V−N+3/2) V−N+1/2

V−N+5/2 −4(V−N+3/2 + V−N+5/2) V−N+3/2

. . .. . .

. . .VN−1/2 −4(VN−1/2 + VN−3/2)

4 Numerical Results

We applied both numerical methods developed above to the calculation of the eigenvalues of the two-dimensional harmonic oscillator and the two-dimensional Henon-Heiles potential. Results are comparedwith those produced using the full discretisation technique.

The potential of the two-dimensional harmonic oscillator is

V (x, y) =12

(x2 + y2)

The potential is

V (x, y) =12(x2 + y2) + (0.0125)1/2

(x2y − y3

3

)The results show that symplectic and asymptotically symplectic methods are suitable methods for the

solution of the two-dimensional time independent Schrodinger equation. (No numerical results are pre-sented here due to space limitations.)

References

[1] Kalogiratou Z., Monovasilis Th., Simos T.E., Assymptotically Symplectic Integrators of 3rd and 4th order for thenumerical solution of the Schrodinger equation, Proceedings of the Second M.I.T. Conference on ComputationalFluid and Solid Mechanics, June 17 - 20, 2003 at the Massachusetts Institute of Technology Cambridge, MA02139 U.S.A., Elsevier Science Publishers (2003).

[2] Liu, X.S., Liu X.Y, Zhou, Z.Y., Ding, P.Z., Pan, S.F., Numerical Solution of the One-Dimensional Time-Independent Schrodinger equation by using symplectic schemes, International Journal of Quantum Chemistry79(2000),343-349.

[3] Liu, X.S., Liu X.Y, Zhou, Z.Y., Ding, P.Z., Numerical Solution of the Two-Dimensional Time-IndependentSchrodinger equation by using symplectic schemes, International Journal of Quantum Chemistry 83(2001), 303-309.

[4] Ruth R.D., A canonical integration technique, IEEE Transactions on Nuclear Science, NS 30 (1983),2669-2671.[5] Yoshida H., Construction of higher order symplectic integrators, Physics Letters A 150(1990),262-268.


A Jacobi-Davidson Continuation Method

T.L. van Noorden∗1

1 Department of Mathematics, Universiteit Utrecht, Budapestlaan 6, 3584 CD Utrecht, The Netherlands


We discuss a continuation method for large scale dynamical systems. The method computes curves of steadystates and determines the stability of those steady states. The method uses Jacobi-Davidson iterations forthe computation of the stability determining eigenvalues. We focus on the performance of the Jacobi-Davidson method and we discuss the efficient generation of initial search spaces for the Jacobi-Davidsonmethod along the curve of steady states. We also consider the efficient updating of the preconditioner in theJacobi-Davidson iterations. The application considered in this paper is a PDE model for the global oceancirculation.

1 Introduction

In this paper we discuss a Jacobi-Davidson continuation code for the bifurcation analysis of large scaledynamical systems. These large scale dynamical systems arise from the discretization of partial differentialequations. The application considered in this paper is a PDE model for the global ocean circulation.

Suppose we have a PDE model, with a time variable and one or more spatial dimensions. The firststep is to discretize these spatial dimensions so that we obtain a large system of ODE’s. The spatiallydiscretized model equations can be written in the form

Mdu

dt= F (u, µ), (1)

where the vector u contains the unknowns at each grid point and µ is a parameter. The operator M islinear and can be singular. If M is singular, we obtain a system of DAE’s (Differential and AlgebraicEquations). We would like to compute the steady states of (1), for different values of the parameter µ.These are solutions of the equation

F (u, µ) = 0. (2)

To determine branches of steady solutions if the parameter µ is varied, the pseudo-arclength method [1]is used. The branch (u(s), µ(s)) is parameterized by an ’arclength’ parameter s. The method consists insolving the additional ’parameterization’ equation

uT0 (u − u0) + µ0(µ− µ0) − ∆s = 0, (3)

where (u0, µ0) is a known solution on the branch, (u0, µ0) is the normalized tangent vector to the branchat (u0, µ0) and where ∆s is the step-length.

In order to determine the linear stability of a steady state u0, we have to compute the eigenvalues withlargest real part of the large sparse generalized eigenvalue problem

Nx = λMx (4)

with N = ∂F∂u (u0).



126 T.L. van Noorden: A Jacobi-Davidson Continuation Method

2 Numerical Methods

2.1 Linear solver and Preconditioning

The equations (3) and (4) are simultaneously solved using Newton’s method. The arising linear equa-tions are solved with the iterative method BICGSTAB using the multilevel ILU preconditioning techniqueMRILU [2].

2.2 Eigenvalue solver

The Jacobi-Davidson (JD) method [3, 4] is an efficient method to compute approximations of an eigenpair(α, β) close to a target τ , and its corresponding eigenvector q, of the generalized eigenproblem (4). In eachstep a low dimensional search subspace V and a low dimensional test subspace W are constructed. Thenew approximation of the eigenpair and eigenvector are obtained form the projected generalized eigenvalueproblem

βW ∗NV u = αW ∗MV u. (5)

The search space is expanded in each step with the vector v that is orthogonal to u and satisfies

(I − zz∗)(βN − αM)(I − uu∗)v = −r, (6)

where r = (βN − αM)u and z = κ0Nu+ κBu with κ0 = (1 + |τ |2)−1/2 and κ1 = −τ (1 + |τ |2)−1/2.The test space is expanded with the vector Mv. The equation (6) is solved iteratively using BICGSTAB,and usually only approximately. For the Jacobi-Davidson method, a good initial search space is important,[5] as well as a good preconditioner for the equation (6).

3 Application and Results

Climate has not always been as stable as it is today. During the last ice-age (∼ 110,000 - 12,000 yr B.P.) asuccession of warm episodes interrupted the basically cold climate. This type of variability is known as theDansgaard-Oeschger cycles. But also during the present era, the Holocene (since 12,000 yr B.P.), climatefluctuations are recorded. The best-known example is the Little Ice Age, that inspired many painters of theGolden Age in painting winter scenes.

A characteristic time scale of variability during the Holocene and the last ice-age is 1,500 yr. Thissuggests that the global ocean circulation may be involved. A few mechanisms have been proposed toexplain the observed variability:

Regime switches: The Dansgaard-Oeschger cycles may reflect switches between equilibria of the globalocean circulation.

Oscillations: Holocene and Pleistocene variability may be caused by an oscillation of the global oceancirculation.

We use the methods discussed in the previous section to investigate the above mentioned mechanisms ina model of the global ocean circulation. The ocean model we use has been developed the Institute of Marineand Atmospheric Research Utrecht (IMAU) [6]. The ocean model solves for the three velocity components


u = (u, v, w), temperature T , salinity S and pressure p. Schematically, the governing equations are:

2Ω × u = = − 1ρ0

∇p+Qτ − r u (7)

∂p

∂z= −ρg (8)

∇ · u = 0 (9)DT

dt= κ∇2T +QT (10)

DS

dt= κ∇2S +QS (11)

ρ = ρ(T, S) (12)

Here, Ω is the rotation of the earth, ρ is density with ρ0 being its background value, r is a friction parameter,and κ is a diffusion coefficient. The ocean circulation is forced through wind forcing Qτ , and heat and saltfluxes QT and QS . For more details about the model equations we refer to [6].

The model equations are discretized in space using a control volume discretization on a staggered gridthat places the p, T and S points in the center of a grid cell, and the u, v and w points on its boundaries.After the spatial discretizetion, the number of unknowns is O(104–105).

We start off from the zero-solution, which is a steady solution of the system when forcing is put tozero. Using the pseudo-arclength continuation, we now increase the forcing to full strength and follow thesteady solution to its final state. In this manner, we arrive at a basic state that represents characteristicsof the realistic ocean circulation. Figure 3 represents this basic state. It shows the main wind-drivencirculation patterns in the subtropical gyres, and a strong circumpolar current. Further continuation in D0,the parameter representing heat transport in the atmosphere, reveals the existence of multiple equilibria(see Fig. 2).

0 100 200 300longitude

-50

0

50

latit

ude

Fig. 1 Representation of the basic state of the global ocean circulation. It shows the main wind-driven circulationpatterns in the subtropical gyres, and a strong circumpolar current.

As mentioned before, good initial search spaces can improve the performance of the Jacobi-Davidsonmethod. We try to exploit the fact that we use the method in a continuation context. We compare differentstrategies for the generations of initial search spaces for the Jacobi-Davidson method along the branch ofsteady states. These strategies can be viewed as a predictor step for the computation of the eigenvectors.The different strategies that we compare, are starting with an initial search space generated by:

• a random vector,

• the eigenvector (or complex conjugated pair of eigenvalues) with eigenvalue with the largest real part,computed in the previous solution on the branch,

128 T.L. van Noorden: A Jacobi-Davidson Continuation Method

-30

-29

-28

-27

-26

-25

-24

-23

0 0.5 1 1.5 2 2.5

Ψgl

o, m

in (

Sv)

D0 (106 m2 s-1)

Fig. 2 A branch of steady states of the global ocean circulation. For the standard value of the atmospheric diffusivity,D0, (dotted line), three equilibria are possible, two of which are stable (solid circles), one is unstable (open circle).Solid (dashed) lines denote stable (unstable) solutions.

• the space spanned by the previously computed eigenvectors.

• the space computed by extrapolation of two previously computed spaces.

We also discuss the updating of the preconditioner used for the Jacobi-Davidson method. In this way wehope to improve the quality of the preconditioner without reconstructing the preconditioner from scratch.An updating method has been proposed in [7]. We investigate variations of this method.

References

[1] H. B. Keller, Numerical solution of bifurcation and nonlinear eigenvalue problems, in: Applications of Bifurca-tion Theory, edited by P. H. Rabinowitz, (Academic Press, New York, U.S.A., 1977), pp. 359–389.

[2] E. F. F. Botta and F. W. Wubs, MRILU: An effective algebraic multi-level ILU-preconditioner for sparse matrices,SIAM J. Matrix Anal. Appl. 20, 1007–1026 (1999).

[3] G. L. G. Sleijpen and H. A. van der Vorst, A Jacobi-Davidson iteration method for linear eigenvalue problems,SIAM J. Matrix Anal. Appl. 17, 410–425 (1996).

[4] D. R. Fokkema, G. L. G. Sleijpen and H. A. van der Vorst, Jacobi-Davidson style QR and QZ algorithms for thereduction of matrix pencils, SIAM J. Sc. Comput. 20, 94–125 (1998).

[5] J. J. van Dorsselaer, Computing eigenvalues occurring in continuation methods with the Jacobi-Davidson QZmethod, J. Comp. Phys. 138, 714–733 (1997).

[6] H. A. Dijkstra, H. Oksuzoglu, F. W. Wubs and E. F. F. Botta, A fully implicit model of the three-dimensionalthermohaline ocean circulation, J. Comput. Phys. 173, 685–715 (2001).

[7] G. L.G. Sleijpen and Fred W. Wubs, Effective preconditioning techniques for eigenvalue problems, Preprint1117, Dep. Math., University Utrecht (1999).


A Numerical Study of the Dispersion for the Two-dimensionalHelmholtz Equation

Kailash C. Patidar ∗

Math. Institut, Universitaet Tuebingen, Auf der morgenstelle 10, 72076, Tuebingen, Germany


Using fourier analysis ([3, 4, 5]), a numerical study for the dispersion has been carried out for the two di-mensional Helmholtz equation. Using fourier transform, the linear partial differential operator with constantcoefficients has been converted into a multiplication operator in the frequency domain which will give usthe numerical symbol whose roots are compared with those of the continuous symbols.

In this paper we present an extremely fast computational method to measure the ‘dispersion’ which isdirectly related to the ‘pollution effect’. This pollution effect is observed by various researchers when,the Helmholtz equation, is solved for high wave numbers. The main effect of the ‘pollution’ is that thewave number of the solution obtained via a finite element method is different from the wave number ofthe exact solution and this is what is called ‘dispersion’. We present this numerical method correspondingto three different type of finite element discretizations, viz., ‘Standard Galerkin’, ‘Standard FOSLS’ and‘Regularized FOSLS’.

Let D ∈ R2 be a bounded domain, which will be assumed to have a C1,1(i.e. first derivative is Lips-chitz continuous) or convex polygonal boundary Γ of positive measure. Consider the exterior Helmholtzboundary value problem

−∆p− k2p = f in R2\D

∂p∂r − ikp = o

(‖x‖− 1

2

)as ‖x‖ → ∞

(0.1)

Here k is the wave number, f ∈ L2loc(R

2\D), and ∂∂r denotes the derivative in the radial direction used in

the asymptotic Sommerfeld radiation condition.

To solve (0.1) numerically, the unbounded domain is truncated and the Sommerfeld radiation conditionis approximated. Thus, let B be a ball containing D (D ⊂⊂ B) with C1,1 boundary Γ. Now considerΩ = B ∩ Rd/D and assume that f ∈ L2(Ω) and n the outward unit vector normal to the boundary, thereduced boundary value problem is

−∆p− k2p = f in Ω

n.∇p− ikp = g on Γ

(0.2)

We introduce a new ‘field’ variable:

u ≡ (u1, u2)t =1k∇p

∗ e-mail: [email protected], Phone: +49 7071 2978652, Fax: +49 7071 294322


130 Kailash C. Patidar: Dispersion for Helmholtz Equation

With this defining equation and a corresponding curl-free condition (to create the ellipticity), togetherwith the mixed boundary condition on u and p, we can reformulate the problem (0.2) as the followingaugmented (due to the extra curl-free condition) first-order system

u − 1k∇p = 0 in Ω

1k∇.u + p = − f

k2 in Ω

1k∇× u = 0 in Ω

n.u − ip = gk on Γ

(0.3)

With the above extra curl-free condition (which is called the regularity condition) the discretization of (0.3)will be termed as regularized FOSLS discretization otherwise we will simply call it as standard FOSLSdiscretization.

It is observed by the researchers that the standard Galerkin finite element method is polluted for onedimensional problems but by some suitable modifications of the corresponding discrete bilinear form, thispollution can be eliminated. However, for two (and higher) dimensional problems no such modificationscan give either the FEM with minimal pollution ([1]) or the pollution free FEM ([2]). Thus there is agrowing interest in the designing of such pollution free methods.

In this work, we have caried out the dispersion analysis for the three methods described as above. Ourmethod do not require to compute the numerical solution of the problem.

References

[1] I. M. Babuska, F. Ihlenburg, E. T. Paik and S. A. Sauter, A generalized finite element method for solving theHelmholtz equation in tw o dimensions with minimal pollution, Comput. Methods Appl. Mech. Engrg., 128(1995), pp. 325-359.

[2] I. M. Babuska and S. A. Sauter, Is the pollution effect of the FEM avoidable for the Helmholtz equation consid-ering high wave numbe rs?, SIAM Rev.(2000), Vol 42, No. 3, pp. 451-484.

[3] A. Brandt, Rigorous quantitative analysis of multigrid. I : Constant coefficients two level cycle with L2-norm,SIAM J. Numer. Anal., 31 (1994), pp. 1695-1730.

[4] W. Rudin, Fourier Analysis on Groups, J. Wiley & Sons., New York, 1962.[5] P. Wesseling, An Introduction to Multigrid Methods, John Wiley & Sons., New York, 1992.


Numerical Solution of Singular Nonlinear Boundary ValueProblems for Shallow Membrane Caps

Matilde Pos-de-Mina Pato∗1 and Pedro Miguel Rita da Trindade Lima∗∗2

1 Instituto Superior de Engenharia de Lisboa, Lisboa, Portugal2 Instituto Superior Tecnico, Lisboa, Portugal


In this paper we apply an efficient numerical method to the solution of a singular nonlinear boundary valueproblem for shallow membrane caps.

1 Introduction

In this work we apply an efficient numerical method to the solution of a singular nonlinear boundary valueproblem.

The problem describes a shallow membrane cap which is rotationally symmetric in its undeformed state.Assume that the strains are small, as well as the pressure, and the undeformed shape of the membrane

Fig. 1 Membrane cap

is radially symmetric and described in cylindrical coordinates by z = C(1 − rγ), where r is the radius,C > 0 is the height at the center of the cap and γ > 1.

∗ Corresponding author: e-mail: [email protected], Phone: +35 1 831 70 01, Fax: +35 1 831 72 67∗∗ e-mail: [email protected]


132 Matilde Pato and Pedro Lima: Numerical Solution for Singular Nonlinear BVPs

According to Dickey [5], the exact equation which describes rotationally symmetric deformations is

d

dr(r√

2εθ + 1) =mS

√2εr + 1Σr

,

where εr, εθ , r are the radial and circumferential strains, the radial variable,m2 = 1+(z′)2, (rS)′ = mΣθ,Σr = σr

√2εθ + 1 and Σθ = σθ

√2εr + 1, (σr, σθ are the radial and circumferential stresses on the mem-

brane).

From this equation, assuming some additional conditions and introducing new notations, the followingequation may be obtained:

r2S′′r + 3rS′

r =λ2r2γ−2

2+βνr2

Sr− r2

8S2r

. (1)

where ν is the Poisson ratio (0 ≤ ν < 0.5), λ and β are positive constants depending on the pressure P ,the thickness of the membrane, and Young’s modulus [2].

Given a solution Sr to (1), the actual shape of the membrane is then described by

u(r) = KrE [rS′

r(r) + (1 − ν)Sr(r)],

w(r) = P2EhK

∫ 1

rt

Sr(t)dt,

where u, w are the radial and vertical displacement of the membrane, E is Young’s modulus and K =(EP

2

h2 )1/3, (h is the thickness of the membrane).

The physical boundary of the undeformed membrane is at r = 1. The boundary condition for the stressproblem becomes

Sr(1) = S ≡ σ

K(S > 0),

and the boundary condition for the displacement problem

S′r(1) + (1 − ν)Sr(1) = Γ ≡ Eµ

K(∀Γ ∈ ).

The regularity condition must be also imposed at r = 0.

Baxley [1] has proved that for γ = 2, ”small” ν and Γ = 0, the displacement problem has exactly onebounded positive solution.

The results in Baxley and Gu [2] provide a complete mathematical analysis of both the stress and dis-placement problems for γ = 2, 0 ≤ ν < 0.5, S > 0 and Γ real. For the stress problem with γ = 2,they show that there exist a unique bounded positive solution if S ≤ 1

4βν and at least one bounded positive

solution if S > 14βν . For the displacement problem, there exist exactly one bounded positive solution if

Γ1−ν ≤ 1

4βν , and at least one bounded positive solution if Γ1−ν >

14βν . In this case, multiple solutions may

exists, as shown by numerical computation.

For the case ν > 0 and γ > 1, Baxley and Robinson [3] showed that for certain values of A and a0

the solution is not monotone.

By introducing the variable substitutions x = r2 and u(x) = Sr(r)x, equation (1) becomes


u′′(x) =λ2

8+βνx

4u− x2

32u2, (2)

u(0) = 0, a0u(1) − a1u′(1) = A, (3)

where a0 = b0 and a1 = 2b1. Thus, for the stress problem, a0 = 1, a1 = 0, and A > 0 is the stress on theboundary; for the displacement problem a0 = 1 − ν, a1 = 2, and A (any real number) is the prescribedboundary displacement.

In the cited works, the considered problems were solved by shooting method. Here we have introducedtwo iterative methods, Picard and Newton which reduce the nonlinear problem to a sequence of linearproblems. Based on these methods, we have constructed numerical algorithms for the solution of the con-sidered problem, which enabled us to obtain accurate approximations of the solution of (2)-(3).

Each linear boundary value problem was discretized by a finite difference scheme. Since, the discretizationerror allows an expansion in powers of h, we have used the Richardson extrapolation (see [4]) to acceleratethe convergence of the method.

2 Monotone iterative methods

In [6], Mooney has proposed monotone iterative methods for the solution of nonlinear boundary valueproblems and stated sufficient condition for the convergence of these methods. In present section, weshall describe how these methods may be applied to the case of problem (2)-(3). To apply the iterativemethods, proposed in cited work, it is essential to know a suitable initial approximation, which is usuallya subsolution or a supersolution of the considered problem.

Definition 2.1 u(0) is a subsolution of (2)-(3) if u(0) ∈ C2(]0, 1[) ∩ C([0, 1]) and

u′′(x) + x2

32u(x)2 − λ2

8 − βνx4u(x) ≤ 0, ∀x ∈]0, 1[

u(0) ≤ 0, a0u(1) + a1u′(1) −A ≤ 0.

By reversing the inequalities we obtain the definition of a supersolution.

If u(0) is a subsolution of (2)-(3) then

• u(0)(x) ≤ u(x), ∀x ∈ [0, 1]

• the Picard and the Newton iterates converge monotonically upwards to the exact solution.

If u(0) is a supersolution, then

• u(0)(x) ≥ u(x), ∀x ∈ [0, 1]

• the Picard and the Newton iterates converge downwards to the true solution.

In the particular case of problem (2)-(3), the convergence results are not valid for all the possible valuesof β and ν. However, if βν is sufficiently small, the numerical experiments have shown that the methodsconverges.

For this problem, we have used subsolution of the form, suggested by the authors Baxley and Gu [2]:u(0)(x) = Cx(δ−x), where C are adjustable parameter and 1 < δ ≤ 1 + 2/(1− ν). Here, we don’t workwith a supersolution.

134 Matilde Pato and Pedro Lima: Numerical Solution for Singular Nonlinear BVPs

Once a subsolution of the considered problem is found, the iterative schemes of the Picard and Newtonmethods may be written for the boundary value problem (2)-(3). In order to guarantee the convergence, wemust first transform the nonlinear equation, by adding to each side one term. In the case of the displacementproblem, the equations of the Picard method will have the following form:

u′′(n+1)(x) + [ βνx4u(0)(x)2

− x2

16u(0)(x)3]u(n+1)(x) =

λ2

8 − x2

32u(n)(x)2+ βνx

4u(n)(x)+ [ βνx

4u(0)(x)2− x2

16u(0)(x)3]u(n)(x)

u(n+1)(0) = 0 a0u(n+1)(1) − a1u

′(n+1)(1) = A

(4)

for n = 0, 1, 2, · · · .

When the Newton method is used, we must solve a sequence of linear boundary value problems of theform

u′′(n+1)(x) + [ x2

32u(0)(x)3− βνx

4u(0)(x)2]u(n+1)(x) = λ2

8 − x2

32u(n)(x)2+ βνx

4u(n)(x)

+[ x2

32u(0)(x)3− βνx

4u(0)(x)2]u(n)(x) + [ x2

16u(n)(x)3+ x2

32u(0)(x)3− βνx

4u(n)(x)2− βνx

4u(0)(x)2]

(u(n+1)(x) − u(n)(x))

u(n+1)(0) = 0 a0u(n+1)(1) − a1u

′(n+1)(1) = A

(5)

for n = 0, 1, 2, · · · .

References

[1] John V.Baxley, A singular nonlinear boundary value problem: membrane response of a spherical cap, SiamJ.Appl.Math., 48 (1988) 497-505.

[2] John V.Baxley, Yihong Gu, Nonlinear boundary value problems for shallow membrane caps, Comm. Appl.Anal., 3 (1999) 327-344.

[3] John V.Baxley, Stephen B.Robinson, Nonlinear boundary value problems for shallow membrane caps, II, SiamJ.Appl.Math., 88 (1998) 203-224.

[4] Claude Brezinski, Michela R.Zaglia, Extrapolation Methods, Theory and Practice, Elsevier Science PublishersB.V., 1991.

[5] R.W.Dickey, Membrane caps, Quart. of Appl. Math., 45 (1987) 697-712.[6] J.W.Mooney, A unified approach to the solution of certain classes of nonlinear boundary value problems using

monotone iterations, Nonlinear Analysis, Theory, Methods & Applications, Pergamon Press Ltd., Vol. 3,No. 4(1979) 449-465.


Trigonometrically-fitted Symmetric Four-step Methods for theNumerical Solution of Orbital Problems

G. Psihoyios∗1 and T.E. Simos∗∗2

1 Department of Mathematics, School of Applied Sciences, Anglia Polytechnic University, East Road, Cam-bridge CB1 1PT, United Kingdom



An explicit hybrid symmetric four-step method of algebraic order six is developed in this paper. Compara-tive numerical results from the application of the new method to well known periodic orbital problems, candemonstrate the efficiency of this new method. Due to space limitations, in this mini-paper we can onlypresent some of the theory behind the development of the method.

1 Introduction

Much research has been done for the approximate solution of second order differential equations of theform

y′′(t) = f(t, y(t)), (1)

i.e. differential equations for which the function f is independent from the first derivative of y. Some ofthe most frequently used methods for the numerical solution of problem (1) are the symmetric multistepmethods.

Symmetric multistep methods were first presented by Lambert and Watson [1]. In [1] it is shown thatthe interval of periodicity of symmetric multistep methods is non vanishing, which reassures the existenceof periodic solutions in it. [The interval of periodicity is determined by the application of the symmetricmultistep method to the test equation y′′(t) = −q2 y(t). If q2h2 ∈ (0, T 2

0 ), where h is the step lengthof the integration, then this interval is called interval of periodicity]. Lambert and Watson [1] developedsymmetric multistep methods, that were orbitally stable (when the number of steps exceeds two). Orbitalinstability is a property which was presented for the family of Stormer-Cowell multistep methods, used forthe solution of (1). The class of numerical methods that is frequently used for the long term integrationof planetary orbits is that of symmetric multistep methods (see [2] and references therein). Quinlan andTremaine [2] have constructed high order symmetric methods, based on the work of Lambert and Watson.

We note here that the linear symmetric multistep methods, developed by Lambert and Watson [1] andby Quinlan and Tremaine (see [2] and [3]), are much simpler than the hybrid (Runge-Kutta type) ones.For the long time integration of initial value problems with oscillating solutions, these methods are veryimportant due to their simplicity and accuracy (especially for orbital problems).

∗ Corresponding author: e-mail: [email protected], Phone: +44 1223 363 271 ext. 2173, Fax: +44 1223 515 349.∗∗ Active Member of the European Academy of Sciences and Arts

and Visiting Professor, Department of Mathematics, Anglia Polytechnic University, East Road, Cambridge CB1 1PT, UK


136 G. Psihoyios and T.E. Simos: Numerical Solution of Orbital Problems

In this paper, we develop a two-stage trigonometrically-fitted and exponentially-fitted symmetric mul-tistep method. We apply the new method to some established orbital type problems. In the full-lengthversion we will also intend to discuss its stability characteristics.

2 The New Trigonometrically-Fitted Method

Consider the the following general four-step formula:

yn+2 = 2 yn+1 − 2 yn + 2 yn−1 − yn−2 =

= h2 [b0 (fn+1 + fn−1) + b1 fn], (2)

yn+2 − 2 yn+1 + 2 yn − 2 yn−1 + yn−2 == h2 [c0 (fn+2 + fn−2) + c1 (fn+1 + fn−1) + c2 fn], (3)

where yn±i = y(x ± ih), i ∈ 0, 1, . . . , 4 fn±i = y′′(x ± ih), i ∈ 0, 1, . . . , 4 h is the step size,bi, i ∈ 0, 1 and ci, i ∈ 0, 1, 2 are the parameters of the method, in order to be trigonometrically-fitted.

For the first stage of the method and for the trigonometrically-fitted case, we demand to integrate exactlyany linear combination of the functions:

1, x, x2, x3, sin(±v x), cos(±v x) . (4)

For the second stage of the method and for the trigonometrically-fitted case, we demand to integrateexactly any linear combination of the functions:

1, x, x2, x3, x4, x5, sin(±v x), cos(±v x) . (5)

The behaviour of the produced coefficients has been investigated and the stability of the obtained methodhas been studied.Finally we test the new method to the following problems:

2.1 A problem by Franco and Palacios

Consider the ”almost” periodic orbit problem studied in [4]:

y′′ + y = ε eiψx, y(0) = 1, y′(0) = i, y ∈ C , (6)

that has an equivalent form:

u′′ + u = ε cos(ψx), u(0) = 1, u′(0) = 0, (7)

v′′ + v = ε sin(ψx), v(0) = 0, v′(0) = 1, (8)

where ε = 0.001 and ψ = 0.01.The analytical solution of problem (6) is given below:


y(x) = u(x) + i v(x), u, v ∈ R, (9)

u(x) =1 − ε− ψ2

1 − ψ2cos(x) +

ε

1 − ψ2cos(ψx), (10)

v(x) =1 − ε ψ − ψ2

1 − ψ2sin(x) +

ε

1 − ψ2sin(ψx). (11)

The solution equation (9)-(11) represents motion of a perturbation of a circular orbit in the complexplane.

2.2 A problem by Stiefel and Bettis

Consider the ”almost” periodic orbit problem studied in [5].

y′′ + y = 0.001 eix, y(0) = 1, y′(0) = 0.9995 i, y ∈ C , (12)

whose equivalent form is:

u′′ + u = 0.001 cos(x), u(0) = 1, u′(0) = 0, (13)

v′′ + v = 0.001 sin(x), v(0) = 0, v′(0) = 0.9995. (14)

The analytical solution of problem (12) is:

y(x) = u(x) + i v(x), u, v ∈ R, (15)

u(x) = cos(x) + 0.0005 x sin(x), (16)

v(x) = sin(x) − 0.0005 x cos(x). (17)

The solution of equations (15)-(17) represents the motion of a perturbation of a circular orbit in thecomplex plane.

2.3 Two-Body Problem

The following system of coupled differential equations is considered, which is well known as the Two-Body problem.

y′′ = −yr, y(0) = 1 − e, y′(0) = 0, (18)

z′′ = −zr, z(0) = 0, z′(0) =

√1 − e2, (19)

where r =√

(y2 + z2)3 and whose analytical solution is given by:

138 G. Psihoyios and T.E. Simos: Numerical Solution of Orbital Problems

y(x) = cos(u) − e, (20)

z(x) =√

1 − e2 sin(u). (21)

where u− e sin(u) − x = 0.The results we have obtained from the above problems will be sufficient to demonstrate the efficiency

of the new method. These numerical results can not be presented here due to space limitations.

References

[1] Lambert, J. D., and Watson, I. A. 1976, J. Inst. Math. Applic., 18, 189[2] Quinlan, G. D., and Tremaine, S. 1990,The Astronomical Journal, 100, 1694[3] Quinlan, G.D. 2000, Resonances and instabilities in symmetric multistep methods, submitted.[4] Franco, J. M., and Palacios, M. 1990, J. Comput. Appl. Math., 30, 1[5] Stiefel, E., and Bettis, D. G. 1969, Numer. Math., 13, 154


Exponentially-fitted Multiderivative Methods for theNumerical Solution of the Schrodinger Equation

G. Psihoyios∗1 and T.E. Simos∗∗2

1 Department of Mathematics, School of Applied Sciences, Anglia Polytechnic University, East Road, Cam-bridge CB1 1PT, United Kingdom



In this paper an exponentially-fitted multiderivative methods is developed for the numerical integration ofthe Schrodinger equation. The method is termed multiderivative since it uses derivatives of order two andfour. An application to the the resonance problem of the radial Schrodinger equation indicates that the newmethod is more efficient than the Numerov’s method and other known methods that can be found in theliterature.

1 Introduction

The one-dimensional Schrodinger equation has the form:

y′′(r) = [l(l + 1)/r2 + V (r) − k2]y(r). (1)

Models of this type, which represent a boundary value problem, occur frequently in theoretical physicsand chemistry, (see for example [1]).

In the following we present some notations for (1):

• The function W (r) = l(l+ 1)/r2 + V (r) denotes the effective potential. This satisfies W (r) → 0 asr → ∞

• k2 is a real number denoting the energy

• l is a given integer representing angular momentum

• V is a given function which denotes the potential.

• The boundary conditions are:

y(0) = 0 (2)

and a second boundary condition, for large values of r, determined by physical considerations.

It is known from the literature that the last decades many numerical methods have been constructed forthe approximate solution of the Schrodinger equation (see for example [2], [3], [4]). The aim and the scopeof the above activity was the development of fast and reliable methods.

The developed methods can be divided into two main categories:

∗ Corresponding author: e-mail: [email protected], Phone: +44 1223 363271 ext. 2173, Fax: +44 1223 515349.∗∗ Active Member of the European Academy of Sciences and Arts

and Visiting Professor, Department of Mathematics, Anglia Polytechnic University, East Road, Cambridge CB1 1PT, UK


140 G. Psihoyios and T.E. Simos: Multiderivative Methods for the Solution of the Schrodinger Equation

• Methods with constant coefficients

• Methods with coefficients dependent on the frequency of the problem 1.

In this paper we introduce an explicit multiderivative method for the numerical solution of the Schrodingerequation. The method is called multiderivative since it has second and forth derivative of the function. Wealso produce an explicit multiderivative method with minimal phase-lag. The application of the new de-veloped methods to the resonance problem of the Schrodinger equation shows the efficiency of the newlydeveloped methods. For comparison purposes we use the well known Numerov method and the Numerov-type methods with minimal phase-lag developed by Chawla [5]-[6].

2 A New Family of Exponentially-Fitted Multiderivative Methods

Consider the following family of methods to integrate y′′ = f(x, y) :

yn+1 = 2 yn − yn−1 + a0 h2 y′′n + a1 h

4 y(4)n (3)

yn+1 = 2 yn − yn−1 + h2[c0 y

′′n + c1

(y′′n+1 + y′′n−1

)]+ h4

[c2 y

(4)n + c3

(y(4)n+1 + y

(4)n−1

)](4)

where y′′n±i = fn±i yn±i, y(4)n±i =

(f ′′n±i + f2

n±i)yn±i + 2 f ′n±i y

′n±i and i = −1(1)1. It is easy to see

that in order the above method (3)-(4) to be applicable, then approximate schemes for the first derivativesof y are needed.

For the first stage of the method and for the exponentially-fitted case, we demand to integrate exactlyany linear combination of the functions:

1, x, x2, x3, exp(±v x) . (5)

For the second stage of the method, we demand to integrate exactly any linear combination of thefunctions:

1, x, x2, x3, x4, x5, x6, x7, exp(±v x) . (6)

The behaviour of the coefficients has been investigated and the stability of the obtained method studied.

3 Computational Implementation

As we have mentioned previously, in order the above methods (3)-(4) to be applicable we need approximateschemes for the first derivatives of y. This is due to the following formula:

y(4)n±i =

(f ′′n±i + f2

n±i)yn±i + 2 f ′n±i y

′n±i and i = −1(1)1. (7)

The general formulae of the first derivatives on the points xi, i = n− 1(1)n+ 1 are given by:

h y′n+1 = a2,n+1 yn+1 + a1,n+1 yn + a0,n+1 yn−1

+h2(b2,n+1 y

′′n+1 + b1,n+1 y

′′n + b0,n+1 y

′′n−1

)h y′n = a2,n yn+1 + a1,n yn + a0,n yn−1

+h2(b2,n y

′′n+1 + b1,n y

′′n + b0,n y

′′n−1

)h y′n−1 = a2,n−1 yn+1 + a1,n−1 yn + a0,n−1 yn−1

+h2(b2,n−1 y

′′n+1 + b1,n−1 y

′′n + b0,n−1 y

′′n−1

)(8)

1 In the case of the Schrodinger equation the frequency of the problem is equal to:|l(l + 1)/r2 + V (r) − k2|


In order for the above methods to have maximal algebraic order, the following system of equations musthold:

−a2, n+1 − a0, n+1 − a1, n+1 = 0a0, n+1 + 1 − a2, n+1 = 0

−b2, n+1 − b0, n+1 − b1, n+1 −12a2, n+1 −

12a0, n+1 + 1 = 0

b0, n+1 −16a2, n+1 +

16a0, n+1 − b2, n+1 +

12

= 0

−12b0, n+1 −

124a2, n+1 −

124a0, n+1 −

12b2, n+1 +

16

= 0 (9)

−a0, n − a2, n − a1, n = 0a0, n − a2, n + 1 = 0

−12a2, n − b1, n − b0, n − 1

2a0, n − b2, n = 0

−16a2, n + b0, n +

16a0, n − b2, n = 0

−12b0, n − 1

24a2, n − 1

24a0, n − 1

2b2, n = 0 (10)

−a1, n−1 − a2, n−1 − a0, n−1 = 01 − a2, n−1 + a0, n−1 = 0

−1 − 12a2, n−1 −

12a0, n−1 − b2, n−1 − b1, n−1 − b0, n−1 = 0

12

+16a0, n−1 −

16a2, n−1 − b2, n−1 + b0, n−1 = 0

−16− 1

24a0, n−1 −

124a2, n−1 −

12b2, n−1 −

12b0, n−1 = 0 (11)

The solution of the above system of equations for the case: b1,n+1 = b1,n = b1,n−1 = 1 is given by:

a2,n+1 =110, a1,n+1 =

45, a0,n+1 =

−910

b2,n+1 =1130, b0,n+1 =

130

a2,n =−710, a1,n =

125, a0,n =

−1710

b2,n =160, b0,n =

1160

a2,n−1 =−32, a1,n−1 = 4, a0,n−1 =

−52

b2,n−1 =16, b0,n−1

−16

(12)

The local truncation error of the above formulae is given by:

LTEn+1 = LTEn = LTEn−1 = − 145h5 y(5)

n (13)


For the application of the first layer of the method (3)-(4) the following formula is also needed:

h y′n = aa1,n yn + aa0,n yn−1 + h2(bb1,n y′′n + bb0,n y′′n−1)

−aa1, n − aa0, n = 0aa0, n + 1 = 0

−bb1, n − bb0, n − 12

aa0, n = 0

bb0, n +16

aa0, n = 0 (14)

The solution of the above system of equations is given by:

bb0, n =16, aa0, n = −1, bb1, n =

13, aa1, n = 1 (15)

The local truncation error of the above formula is given by:

LTEn = − 124h4 y(4)

n (16)

4 Problems used in Numerical Illustrations

In this section we present some problems we chose in order to obtain numerical results, which illustratethe performance of our new methods. Due to space restrictions, no numerical results can be presented.Consider the numerical integration of the Schrodinger equation (1) using the well-known Woods-Saxonpotential (see for example [2], [3]) which is given by

V (r) = Vw(r) =u0

(1 + z)− u0z

[a(1 + z)2](17)

with z = exp[(r − R0)/a], u0 = −50, a = 0.6 and R0 = 7.0.. In Figure 1 we give a graph ofthis potential. In the case of negative eigenenergies (i.e. when E ∈ [−50, 0]) we have the well-knownbound-states problem while in the case of positive eigenenergies (i.e. when E ∈ (0, 1000]) we have thewell-known resonance problem (see [2], [3] and [4]).

We apply the new methods to the following problems:

4.1 Resonance Problem

In the asymptotic region the equation (1) effectively reduces to

y′′(x) + (k2 − l(l + 1)x2

)y(x) = 0, (18)

for x greater than some value X.The above equation has linearly independent solutions kxjl(kx) and kxnl(kx), where jl(kx), nl(kx)

are the spherical Bessel and Neumann functions respectively. Thus the solution of equation (1) has theasymptotic form (when x→ ∞)

y(x) Akxjl(kx) − Bnl(kx) D[sin(kx− πl/2) + tan δl cos(kx− πl/2)] (19)


-50

-40

-30

-20

-10

02 4 6 8 10 12 14

r

The Woods-Saxon Potential

Fig. 1 The Woods-Saxon potential.

where δl is the phase shift which may be calculated from the formula

tan δl =y(x2)S(x1) − y(x1)S(x2)y(x1)C(x2) − y(x2)C(x1)

(20)

for x1 and x2 distinct points on the asymptotic region (for which we have that x1 is the right hand endpoint of the interval of integration and x2 = x1 − h, h is the stepsize) with S(x) = kxjl(kx) andC(x) = kxnl(kx).

Since the problem is treated as an initial-value problem, one needs y0 and y1 before starting a two-stepmethod. From the initial condition, y0 = 0. The value y1 is computed using the Runge-Kutta-Nystrom12(10) method of Dormand et. al. [9]-[10]. With these starting values we evaluate at x1 of the asymptoticregion the phase shift δl from the above relation.

4.1.1 The Woods-Saxon Potential

As a test for the accuracy of our methods we consider the numerical integration of the Schrodinger equation(1) with l = 0 in the well-known case where the potential V (r) is the Woods-Saxon one (17).


One can investigate the problem considered here, following two procedures. The first procedure consistsof finding the phase shift δ(E) = δl for E ∈ [1, 1000]. The second procedure consists of finding those E,for E ∈ [1, 1000], at which δ equals π/2. In our case we follow the first procedure i.e. we try to find thephase shifts for given energies. The obtained phase shift is then compared to the analytic value of π/2.

The above problem is the so-called resonance problem when the positive eigenenergies lie under thepotential barrier. We solve this problem, using the technique fully described in [5].

The boundary conditions for this problem are:

y(0) = 0,

y(x) ∼ cos[√Ex] for large x.

The domain of numerical integration is [0, 15].

4.2 The Bound-States Problem

For negative energies we solve the so-called bound-states problem, i.e. the equation (1) with l = 0 andboundary conditions given by

y(0) = 0,

y(x) ∼ exp(−√−Ex) for large x.

In order to solve this problem numerically we use a strategy which has been proposed by Cooley [8] andhas been improved by Blatt [7]. This strategy involves integrating forward from the point x = 0, backwardfrom the point xb = 15 and matching up the solution at some internal point in the range of integration. Asinitial conditions for the backward integration we take:

y(xb) = exp(−√−Exb) and y(xb − h) = exp[−

√−E(xb − h)] , (21)

where h is the steplength of integration of the numerical method.

References

[1] I. Prigogine, Stuart Rice (Eds): Advances in Chemical Physics Vol. 93: New Methods in Computational QuantumMechanics, John Wiley & Sons, 1997.

[2] T.E. Simos, Atomic Structure Computations in Chemical Modelling: Applications and Theory (Editor: A. Hinch-liffe, UMIST), The Royal Society of Chemistry 38-142(2000).

[3] T.E. Simos, Numerical methods for 1D, 2D and 3D differential equations arising in chemical problems, ChemicalModelling: Application and Theory, The Royal Society of Chemistry, 2(2002),170-270.

[4] L.Gr. Ixaru and M. Rizea, A Numerov-like scheme for the numerical solution of the Schrodinger equation in thedeep continuum spectrum of energies. Comput. Phys. Commun. 19 23-27(1980).

[5] M.M. Chawla, Numerov made explicit has better stability, BIT 24 117-118(1984).[6] M.M. Chawla and P.S. Rao, A Numerov-type method with minimal phase-lag for the integration of second order

periodic initial-value problems. II Explicit method, J. Comput. Appl. Math. 15 329-337(1986).[7] J.M. Blatt: Practical points concerning the solution of the Schrodinger equation, Journal of Computational

Physics, 1 382-396 (1967).[8] J.W. Cooley, An improved eigenvalue corrector formula for solving Schrodinger’s equation for central fields.

Math. Comp. 15 363-374 (1961).[9] J.R. Dormand, M.E. El-Mikkawy and P.J. Prince, Families of Runge-Kutta- Nystrom formulae, IMA Journal of

Numerical Analysis 7, 423-430(1987).[10] J.R. Dormand, M.E.A. El-Mikkawy and P.J. Prince, High-Order Embedded Runge-Kutta-Nystrom Formulae,

IMA J. Numer. Anal. 7 595-617.


Difference Schemes for the Class of Singularly PerturbedBoundary Value Problems

Ismail R. Rafatov∗1 and Sergey N. Sklyar2

1 CWI, P.O. Box 94079, 1090 GB Amsterdam, The Netherlands2 AUCA 205, Abdymomunova st.,720000, Bishkek, Kyrgyzstan


The work deals with the construction of difference schemes for the numerical solution of singularly per-turbed boundary value problems, which appear while solving heat transfer equations with spherical sym-metry. A projective version of integral interpolation (PVIIM) method is used. Derived schemes allow toapproximate the solution of the problem and the derivatives of the solution at the same time. Moreover, theyallow to approximate the boundary conditions of general form in the framework of the same method. Newschemes are tested in order to compare them with well known difference schemes. Estimates for rates ofclassical and uniform convergence are carried out.

1 Introduction

Let us consider the following boundary value problem(ε/x)2

(x2u′

)′ − q(x)u = f(x), x ∈ (0, 1) ,u′(0) = 0, ξu(1) + ηεu(1) = ψ.

(1)

We assume the functions q, f in (1) to be sufficiently smooth, and additionally the conditions

q(x) ≥ q0 > 0 for x ∈ [0, 1], (2)

ε ∈ (0, 1] , ξ ≥ 0, η ≥ 0, ξ + η > 0 (3)

are satisfied.Solving numerically similar problems (so called singularly perturbed problems) needs to use special

difference schemes which guarantee the uniform convergence of the appropriate solution to the exact one[1]. There are two fundamental ways to construct the uniformly convergent numerical algorithms forsingularly perturbed boundary problems. The first one uses the construction of the ”special” differenceschemes on uniform grids and starts from A.M.Ilyin’s investigation [2]. The second one is based on theuse of nonuniform grids adapted to the properties of the solution and is related to N.S.Bakhvalov’s namehistorically [3]. PVIIM [4], [5], which permits to combine both methods, was used in our paper. First, themethod of the discretization keeps properties of the original differential problem automatically, thereforeconstructed schemes to be of the special type. Second, in the framework of the proposed method, analgorithm for grid adaptation can be realized. Furthermore, the method allows to approximate the solutionas well as its derivatives at the same time.



146 I. Rafatov and S. Sklyar: Difference Schemes for Singularly Perturbed BVPs

New difference schemes for problem (1), which converge uniformly with the first order in ε, weredeveloped in [6] on the base of PVIIM. Our aim is the construction of the schemes of the second order ofuniform convergence on any irregular grid.

Throughout the paper we assume that problem (1) has an unique solution from the class C1[0, 1] ∩C2(0, 1). Let the operator L of the problem (1) be defined by representations

Lv(0) ≡ −εv(0),Lv(x) ≡ − (ε/x)2

(x2v)

+ q(x)v, x ∈ (0, 1) ,Lv(1) ≡ ξv(1) + ηεv(1)

for function v from the above class. Using corresponding methods from [7] we can prove that L is anoperator of monotonic type.

The following statement guarantees a uniform boundedness (with respect to ε) of the solution of prob-lem (1) (see [6]):

|u(x)| ≤ max0≤y≤1

|f(y)| /q0 + |ψ| (ξ + ηq0/ (3 +√q0))

−1.

2 Difference schemes

Let us describe an idea of the method briefly. We introduce some grid 0 = x1 < x2 < ... < xN = 1 on

the interval [0, 1] and denote hi ≡ xi+1 − xi, (i = 1, 2, ..., N − 1), h ≡ max1≤i≤N−1

(hi). Let vh ≡vhiNi=1

denote some grid function with corresponding norm:

∥∥vh∥∥h,∞ ≡ max

1≤i≤N∣∣vhi ∣∣ .

Moreover, we denote (v)h ≡ v(xi) ≡ viNi=1 a projection of some continuous function v(x) on thegrid. Let constants q and f approximate functions q(x) and f(x) into the interval [xi, xi+1]. Multiplyingequation (1) by −x2v(x), where v(x) is a sufficiently smooth test function, then performing a partialintegration of the result on [xi, xi+1] and inserting values q and f in the integral identity, we obtain:

[−ϕεx2v + uε2x2v′

]xi+1

xi+∫ xi+1

xiu[−ε2

(x2v′

)′ + qx2v]dx = −f

∫ xi+1

xix2vdx

+ δ (xixi+1, ) ,δ (xi, xi+1) ≡

∫ xi+1

xi

f − f(x) + [q − q(x)]u(x)

x2vdx.

(4)

Here we denote ϕ(x) ≡ εu′(x). We choose testing functions v(0)(x) and v(1)(x) in identity (4) accord-ing to

−ε2(x2v′

)′+ qx2v = 0, x ∈ (xi, xi+1). (5)

xv(0)∣∣x=xi

= 1, xv(0)∣∣x=xi+1

= 0,

xv(1)∣∣x=xi

= 0, xv(1)∣∣x=xi+1

= 1.(6)

Solution of the problems (5), (6) can be easily find. Substituting q = q(0), f = f (0) , v = v(0) , andanalogously, q = q(1), f = f (1), v = v(1) into (4), and not taking into account the errors of approximation


δ(0)(xi,, xi+1) and δ(1)(xi,, xi+1), we obtain after some transformations the following discrete problem,corresponding to (1):

ϕh1 = 0,εxiϕ

hi − ε2xi+1Du

hi

+hiq(0)i

[γ(R

(0)i

)xi+1u

hi+1) + µ

(R

(0)i

)xiu

hi

]= −f (0)

i σ(0)i ,

−εxi+1ϕhi+1 + ε2xiDu

hi

+hiq(1)i

[µ(R

(1)i

)xi+1u

hi+1 + γ

(R

(1)i

)xiu

hi

]= −f (1)

i σ(1)i ,

ξuhN + ηϕhN = ψ. (i = 1, 2, ..., N − 1)

(7)

Here uh ≡uhiNi=1

and ϕh ≡ϕhiNi=1

approximate unknown grid functions (u)h and (ϕ)h, respec-tively, and we denote

R(k)i ≡ hi

√q(k)i /ε, k = 0, 1,

µ(z) = (z coth z − 1) /z2, γ(z) = (1 − z/ sinh z) /z2,

and

Duhi ≡(uhi+1 − uhi

)/hi, σ

(0)i ≡ hi

[γ(R

(0)i

)xi+1 + µ

(R

(0)i

)xi

],

σ(1)i ≡ hi

[µ(R

(1)i

)xi+1 + γ

(R

(1)i

)xi

], i = 1, 2, ..., N − 1.

Excluding values ϕhi (i = 1, 2, ..., N − 1 ) from the equations (7) we can rewrite this problem in thetraditional third-point form. The difference scheme (7) converges uniformly in ε with the first rate on anyirregular grid (see [6]).

Let us present now one of possible variants of refinement of scheme (7). After transformation of localerrors of approximation of (7) by separating the main asymptotic terms (k = 0, 1):

δ(k)(xi, xi+1) =

∫ xi+1

xi

[D(f)hi + u0(x)D(q)hi

](xi+1/2 − x)x2v

(k)i (x)dx

+ δ(k)

(xi, xi+1),

δ(k)

(xi, xi+1) ≡∫ xi+1

xi

(xi+1/2 − x)∆(x)D(q)hi

+ωi(x) [f ′′(ρ1,i) + u(x)q′′(ρ2,i)]x2v(k)i (x)dx

(8)

and substituting to the corresponding equations in (7) all terms of δ(0) and δ(1) from (8) with the excep-

tion of δ(0)

and δ(1)

, we have the modified difference scheme:

ϕh1 = 0,εxiϕ

hi − ε2xi+1

(Ri+1/2/ sinhRi+1/2

)Duhi + S

(0)i [q]uhi

= −S(0)i [f ] + T

(0)i ,

−εxi+1ϕhi+1 + ε2xi

(Ri+1/2/ sinhRi+1/2

)Duhi + S

(1)i [q]uhi+1

= −S(1)i [f ] + T

(1)i , (i = 1, 2, ..., N − 1)

ξuhN + ηϕhN = ψ.

(9)

S0,1i and T 0,1

i are introduced here for brevity of the formulas. Excluding values ϕhi ( i = 1, 2, ..., N−1)from equations (9) we can write down the discrete problem, related to a grid function uh.

The following statement, that contains an estimate of convergence of the family of schemes (9), may beproved.

148 I. Rafatov and S. Sklyar: Difference Schemes for Singularly Perturbed BVPs

Theorem 2.1 Let us assume that the problem (1) satisfies conditions (2), (3) and q, f ∈ C2[0, 1]. Inthis case the difference problem (9) has a unique solution, and for its solution uh and solution u(x) ofproblem (1) the estimate

∥∥uh − (u)h∥∥h,∞ ≤ Ch2 holds, where C is a constant independent of ε and h.

Note that the method for constructing of difference schemes, used in the this sections, may be general-ized for the system of equations.

3 Numerical example

The experiments deal with the calculation of orders of uniform and classical convergence according tothe following algorithm (see also [1], [11]). Let vε(x) be the solution of the differential problem (1),

which depends on the parameter ε and which is determined on the interval [0, 1]. Let vhε ≡vhε,iNi=1

be the grid function that approximates vε(xi) at the nodes of the uniform grid xi = (i − 1)h (i =1, 2, ..., N,N = 1/h + 1) and is calculated for h ∈ H ≡

h0/2j

∣∣ j = 0, 1, ..., k

and ε ∈ E ≡ε0/2j

∣∣ j = 0, 1, ...,m

. Let us denote

δ(h, ε) ≡∥∥vhε − (vε)h

∥∥h,∞ , ∆(h) ≡ ∆(h, v) ≡ max

ε∈Eδ(h, ε) .

The experimental orders of uniform and classical convergence (”p” and ”p0”) are determined by theformulas

p = ln[1/k∑k−1

j=0

[∆(h0/2j

)/∆(h0/2j+1

)]]/ ln 2, (10)

p0 = ln[1/k∑k−1

j=0

[δ(h0/2j , ε0

)/δ(h0/2j+1, ε0

)]/ ln 2

](11)

for values h0 = 1/8, ε0 = 1/2, k = 7, m = 8. Note that in the case of a piecewise constant q(x) andf(x), derived in Section 2 schemes (7) lead to the exact solution of the problem (1).

The quantities (10) and (11) are calculated for vε(x) ≡ u(x) and vε(x) ≡ ϕ(x) = εu′(x). Samarskii’s

well-known scheme [8] and scheme (7) with the different approximations of f (k)i and q(k)i ((a): at the

nodes and (b): at the center of cells) are tested here. In case of Samarskii’s scheme the boundary values ofderivatives are calculated with use of the directed difference (right point) and under the formula:

u′(0) = h (q(0)u(0) + f(0)) /6.

Table The experimental orders of convergence

u(x) ϕ(x)Method Uniform Classical Uniform Classical

convergence convergence convergence convergenceSamarskii [8] 0.30 1.10 0.23 0.98(7), (a) 1.22 2.00 0.84 1.97(7), (b) 1.06 1.99 0.99 1.98

The analysis of Table allows to conclude, that the numerical experiment confirms the statement aboutuniform convergence (with the first order) of the solution of the difference problem (7) to the solution ofthe initial continuous problem (1). Moreover, as a result of this experiment, the hypothesis about uniformconvergence (with the first order) of fluxes can be formulated.


References

[1] Doolan E.P.H., Miller J.J., Scilders W.H.A., Uniform numerical methods for problems with initial and boundarylayers (Boole Press, Dublin, 1980).

[2] Ilyin A.M., Matem.Zametki 6, 237-248 (1969).[3] Bakhvalov N.S., J.Comput.Math.Math.Phys. 9, 842-859 (1969).[4] Sklyar S.N., in: Proceedings of the International Conference AMCA-95, Novosibirsk, 1995, edited by

A.S.Alekseev and N.S.Bakhvalov (NCC Pablisher, Novosibirsk, 1995), p.380.[5] Sklyar S.N. and Bakirov J.J., Izvesia. Acad. Nauk Kyrgyz. 2-3, 36-47 (1997).[6] Sklyar S.N. and Rafatov I.R., J.Comp.Math.Math.Phys. 42, 1397-1407 (2002).[7] Collatz L., Functional analysis and numerical mathematics (Mir, Moscow, 1969).[8] Samarskiy A.A., The theory of difference schemes (Nauka, Moscow, 1983).[9] Samarskiy A.A. and Nikolaev E.S., Numerical methods for grid equatios (Nauka, Moscow, 1978).

[10] Ortega J. and Rheinbold W., Iterative solution of nonlinear equations in several variables (Mir, Moscow, 1975).[11] Farell P.A., IMA J.Numer.Anal. 7, 459-472 (1987).


The Design and Implementation of a New Out-of-Core SparseCholesky Factorization Method

Vladimir Rotkin 1 and Sivan Toledo ∗1

1 School of Computer Science, Tel-Aviv University, Tel-Aviv 69978, Israel.


This work describes a new out-of-core sparse Cholesky factorization method and its implementation. Themethod is based on a supernodal left-looking factorization [6, 7], which allows the method to fully exploitthe sparsity of the matrix and the capabilities of the hardware.

1 Introduction

Out-of-core sparse direct solvers can solve large linear systems on machines with limited main memoryby storing the factors of the coefficient matrix on disks. Disks are cheap (approximately 50–100 timescheaper than main memory per megabyte), so it is practical and cost-effective to equip a machine with tensor hundreds of gigabytes of disk space. As we show in this paper and as others have shown [1, 2, 3, 4, 8, 9],access time to data on slow disks does not significantly slow down properly designed sparse factorizationalgorithms. The ability to store their data structures on disks makes out-of-core sparse direct solvers veryreliable, since like in-core direct solvers they usually do not suffer from numerical problems, and unlikein-core solvers, they are less likely to run out of memory, since most of their data structures are stored ondisks.

The main challenge in designing out-of-core solvers of any kind lies in I/O (input-output) minimization.Disks are slow, and it is essential to perform as little I/O as possible in order to achieve near-in-coreperformance levels. In addition, it order to achieve high performance it is essential to fully exploit thesparsity of the coefficient matrix and the capabilities of the computer’s architecture (i.e., to exploit cachesand the processor’s functional units).

A new technique for minimizing I/O represents the main algorithmic novelty in our new method. Thistechnique, which was first introduced in a pivoting-LU factorization algorithm by Gilbert and Toledo [4],minimizes I/O by exploiting the elimination tree, a combinatorial structure that compactly represents de-pendencies in the factorization process. In this research we apply this technique to a supernodal Choleskyfactorization (the Gilbert-Toledo LU factorization code is not supernodal, so it is relatively slow).

Our code, implementing the new algorithm, is robust in that it copes better with smaller main mem-ories than all previous solvers. The code is freely available as part of the TAUCS subroutine library [seehttp://www.tau.ac.il/˜stoledo/taucs]. The library, which also contains in-core sparse di-rect solvers and in-core sparse iterative solvers, is portable, widely-used, well-tested, and freely-available.

Due to space restrictions, we cannot describe the algorithm in details. For the purpose of this work, itsuffices to say that the new algorithm is an enhancement of the algorithm of Rothberg and Schreiber [9],which uses the technique of Gilbert and Toledo [4] to reduce I/O. The code is implemented in C, it uses

∗ Corresponding author. e-mail: [email protected], Phone: +972 3 640 5285, Fax: +972 3 640 9357. This research was sup-ported in part by an IBM Faculty Partnership Award, by an IBM Equinox Equipment Award, and by grants 572/00 and 9060/99 fromthe Israel Science Foundation (founded by the Israel Academy of Sciences and Humanities).



Table 1 The matrices that we used to evaluate the algorithm. The first three are from the PARASOL test-matrixcollection, and the others are the Laplacians of regular 3D meshes of the dimensions shown. For each matrix, the tableshows its dimension, the number of nonzeros in its lower triangle, the number of nonzeros in the lower triangle of itsCholesky factor (using METIS reordering prior to the factorization), the size of its factor file in bytes, the number offloating-point operations performed during the factorization, and the time to factor it using our algorithm on a 192 MBmachine and on a 768 MB machine.

Matrix dim(A) nnz(A) nnz(L) bytes(L) flops T192 T768

inline1 5.05e+05 1.87e+07 1.76e+08 1.59e+09 1.52e+11 270 201ldoor 9.52e+05 2.37e+07 1.43e+08 1.29e+09 7.39e+10 250 143audikw1 9.44e+05 3.93e+07 1.26e+09 1.11e+10 5.95e+12 7629 457040-40-40 6.40e+04 2.51e+05 1.42e+07 1.36e+08 1.55e+10 23 2150-50-50 1.25e+05 4.93e+05 3.83e+07 3.55e+08 6.69e+10 102 7360-60-60 2.16e+05 8.53e+05 8.66e+07 7.95e+08 2.19e+11 310 20870-70-70 3.43e+05 1.36e+06 1.70e+08 1.54e+09 5.84e+11 819 56980-80-80 5.12e+05 2.03e+06 2.95e+08 2.59e+09 1.30e+12 1852 129090-90-90 7.29e+05 2.89e+06 4.89e+08 4.23e+09 2.72e+12 4254 2474100-100-100 1.00e+06 3.97e+06 7.70e+08 6.57e+09 5.31e+12 9456 4508110-110-110 1.33e+06 5.29e+06 1.16e+09 9.80e+09 9.54e+12 10863 8077120-120-120 1.73e+06 6.87e+06 1.72e+09 1.44e+10 1.71e+13 55754 16295130-130-130 2.20e+06 8.74e+06 2.40e+09 2.00e+10 2.76e+13 25234140-140-140 2.74e+06 1.09e+07 3.28e+09 2.72e+10 4.37e+13 42840500-30-30 4.50e+05 1.77e+06 1.18e+08 1.07e+09 1.62e+11 266 181500-40-40 8.00e+05 3.16e+06 3.05e+08 2.73e+09 7.23e+11 1017 704500-50-50 1.25e+06 4.95e+06 6.43e+08 5.71e+09 2.37e+12 3461 2164500-60-60 1.80e+06 7.14e+06 1.14e+09 9.91e+09 5.87e+12 10152 5070500-70-70 2.45e+06 9.73e+06 1.87e+09 1.59e+10 1.25e+13 10422500-80-80 3.20e+06 1.27e+07 2.84e+09 2.38e+10 2.39e+13 20348500-90-90 4.05e+06 1.61e+07 4.13e+09 3.44e+10 4.31e+13 39951500-100-100 5.00e+06 2.00e+07 5.82e+09 4.82e+10 7.50e+13 79084

the BLAS and LAPACK. The factor is written out to disk in dense blocks (supernodes), which reduces thestorage overhead.

The rest of this abstract summarizes our experimental results and discusses the benefits of the solver.For complete details of the algorithm, the data structures, and the full experimental results, see our fullmanuscript at http://www.tau.ac.il/˜stoledo/Pubs/oocspch.pdf.

2 Experimental Results

This section summarizes our experimental results. The matrices and the most important solver statistics areshown in Table 1. The matrices we used include test matrices from the PARASOL test-matrix collection, aswell as matrices generated from regular 7-point discretizations of the Poisson equation in 3D. The graphs ofthe latter are regular 3D meshes, either N -by-N -by-N meshes or 500-by-N -by-N meshes. The PARASOL

matrices are the only test matrices that we could find that cannot be factored in-core on a machine with768 MB of main memory.

We performed the experiments on two IBM Intel-based workstations, which are identical except for theamount of main memory they have. Both machines have a 2 GHz Pentium 4 processors. One machine has768 MB of main memory, and the other has 192 MB. The 768 MB configuration is more typical of themachines we expect users to use. The 192 MB machine is used mostly to show how the algorithm behaveswith very little main memory. On both machines, a 75 GB IDE disk was used for swap space and to storethe Cholesky factors. The disk has an I/O bandwidth of about 36 GB/s at the fast end, which degrades toabout 18 GB/s at the slow end.

The machines run Linux with a 2.4.19 kernel. We used ATLAS Version 3.4.1 [10] by Clint Whaleyand others for the BLAS. Using these BLAS routines and these machines, our in-core multifrontal sparsefactorization code factors matrices whose graphs are 3D meshes at a rate of approximately 1.6×109 flops.

152 V. Rotkin and S. Toledo: Out-of-Core Sparse Cholesky Factorization Method

107

108

109

1010

10−3

10−2

10−1

100

101

102

Factorization Times

Number of Nonzeros in L

Fac

toriz

atio

n T

imes

in H

ours

OOC (192 MB)OOC (768 MB)

107

108

109

1010

0

2

4

6

8

10

12

14

16x 10

8

40−40−40

50−50−50

60−60−60

500−30−30

ldoor

70−70−70

inline1

80−80−80

500−40−40

90−90−90

500−50−50

100−100−100

500−60−60

110−110−110

audikw1

120−120−120

500−70−70

130−130−130

500−80−80

140−140−140

500−90−90

500−100−100

Paging vs. Explicit OOC

Number of Nonzeros in L

Flo

atin

g−P

oint

Ops

per

Sec

ond

IC MF (192 MB)IC LL (192 MB)OOC (192 MB)OOC (768 MB)

Fig. 1 Left: Out-of-core factorization times, as a function of the number of nonzeros in the factor. Right: Perfor-mance of the out-of-core factorizations. The figure shows the performance of the out-of-core factorization on twomachines with different amounts of memory, as well as the performance of two in-core algorithms that uses the oper-ating system’s demand-paging mechanism.

We used METIS [5] version 4.0 to symmetrically reorder the rows and columns of the matrices prior tofactoring them.

Figure 1 and Table 1 provide an overview of the performance of the solver. The data shows that on amachine with 768 MB of main memory, our algorithm can factor matrices whose factors have 3-4 billionnonzeros overnight (10–12 hours), and matrices whose factors have about 6 billion nonzeros in around 24hours. The data also shows that the algorithm can factor matrices whose factors have around one billionnonzeros in an hour or two.

Figure 1 (right) shows the performance of the algorithm in floating-point operations per second. On themachine with 768 MB of main memory, the algorithm usually achieve a performance of 0.8–1 Gflop/s.This level of performance is about 50% of the performance of the in-core multifrontal algorithm on smallmatrices. The most important conclusion from this data is that the performance of the algorithm is accept-able despite the large amount of I/O that it performs.

The data in Figure 1 also shows that the amount of main memory has a significant impact on perfor-mance. The algorithm achieves significantly higher performance on the machine with 768 MB of mainmemory than on the machine with only 192 MB of memory. This is an expected consequence of the factthat more memory allows the algorithm to use larger subtrees and to perform less I/O. Another memory-size-related phenomenon that is evident in Figure 1 is that as matrices grow, performance declines, althoughnot dramatically. This is evident only on the 192 MB machine, on which large matrices leave little memoryfor subtrees of the factor. This causes the amount of memory left for subtrees shrinks and an increase inI/O activity. On the machine with 768 MB, none of these matrices fill a substantial amount of the mainmemory, so this behavior does not occur, although it would occur on even larger matrices.

Another important conclusion that we can draw from Figure 1 is that an explicit out-of-core algorithmis far more efficient than an in-core algorithm that uses the operating system’s demand-paging mechanism.On large matrices, the in-core algorithms are slower by factors of up to 3. The difference is probablydue to the better locality in the explicit out-of-core algorithm, which builds a schedule that is specificallydesigned for the machine’s memory size. Note that the swap area (page file) was stored on the fastest areaof the same disk that was also used to store the out-of-core factors. Obviously, we can only factor in-corematrices whose factors are smaller than 2 GB on these 32 bit machines, so the out-of-core algorithm havejustification beyond performance.


3 Discussion and Conclusions

Our out-of-core algorithm can solve enormous systems within reasonable amounts of time on typical work-stations. The out-of-core algorithm avoids the need for still-rare and still-expensive 64-bit workstationswith tens of gigabytes of main memory, or the need for a cluster, one of which is necessary for solvingsuch systems in core.

The scaling behavior of sparse direct triangular factorization is superlinear, as known from theory and asdemonstrated by out experiments. But while the asymptotic behavior is superlinear, the constants involvedare small, which allows us to solve huge linear systems within reasonable amounts of time. In particular,our algorithm can solve a matrix such as AUDIKW, the largest matrix in any test-matrix collection, whosedimension is almost a million, in about an hour and 15 minutes. The algorithm can factor a matrix whosegraph is a 140-by-140-by-140 mesh with over 2.7 million vertices in about 12 hours. We believe that thesize of linear systems that our algorithm can solve on typical workstations overnight or during a lunchbreak makes the algorithm useful.

The algorithm is highly reliable, in that the amount of main memory it needs is proportional to n, thedimension of the system, and not to the number of nonzeros in A or in L. This implies that our algorithmcan handle almost any matrix that any solver, whether iterative or direct, can solve.

The running time and storage requirements of the algorithm are predictable. The symbolic eliminationphase determines the size of the factor and the number of floating-point operations to compute it. Giventhat with sufficient main memory the algorithm operates within a fairly narrow floating-point-operations-per-second range, the running time can be predicted to within a factor of about 2. This feature allowsthe user to decide whether he/she wants to spend the computation time and disk space to solve a givenproblem, before significant resources have been used. Since the column counts in L can be computed evenbefore the symbolic elimination, this information can be made available to the user very quickly, beforesignificant resources are used.

Acknowledgements Thanks to Ed Rothberg for discussions regarding the details of his sparse Cholesky factorizationalgorithms. Thanks to Didi Bar-David for configuring the operating system on the test machines.

References

[1] Petter E. Bjørstad. A large scale, sparse, secondary storage, direct linear equation solver for structural analysisand its implementation on vector and parallel architectures. Parallel Computing, 5(1):3–12, 1987.

[2] Alan George and Hamza Rashwan. Auxiliary storage methods for solving finite element systems. SIAM Journalon Scientific and Statistical Computing, 6(4):882–910, 1985.

[3] J. A. George, M. T. Heath, and R. J. Plemmons. Solution of large-scale sparse least squares problems usingauxiliary storage. SIAM Journal on Scientific and Statistical Computing, 2(4):416–429, 1981.

[4] John R. Gilbert and Sivan Toledo. High-performance out-of-core sparse LU factorization. In Proceedings of the9th SIAM Conference on Parallel Processing for Scientific Computing, San-Antonio, Texas, 1999. 10 pages onCDROM.

[5] George Karypis and Vipin Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs.SIAM Journal on Scientific Computing, 20:359–392, 1998.

[6] Esmond G. Ng and Barry W. Peyton. Block sparse Cholesky algorithms on advanced uniprocessor computers.SIAM Journal on Scientific Computing, 14(5):1034–1056, 1993.

[7] Edward Rothberg and Anoop Gupta. Efficient sparse matrix factorization on high-performance workstations—exploiting the memory hierarchy. ACM Transactions on Mathematical Software, 17(3):313–334, 1991.

[8] Edward Rothberg and Robert Schreiber. An alternative approach to sparse out-of-core factorization. presentedat the 2nd SIAM Conference on Sparse Matrix, Coeur d’Alene, Idaho, October 1996.

[9] Edward Rothberg and Robert Schreiber. Efficient methods for out-of-core sparse cholesky factorization. SIAMJournal on Scientific Computing, 21:129–144, 1999.

[10] R. Clint Whaley and Jack J. Dongarra. Automatically tuned linear algebra software. Technical report, ComputerScience Department, University Of Tennessee, 1998. available online at www.netlib.org/atlas.


Positive Two-level Difference Schemes for One-dimensionalConvection-Diffusion Equations

Joao Santos ∗

Department of Mathematics, University of Aveiro, 3810-193 Aveiro, Portugal


In this paper we study two-level, three and four-points numerical schemes for solving unsteady one-dimensionalconvection-diffusion equations, dominated by convection. The main objective is to select schemes that pro-duce positive and accurate numerical results. Unconditionally positive methods have maximal order oneand, therefore, that one compromise is difficult. For the construction of the schemes we start by defininggeneral two-level three and four-point methods with real parameters. A criterion based on the analysis of themodified equation and on sufficient conditions for positivity is defined and points out a set of (two-level andthree or four-point) schemes. With the aim of testing certain properties, numerical examples with constantvelocity and diffusion are included.

1 Introduction

Many standard numerical methods when used to discretize convection-diffusion equations, dominated byconvection, produce oscillatory non-positive solutions. The positivity is frequently a natural requirementof the physical model and it is natural that the main objective of this paper is to select schemes that producepositive and accurate numerical results, for strongly convective problems.

Several authors studied this kind of problems. The work of Rigal [3] was the start point of our work.The present paper intends to present two-level, three and four-points schemes, with a particular feature,

namely positivity, approximating the unsteady linear convection-diffusion equation

ut + βux − αuxx = 0, (P)

where β is the constant velocity and α is the diffusion coefficient.The more general two-level, three-point scheme is written

l−1un+1j−1 + l0u

n+1j + l1u

n+1j+1 = r−1u

nj−1 + r0u

nj + r1u

nj+1, (1)

and, in the same way, the more general two-level, four-point scheme approximating (P) (β > 0) is written

l−2un+1j−2 + l−1u

n+1j−1 + l0u

n+1j + l1u

n+1j+1 = r−2u

nj−2 + r−1u

nj−1 + r0u

nj + r1u

nj+1. (2)

Fundamental properties, like consistency and stability, are established for the general schemes (1) and(2).

For the construction of the schemes we start by defining general numerical methods with real parame-ters. The process of selection of the parameters is divided in two phases:

∗ e-mail: [email protected], Phone: +351 234 370 359, Fax: +351 234 382 014



• conditions in the parameters are established based on the classical theory of the modified equationapproach [5], in such a way that the conditions of consistency and stability are satisfied;

• the parameters that remain free are used to guarantee the positivity under certain conditions.

2 Three-point schemes

2.1 Fundamental properties

Consider the general scheme (1). Rigal in [3] established the stability condition and in [2] defined non-oscillatory schemes, to discard the schemes which present roughly oscillatory solutions.

2.2 Construction of a positive scheme

We define a general two-level three-point scheme

Mx

(un+1j − unj

k

)+ β

[(12

+A1

)Lxu

nj +

(12

+A2

)Lxu

n+1j

]−

α

[(12

+B1

)Lxxu

nj +

(12

+B2

)Lxxu

n+1j

]= 0 (3)

where Mx ≡ δ, (1 − 2δ), δ, Lx ≡ 12h −1, 0, 1 , Lxx ≡ 1

h2 1,−2, 1 and δ, A1, A2, B1, B2 are realparameters. For δ = A1 = A2 = B1 = B2 = 0, the classical Crank-Nicolson scheme is obtained.

2.2.1 Modified-equation approach and positivity

We use the modified-equation approach (Warming and Hyett [5]) to select the parameters. The behaviourof the solutions (the dissipative and dispersive tendencies of a particular scheme) can be analysed fromthis modified equation. In the modified equation, odd-order derivatives are associated with dispersionproperties and even-order derivatives are associated with dissipation properties. We are also interested inthe positivity of the scheme.

Using the modified equation approach it is possible to show that the general scheme (3) has as theleading terms in the truncation error

β (A1 +A2)ux −[α (B1 +B2) + β2kA2

]uxx+

β

[αk (A2 +B2) +

h2

6(1 +A1 +A2 − 6δ) +

β2k2

2

(16

+A2

)]uxxx−[

α2kB2 +αh2

12(1 +B1 +B2 − 12δ) +

β2αk2

2

(12

+B2 + 2A2

)+

β2kh2

6

(12

+A2 − 3δ)

+β4k3

6

(14

+A2

)]uxxxx.

Using the parameters to eliminate the coefficients of ux, uxx, uxxx in the truncation error, we obtainthe scheme (1) with

156 J. Santos: Positive Two-level Difference Schemes

l−1 =(s+

C2

2− C

2

)A2 +

16

+C2

12− C

4− s

2; l0 = −

(2s+ C2

)A2 +

23− C2

6+ s;

l1 =(s+

C2

2+C

2

)A2 +

16

+C2

12+C

4− s

2; r−1 =

(s− C2

2− C

2

)A2 +

16

+C2

12+C

4+s

2;

r0 =(−2s+ C2

)A2 +

23− C2

6− s ; r1 =

(s− C2

2+C

2

)A2 +

16

+C2

12− C

4+s

2,

where C = βkh is the Courant number, s = αk

h2 and A2 is a free parameter. This parameter will be used,whenever possible, to guarantee the positivity. We recall that Rigal in [3] used all parameters to eliminatetotally or partially the term associated with uxxxx (even-order term associated with dissipation properties).

To guarantee that the scheme is stable, non-oscillatory and positive, we must impose conditions on A2.For each C and Rcell (cell Reynolds number - a Reynolds number based on the characteristic length -Rcell = C

s ) there exists, in general, an admissible interval for A2 where the properties of stability andpositivity are guaranteed. Then, we present a selection criterion to choose A2. This procedure allows us toget a scheme - P3-scheme.

3 Four-point schemes

3.1 Fundamental properties

Consider

K0 = −l−1 − l1 − 4l−2 + (l1 − l−1)2 + 4l−2 (l−2 + l−1 − l1) +

r−1 + r1 + 4r−2 − (r1 − r−1)2 − 4r−2 (r−2 + r−1 − r1)

K1 = 4(l−1l1 − r−1r1 − l2−2 + r2−2 − l−2l−1 + r−2r−1 + l−2 − r−2 + 5l−2l1 − 5r−2r1

)K2 = 16 (r−2r1 − l−2l1)

Lemma 3.1 The scheme (2) is stable if and only if the coefficients li and ri satisfy

K0 ≥ 04K2K0 −K2

1 ≥ 0, if K2 > 0 and 0 ≤ −K1 ≤ 2K2

K2 +K1 +K0 ≥ 0, otherwise

Lemma 3.2 The scheme (2) is non-oscillatory if the coefficients li and ri satisfy

(l1 − r1) (l−1 − r−1) ≥ 0, if l−2 = r−2

(l−2 − r−2) (l0 − r0) ≥ − (l−2 − r−2) [(l−2 − r−2) + (l−1 − r−1)] ≥ 0, if l−2 = r−2

3.2 Construction of a positive scheme

We define a general two-level four-point scheme


un+1j − unj

k+ β

[(12

+ A1

)L(4)x unj +

(12

+A2

)L(4)x un+1

j

]−

α

[(12

+B1

)Lxxu

nj +

(12

+B2

)Lxxu

n+1j

]+ (4)

βαk

[(12

+D1

)Lxxxu

nj +

(12

+D2

)Lxxxu

n+1j

]= 0

whereL(4)x = 1

6h 1,−6, 3, 2 , Lxx = 1h2 1, 2, 1,Lxxx = 1

h3 −1, 3,−3, 1andA1, A2, B1, B2, D1, D2

are real parameters.Using an almost similar procedure to the defined in the three-point schemes, for each C and Rcell there

exists, in general, an admissible interval for a parameter that remains free, where the properties of stabilityand positivity are guaranteed. We present a selection criterion to choose the value of the parameter. Thisprocedure allows us to get a scheme - P4-scheme.

4 Numerical examples

We are interested in pollutant-transport problems where non-smooth initial solutions are very common.Three numerical tests are considered to compare the behaviour of solutions produced by P3 and P4 −

schemes with the numerical solutions obtained by the most accurate fourth-order scheme presented in [3]- R2 scheme.

In our numerical experiments we verify that, in general, when we have sharper profiles and stronglyconvective problems (Rcell >> 1), the schemes P3 and P4 present more accurate numerical results thanthe R2 scheme. Moreover, positivity of the numerical solution is always guaranteed.

Between P3 and P4 we observe that they demonstrate to have a similar behavior (as much in the profilesas in the errors), but the P3 − scheme is simpler and “cheaper”.

Acknowledgements It is a pleasure to acknowledge Paula de Oliveira for many kind suggestions and advices.

References

[1] C.A.J. Fletcher, Computational Techniques for Fluid Dynamics 1, (Springer-Verlag, 1996).[2] A. Rigal, Int. J. Numer. Methods Engng. 30, 307-330 (1990).[3] A. Rigal, J. Comput. Phys. 114, 59-76 (1994).[4] C.B.Vreugdenhil, B. Koren (Eds.), Numerical Methods for Advection-Diffusion Problems, Notes on Numerical

Fluid Mechanics Vol. 45, (Vieweg, Braunschweig, 1993).[5] R.F.Warming, B.J.Hyett, J. Comput. Phys. 14, 159-179 (1974).


Stability Analysis of High-order Finite Volume Schemesin Turbulent Simulations

Olga Shishkina∗1 and Claus Wagner∗∗2

1,2 DLR - Institute for Aerodynamics and Flow Technology, Bunsenstr. 10, D-37073 Gottingen, Germany


The sufficient conditions for stability are found for the class of finite volume schemes of different order.From the theorems presented in this work all known stability conditions for the central high order ”Leap-Frog plus Euler-Forward” scheme are follows immediately.

Among the other schemes for solving convection-diffusion problems, the central high order ”Leap-Frogplus Euler-Forward” finite volume scheme is still one of the most popular schemes. This fact can beexplained by the following advantages of the scheme.

• High stability of the scheme for large values of the Peclet number Pej = Cj/Dj , where Cj isthe Courant number, Cj = Uj∆t

∆xj, and Dj is the diffusion number, Dj = µ∆t

∆x2j

, µ is the diffusion

coefficient, µ > 0, Uj is the velocity field, ∆xj is the size of finite volumes in direction xj , ∆t is thetime step. This property makes possible to calculate turbulent flows for high Reynolds number andsimulate high Rayleigh number convection with the help of this scheme.

• The scheme does not suffer from fasle diffusion.

• It is suitable for cheap explicit calculations.

Consider the following linearized convection-diffusion problem in finite volumes

ϕn+1α,β,γ − ϕn−1

α,β,γ

2+ C1

(ϕnα+1/2,β,γ − ϕnα−1/2,β,γ

)+ C2

(ϕnα,β+1/2,γ − ϕnα,β−1/2,γ

)+C3

(ϕnα,β,γ+1/2 − ϕnα,β,γ−1/2

)(1)

= D1∆x1

((ϕn−1α+1/2,β,γ

)′

−(ϕn−1α−1/2,β,γ

)′)+D2∆x2

((ϕn−1α,β+1/2,γ

)′

−(ϕn−1α,β−1/2,γ

)′)+D3∆x3

((ϕn−1α,β,γ+1/2

)′

−(ϕn−1α,β,γ−1/2

)′),

where ϕnα+1/2,β,γ denotes a high order approximation to the true solution in time t = n∆t at the point((α+1/2)∆x1, β∆x2, γ∆x3), i.e. on the cell boundary between the nodes (α, β, γ) and (α+1, β, γ); also(ϕn−1α+1/2,β,γ

)′

denotes a high order approximation to the partial derivative of the true solution in direction

x1 in time t = (n− 1)∆t at the point ((α+ 1/2)∆x1, β∆x2, γ∆x3). (The notation, which correspond tothe other directions, are constructed analogously).

∗ e-mail: [email protected], Phone: +49 551 709 2294, Fax: +49 551 709 2404∗∗ e-mail: [email protected], Phone: +49 551 709 2261, Fax: +49 551 709 2404



In order to investigate the stability of scheme, we assume periodic boundary conditions and express thesolution as

ϕnα,β,γ = ϕ0ei(ωn∆t−k1α∆x1−k2β∆x2−k3γ∆x3),

where i is the imaginary unit, k2, k2, and k3 are the spacial wavenumbers, ω is the temporal frequency.Further, the amplification factor G,

G =ϕn+1α,β,γ

ϕnα,β,γ= eiωn∆t,

is introduced [1].We divide (1) by ϕnα,β,γ and get the equation for G. If all the solutions G of this equation satisfy

|G| ≤ 1, then the scheme (1) is stable, i.e. the amplitude of the solution (1) does not grow in time.Theorem 1. Suppose that the amplification factor G of a scheme for solving the three-dimensional

convection-diffusion problem satisfies the following equation

G2 − 2iΠG− S = 0, (2)

where i is the imaginary unit,

Π ≡3∑j=1

Cjα(θj), S ≡ 1 − 23∑j=1

Djβ(θj). (3)

The real functions α(θj) and β(θj) satisfy the equalities

β(θj) = β(−θj), β(θj) ≥ 0; (4)

α(θj) = −α(−θj); ∀θj ∈ [−π;π], j = 1, 2, 3.

The following statements are equivalenta) the scheme is stable, i.e. for all solutions G of the equation (2)

|G| ≤ 1 ∀θj ∈ [−π;π], j = 1, 2, 3; (5)

b)

2|Π| − 1 ≤ S ≤ 1 ∀θj ∈ [−π;π], j = 1, 2, 3; (6)

c)

3∑j=1

(|Cjα(θj)| +Djβ(θj)) ≤ 1 ∀θj ∈ [−π;π], j = 1, 2, 3. (7)

Theorem 2. Suppose that the amplification factor G of a scheme satisfies the relations (2-4) and also

maxθα(θ)

3∑j=1

|Cj | + maxθβ(θ)

3∑j=1

|Dj | ≤ 1. (8)

Then the scheme is stable.If we make the following notations

C ≡3∑j=1

|Cj |; D ≡3∑j=1

|Dj |; Cpure ≡1

maxθ α(θ); Dpure ≡

1maxθβ(θ)

from the Theorem 2 we receive the following sufficient stability conditions with the preceeding notations.Corollary 1. Suppose that the scheme has the properties (2-4). Then the scheme is stable if

160 O. Shishkina and C. Wagner: Stability Analysis in Turbulent Simulations

• C ≤ Cpure (in case of pure convection);

• D ≤ Dpure (in case of pure diffusion);

• CCpure

+ DDpure

≤ 1 (in case of convection and diffusion).

Corollary 2 [1], [2]. If for the central second order scheme with

ϕnα+1/2,β,γ =ϕnα+1,β,γ + ϕnα,β,γ

2; (9)(

ϕn−1α+1/2,β,γ

)′

=ϕn−1α+1,β,γ − ϕn−1

α,β,γ

∆x1

(and analogous treatment in other directions) the requirement

3∑j=1

(|Cj | + 4Dj) ≤ 1,

is fulfilled, then the scheme is stable.Proof. Since the central second order scheme satisfies (2-4) with

α(θj) = sin θj ; β(θj) = 2(1 − cos θj), (10)

from the Therem 2 follows the stability of the central second order scheme. End of proof.Corollary 3. When

3∑j=1

(1.37|Cj | + 4Dj) ≤ 1, (11)

the central fourth order (for convective term) anf the second order (for diffusive term)

ϕnα+1/2,β,γ =−ϕnα−1,β,γ + 7ϕnα,β,γ + 7ϕnα+1,β,γ − ϕnα+2,β,γ

12; (12)(

ϕn−1α+1/2,β,γ

)′

=ϕn−1α+1,β,γ − ϕn−1

α,β,γ

∆x1

scheme is stable.Proof. The central fourth order (for convective term) anf the second order (for diffusive term) scheme

satisfies (2-4) with

α(θj) =sin θj(4 − cos θj)

3; β(θj) = 2(1 − cos θj). (13)

Since the maximum value of α(θj) is reached for cos(θj) = 2−√6

2 and equals to

maxθj

α(θj) =

√8√

6 + 312

≈ 1.37,

from the Therem 2 follows the stability of this scheme. End of proof.Corollary 4. When

3∑j=1

(1.37|Cj | + 5.33Dj) ≤ 1, (14)


the central fourth order (for convective term) anf the second order (for diffusive term) scheme,

ϕnα+1/2,β,γ =−ϕnα−1,β,γ + 7ϕnα,β,γ + 7ϕnα+1,β,γ − ϕnα+2,β,γ

12; (15)(

ϕn−1α+1/2,β,γ

)′

=ϕnα−1,β,γ − 15ϕnα,β,γ + 15ϕnα+1,β,γ − ϕnα+2,β,γ

12∆x1,

is stable.Proof. The central fourth order (for convective term) anf the second order (for diffusive term) scheme

satisfies (2-4) with

α(θj) =sin θj(4 − cos θj)

3; (16)

β(θj) =(4 − cos θj)2 − 9

3.

Since the maximum values of α(θj) and β(θj) are

maxθj

α(θj) =

√8√

6 + 312

≈ 1.37, maxθj

β(θj) =163,

from the Therem 2 follows the stability of this scheme. End of proof.Theorem 3 (main). Suppose that the amplification factor G of a scheme satisfies the relations (2-4) and

also

maxθ β(θ)2

3∑j=1

(Dj +

√D2j +AC2

j

)≤ 1, (17)

where

A = maxθ

α2(θ)β(θ)(maxφ β(φ) − β(θ))

. (18)

Then the scheme is stable.Corollary 5 [3], [2]. The second order scheme with (9) is stable when

3∑j=1

(2Dj +

√4D2

j + C2j

)≤ 1, (19)

Corollary 6 [2]. The scheme with (9) is stable when

3∑j=1

(2Dj +

√4D2

j +259C2j

)≤ 1, (20)

Note, that for large Peclet numbers, namely, for Pe ≥ 6.09 (in which we a interested in the convectionsimulations) the requirement (11) is weaker than (20).

Corollary 7. The fourth order scheme with (15) is stable when

3∑j=1

(83Dj +

√649D2j +

209C2j

)≤ 1, (21)

References

[1] U. Schumann, J. Comput. Phys. 18, 465-470 (1975).[2] P. Wesseling, IMA J. of Numer. Anal. 16, 583-598 (1996).[3] T.F. Chan, SIAM J. of Numer. Anal. 21, 272-284 (1984).


A Factorized High Dimensional Model Representation on thePartitioned Random Discrete Data

M. Alper Tunga∗1 and Metin Demiralp∗∗2

1 Department of Computer Science and Engineering, Engineering Faculty, Isık University, Maslak, 34398,Istanbul, Turkey



The main purpose of this work is to obtain the general structure of a product type of multivariate functionif the values of that function are given randomly at the nodes of a hyperprism. When the dimensionalityof multivariate interpolation and the number of the data sets increase unboundedly, many problems canbe encountered in the standard numerical methods. Factorized High Dimensional Model Representation(FHDMR) uses the Generalized High Dimensional Model Representation (GHDMR) components to obtainthe general structure of the multivariate function. The given random discrete data is partitioned by GHDMRmethod. This partitioned data produces a structure for the multivariate function by using one-variable La-grange interpolation formula including only the constant term and the univariate terms of the GHDMRcomponents. Finally, a general structure is obtained via FHMDR by using these components. By this way,the multidimensionality of a multivariate interpolation is approximated by lower dimensional interpolationslike univariate ones.

1 Introduction

When the values of a multivariate function f(x1, ..., xN ) are given for only a finite number of points inthe space of independent variables x1, x2,...,xN and the analytical structure of that function is sought tobe passed through those given points, the interpolation methods come into mind. Because of the continu-ity considerations the basis functions for the interpolation are chosen as multinomials of lowest possibledegrees. However, this routine approach may become quite cumbersome when the dimensionality in-creases unboundedly due to rapidly growing complexities coming from high dimensionality. This urgesus to develop a divide–and–conquer algorithm which approximates the function to be constructed first bya constant term, second univariate functions, third bivariate functions and so on. This approach whichis called High Dimensional Model Representation was first proposed by Sobol[1] and later by Rabitz[2].This expansion can be defined by the following general equation for a given multivariate function.

f(x1, ..., xN ) = f0 +N∑i1=1

fi1(xi1) +N∑

i1,i2=1i1<i2

fi1i2(xi1 , xi2) + · · · + f12...N (x1, ..., xN ) (1)

The functions on the right hand side of this equality are orthogonal decomposition components of theoriginal function. The orthogonality condition is defined over an inner product and square of the originalfunction, f(x1, ..., xN ), and the individual squares of the right hand side components are assumed to beintegrable functions.




In this representation, the values of the function f(x1, ..., xN ) are not known continuously, instead,given at the nodes of a finite hyperprismatic regular grid. Nodes are represented by N-tuples which are,in fact, the elements of a cartesian product. But, there is an incompleteness in this application. Because,if the data have repetitions on some values of any independent variable, then there may appear someincompatibilities in the resulting equations. So, a new method of model representation is needed.

In this new method, the values of the function f(x1, ..., xN ) are not known continuously, instead, givena set of random discrete data in a finite hyperprism. First step is to partition this data into low-variate datafor GHDMR, Generalized High Dimensional Model Representation [3], whose components are basicallyvery similar to the standard HDMR’s.

In this sense the action needed to be taken is to define a general weight function which is a delta typefunction. By using the orthogonality, this weight function and an auxiliary weight function; the data arepartitioned into low-variate data for GHDMR components and then the constant term, f0, the univariatefunction, fm(xm), and the others can be determined.

Therefore, instead of an analytic structure for these functions, tables of appropriate tuples of data canbe produced. These tables enable us to use interpolation in according to what we desire to obtain.

After the GHDMR components given at the right hand side of the equation (1) are obtained a completefactorization scheme can be defined. One can develop a factorization formula to approximately representthe given function by using only constant and univariate GHDMR components. This representation iscalled FHDMR, Factorized High Dimensional Model Representation. By using the GHDMR components,the FHDMR components can be obtained and these GHDMR and FHDMR components will be used torepresent the multivariate function, f(x1, ..., xN ).

In this work, the method to obtain the FHDMR components based on the GHDMR components is given.

2 Generalized High Dimensional Model Representation

It is assumed that the HDMR expansion will be constructed under a general multivariate weight function,W (x1, ..., xN ), to develop a Generalized High Dimensional Model Representation. This general weightfunction for GHDMR is represented via the following HDMR expansion.

W (x1, ..., xN ) = W0 +N∑i1=1

Wi1(xi1) +N∑

i1,i2=1i1<i2

Wi1i2(xi1 , xi2) + · · ·+W12...N (x1, ..., xN ) (2)

Another but product type of weight function, an auxiliary weight function, Ω(x1, ..., xN ), is defined toobtain the GHDMR components as follows

Ω(x1, ..., xN ) ≡N∏j=1

Ωj(xj) (3)

where the overall weight is normalized over a hyperprism whose corners are located at the points, (a1, b1),..., (aN , bN ).

Firstly, for getting a rule to obtain the constant term, f0, the operator I0 whose explicit definition isgiven below is applied to the both sides of the equation (1).

I0F (x1, ..., xN ) ≡∫ b1

a1

dx1 · · ·∫ bN

aN

dxNΩ(x1, ..., xN )F (x1, ..., xN ) (4)

The rule of univariate term determination can be produced by using the operator Ii which is defined byusing an arbitrary square integrable function, F (x1, ..., xN ) as follows.

IiF (x1, ..., xN ) ≡∫ b1a1dx1 · · ·

∫ bi−1

ai−1dxi−1

∫ bi+1

ai+1dxi+1 · · ·

∫ bN

aNdxNΩ1(x1) · · ·Ωi−1(xi−1)×

× Ωi+1(xi+1) · · ·ΩN (xN )F (x1, ..., xN ), 1 ≤ i ≤ N (5)

164 M.A. Tunga and M. Demiralp: A Factorized High Dimensional Model Representation

The other GHDMR components of the given multivariate function can be obtained similarly.

3 Random Data Partitioning via GHDMR

Assume that the following (N+1)–tuples is taken as data to describe a multivariate function f(x1, ..., xN )

dj ≡(x

(j)1 , ..., x

(j)N , ϕj

), ϕj ≡ f(x(j)

1 , ..., x(j)N ), 1 ≤ j ≤ m (6)

which means that all information about f(x1, ..., xN ) are these values and this implies that a weight func-tion which picks up the values of the function it multiplies, only at these points has to be used. Thisnecessitates the use of delta function type weight. For this problem the weight function can be defined asfollows

W (x1, ..., xN ) ≡m∑j=1

αjδ(x1 − x(j)1 ) · · · δ(xN − x

(j)N ) (7)

where αj parameters are used for making it possible to give different importance to each individual datum.Their values will be constrained to get the normalization of the weight function. An auxiliary weightfunction is going to be used for this normalization. Its definition is given in equation (3).

By using the concepts constructed until now, the general structures of the general weight function com-ponents and the values of the constant term and the univariate terms of the given multivariate function canbe obtained.

These values enable us to use Lagrange interpolation formula to determine the general structures ofunivariate components of the multivariate function. Hence, the following approximation can be written

f(x1, ..., xN ) ≈ f0 +N∑i1=1

fi1(xi1) (8)

where approximants are obviously polynomials.

4 Factorized High Dimensional Model Representation

The factorized form of GHDMR can be obtained by using the following equation of the Factorized High Di-mensional Model Representation (FHDMR)[4] expansion for a given multivariate function, f(x1, ..., xN ).

f(x1, ..., xN ) = r0

[N∏i1=1

( 1 + ri1(xi1) )] N∏

i1,i2=1i1<i2

( 1 + ri1i2(xi1 , xi2) )

×× · · · [ ( 1 + r12...N (x1, ..., xN ) ) ] (9)

New relations can be produced by comparing the above relation’s additive expansion and the right handside of (1). To make the comparison, idempotent operators will be used as auxiliary tools. The propertiesof these operators are as follows,

I(id)j I(id)

k ≡ I(id)k I(id)

j ,[I(id)j

]2≡ I(id)

j j, k = 1, ..., N (10)

By using these operators HDMR and FHDMR expansions are replaced by the following generalized ones.

S(x1, ..., xN ) ≡ f0I +N∑i1=1

fi1(xi1)I(id)i1

+N∑

i1,i2=1i1<i2

fi1i2(xi1 , xi2)I(id)i1

I(id)i2

+ · · · (11)


R(x1, ..., xN ) ≡ r0

[N∏i1=1

(I + ri1(xi1)I

(id)i1

)] N∏i1,i2=1i1<i2

(I + ri1i2(xi1 , xi2)I

(id)i1

I(id)i2

) · · ·(12)

These two entities are representing same thing, hence their right hand sides must match for all idempotentoperators. This permits us to determine the constant term, the univariate terms and the other terms of theFHDMR expansion. In this work, the structure of the constant and the univariate terms are obtained. Theconstant term is obtained by the comparison of the coefficients of the identity operator in the two relations.

r0 = f0 (13)

The next step is to find the structure of the univariate terms of the FHDMR expansion. The following resultis obtained.

ri1(xi1) =fi1(xi1)f0

(14)

These comparisons can be rebuilt to include more general structures including some arbitrary parametersby rewriting the relation given in (11)

S(x1, ..., xN ) ≡ f0I +N∑i1=1

(fi1(xi1) + c

(1)i1

)I(id)i1

+

+N∑

i1,i2=1i1<i2

(fi1i2(xi1 , xi2) + c

(2,1)i1i2

(xi1) + c(2,2)i1i2

(xi2))I(id)i1

I(id)i2

+ · · · (15)

whereN∑i1=1

c(1)i1

= 0,N∑

i1,i2=1i1<i2

c(2,1)i1i2

(xi1) = 0,N∑

i1,i2=1i1<i2

c(2,2)i1i2

(xi2) = 0 (16)

Now, a new comparison can be done between the relations (12) and (15). As a result the following relationsare obtained

r0 = f0 (17)

which is same as before and

ri1(xi1) =fi1(xi1)f0

+c(1)i1

f0(18)

as the general structure of the univariate terms of the FHDMR expansion.Some numerical applications are done to test the convergence of the method. It is seen that when the

constants, ci1 and the others, are selected as complex numbers and are determined by optimizing the squareof the norm of the following function the results are very similar to the original values.

F (x1, x2) = f(x1, x2) − fFHDMR(x1, x2) (19)

References

[1] I.M. Sobol, Sensitivity Estimates for Nonlinear Mathematical Models, MMCE, Vol.1, No.4.407, (1993).[2] H. Rabitz and O. Alıs, General Foundations of High Dimensional Model Representations, J. Math. Chem., 25,

197–233 (1999).[3] M. Demiralp and H. Rabitz, Generalized High Dimensional Model Representation of Multivariate Functions, (to

be published).[4] M. Demiralp and H. Rabitz, Factorized High Dimensional Model Representation of Multivariate Functions, (to

be published).


Optimally Controlled Dynamics of One Dimensional HarmonicOscillator: Linear Dipole Functions and Quadratic Penalty

Burcu Tunga∗1 and Metin Demiralp∗∗1



In this work we investigate the optimally controlled quantum dynamics of one dimensional harmonic os-cillator. External field is characterized by a linear dipole function. Penalty term is taken as kinetic energy.Some specific spatial structures are assumed for wave and costate functions. The resulting equations aretemporal ordinary differential equations beside an algebraic equation which connects the forward and back-ward evolutions.

The Hamiltonian of a system under the influence of an external field is defined as follows

H = H0 + µE(t) (1)

where H0 is the reference Hamiltonian for the system unperturbed by the control field E(t). Here, thislatter field is taken as interacting with the quantum mechanical system through the dipole operator µ.

The optimal control problem is constructed to find E(t), such that some particular objectives are reachedand penalties imposed. In this problem the following cost functional J is used to describe the system.

J = Jo + J (1)p + J (2)

p + Jc,d (2)

where

Jo =12

(〈ψ(T )|O|ψ(T )〉 − O

)2

(3)

is an objective term including the Hermitian operator O and the expectation value of that operator as O,

J (1)p =

12

T∫0

dtWp(t)〈ψ(t)|O′|ψ(t)〉2 Wp(t) > 0; t ∈ [ 0, T ] (4)

J (2)p =

12

T∫0

dtWE(t)E(t)2, WE(t) > 0; t ∈ [ 0, T ] (5)

are the penalty terms to suppress the expectation value of an undesired observable operator, O′ and tominimize the field fluence,

Jc,d =

T∫0

dt〈λ(t)|ih ∂∂t

−H(t)|ψ(t)〉 +

T∫0

dt〈λ∗(t)|−ih ∂∂t

−H(t)|ψ∗(t)〉 (6)

∗ Corresponding author: e-mail: burcu [email protected], Phone: +90 212 285 70 82, Fax: +90 212 285 70 73∗∗ e-mail: [email protected], Phone: +90 212 285 70 82, Fax: +90 212 285 70 73



is the dynamic constraint.The control equations are obtained by demanding that the first variation of J be vanishing

δJ = 0 (7)

The resultant variational equations are as follows.

ih∂ψ(t)∂t

= [H0 + µE(t)]ψ(t) (8)

ψ(0) = ψ (9)

ih∂λ(t)∂t

= [H0 + µE(t)]λ(t) −Wp(t)〈ψ(t)|O′|ψ(t)〉O′ψ(t) (10)

λ(T ) =i

hηOψ(T ) (11)

E(t) =2

WE(t) (〈λ(t)|µ|ψ(t)〉) (12)

〈ψ(T )|O|ψ(T )〉 = O + αη (13)

where the system under consideration is a one dimensional harmonic oscillator whose Schrodinger’s equa-tion can be given as follows when it is isolated. This equation can be rewritten in a physically dimensionlessform

ih∂ψ(x, t)∂t

= − h2

2m∂2ψ(x, t)∂x2

+12kx2ψ(x, t) (14)

by using the following transformations.

(mk)14

h12

x −→ x,

√k

mt −→ t (15)

If an external field with an amplitude E(t) is applied to the harmonic oscillator then the equation (8) canbe explicitly written as follows.

i∂ψ(x, t)∂t

= −12∂2ψ(x, t)∂x2

+(

12x2 + µE(t)

)ψ(x, t) (16)

The initial wave function which is selected for the harmonic oscillator is corresponding to the ground stateof the harmonic oscillator

ψ(x, 0) = π− 14 e−

x22 (17)

and the general form of the wave function is assumed to be

ψ(x, t) = A(t)eα1(t)x+α2(t)x2

(18)

In this work, the µ function is taken linearly as follows.

µ = x (19)

By using the above equations the results for temporal unknowns are obtained as follows,

α2(t) = −12

(20)

168 B. Tunga and M. Demiralp: Optimally Controlled Dynamics...

α1(t) = −iα(t) (21)

α(t) ≡∫ t

0

dτe−i(t−τ)E(τ ) (22)

A(t) = π− 14 e−i

t2+i 12

t0 dτα1(τ)2 (23)

The operators O and O′ are taken as follows

O ≡ x2, O′ ≡ − ∂2

∂x2(24)

This allows us to write

〈ψ(t)|O′|ψ(t)〉 ≡∫ ∞

∞dx ψ∗(x, t)

(− ∂2

∂x2

)ψ(x, t) (25)

〈ψ(t)|O′|ψ(t)〉 = −αc(t)2 +12

(26)

where

αc(t) ≡∫ t

0

dτ cos(t− τ )E(τ ) (27)

The costate function satisfies the following equation

i∂λ(x, t)∂t

= −12∂2λ(x, t)∂x2

+(

12x2 + E(t)x

)λ(x, t) −Wp(t)〈ψ(t)|O′|ψ(t)〉O′ψ(t) (28)

and we assume the following form for the solution

λ(x, t) ≡ (c0(t) + c1(t)x)ψ(x, t) (29)

This gives

c0(t)′ = i〈ψ(t)|O′|ψ(t)〉α(t)2 + c1(t)α(t) + i〈ψ(t)|O′|ψ(t)〉 − ic1(t) (30)

c1(t)′ = (2〈ψ(t)|O′|ψ(t)〉 + c0(t))α(t) (31)

E(t) = 1 +〈ψ(t)|O′|ψ(t)〉

c1(t)(32)

The equations (30) and (31) are accompanied by final conditions given at t = T . Therefore they describebackward evolution. Forward evolution is described by α1(t), α2(t) and A(t). The equation (32) providesconnection between these opposite direction evolutions. Solutions are obtained by iteration.

References

[1] M. Demiralp and H. Rabitz, Phys. Rev. A, 47, 809 (1993).[2] P. Gross, D. Neuhauser, and H. Rabitz, J. Chem. Phys.,98, 4557 (1993).[3] C.D. Schwieters and H. Rabitz, J. Phys.Chem. , 97, 8864 (1993).[4] M.Demiralp and H. Rabitz, J. Math. Chem., 16, 185 (1994).


Iterative Schemes with Extra Sup-steps for Implicit Runge-Kutta Methods

R. Vigneswaran ∗1

1 Department of Mathematics, Eastern University, Chenkalady, Sri Lanka


Various iterative schemes have been proposed to solve the non-linear equations arising in the implementationof implicit Rune-Kutta methods. In one scheme, when applied to an s-stage Runge-Kutta method, each stepof the iteration still requires s function evaluations but consists of r(> s) sub-steps. Improved convergencerate was obtained for the case r = s + 1 only. This scheme is investigated here for the case r = ks,k = 2, 3, · · · , and superlinear convergence is obtained in the limit k → ∞ .

1 Introduction

Consider the initial value problem

x′ = f(x(t)), x(t0) = x0, f : Rn → R

n. (1)

Let xr be an approximation to x(tr), tr = t0 + rh, r = 1, 2, 3, · · · . In a general s-stage implicitRunge-Kutta method xr+1 is obtained from xr in terms of y1, y2, · · · , ys by solving a set of sn equations

Y = X + h(A⊗ In)F (Y ) (2)

where X = xr ⊕ xr ⊕ · · · ⊕ xr ,Y = y1 ⊕ y2 ⊕ · · · ⊕ ys and F (Y ) = f(y1) ⊕ f(y2) ⊕ · · · ⊕ f(ys) arecolumn vectors and A⊗ In is the tensor product of the matrix A with n× n identity matrix In. In generalA⊗B = [aijB].

Equation (2) may be solved by a modified Newton iteration. Let J be the Jacobian of f evaluated atsome recent point xp, updated infrequently. The modified Newton scheme evaluates ∆1,∆2,∆3, · · · andhence Y 1, Y 2, Y 3, · · · , to satisfy

(Isn − hA⊗ J)∆m = D(Y m−1),Y m = Y m−1 + ∆m, m = 1, 2, 3, · · · , (3)

where D(Z) = X − Z + h(A⊗ In) so that D(Y ) = 0. Schemes have been developed, to solve equation(3), which use the fact that J is constant [1], [7], [8]. In other schemes advantage is taken of the specialforms of some implicit methods [2], [5], [6], [12].

In another approach, schemes based directly on iterative procedure have been developed [3], [9], [10],[13]. Cooper and Butcher [9] propose an iterative scheme

[Is ⊗ (In − hλJ)]Em = (B1S−1 ⊗ In)D(Y m−1) + (L1 ⊗ In)Em,

Y m = Y m−1 + (S ⊗ In)Em, m = 1, 2, 3, · · · , (4)

where B1 and S are real s × s non-singular matrices and L1 is strictly lower triangular matrix of order s,and λ is a real constant. Cooper and Vignesvaran [10] propose a more efficient scheme where expensive

∗ e-mail:[email protected], Phone: +94-65-40753, Fax: +94-65-40758


170 R. Vigneswaran: Iterative Schemes for Implicit Runge-Kutta Methods

vector transformations are avoided and Y m−1 and D(Y m−1) are updated after each sub-step completed.The rate of convergence of this scheme is examined when it is applied to the scalar test problem x′ = qx,with rapid convergence required for all z ∈ C− = z ∈ C|(z) ≤ 0, where z = hq. For this scheme, wehave

Y − Y m = SM1(z)S−1(Y − Y m−1), m = 1, 2, 3, · · · ,

where Y is the solution of D(Y ) = 0 and M1 is the iteration matrix given by

M1(z) = Is − [(1 − λz)Is − L1]−1B1(Is − zA), (5)

where A = S−1AS.

2 Schemes with Extra Sup-steps

Cooper and Vignesvaran [11] propose another scheme which is a generalization of the basic scheme (4).They consider the scheme

[Ir ⊗ (In − hλJ)]Em = (BS−1 ⊗ In)D(Y m−1) + (L⊗ In)Em,Y m = Y m−1 + (SR⊗ In)Em, m = 1, 2, 3, · · · , (6)

where B and RT are real r × s matrices (r > s), each of column rank s, and L is a strictly lowertriangular matrix of order r. In this scheme, Em = Em1 ⊕ Em2 ⊕ · · · ⊕ Emr is computed and then Y m =ym1 ⊕ ym2 ⊕ · · · ⊕ yms is computed in each step of the iteration.

The convergence rate of the scheme is examined when it is applied to the scalar problem x′ = qx, withrapid convergence required for all z ∈ C−. For the scheme (6)

Y − Y m = SM(z)S−1(Y − Y m−1), m = 1, 2, 3, · · · ,

and the rate of convergence depends on the spectral radius ρ[M(z)] of the matrix

M(z) = Is −R[(1 − λz)Ir − L]−1B(Is − zA). (7)

Cooper and Vignesvaran [11] consider the partitioned form of the parameter matrices as follows:

B =[

IsB12

]B11, L =

[L11 0L21 L22

], R = [Is, R12], (8)

where B11 is an s× s non-singular matrix with β = detB11 and, L11 and L22 are strictly lower triangularmatrix of order s and r − s respectively. Cooper and Vignesvaran [11] consider only the case r = s + 1and obtain the parameters for the scheme (6) by solving the problem

min maxz∈C−

ρ[M(z)], (9)

subject to the constraint that the iterative matrix M(z) has only one eigenvalue.

3 The Case r = ks

In this section, we consider the scheme (6) with r = ks, k = 2, 3, · · · · Parameter matrices in this schemeare chosen in appropriate way to obtain superlinear convergence in the limit k → ∞ when the scheme isapplied to the scalar test problem x′ = qx.


First we consider the case r = 2s. In this case, the parameter matrices B, L, R of the scheme (6) arepartitioned as

B =[IsB2

]B1, L =

[L1 0L21 L22

], R = [Is, R2], (10)

where, B1, L1 are the matrices of order s obtained from the basic scheme (4) and L22 is a strictly lowertriangular matrix of order s and R2 is an s× s matrix. These parameter matrices have to be chosen so thatρ[M(z)] = ρ[M1(z)]2. That is, these matrices should be chosen so that

Is −M = Is −M21 . (11)

This gives

R2[(1 − λz)Is − L22]−1L21 +B2[(1 − λz)Is − L1] = M1. (12)

With the choice R2 = Is, L22 = L1, the equations (5) and (12) give

B2 = Is −1λB1A, L21 = −B1

[Is −

1λA(Is − L1)

]. (13)

Next we consider the general case r = ks. In this case the parameter matricesB, L andR are partitionedas follows:

B =

ISB2

...Bk

B1, L =

L1 0 · · · 0L21 L22 · · · 0

......

Lk1 Lk2 · · · Lkk

, R = [Is, R2, · · · , Rk] (14)

where B1, L1 are obtained from the basic scheme (4) and B2, L21 are given by (13) with L22 = L1. Othersub-matrices are of order s and the matrices Lii, i = 3, 4, · · · , k are strictly lower triangular. We nowconsider how to make the choice of the matrices given by (14) in order to obtain ρ[M(z)] = ρ[M1(z)]k.One possible choice is

B =

ISB2

B22

...Bk−1

2

B1, L =

L1 0 · · · 0 0L21 L1 · · · 0 0B2L21 L21 · · · 0...

...Bk−2

2 L21 Bk−32 L21 · · · L21 L1

, R = [Is, Is, · · · , Is]. (15)

Let T = (1−λz)Is−L1. Then, by noting M = TM1T−1, we obtain from (5), (13) thatL21T

−1 = M−B2

and hence we obtain

[(1 − λz)Iks − L]−1 = (Ik ⊗ T−1)

Is 0 · · · 0 0M1 −B2 Is · · · 0 0M1(M1 −B2) M1 −B2 · · · 0 0...

......

...

M1

k−2(M −B2) Mk−3(M −B2) · · · M − B2 Is

.

172 R. Vigneswaran: Iterative Schemes for Implicit Runge-Kutta Methods

This implies that

R[(1 − λz)Iks − L]−1BB−11 T = T−1R

IsM −B2 +B2

M1(M1 −B2) + (M1 −B2)B2 +B22

...

M1

k−2(M1 − B2) + M1

k−3(M1 −B2) + · · ·Bk−2

2

T

= T−1(Is + M1 + · · · + M1

k−1)T

= Is +M1 +M21 + · · · +Mk−1

1 .

HenceR[(1 − λz)Iks − L]−1BB−1

1 T (Is −M1) = Is −Mk1 .

It follows from (5) and (7) thatIs −M = Is −Mk

1

and hence

ρ[M(z)] = ρ[M1(z)]k. (16)

This giveslimk→∞

ρ[M(z)] = 0 if ρ[M1(z)] < 1.

This result is investigated for the cases k = 2 and k = 3. The numerical results show that the use of extrasub-steps gives increased convergence rates.

References

[1] T.A.Bickart, An efficient solution process for implicit Runge-Kutta methods, SIAM J. Numer.Anal. (14) (1977),1022–1027.

[2] J.C.Butcher, On the implementation of implicit Rune-Kutta methods, BIT 16 (1976), 237–240.[3] J.C.Butcher, Some implementation schemes for implicit Runge-Kutta methods, Proceeding of the Dundee Con-

ference on Numerical Analysis 1979, Lecture Notes in Mathematics, Springer–Verlag, 773, 12–24.[4] J.C.Butcher, K.Burrage and F.H.Chipman, STRIDE: Stable Runge-Kutta integrator for differential equations,

Report Series No. 150 (1979), Dept. of Mathematics, Uni. of Auckland.[5] J.C.Butcher, J.R.Cash, Towards efficient Runge-Kutta methods for stiff systems, SIAM J.Numer. Anal. 19 (1990),

753–761.[6] J.R.Cash, On a class of implicit Runge-Kutta procedures, J. Inst. Maths. Applics. 19 (1977), 455–470.[7] F.H.Chipman, The implementation of Runge-Kutta implicit processes, BIT 18 (1973), 391–393.[8] A.G.Collings and G.J.Tee, An analysis of Euler and implicit Runge-Kutta numerical integration schemes for

structural dynamic problems, Proceeding of the Sixth Australasian Conference on the Mechanics of Structuresand Materials, 1977, 1, 147–154.

[9] G.J.Cooper and J.C.Butcher, An iteration scheme for implicit Runge-Kutta methods, IMA J. Numer. Anal. 3(1983), 127–140.

[10] G.J.Cooper and R. Vignesvaran, A scheme for the implementation of implicit Runge-Kutta methods, Computing45 (1990), 321–332.

[11] G.J.Cooper and R. Vignesvaran, Some schemes for the implementation of implicit Runge-Kutta methods, J. Comp.App. Math. 45 (1993), 213–225.

[12] W.H. Enright, Improving the efficiency of matrix operations in the numerical solution of ODEs, Technical Report98 (1976), Computer Science Dept., Univ. of Toronto.

[13] R. Frank and C.W. Ueberhuber, Iterated defect correction for the efficient solution of stiff systems of ordinarydifferential equations, BIT 18 (1977), 146–159.


Spatial Domain Fourier Description of Hand-written SignatureImages by use of Iterative Dilation

G. B. Wilson∗1

1 Department of Computer Science, Anglia Polytechnic University, East Rd, Cambridge CB1 1PT, UK


The electronic recognition of an individual on the basis of their hand-written signature is an important areaof biometric research. A purely image-based approach has received little attention because of the difficultyin identifying spatial features that are sufficiently sensitive to identify authorship. This work presents anovel image-based method which first segments the signature into one object from which its perimeter co-ordinates are extracted and a number of spatial properties are measured. Fourier descriptors are used toregenerate the shape and to isolate specific elements of shape. Spectral data from the individual harmonicssuggest that harmonics 5-15 are most useful for signature verification, encapsulated in a modified DescriptorCoefficient, DC(5-15). The three feature measures of Circularity, Roundness and DC(5-15) are invariant ofshape-size and orientation and are shown to increase in sensitivity in that order to signature authorship.

1 Introduction

Although Optical Character Recognition (OCR) software is available for printed and written text there isstill much work to be done on the authentication of hand-written signatures by electronic means (Goder,1999). Most success has been achieved by measurement of time-dependant properties of the signature,such as pen-up time, velocity and acceleration information (Plamondon and Lorette, 1989). A secondapproach is to measure only the spatial properties of the signature image (such as the change in curvatureof a letter, the ratio of signature length-to-height) after it is written (Qi and Hunt, 1994). Whilst most workhas adopted a mixed approach, a purely spatial approach is less popular probably because of the difficultyin identifying a group of spatial feature measures that are sufficiently sensitive enough to be used forclassification purposes. The present work aims to show that a purely spatial approach is worthy of furtherstudy based on features measured from a single shape obtained by iteratively dilating the signature imageuntil one object (an ink-blot) is constructed. The perimeter co-ordinates of the object are then extractedfrom which a range of feature measures are obtained describing the shape.

2 Feature measures and descriptors

Simple feature measures of a shape in a digital image are based on area, perimeter length, long axis versusshort axis and combinations of such parameters, such that the feature measure is encapsulated in one value.Two such measures were utilized for this work, namely Circularity ( shortaxis/longaxis ) and Roundness( 4πarea/perimeter2 ), these being independent of object size and orientation. However they are onlyuseful in classifying shape objects if the shapes are regular within narrow limits. Generally speaking, themore complex and greater the acceptable limits of variability that will define a shape (such as a signature),the more complex will be the feature measure(s) required to effectively classify that shape. Such complex



174 G.B. Wilson: Fourier Description of Hand-written Signature Images

pixel line vs. width

binary image

50 100 150 200

10

20

30

40

50

60

70

80

pixel line vs. width

segmentation: D=7

50 100 150 200

10

20

30

40

50

60

70

80

Fig. 1 Comparison of original and dilated signature image after binary thresholding

measures are referred to as descriptors, because they are usually boundary-based and describe the wholeshape rather than some component of it.

One descriptor that has proved successful with respect to describing individual letter characters is theFourier descriptor (Lord and Wilson, 1984). However the potential use of a Fourier analysis in the spa-tial domain to verify an image-signature has received scant attention, despite its established success inclassifying some objects in astronomy (Eppler et. al., 1983) and in microscopy (Dowdeswell, 1982).

In their study of shape description, Eppler et. al., (1983) use a rotating vector (phaser) representationof the Fourier series. Constants (or Fourier descriptors) can be derived and provide a means of quantifyingand potentially classifying a shape. In order to consolidate the descriptor information into a single termErlich and Weinberg (1970) suggest use of a Descriptor Coefficient (DC) that describes the roughness ofthe shape boundary. It is only necessary to evaluate the transform over the first 20 harmonics, since higherharmonic contributions to shape are likely to be of little significance.

3 Image capture and pre-processing

Three individuals submitted several hand-written signatures on paper which was then scanned at 100 d.p.i.,256-greyscale and saved as bitmap files. Each image was then cropped to a standard size and thresholdedto a binary image using a standard method (Gonzalez and Woods, 1993). A 3x3 dilation filter was thenrepeatedly applied to each image until the signature merged into one object. In order to automate thisprocess and exclude user bias a segmentation algorithm was implemented which involves repeated traversesof the image, each traverse testing for object-connectivity and dilating the image where more than oneobject was present (Fig. 1). Cartesian perimeter co-ordinates of the object were then extracted by use of anedge-following routine (Pitas, 1993) which were then directly used to determine Circularity and Roundnessand some other feature measures. For the Fourier analysis the cartesian co-ordinates were first transformedinto polar co-ordinates referencing the shapes centroid.

4 Results and discussion

Fig. 2 (left) shows the phaser-regenerated signature shape of Fig. 1 (right) from the summation of the first20 harmonics. The summated radii for a particular direction will augment or retard that radii dependent onthe individual harmonic amplitudes. The broader aspects of shape are described by the lower harmonicsand Fig. 2 (right) for example shows the contribution of harmonic 2 which describes elongation. Thehigher harmonics reflect the surface detail rather than the broader elements of shape.

An important consideration is to what extent signature information will be lost or obscured by thedilation and transformation method. Note that the transform only uses single-valued functions. This meansthat if any substantial embayments are present on the shape boundary such that a straight line from thecentroid intersects a boundary twice, only one can be processed. Strategies can be adopted to deal with


2

4

30

210

60

240

90

270

120

300

150

330

180 0

0.5

1

30

210

60

240

90

270

120

300

150

330

180 0

Fig. 2 Phasers showing the regenerated object-merged signature-shape (left) and the contribution of harmonic 2 tothat shape (right)

0 5 10 15 200

0.2

0.4

0.6

0.8

1

Harmonic number

Mea

n A

bsol

ute

Rn/

Ro

author1: dash

author2: solid

author3: dotted

Fig. 3 Amplitude spectra for harmonics 0-20 for three authors of signatures

this - in the present work the second boundary of any double-valued points of contact were ignored, henceclipping and smoothing the regenerated shape in that vicinity. Fig. 3 shows the averaged amplitude spectrafor the three signatories. The greatest separation occurs between harmonics 5 and 15 and would indicatethat a Descriptor Coefficient derived only over this harmonic range would be more sensitive to authorship.In fact of the three invariant measures of shape used in this work, Circularity, Roundness and DC(5-15)increase in that order to sensitivity of signature authorship. This is demonstrated by the separation intothree groups in 3D-space of all the raw feature and descriptor data (Fig. 4). A classifier such as a neuralnet would be able to verify signature authorship on this basis. Whilst a classification scheme is likely towork for a small group of signatures, it remains to be seen if it can be practically scaled-up to a larger

176 G.B. Wilson: Fourier Description of Hand-written Signature Images

00.2

0.4

0

0.50

0.1

0.2

0.3

0.4

0.5

RoundnessCircularity

DC

(5−

15)

author1: circleauthor2: squareauthor3: cross

Fig. 4 Feature and descriptor measures expressed on a 3D-plot

implementation which can discriminate from many signatures and for an acceptable cost in computingresources. This work demonstrates that a purely spatial domain approach to signature verification is worthyof further study.

References

[1] J. A. Dowdeswell, Scanning Electron Micrographs of quartz sand grains from cold environments using Fouriershape analysis, in: Handwritten Alphabetic Character recognition by the Neocognitron, (IEEE Trans. NeuralNetworks, 1991) 2, 355–365.

[2] R. Erlich and B. Weinberg, An exact method for the characterization of grain shape, J. Sed. Pet. 40, 205–212(1970).

[3] D. T. Eppler, R. Ehrlich, D. Nummedal and P. H. Schultz, Sources of shape variation in lunar impact craters:Fourier shape analysis, Geol. Soc. Am. Bull, 94, 274–291 (1983).

[4] P. D. Goder, P. D. Lexicon-driven handwritten word recognition. In: Electronic Imaging Technology (SPIEOptical Engineering Press, 1999), chap. 9, pp. 317-347.

[5] R. C. Gonzalez and R. E. Woods, Digital Image Processing (Addison-Wesley, 1993), pp716.[6] E. A. Lord and C. B. Wilson, The mathematical description of shape and form, (John Wiley and Sons, 1994),

pp260.[7] I. Pitas, Digital Image Processing Algorithms, (Prentice Hall, 1993), pp362.[8] R. Plamondon, and G. Lorette, Automatic Signature Verification and Writer Identification - The State of the Art.

Pattern Recognition. 22, 107–131 (1989).[9] Y. Qi, and B. R. Hunt. Signature Verification using Global and Grid Features. Pattern Recognition, 27, 1621–1629

(1994).


A High Dimensional Model Representation Approximation ofan Evolution Operator with a First-order Partial DifferentialOperator Argument

Irem Yaman∗1 and Metin Demiralp∗∗1



In this work, a novel approximation scheme based on quite a new representation, High Dimensional ModelRepresentation, is proposed for evolution operators. The approximation is not developed at the operatorlevel. Instead, the effect of the operator on an arbitrary function is approximated. The approximation canbe constructed for any order of multivariance, however for the present constant, univariate and bivariatecomponents of HDMR are considered.

1 Introduction

We consider the following evolution operator

Q ≡ etL, L ≡m∑i=1

ϕi(x1, · · · , xm)∂

∂xi(1)

If g is assumed to be an infinitely differentiable function of independent variables x1, · · · , xN we canconsider the following spatially and temporally varying function [ 1, 4 ].

f(x1, · · · , xN , t) = Qg(x1, · · · , xN ) (2)

Temporal differentiation of both sides of this equation yields the following partial differential equation[∂

∂t− ϕ1(x1, · · · , xN )

∂

∂x1− · · · − ϕN (x1, · · · , xN )

∂

∂xN

]f(x1, · · · , xN , t) = 0 (3)

The initial condition which accompanies equation (3) is considered to be

f(x1, · · · , xN , 0) = g(x1, · · · , xN ) (4)

2 High Dimensional Model Representation Approach

The following High Dimensional Model Representation (HDMR) of the function f is given through mul-tivariance ordering expansion [ 5, 6 ].

f(x1, · · · , xN , t) = f0(t) +N∑i=1

fi(xi, t) +N∑

i1,i2=1i1<i2

fi1i2(xi1 , xi2 , t) + · · · (5)



178 I. Yaman and M. Demiralp: A High Dimensional Model Representation...

The functions on the right hand side of this equality are the orthogonal components of the original function.The orthogonality condition is defined over an inner product and square of the original function, and theright hand side components are assumed to be square integrable functions.∫ bis

ais

dαisWis(αis)fi1i2...iN (α1, ..., αN ) = 0, 1 ≤ s ≤ N (6)

The univariate weight functions are chosen as∫ bis

ais

dαiWi(αi) = 1 (7)

Substituting the form given by (5) in equation (3) the following is obtained.

df0(t)dt

+∂

∂t

N∑i=1

fi(xi, t) −N∑i=1

ϕi(x1, · · · , xN )∂

∂xifi(xi, t)+

+∂

∂t

N∑i1,i2i1<i2

fi1i2(xi1 , xi2 , t) −N∑i=1

ϕi(x1, · · · , xN )∂

∂xi

N∑i1,i2i1<i2

fi1i2(xi1 , xi2 , t), · · · = 0 (8)

Multiplying both sides of this equation with the product of the weight functions which we chooseas Wi(xi) = αie−αixi and integrating over the range [0,∞) and using orthogonality properties of thecomponents; the following formula is obtained.

df0(t)dt

−∫ ∞

0

· · ·∫ ∞

0

N∑i=1

ϕi(x1, · · · , xN )∂

∂xifi(xi, t)

N∏j=1

αje−αjxjdxj − · · · = 0 (9)

In this work it is assumed that the ϕ functions have the following quadratic form.

ϕi(x1, · · · , xN ) =N∑j=1

N∑k=j

b(i)jkxjxk (10)

Insertion of this equality into equation (9) enables us to write

df0(t)dt −

∑Ni=1

[b(i)ii ρi(t) +

(∑i−1j=1

b(i)ji

αj+∑Nj=i+1

b(i)ij

αj

)βi(t)+(∑N

j=1j =i

2b(i)jj

αj2 +∑N−1

j=1j =i

1αj

(∑Nk=j+1

k =i

b(i)jk

αk

))γi(t)

]+ · · · = 0 (11)

where

ρi(t) =∫ ∞

0

xi2 ∂

∂xifi(xi, t)αie−αixidxi (12)

βi(t) =∫ ∞

0

xi∂


γi(t) =∫ ∞

0

∂



The initial condition which accompanies equation(11) is obtained by applying HDMR to the initial condi-tion of (4).

f0(0) = g0 (15)

where g0 is the constant HDMR component of g(x1, · · · , xN ).We can approximately solve HDMR equations by omitting high variate terms. If we define zeroth

order approximation as the case where only constant contributions are retained then the partial differentialequation, involving the function f (0)

0 becomes

df(0)0 (t)dt

= 0 (16)

Using equality (15), the solution of equation (16) is obtained as

f(0)0 (t) = g0 (17)

To obtain the first order HDMR terms which omits bivariate and higher terms are omitted. Both sides ofequation (8) are integrated (N-1) times excluding the integration over the variable αk (where 1 ≤ k ≤ N ).This gives the following equations for univariate terms after neglecting bivariate and higher terms andreplacingdf0(t)dt with its counterpart in (9).

∂

∂tfk(xk, t) −

∂

∂xkfk(xk, t)

(A0 +A1x+A2x

2)

= b0(t) + b1(t)x+ b2(t)x2 (18)

where

A0 =N∑

j=1j =k

2b(k)jj

αj2+N−1∑j=1j =k

1αj

N∑l=j+1

l =k

b(k)jl

αl

(19)

A1 =k−1∑j=1

b(k)jk

αj+

N∑j=k+1

b(k)kj

αj(20)

A2 = b(k)kk (21)

b0(t) = −A2ρk(t) −A1βk(t) −A0γk(t) (22)

b1(t) =N∑

i=1i=k

b(i)ik xkβi(t) +

k−1∑j=1

b(k)jk

αj+

N∑j=k+1

b(k)kj

αj

γi(t)

(23)

and

b2(t) =N∑

i=1i=k

b(i)kkγi(t) (24)

Equation (18) is a first order partial differential and integral equation with known parameters. Theaccompanying initial condition can be obtained from the univariate HDMR components of the equation(4) and contains univariate HDMR components of g. This solution can be fed into the equation (9) afterreplacing f0 by f (1)

0 , f0’s first order approximation, to find the equation for constant component. Then theresulting equation is integrated. Therefore it becomes possible to approximate the effect of an evolutionoperator on any given function. That is, what we have done here is not to approximate an evolution operatoritself but its image on a given function .

180 I. Yaman and M. Demiralp: A High Dimensional Model Representation...

References

[1] M. Demiralp and H. Rabitz, Lie Algebraic Factorization of Multivariable Evolution Operators: Definition andthe Solution of the Canonical Problem, Int. J. Eng. S., 31, 307 (1993).

[2] M. Demiralp and H. Rabitz, Lie Algebraic Factorization of Multivariable Evolution Operators: ConvergenceTheorems for the Canonical Case, Int. J. Eng. S., 31, 333 (1993).

[3] M. Demiralp and H. Rabitz, Factorization of Certain Evolution Operators Using Lie Operator Algebra: Conver-gence Theorems, J.Math. Chem., 6, 193 (1991).

[4] M. Demiralp and H. Rabitz, Factorization of Certain Evolution Operators Using Lie Algebra: Formulation of theMethod, J.Math. Chem., 6, 164 (1991).

[5] I. M. Sobol, Sensitivity Estimates for Nonlinear Mathematical Models, MMCE, 1, No.4.407 (1993).[6] H. Rabitz and O. Alıs, J. Math. Chem., 25, 197 (1999).


A Comparison of the Model Order Reduction Techniques forLinear Systems arising from VLSI Interconnection Simulation

E. Fatih Yetkin∗1 and Hasan Dag1

1 Istanbul Technical University, Informatics Institute, Computational Science and Engineering,


In recent years, model order reduction scheme for linear systems has become very popular for computeraided design of the systems. In this study, we analyze the existent methods of model order reduction tech-niques used in Very Large Scale Integrated (VLSI) circuit interconnection modelling in terms of numericalstability, computational speed and accuracy.

1 Introduction

Today’s sub-micron integrated circuits are formed using millions of semiconductor components on a singlechip. To realize the circuit functions, these components are combined with some interconnections madefrom conductors. Due to high frequency used, these interconnections create some parasitic circuits arisingfrom their physical and chemical properties, and their geometries [1]. In the last decade the modellingand simulation of the parasitic circuits has become one of the most important research areas in Very LargeScale Integrated (VLSI) circuit simulation studies. The VLSI circuit simulation is based on the numericalsolution of very large, sparse, in general nonlinear, systems of time-dependent differential algebraic equa-tions (DAE). After the addition of sub-circuit effects arising from interconnections the problem is gettingbigger and harder. To solve these problems we take advantages of the separable nature of the nonlinearpart of the equations and the linear interconnection circuits equations.

The linear model order reduction techniques used in VLSI interconnection were triggered by this sepa-ration [2].

The basic approach of the model order reduction techniques is to build a new DAE system from theoriginal DAE system. The new reduced system must have all the dominant properties of the originalsystem at hand and must have a lower order for faster solution [3]. The accuracy, the order of reductionand the speed are the basic attributes of a successful model order reduction algorithm. In this study wecompare linear model order reduction methods developed up to date from the perspective of numericalstability, accuracy, speed and ease of use.

The remainder of the paper is organized as follows. In section 2, we define the problem that we areinterested in. In section 3, we introduce the model order applications like implicit and explicit momentmatching techniques, model order reduction methods based on system gramians and model order reductionon time domain with polynomial approximations. In section 4, we make some concluding remarks.



182 E.F. Yetkin and H. Dag: A Comparison of the Model Order Reduction Techniques

2 Definition of the Problem

Assume that the state equations of a linear time-invariant multi input multi output (MIMO) dynamic sys-tems are;

Cx = Gx+Bu(t)

y = LTx (1)

where, G,C ∈ RNxN are system matrices, x(t) ∈ RNx1 is a state vector, u(t) ∈ Rmx1 is the inputexcitation vector, y(t) ∈ Rpx1 is the output vector, B ∈ RNxm and L ∈ RNxp are input and outputdistribution arrays respectively. N is the state space dimension and m and p are the input and outputnumbers. The matrices C and G are allowed to be singular but the matrix pencil G + sC is singular onlyat finite number of values s ∈ C. The reduced order model of (2) is given below.

Cnx = Gnx+Bnu(t)

y = LTnx (2)

In (2) the state space order is n < N . In the frequency domain the input output relation of the systemis the Laplace Transform of the time domain transfer function h(t). The Laplace transformation is definedbelow.

Definition 2.1 F (s) =∫∞0dte−stf(t)

The frequency domain transfer function is shown as

H(s) = LT (G+ sC)−1B, s ∈ C. (3)

where H(s) is a matrix valued function called system transfer function. In this study nearly all methods usefrequency domain and their aims are to obtain a reduced order system transfer function of the form below

Hn(s) = LTn (Gn + sCn)−1Bn, s ∈ C (4)

The transfer function can also be shown in pole-residue form. The zeros of the denominator of thetransfer function are named as poles of the system and the zeros of the nominator are named as zeros of thesystem. They are playing a very important role in system theory [4].

H(s) = K(z1 + s)(z2 + s) · · · (zm + s)(p1 + s)(p2 + s) · · · (pn + s)

(5)

The transfer function have some important properties to determine the system behavior. In model orderreduction perspective, there are two basic properties which must be preserved in reduction cycle: Thepassivity and the stability of the system. Their definitions can be found in [5]. A stable system guaranteesa bounded response to a bounded input. Preservation of the passivity and the stability of transfer functionare the most important criteria for creation of the reduced order models [6] .

Another important concept in model ordering is the moment definition. In Definition 2.1, we couldexpand the e−st operator in power series like Taylor series with s=0. If we do this, the transfer functioncould be written in the form below.

H(s) =∞∑k=0

(−1)k

k!sk∫ ∞

0

tkh(t)dt (6)

In (6), qth moment of the system is defined by

mq ≡(−1)q

q!

∫ ∞

0

tqh(t)dt (7)

A lot of model order reduction techniques are based on the matching the system moments to a lowerorder system. It’s proved that the matching first 2k moments is sufficient to build the k-order system fromoriginal system [7].


3 Methods of Model Order Reduction

Asymptotic Waveform Evaluation (AWE) is a method to build system moment matrix and use some explicitmethods to match the first 2k moments of system to obtain a reduced order system [8]. AWE is a veryuseful method but it has some restrictions. It is numerically unstable for higher order models. It hassome limitations because of the ill-conditioned structure of the moment matrix. Therefore there is someinsufficiencies to reach to a desired accuracy for large systems [9]. AWE uses the moment matrix to matchmoments. To avoid matching moments with the ill-conditioned moment matrix, some implicit methodswere developed.

These implicit moment matching methods are mainly based on Krylov subspace techniques. One ofthem is the Pade via Lanczos (PVL) method [10]. PVL uses the well-known relation between Pade ap-proximation and Lanczos tridiagonalization process [11]. Although it is a numerically stable algorithm, ithas some problems in preserving the passivity of the system. So some extensions were made to increasethe correctness of the PVL algorithm. SyPVL is the improved version of the PVL algorithm [12]. More-over PVL is a method for Single Input Single Output ( SISO ) systems. To solve MIMO systems, MatrixPVL ( MPVL ) was developed [13]. The other approach of Krylov subspace techniques on model reduc-tion problem is the usage of Arnoldi algorithm [14]. Arnoldi algorithm can also be used to obtain SISOapproximation and Block Arnoldi method for MIMO reduced order models [15].

All of moment matching techniques have great advantages from the point of computational cost. Alsosome of them have great accuracy and numerical stability. But they have some insufficiencies for relationsbetween the model order and error. There isn’t any exact relation between the error of the reduced ordermodel and reduction order. To avoid this problem, Truncated Balanced Reduction (TBR) method can beused. This method is using the system observability gramian (WO ) and the system controllability gramian( WC ). They are defined as the solution of two Lyapuanov equations

−GWCCT − CWCG

T +BBT = 0 (8)

−GTWOC − CTWOG+ LLT = 0 (9)

The observation gramian, WO, can be factorized as, WO = RTR. Using this result the singular valuesof the matrix, RWCR

T , are obtained and then these singular values are used for building a balancing trans-formation to create the reduced order system. Although Truncated Balanced Reduction has an exact errorbound, it is computationally inefficient [16]. Because it needs to solve two large size Lyapuanov equa-tions and one large size singular value decomposition. Some techniques have been developed based on theSmith method [17] and the Alternating-Direction Implicit ( ADI ) method [18] to reduce the computationaleffort of the problem. These two methods solve the original large size Lypaunov equations approximately.Vector-ADI methods are also developed to produce approximate Gramians eigenspace directly to design aprojector for model reduction scheme [19]. Krylov subspace methods with TBR algorithms have relativelybetter computational time complexities [20].

There are some time domain model order reduction algorithms using polynomials. They are based onfinding an approximation of the impulse response of the system with polynomials such as Chebyshev orLaguerre [21, 22].

4 Concluding Remarks

In this work, we propose an extensive comparison of the reduced order modelling techniques arising fromVLSI interconnection circuit simulation. Our comparison criteria will be based on accuracy, numericalstability, preservation of the passivity and computational speed. To realize this goal, we will use somebenchmark data and develop a software that contains basic methods described in this paper, as an option.After developing the software we hope that it will be a useful tool for building the reduced order modelsnot only for circuit simulation but also for other dynamical systems.

184 E.F. Yetkin and H. Dag: A Comparison of the Model Order Reduction Techniques

References

[1] N. P. van der Meijs, T. Smedes, Accurate Interconnect Modeling: Towards Multi-million Transistor Chips AsMicrowave Circuits. Int. Conf. On CAD, Proc. of ICCAD’96 p. 244-251, 1996.

[2] R. W. Freund, Krylov Subspace Iterations for Reduced-Order Modeling in VLSI Circuit Simulation,Tech. Report,Bell Labarotaries, Murray Hill.

[3] D. Skoogh, Krylov Subspace Methods for Linear Systems, Eigenvalues and Model Order Reduction, Tech. Rep.,Department of Mathematics, Goteborg, 1997.

[4] J. Vlach and K. Singhal, Computer Methods for Circuit Analysis and Design, Van Nostrand Reinhold, New York,second edition, 1993.

[5] Z. Bai, Krylov subspace techniques for reduced-order modeling of large-scale dynamical systems, Applied Nu-merical Mathematics, Elsevier, 43, p. 9-44, 2002.

[6] Z. Bai, P. Feldmann, R. W. Freund, How to Make theoretically passive reduced-order models passive in practice.Proc. IEEE 1998 Custom Int. Circ. Conf. p. 207-210.

[7] L. T. Pillage, R. A. Rohrer, C. Visweswariah, Electronic Circuit and System Simulation Methods, McGraw-Hill,New York, 1995.

[8] L. T. Pillage, R. A. Rohrer, Asymptotic Waveform Evaluation for Timing Analysis, IEEE Trans. on CAD, 9, pp352-366, 1990.

[9] Z. Bai, P. M. Dewilde, R. W. Freund, Reduced Order Modeling, Tech. Report, Bell Labarotaries, Murray Hill,New Jersey, 2002.

[10] P. Feldmann, R. W. Freund, Efficient linear circuit analysis by Pade approximation via the Lanczos process, IEEETrans. Computer-Aided Design, vol. 14, p. 639-649.

[11] P. V. Dooren, The Lanczos algorithm and Pade approximations, Notes, Dept. Math. Eng. Universite Catholiquede Louvain, 1995.

[12] R. W. Freund, P. Feldmann Reduced-order modeling of large passive linear circuits by means of the SyPVLalgorithm. Tech. Dig. 1996 IEEE/ACM Int. Conf. on CAD, p. 280-287.

[13] R. W. Freund, P. Feldmann, Reduced-Order Modeling of Large Linear Passive Multi-terminal Circuits UsingMatrix-Pade Approximation, Tech. Report, Bell Labarotaries, Murray Hill, New Jersey, 1997.

[14] E. J. Grimme, Krylov Projection Methods for Model Reduction, Ph.D. Thesis, University of Illionis, 1997.[15] A. Odabasıoglu, M. Celik, L. T. Pillegi, PRIMA: passive reduced-order interconnect macro-modelling algorithm.

Tech. Dig. 1997 IEEE/ACM Int. Conf. on CAD, p. 58-65.[16] R. Li, Model Reduction of Large Linear Systems via Low Rank System Gramians, Ph.D. thesis, M. I. T. (2000).[17] R. A. Smith, Matrix Equation XA + BX = C., SIAM Journal on Applied Mathematics, 16(1): 198-201. 1968.[18] A. Lu, B. L. Wachspress. Solution of Lyapunov equations by alternating direction implicit iteration. Computers

MAth. Appl. 21(9):43-58, 1991.[19] J. Li, J. White, Efficient Model Reduction of Interconnect via Approximate System Gramians., IEEE, 1999.[20] Q. Su, V. Balakrishnan, C. K. Koh, Efficient Approximate Balanced Truncation of General Large-Scale RLC

systems via Krylov Methods, Proc. of VLSID, 2002.[21] J. M. Wang, E. Kuh, Passive model order reduction algorithm based on chebyshev expansion of impulse response

of interconnect networks. In Proc. Design Automation Conf., p. 520-525, June 2000.[22] Y. Chen, V. Balakrishnan, C.K. Koh, K.Roy, Model Reduction in the Time-Domain using Laguerre Polynomials

and Krylov Methods, Proc. Of Design, Automation and Test in Europe Conf., 2002.

Documents

NACoM-2003 Extended Abstracts