The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol1-2009.pdf · The Japan Society for Industrial and Applied Mathematics Vol.1 (2009)

The Japan Society for Industrial and Applied Mathematics

Vol.1 (2009) pp.1-79


Vol.1 (2009) pp.1-79

Editorial Board Chief Editor Yoshimasa Nakamura (Kyoto University)

Vice-Chief Editor Kazuo Kishimoto (Tsukuba University)

Associate Editors Reiji Suda (University of Tokyo) Satoshi Tsujimoto (Kyoto University) Masashi Iwasaki (Kyoto Prefectural University) Norikazu Saito (University of Tokyo) Koh-ichi Nagao (Kanto Gakuin University) Koichi Kato (Japan Institute for Pacific Studies) Saburo Kakei (Rikkyo University) Atsushi Nagai (Nihon University) Takeshi Mandai (Osaka Electro-Communication University) Ryuichi Ashino (Osaka Kyoiku University) Ken Umeno (NiCT) Yuzuru Sato (Hokkaido University) Daisuke Takahashi (Waseda University) Katsuhiro Nishinari (University of Tokyo) Hitoshi Imai (University of Tokushima) Nobito Yamamoto (University of Electro-Communications) Takahiro Katagiri (University of Tokyo) Tetsuya Sakurai (Tsukuba University) Yoshitaka Watanabe (Kyushu University) Takeshi Ogita (Tokyo Woman's Christian University) Takashi Suzuki (Osaka University) Yoshihiro Shikata Tatsuo Oyama (National Graduate Institute for Policy Studies) Tetsuo Ichimori (Osaka Institute of Technology) Masami Hagiya (University of Tokyo) Yasuyuki Tsukada (NTT Communication Science Laboratories) Hideyuki Azegami (Nagoya University) Kenji Shirota (Ibaraki University) Naoyuki Ishimura (Hitotsubashi University) Jiro Akahori (Ritsumeikan University) Ken Nakamula (Tokyo Metropolitan University) Miho Aoki (Okayama University of Science) Keiko Imai (Chuo University) Ichiro Kataoka (HITACHI) Shin-Ichi Nakano (Gunma University) Maiko Shigeno (Tsukuba University) Ichiro Hagiwara (Tokyo Institute of Technology) Fumiko Sugiyama (Kyoto University)

Contents

On a discrete optimal velocity model and its continuous and ultradiscrete relatives ・・・ 1-4 Daisuke Takahashi and Junta Matsukidaira

Numerical Inclusion of Optimum Point for Linear Programming ・・・ 5-8 Shin'ichi Oishi and Kunio Tanabe

2D tight framelets with orientation selectivity suggested by vision science ・・・ 9-12 Hitoshi Arai and Shinobu Arai

Analysis of Neuronal Dendrite Patterns Using Eigenvalues of Graph Laplacians ・・・ 13-16 Naoki Saito and Ernest Woei

The Gateau derivative of cost functions in the optimal shape problems and the existence of the shape derivatives of solutions of the Stokes problems

・・・ 17-20

Satoshi Kaizu

On very accurate verification of solutions for boundary value problems by using spectral methods

・・・ 21-24

Mitsuhiro T. Nakao and Takehiko Kinoshita

On oscillatory solutions of the ultradiscrete Sine-Gordon equation ・・・ 25-27 Shin Isojima and Junkichi Satsuma

Computational and Symbolic Anonymity in an Unbounded Network ・・・ 28-31 Hubert Comon-Lundh, Yusuke Kawamoto and Hideki Sakurada

Reformulation of the Anderson method using singular value decomposition for stable convergence in self-consistent calculations

・・・ 32-35

Akitaka Sawamura

On the qd-type discrete hungry Lotka-Volterra system and its application to the matrix eigenvalue algorithm

・・・ 36-39

Akiko Fukuda, Emiko Ishiwata, Masashi Iwasaki and Yoshimasa Nakamura

Eigendecomposition algorithms solving sequentially quadratic systems by Newton method

・・・ 40-43

Koichi Kondo, Shinji Yasukouchi and Masashi Iwasaki

Block BiCGGR: a new Block Krylov subspace method for computing high accuracy solutions

・・・ 44-47

Hiroto Tadano, Tetsuya Sakurai and Yoshinobu Kuramashi

On parallelism of the I-SVD algorithm with a multi-core processor ・・・ 48-51 Hiroki Toyokawa, Kinji Kimura, Masami Takata and Yoshimasa Nakamura

A numerical method for nonlinear eigenvalue problems using contour integrals ・・・ 52-55 Junko Asakura, Tetsuya Sakurai, Hiroto Tadano, Tsutomu Ikegami and Kinji Kimura

Differential qd algorithm for totally nonnegative band matrices: convergence properties and error analysis

・・・ 56-59

Yusaku Yamamoto and Takeshi Fukaya

Algorithm for computing Kronecker basis ・・・ 60-63 Yoshiaki Kakinuma, Kazuyuki Hiraoka, Hiroki Hashiguchi, Yutaka Kuwajima and Takaomi Shigehara

Robust exponential hedging in a Brownian setting ・・・ 64-67 Keita Owari

A hybrid of the optimal velocity and the slow-to-start models and its ultradiscretization ・・・ 68-71 Kazuhito Oguma and Hideaki Ujino

A new compressible fluid model for traffic flow with density-dependent reaction time of drivers

・・・ 72-75

Akiyasu Tomoeda, Daisuke Shamoto, Ryosuke Nishi, Kazumichi Ohtsuka and Katsuhiro Nishinari

Error analysis for a matrix pencil of Hankel matrices with perturbed complex moments ・・・ 76-79 Tetsuya Sakurai, Junko Asakura, Hiroto Tadano and Tsutomu Ikegami

JSIAM Letters Vol.1 (2009) pp.1–4 c©2009 Japan Society for Industrial and Applied Mathematics

On a discrete optimal velocity model and its continuous

and ultradiscrete relatives

Daisuke Takahashi1 and Junta Matsukidaira2

Department of Applied Mathematics, Waseda University, 3-4-1, Okubo, Shinjuku-ku, Tokyo169-8555, Japan1

Department of Applied Mathematics and Informatics, Ryukoku University, Seta, Ohtsu, Shiga520-2194, Japan2

E-mail [email protected], [email protected]

Received August 29, 2008, Accepted October 5, 2008 (INVITED PAPER)

Abstract

We propose a discrete traffic flow model with discrete time. Continuum limit of this model isequivalent to the optimal velocity model. It has also an ultradiscrete limit and a piecewise-linear type of traffic flow model is obtained. Both models show phase transition from freeflow to jam in a fundamental diagram. Moreover, the ultradiscrete model includes the Fukui–Ishibashi model in a special case.

Keywords optimal velocity model, discrete model, ultradiscretization

Research Activity Group Applied Integrable Systems

1. Introduction

There are various models of different levels of discrete-ness to analyze the traffic congestion [1]. Macroscopicmodel is defined by a partial differential equation basedon fluid dynamics and it describes a traffic flow by themotion of continuous media. For example, Musha andHiguchi used the Burgers equation to describe a fluctu-ation of traffic flow [2].

System of ordinary differential equations (ODEs),coupled map lattice (CML) and cellular automaton (CA)are often used as microscopic model to describe each ve-hicle motion directly. About ODE models, time t andvehicle position x are continuous, and vehicle num-ber k is discrete. CML is similar to ODE but timeis discretized [3]. All dependent and independent vari-ables are discrete for CA models. For example, Nagel–Schreckenberg model [4], elementary CA of rule number184 (ECA184) [5], Fukui–Ishibashi (FI) model [6] andslow-start model [7] are known as effective traffic model.Though evolution rule of CA model is simple due toits discreteness, a mechanism of congestion formation ispresented sharply.

Bando et al. proposed a noticeable ODE model [8].The model is now called ‘optimal velocity model’ (OVmodel) and is defined as follows. Assume a finite numberof vehicles moving on a one-way circuit of single lane asshown in Fig. 1. The length of the circuit is L and totalnumber of vehicles is K. Introduce a one-dimensionalcoordinate along the circuit with an appropriate origin.Define xk(t) by a position of vehicle with vehicle numberk (k = 1, 2, · · · , K) at time t. The vehicle number isgiven sequentially to each vehicle as the preceding onehas a larger number. Note that the preceding vehicle ofk = K is k = 1. Then the evolution equation on xk(t) is

xk = AV (xk+1 − xk)− xk, (1)

x1

x2

x3

xK

xK!1

Fig. 1. Circuit and vehicles.

where A is a constant representing a driver’s sensitivityand V (∆x) is an optimal velocity representing a desiredvelocity of a driver with a distance ∆x between his vehi-cle and the vehicle ahead. The acceleration of kth vehicleis determined by (1) and is proportional to the differencebetween its optimal velocity and its current real velocityxk.

The typical profile of optimal velocity is shown inFig. 2. This profile reflects a driver’s behavior; if thedistance from the vehicle ahead is short (long), he wantsto keep low (high) speed. When the distance becomeslong enough, he wants to keep a speed limit of the road.The results obtained by the optimal velocity model agreewith real traffic data well.

Nishinari and Takahashi reported an interesting re-lation between the Burgers equation and ECA184 [9].They proposed a difference equation called ‘discreteBurgers equation’ and showed that the Burgers equa-tion and ECA184 were obtained by continuum and ul-tradiscrete limit respectively from the discrete Burgers

– 1 –

JSIAM Letters Vol. 1 (2009) pp.1–4 Daisuke Takahashi and Junta Matsukidaira

x0

Vmax

V

Fig. 2. Typical profile of optimal velocity.

equation. Ultradiscretization is a method utilizing a non-analytic limit defined by the following formula [10].

limε→+0

ε log(eA/ε + eB/ε + · · · ) = max(A,B, · · · ). (2)

We obtain an equation of piecewise-linear type called‘ultradiscrete equation’ by ultradiscretizing a differenceequation. There is a correspondence between basic op-erations of difference equation and of ultradiscrete one.Usual operations +, × and / of difference equation corre-spond to max, + and − of ultradiscrete one respectively.Thus we can make ‘analytic evaluation’ for ultradiscreteequation as we do for difference equation.

Moreover dependent variables can be discretized us-ing appropriate initial data and constants of ultradis-crete equation. Therefore ultradiscrete equation is acompletely discretized equation in this sense. Utilizingthis feature, we can show that ECA184 originally definedby a binary table is equivalent to ultradiscrete Burg-ers equation. Thus asymptotic behavior of solutions toECA184 can be proved by the analytic evaluation reflect-ing that of Burgers equation. As seen by this example,ultradiscretization gives a direct relation between CAand differential equation via difference one and proposesa new perspective for CA which can not be obtained ifwe make a closed analysis.

In this letter, we propose a difference equation rele-vant to the OV model and call it ‘discrete OV (dOV)equation’ . If we take a continuum limit for this equa-tion, we obtain (1) with a specific V (∆x). If we take anultradiscrete limit, we obtain ‘ultradiscrete OV (uOV)equation’ including ECA184 or FI model in a specialcase. Since uOV equation is of second-order on time dif-ference, it can express an acceleration effect. Both dOVand uOV equations show a phase transition from freeflow to jam.

2. Discrete Optimal Velocity Model

Let us assume the same situation as of OV model (1).The only difference is that a time variable is discrete.Assume a time step denoted by n (n = 0, 1, · · · ) and aninterval of time step by δ (> 0). Using these notations,dOV equation is defined by

xn+1k − 2xn

k + xn−1k = A log

(1 + δ2V (xn

k+1 − xnk )

)− log

(1 + δ(exn

k−xn−1k − 1)

).(3)

If 1 + δ2V (xnk+1 − xn

k ) or 1 + δ(exnk−xn−1

k − 1) in thelogarithmic terms is 0 or negative, (3) is not well-defined.However, if δ is small enough and if V (∆x) and initialdata are appropriately defined, we can easily exclude thisproblem.

Replacing xnk by xk(nδ) and assuming δ ∼ 0, we obtain

the following expansion.

xk = AV (xk+1−xk)−xk+A

2(xk−(x)2)δ+O(δ2). (4)

Thus (1) is derived from (3) by the continuum limitδ → 0 and (3) is a discrete analogue to (1). Consid-ering this relation, V (∆x) is required to have the profileroughly shown in Fig. 2. Moreover if we assume that(3) can be ultradiscretized, V (∆x) is required to have amore specific form. To realize both continuum and ul-tradiscrete limit, we fix the following form for V (∆x),

V (∆x) = a( 1

1 + e−b(∆x−c)− 1

1 + ebc

), (5)

where a, b and c are positive constants. Fig. 3 shows anexample of profile of V (∆x). Fig. 4 shows an example

0 2 4x

1

2

V

Fig. 3. Profile of V (∆x) defined by (5) with a = 2, b = 4, c = 2.

of orbits of vehicles. Initial positions of vehicles are setat nearly regular intervals with small disturbances. Co-alescence of jams occurs at earlier time and three majorjams survive in this figure. Though not shown in thisfigure, more coalescences occur after a long time passes.

The fundamental diagram is shown in Fig. 5 [1]. Thisdiagram shows a dependence of flow Q on density ρ.Density ρ is a number of vehicles per unit length andflow Q is equivalent to a total momentum of vehiclesper unit length. Both are defined by

ρ =1L

(number of vehicles),

Q =1

(n1 − n0 + 1)Lδ

n1∑n=n0

K∑k=1

(xnk − xn−1

k ).(6)

We can observe three phases, that is, (a) free flow phasein a low density region, (b) jam phase in a medium den-sity region and (c) tight jam phase in a high densityregion. Since these phases can be observed for the OVmodel (1), we can consider that discretization of timevariable in the dOV model (3) works well.

– 2 –


Fig. 4. Example of orbits of vehicles for L = 50, K = 25, δ = 0.1,A = 1, a = 2, b = 4 and c = 2.

Fig. 5. Fundamental diagram with L = 50, δ = 0.1, A = 1,a = 2, b = 4, c = 2, n0 = 90000 and n1 = 100000. Plottedpoints are obtained for 1 ≤ K ≤ 50.

3. Ultradiscrete Optimal Velocity Model

The dOV equation (3) with optimal velocity (5) canbe ultradiscretized. Let us introduce transformation ofvariable and constants including a new parameter ε de-fined by

xnk →

xnk + nδ

ε, δ → e−δ/ε, a → e(a+2δ)/ε, c → c

ε.

(7)

Substituting the transformation into (3) and (5), we ob-tain

xn+1k − 2xn

k + xn−1k

= A

ε log(1 +

ea/ε

1 + e−b(xnk+1−xn

k−c)/ε− ea/ε

1 + ebc/ε

)− ε log

(1 + e(xn

k−xn−1k )/ε − e−δ/ε

). (8)

If a, b, c, δ are positive and a < bc, we obtain the fol-lowing ultradiscrete equation by taking a limit ε → +0.

xn+1k − 2xn

k + xn−1k

= Amax(0, a−max(0,−b(xnk+1 − xn

k − c)))

−max(0, xnk − xn−1

k ). (9)

Moreover this equation is equivalent to

xn+1k − 2xn

k + xn−1k

= AV (xnk+1 − xn

k )−max(0, xnk − xn−1

k ), (10)

where

V (∆x) = max(0, b(∆x−c)+a)−max(0, b(∆x−c)). (11)

Note that (10) does not include the parameter δ in (3),since it can not be an arbitrary independent parameterwhen we take the ultradiscrete limit and can be excludedby introducing a background speed δ into xn

k and replac-ing δ by e−δ/ε as shown in (7). And we also commentthat the condition a < bc is necessary to keep the lowerspeed part of profile of V (∆x) for a small ε in the regionof ∆x > 0.

We show a typical profile of V (∆x) in Fig. 6. Fig. 7

0 cx

a

V

cab

Fig. 6. Profile of V (∆x) in (11).

shows an example of orbits of vehicles. Positions of ve-hicles are random integer at initial time step. Howeverxn

k is generally non-integer since A and a are not inte-ger in this example. A fundamental diagram using thesame constants other than K is shown in Fig. 8. Surpris-ingly three phases clearly exist as in Fig. 5. Note thatnumerical experiments are executed by double precisioncalculation of C program.

Fig. 7. Example of orbits of vehicles for (10) with L = 50, K =25, A = 0.5, a = 1.9, b = 4, c = 3.

– 3 –


Fig. 8. Fundamental diagram for (10) with L = 100, A = 0.5,a = 1.9, b = 4, c = 3, n0 = 1000, n1 = 2000. Plotted points areobtained for 1 ≤ K ≤ 100 and 50 trials are executed for everyK.

4. Special Case of Ultradiscrete Optimal

Velocity Model

In this section, we discuss two special cases of uOVmodel.

4.1 No OvertakingAssume that A, a, b, c in (10) and (11) are positive.

Then if AV (∆x) ≤ ∆x, we can derive

xn+1k ≤ xn

k+1 +xnk − xn−1

k −A max(0, xnk − xn−1

k )︸︷︷︸(a)

.

(12)Moreover if A ≥ 1, (a) is always 0 or negative. Thereforewe get xn+1

k ≤ xnk+1 on these assumptions. And further-

more, if velocity of all vehicles is non-negative, overtak-ing does not occur. When we use the OV (dOV, uOV)model as a numerical simulator of concrete traffic flow,overtaking of vehicle can not occur in a one-way circuitof single lane. Though we can avoid overtaking by choos-ing appropriate constants and initial data, assurance ofno overtaking is important for a real application.

4.2 Cellular AutomatonIf constants A, a, b, c and initial position x0

k are inte-ger, any xn

k calculated by (10) is also integer. Thereforeall the dependent and independent variables in (10) arediscrete in this case. Moreover, if we set A = 1, (10)reduces to

xn+1k = xn

k + V (xnk+1 − xn

k ) + min(0, xnk − xn−1

k ). (13)

Let us assume V (∆x) ≥ 0 as in Fig. 6. Moreover ifxn

k − xn−1k ≥ 0 for any k at a certain n, the last term

min(0, xnk − xn−1

k ) in (13) becomes 0 and xn+1k − xn

k =V (xn

k+1 − xnk ) ≥ 0 for any k. Therefore any vehicle does

not go backward if initial velocity of any vehicle is notnegative. Under this condition, (13) again reduces to thefirst-order equation,

xn+1k = xn

k + V (xnk+1 − xn

k ). (14)

Moreover let us consider the case of A = 1, a = vmax,b = 1, c = vmax + 1 where vmax is positive integer. ThenV (∆x) in (14) becomes

V (∆x) = max(0,∆x−1)−max(0,∆x−vmax−1). (15)

Assuming a size of vehicles is a unit cell size, xnk+1−xn

k−1is a distance between k-th and (k + 1)-th vehicles attime step n. Therefore every vehicle moves forward byits distance up to vmax. This model is nothing but theFI model and ECA184 for vmax = 1. We note that ananalogy between OV and some CA models is commentedby the references [11] and [12].

5. Concluding Remarks

We propose a new discrete OV model with a discretetime. Continuum limit of this model is equivalent to theOV model. We show that orbits of vehicles and the fun-damental diagram agree with those of OV model quali-tatively. Moreover this model has an ultradiscrete limitand a piecewise-linear type of evolution equation is ob-tained. We show that the ultradiscrete OV model alsogives phase transition in its fundamental diagram by anumerical calculation. It includes the FI model as a spe-cial case.

We only show a definition, a few features and somenumerical results about dOV and uOV models in thisletter. Detailed analysis using various combinations ofconstants is necessary to understand a dynamics of themodels fully. Comparison with other models and withreal data is also necessary. These points are future prob-lems to be solved.

References

[1] D. Chowdhury, L. Santen and A. Schadschneider, Statisticalphysics of vehicular traffic and some related systems, Phys.Rep., 329 (2000), 199–329.

[2] T. Musha and H. Higuchi, Traffic current fluctuation and theBurgers equation, Jpn. J. Appl. Phys., 17 (1978), 811–816.

[3] S. Tadaki, M. Kikuchi, Y. Sugiyama and S. Yukawa, Coupledmap traffic flow simulator based on optimal velocity func-tions, J. Phys. Soc. Jpn., 67 (1998), 2270–2276.

[4] K. Nagel and M. Schreckenberg, A cellular automaton modelfor freeway traffic, J. Physique I, 2 (1992), 2221–2229.

[5] S. Wolfram, Theory and Applications of Cellular Automata,World Scientific, Singapore, 1986.

[6] M. Fukui and Y. Ishibashi, Traffic flow in 1D cellular automa-ton model including cars moving with high speed, J. Phys.Soc. Jpn., 65 (1996), 1868–1870.

[7] M. Takayasu and H. Takayasu, 1/f noise in a traffic model,Fractals, 1 (1993), 860–866.

[8] M. Bando, K. Hasebe, A. Nakayama, A. Shibata and Y.Sugiyama, Dynamical model of traffic congestion and numer-ical simulation, Phys. Rev. E, 51 (1995), 1035–1042.

[9] K. Nishinari and D. Takahashi, Analytical properties of ultra-discrete Burgers equation and rule–184 cellular automaton, J.Phys. A, 31 (1998), 5439–5450.

[10] T. Tokihiro, D. Takahashi, J. Matsukidaira and J. Sat-suma, From soliton equations to integrable cellular automatathrough a limiting procedure, Phys. Rev. Lett., 76 (1996),3247–3250.

[11] D. Helbing and M. Schreckenberg, Cellular automata simu-lating experimental properties of traffic flow, Phys. Rev. E,59 (1999), R2505–R2508.

[12] K.Nishinari, A Lagrange representation of cellular automatontraffic-flow models, J. Phys. A, 34 (2001), 10727–10736.

– 4 –


Numerical Inclusion of Optimum Point for Linear

Programming

Shin’ichi Oishi1,2

and Kunio Tanabe1

Department of Applied Mathematics, Faculty of Science and Engineering, Waseda University,Tokyo 169-8555, Japan1 and CREST, JST, Japan2



Abstract

This paper concerns with the following linear programming problem:

Maximize ctx, subject to Ax ≦ b and x ≧ 0,

where A ∈ Fm×n, b ∈ F

m and c, x ∈ Fn. Here, F is a set of floating point numbers.

The aim of this paper is to propose a numerical method of including an optimum pointof this linear programming problem provided that a good approximation of an optimumpoint is given. The proposed method is base on Kantorovich’s theorem and the continuousNewton method. Kantorovich’s theorem is used for proving the existence of a solution forcomplimentarity equation and the continuous Newton method is used to prove feasibility ofthat solution. Numerical examples show that a computational cost to include optimum pointis about 4 times than that for getting an approximate optimum solution.

Keywords numerical verification, continuous Newton method, Kantorovich’s theorem

Research Activity Group Quality of Computations

1. Introduction

In this paper, we are concerned with the followinglinear programming problem:

Maximize ctx, subject to Ax ≦ b and x ≧ 0, (1)

where A ∈ Fm×n, b ∈ F

m and c, x ∈ Fn. Here, F is a set

of floating point numbers. The superscript t indicatesthe transposition. The aim of this paper is to proposea numerical method of including an optimum point ofthis linear programming problem provided that a goodapproximation of an optimum point is given.

Let xf be a feasible point of the primal problem (1),i.e., xf be a point satisfying

Axf ≦ b and xf ≧ 0. (2)

It is clear that ctxf becomes a lower bound of the opti-mum value. A dual problem for (1) is given by

Minimize bty, subject to Aty ≧ c and y ≧ 0, (3)

where y ∈ Fm. Let yf be a feasible point of the dual

problem (3), i.e., yf be a point satisfying

Atyf ≧ c and yf ≧ 0. (4)

It is clear that btyf becomes an upper bound of the op-timum value. Thus, an inclusion of the optimum valuev∗ is given by

v∗ ∈ [ctxf , btyf ] (5)

provided that feasible points xf and yf can be found.The duality theorem asserts that in principle the widthof the interval [ctxf , btyf ] can make as small as desired.This argument, which is rather well known (cf. Ref. [1]),

gives a method of numerical inclusion of the optimumvalue.

This paper is concerned with a problem of includingnumerically an optimum point of (1). For the purpose,we shall consider the following complimentarity problem

f(z) =

(

x(Aty − c)y(b − Ax)

)

= 0 ∈ Rn+m (6)

subject to

x ≧ 0, y ≧ 0, b − Ax ≧ 0 and Aty − c ≧ 0, (7)

where z = (xt, yt)t. Here, for two vectors u, v with thesame dimension, uv denotes a vector of the same dimen-sion with components uivi. To solve (6) subject to (7) isequivalent to solve the primal and dual problem (1) and(3). Tanabe [2] has applied Kantorovich’s theorem to (6)to estimate error in an approximate solution. However,the solution proved to be included by this approach isnot guaranteed to satisfy the feasibility condition (7).This paper resolves this difficulty by introducing a con-tinuous Newton method. Namely, this paper proposesa method, in which Kantorovich’s theorem is used forproving the existence of a solution for complimentarityequation and the continuous Newton method is exploitedto prove feasibility of that solution. Since Kantorovich’stheorem for the Newton method is used, non-degeneracyof the optimum solution is required for our analysis. De-generate cases will be considered in a separate paper.

– 5 –

JSIAM Letters Vol. 1 (2009) pp.5–8 Shin’ichi Oishi and Kunio Tanabe

2. Verification Method

The center path of (6) (cf. for instance Refs. [3–5]) isdefined by

f(z) =

(


)

= γe, (8)

where e ∈ Rm+n with all elements being 1. The constant

γ is defined by

γ =‖f(z)‖1

m + n=

(bty − ctx)

m + n(9)

provided that z is a feasible point. Namely, if z is afeasible point, γ is the duality gap of the problem dividedby m + n. This fact is pointed out in Refs. [4, 5].

The Frechet derivative f ′(z) is given by

f ′(z) =

(

[Aty − c] [x]At

−[y]A [b − Ax]

)

, (10)

where for a vector x = (x1, x2, · · · , xn)t, [x] denotesdiag(x1, x2, · · · , xn). At a given approximate optimumpoint z, the Newton direction dn and the centered di-rection dc are defined by

f ′(z)dn = −(


)

(11)

and

f ′(z)dc = −(


)

+ γe, (12)

respectively.In the method we shall propose the following verifica-

tion procedures. First, an interior point of (6) is searchedfor a searching direction, which is a linear combinationof dn and dc, based on the guiding cone method or thepenelalized norm method [4,5]. Here, we assume that wecan find an interior point z, which is a good approxima-tion of an optimum point. Then, the second step of ourmethod is to check conditions of the following theoremat the point z:

Theorem 1 Let z ∈ Rm+n be an interior point, namely

a point satisfying (7) with inequality condition:

x > 0, y > 0, b − Ax > 0 and Aty − c > 0. (13)

Let further constants α and ω be defined by the

inequalities α ≧ ‖f ′(z)−1‖∞‖f(z)‖∞ and ω ≧

2max (‖A‖∞, ‖A‖1)‖f ′(z)−1‖∞, respectively. If

αω ≦1

4, (14)

there exists an optimal point z∗ = (x∗t, y∗t)t ∈ B(z, ρ) =z′ ∈ R

m+n|‖z′ − z‖∞ ≦ ρ, which is a point satisfying

(6) and (7), where

ρ =1 −

√1 − 3αω

ω. (15)

The optimum point z∗ is unique in B(z, ρ).

We note that the half assertion of Theorem 1 can bederived from the following Kantorovich theorem appliedto the nonlinear equation (6):

Theorem 2 (Kantorovich’s Theorem for (6)) Let f be

defined by (6). We assume that the Frechet derivative

f ′(z) is nonsingular and satisfies the inequality

α′ ≧ ‖f ′(z)−1f(z)‖∞ (16)

for a certain positive α′. Furthermore, we assume that

f satisfies

‖f ′(z)−1(f ′(z′) − f ′(z′′))‖∞≦ ω′‖z′ − z′′‖∞ for ∀z′, z′′ ∈ R

m+n (17)

with a certain positive constant ω′. If

α′ω′ ≦1

2, (18)

and

ρ′ =1 −

√1 − 2α′ω′

ω′, (19)

there exists a point z∗ = (x∗t, y∗t)t ∈ B(z, ρ′) satisfying

(6). The solution z∗ of (6) is unique in B(z, ρ′).

Proof of Theorem 1 We assume that the conditionsof Theorem 1 is satisfied.

In the first place, we shall show that the conditionsof Theorem 2 are satisfied. We note that f is defined onR

m+n. If we put α′ = 1.5α, then

‖f ′(z)−1f(z)‖∞ ≦ ‖f ′(z)−1‖∞‖f(z)‖∞≦ α < α′. (20)

It is further noted that for any z′, z′′ ∈ Rm+n

f ′(z′) − f ′(z′′)

=

(

[At(y′ − y′′)] [x′ − x′′]At

−[y′ − y′′]A [−A(x′ − x′′)]

)

. (21)

Let ek = (1, 1, · · · , 1)t ∈ Rk and Ik be the identity ma-

trix in Rk. Then, from (21), we have the following ele-

mentwise inequality:

|f ′(z′) − f ′(z′′)|

≦ ‖z′ − z′′‖∞(

[|A|t em] [en] |A|t[em] |A| [|A| en]

)

≦ ‖z′ − z′′‖∞(

‖A‖1In |A|t

|A| ‖A‖∞

Im

)

(22)

which implies

‖f ′(z′) − f ′(z′′)‖∞≦ 2max (‖A‖∞, ‖A‖1)‖z′ − z′′‖∞. (23)

Here, for x = (x1, x2, · · · , xk)t, y = (y1, y2, · · · , yk)t ∈R

k

|x| = (|x1|, |x2|, · · · , |xk|)t (24)

and

x ≦ y ⇐⇒ xi ≦ yi, (i = 1, 2, · · · , k). (25)

Hence, we can use ω in Theorem 1 as ω′ in Theorem 2.If we put α′ = 1.5α and ω′ = ω, we have

α′ω′ = 1.5αω ≦ 3/8 < 1/2. (26)

Further, ρ′ coincides with ρ. Thus, from Kantorovich’stheorem (Theorem 2) it is seen that there exists a so-lution z∗ = (x∗t, y∗t)t ∈ B(z, ρ) satisfying (6). Kan-

– 6 –


torovich’s theorem states also that z∗ is unique solutionof (6) in the closed ball B(z, ρ).

Next, we show that z∗ is feasible, i.e., it satisfiesthe inequality conditions (7). Let us consider a solutioncurve of the following continuous Newton method start-ing from a given interior point z:

dz(t)

dt= −f ′(z(t))−1f(z(t)) with z(0) = z. (27)

The elementary theory for differential equations such asthe Picard-Lindelof theorem (see, for example, [6]) statesthat the solution curve z(t) exists for t ∈ [0,M) for acertain positive constant M .

Suppose T < M be the smallest value of T such thatz(T ) is on the boundary of the closed ball B(z, ρ). Then

‖z − z(T )‖∞ ≦

∫ T

0

∥

∥

∥

∥

dz(t)

dt

∥

∥

∥

∥

∞

dt < k‖f(z)‖∞, (28)

where k is defined by

k = maxz′∈B

‖f ′(z′)−1‖∞. (29)

This result was used in Refs. [3,7]. In fact, z(t) satisfies

df(z(t))

dt= −f(z(t)) with z(0) = z. (30)

Thus,

f(z(t)) = f(z)e−t (31)

holds. Hence, we have∥

∥

∥

∥

dz(t)

dt

∥

∥

∥

∥

∞

≦ ‖f ′(z(t))−1‖∞‖f(z(t))‖∞

≦ k‖f(z)‖∞e−t, (32)

which gives∫ T

0

∥

∥

∥

∥

dz(t)

dt

∥

∥

∥

∥

∞

dt ≦ k‖f(z)‖∞(1 − e−T )

< k‖f(z)‖∞. (33)

Furthermore, f(z(t)) = f(z)e−t implies z(t) startingwith an interior point remains to be an interior point fort ∈ [0,M).

We note that for z′ ∈ B(z, ρ)

‖f ′(z)−1(f ′(z) − f ′(z′))‖∞ ≦ ω‖z − z′‖∞ (34)

holds. We note also that (14) implies

ωρ < 1. (35)

Thus, from (34), it follows that

k = maxz′∈B

‖f ′(z′)−1‖∞

≦ maxz′∈B

‖f ′(z)−1‖∞1 − ‖I − f ′(z)−1f ′(z′)‖∞

= maxz′∈B

‖f ′(z)−1‖∞1 − ‖f ′(z)−1(f ′(z) − f ′(z′))‖∞

≦ maxz′∈B

‖f ′(z)−1‖∞1 − ω‖z − z′‖∞

≦‖f ′(z)−1‖∞

1 − ωρ. (36)

Therefore, we have

k‖f(z)‖∞ ≦‖f ′(z)−1‖∞‖f(z)‖∞

1 − ωρ

≦α

1 − ωρ. (37)

On the other hand, we can show the following inequal-ity

α

1 − ωρ≦ ρ. (38)

In fact, since 4αω ≦ 1, we have the inequality

1 − 2αω ≦√

1 − 3αω, (39)

which implies the inequality

α√1 − 3αω

≦1 −

√1 − 3αω

ω(40)

which is equivalent to the inequality (38).The inequalities (28), (37) and (38) imply

‖z − z(T )‖∞ < ρ (41)

which contradicts the fact that z(T ) is on the boundaryof B(z, ρ). Therefore, there exists no such T and thesolution curve is contained in the interior of the ballB(z, ρ). There is no singularity of the right hand side of(27) in B(z, ρ). By the elementary theory of differentialequation on extending solutions (see, for instance, [8]),the solution can be prolonged to the interval [0,∞), i.e.,M = ∞ and it converges to z∗ as t tends to ∞. In fact,let z∗∗ be a point in the limit set of the solution curve,which is obviously contained in the closed ball B(z, ρ).Then z∗∗ is a solution of (6) by (31). By the uniquenessof the solution of (6) in the closed ball B(z, ρ), it isidentical to z∗. Therefore, the solution curve convergesto z∗ as t tends to ∞.

Since the solution curve is contained in the feasibleset, the limit point z∗ is also a feasible point.

(QED)

3. Numerical Examples

In this section, let us present numerical examples. Forexecuting verified computation, we have used MATLABon Windows XP over a personal computer having Core2 Duo 1.2GHz Intel processor.

A verification function is programmed based on therounding mode controlled numerical verification methodproposed in Ref. [9].

Example 1 Let us consider the following problem:


where ct = (3, 2),

A =

−1 31 12 −1

(43)

and bt = (12, 8, 10). It is known that the optimal solution

– 7 –


is x = (6, 2)t. In this case, we have an interior point

x =

(

5.9999999999999992.000000000000000

)

,

y =

4.166666666666667 × 10−017

2.3333333333333343.333333333333334 × 10−1

.

For this point, we have

αω < 7.93 × 10−14. (44)

Thus, there exists an optimum solution of (42) in the

ball centered at z = (xt, yt)t with a radius

ρ = 5.14 × 10−15. (45)

Further, the objective value is included in

[22.00000000000000, 22.00000000000001]. These re-

sults are consistent with the exact solution x = (6, 2)t.

Example 2 Next, let us consider the following simple

linear programming problem:


where ct = (300, 300, 500),

A =

150 100 1001 2 10 0 150

(47)

and bt = (3000, 40, 1200). In this case, we have a feasible

solution

x =

5.999999999999997313.0000000000000048.0000000000000000

,

y =

1.500000000000000075.0000000000000001.8333333333333335

. (48)

For this feasible point, we have

αω < 1.64 × 10−11. (49)

Thus, there exists an optimum solution of (46) in the

ball centered at z = (xt, yt)t with a radius

ρ = 1.45 × 10−13. (50)

Example 3 In this example, we shall consider the fol-

lowing problem with n = 2m.


where

c = 10cr, A = 10E + 5Ar, b = 100br (52)

Here, cr ∈ Fn is a pseudo-random vector whose elements

are distributed uniformly in [0, 1], E ∈ Fm×n is a con-

stant matrix whose elements are all one, Ar ∈ Fm×n is

a pseudo-random matrix whose elements are distributed

according to the normal distribution with the mean zero

and the variation one, and br ∈ Fm is a pseudo-random

vector whose elements are distributed uniformly in [0, 1].We examine the cases of m = 100 , 200 , 300, 400, 500,

600, 700, 800 and m = 900. We solve the problem on

MATLAB. The result is summarized in Table 1. In this

table, ta is time for computing an approximate optimum

solution and tb that for numerical verification.

Table 1. Results of Verificationm + n αω ρ ta[sec] tv [sec]

300 3.78× 10−6 3.4× 10−12 0.30 0.29600 1.69× 10−5 5.0× 10−12 0.79 2.17

900 4.22× 10−4 2.9× 10−11 1.9 7.41200 0.0049 7.3× 10−11 3.9 17.41500 0.083 3.8× 10−10 7.2 33.41800 2.91× 10−4 1.8× 10−11 13 59

2100 0.189 4.7× 10−10 21 912400 0.051 2.1× 10−10 37 1372700 0.029 1.4× 10−10 54 184

3000 0.011 8.3× 10−11 72 270

In this numerical example, a computational cost to in-

clude optimum point is about 4 times than that for get-

ting an approximate optimum solution.

Acknowledgments

This research is supported by the Grant-in-Aid forSpecially Promoted Research from the MEXT, Japan:“Establishment of Verified Numerical Computation”,(No. 17002012). The authors express their sincere thanksto the referees for their valuable comments on this arti-cle.

References

[1] C. Jansson, Rigorous lower and upper bounds in linear pro-gramming, SIAM J. Optim., 14(3) (2004), 914–935.

[2] K. Tanabe, A posteriori error estimate for an approximatesolution of a general linear programming, in: New Methods

for Linear Programming 2, The Institute of Statistical Math-ematics Cooperative Research Report 10, pp.118–120, 1988.

[3] K. Tanabe, A geometric method in nonlinear programming,J. Optim. Theory Appl., 30 (1980), 181-210.

[4] K. Tanabe, Complementarity-enforcing centered newtonmethod for mathematical programming: global method, in:

New Methods for Linear Programming, The Institute of Sta-

tistical Mathematics Cooperative Research Report 5, pp.118–144, 1987.

[5] K. Tanabe, Centered Newton method for mathematical pro-gramming, in: System Modeling and Optimization, M. Iri and

K.Yajima eds., pp.197–208, Springer-Verlag, Berlin, 1988.[6] E. Zeitler, Nonlinear Functional Analysis and Its Applica-

tions: Part I Fixed Point Theorems, Springer-Verlag, New

York, 1986, p.78.

[7] K. Tanabe, Continuous Newton-Raphson method for solvingan underdetermined system of nonlinear equations, NonlinearAnal. T.M.A., 3 (1979), 495–503.

[8] M.W.Hirsch and S.Smale, Differential Equations, DynamicalSystems, and Linear Algebra, Academic Press, London, 1974,

p.171.[9] S. Oishi and S.M. Rump, Fast verification of solutions of ma-

trix equations, Numer. Math., 90 (4) (2002), 755–773.

– 8 –


2D tight framelets with orientation selectivity suggested

by vision science

Hitoshi Arai1

and Shinobu Arai

Graduate School of Mathematical Sciences, The University of Tokyo, 3-8-1 Komaba, Meguro-ku, Tokyo 153-8914, Japan1

E-mail [email protected]


Abstract

In this paper we will construct compactly supported tight framelets with orientation selectivityand Gaussian derivative like filters. These features are similar to one of simple cells in V1revealed by recent vision science. In order to see the orientation selectivity, we also give asimple example of image processing of a test image.

Keywords wavelet frame, framelet, visual cortex, simple cell, orientation selectivity

Research Activity Group Wavelet Analysis

1. Introduction

Simple cells in V1 of the brain cortex play impor-tant roles in the visual information processing, and somemathematical models of such cells have been studied byusing the Gabor function or DOG function. However, R.Young established that Gaussian derivative models aresuitable for studying simple cells (see [1]). In this paperwe construct compactly supported framelets which havesimilar graphs as Gaussian derivatives and good orien-tation selectivity. See [2] for the definition of framelets.In [3], B. Escalante-Ramırez and J. Silvan-Cardenas con-structed a multi-channel model with orientation selectiv-ity by means of Gaussian derivatives. However the Gaus-sian function is not compactly supported. In the previ-ous paper [4], we presented new wavelet frames with ori-entation selectivity and Gaussian derivative like shape.The frames are defined only on product of two finiteabelian groups, Z/N1Z × Z/N2Z, where Z is the ad-ditive abelian group consisting of all integers, and N1

and N2 are positive integers. However our framelet fil-ters presented in this paper define not only tight framesof l2(Z/N1Z ×Z/N2Z), but also one of l2(Z ×Z), andof L2(R2).

To describe our construction we mention our termi-nology. Let Z

2 = Z × Z. For a matrix A we denoteby AT the transpose of A. For a 2 × 2 matrix M ,let N (M) be the set (x1, x2) M : x1, x2 ∈ [0, 1) ∩ Z

2.In this paper we will be concerned with the following

matrices: Mr =

(

2 00 2

)

, Mq =

(

1 1−1 1

)

and

Mh =

(

1 12 −2

)

. These matrices are related to deci-

mation of 2D signals: Mr defines a rectangular decima-tion, Mq a quincunx decimation, and Mh a hexagonaldecimation. Suppose M = Mr, Mq, or Mh. For a sta-ble filter h = (h[m1,m2])m1,m2∈Z

, let H(ω1, ω2) be its

frequency response function

∞∑

m1,m2=−∞

h[m1,m2]e−2πim1ω1e−2πim2ω2 .

The purpose of this paper is to give a simple constructionof a finite number of FIR filters hs = (hs[n])n∈Z2 , s ∈ S,having the above mentioned properties and the followingconditions: For all ω ∈ R

2 and for all r ∈ N (M) withr 6= 0,

∑

s∈S

|Hs(ω)|2 = |det M | , (1)

∑

s∈S

Hs(ω)Hs(ω + rM−1) = 0, (2)

where the bar denotes the complex conjugation. For u ∈Z

2, let hs,u[v] = hs[v − uMT ], v ∈ Z2, and hs,u =

(hs,u[v])v∈Z2 . Obviously hs,u ∈ l2(Z2). Since filters hs,

s ∈ S, satisfy the conditions (1) and (2), hs,us∈S,u∈Z2

is a tight frame of l2(Z2). Moreover as we will show later,when M = Mr, constant multiplication of our filterssatisfies the unitary extension principle, and thereforewe gain also a tight frame of L2(R2).

We note that several wavelet frames with orientationselectivity had been constructed: for example curvelet[5], contourlet [6], complex wavelet [7], wavelet framesin [4] and so on. However our framelet is completely dif-ferent from them, and satisfies several properties similarto simple cells.

2. Construction

Suppose n is an integer with n ≥ 2. Let rn = 1 if n isodd, and let rn = 0 if n is even. Then there is a uniquepositive integer r such that n = 2r + rn. Let

Λf = (0, 0) , (0, n) , (n, 0) , (n, n) ,

Λg = (k, l)k=0,n;l=1,2,··· ,n−1∪(k, l)l=0,n;k=1,2,··· ,n−1,

– 9 –

JSIAM Letters Vol. 1 (2009) pp.9–12 Hitoshi Arai and Shinobu Arai

Λa = (k, l)k=1,2,··· ,n−1; l=1,2,··· ,n−1.

We abbreviate c(x) = cos (πx) and s(x) = sin (πx). Wedenote by

(

nk

)

the binomial coefficient. Let αk =(

nk

)

,

βk =(

n−2

k−1

)

, and cM = |det M |1/2. For (k, l) ∈ Λf , let

Fk,l(x, y) = cM ik+le−rnπi(x+y)c(x)n−ks(x)kc(y)n−ls(y)l.

For (k, l) ∈ Λg, let

Gk,l(x, y) = 2−1/2cM ik+l√αkαle−rnπi(x+y)

× c(x)n−ks(x)kc(y)n−ls(y)l.

For (k, l) ∈ Λa, let

Aκk,l(x, y) =

2−1cM ik+l+1e−rnπi(x+y)c(x)n−k−1s(x)k−1c(y)n−l−1

× s(y)l−1

(

(−1)κ√

αkβlc(x)s(x) +√

αlβkc(y)s(y))

,

where κ = 1, 2. Let

δn,kν =

min(ν+r,n−k)∑

µ=max(ν+r−k,0)

(−1)ν+µ

(

n − k

µ

)(

k

ν + r − µ

)

,

γn,lν =

min(ν+r−1,n−l−1)∑

µ=max(ν+r−l,0)

(−1)ν+µ

(

n − l − 1

µ

)(

l − 1

ν + r − 1 − µ

)

.

By calculation, we have the following lemma.

Lemma 1 Fk,l, Gk,l, A1k,l and A2

k,l are frequency re-

sponse functions of real valued FIR filters as follows.

Fk,l(x, y) =cM

22n

r+rn∑

ν=−r

r+rn∑

λ=−r

δn,kν δn,l

λ e−2πi(νx+λy),

Gk,l(x, y)

=cM

√αkαl

22n+1/2

r+rn∑

ν=−r

r+rn∑

λ=−r

δn,kν δn,l

λ e−2πi(νx+λy),

Aκk,l(x, y) =

cM

22n−1

√

αlβk

r−1+rn∑

ν=−r+1

r+rn∑

λ=−r

γn,kν δn,l

λ e−2πi(νx+λy)

+ (−1)κ

√

αkβl

r+rn∑

ν=−r

r−1+rn∑

λ=−r+1

δn,kν γn,l

λ e−2πi(νx+λy)

.

Let fk,l, gk,l, a1k,l and a2

k,l be filters whose frequency

response functions are Fk,l, Gk,l, A1k,l and A2

k,l, respec-tively. Let

H = fk,l(k,l)∈Λf∪gk,l(k,l)∈Λg

∪

aκk,l

(k,l)∈Λa,κ∈1,2.

We call this family of filters H the simple pinwheelframelet of degree n (abbr. SP framelet of degree n).This name comes from the famous “pinwheel structure”of simple cells. Our main theorem is the following:

Theorem 1 (i) If n is odd, H satisfies the conditions

(1) and (2) for Mr, Mq and Mh.(ii) If n is even and n ≥ 4, H satisfies the conditions

(1) and (2) for Mr and Mq, but does not satisfy (2) for

Mh. If n = 2, H satisfies the condition (1), but does not

satisfy (2) for Mr, Mq and Mh.

Sketch of the proof For real numbers p and q, let

Φ(x, y; p, q) =∑

(k,l)∈Λf

Fk,l(x, y)Fk,l(x + p, y + q)

+∑

(k,l)∈Λg

Gk,l(x, y)Gk,l(x + p, y + q)

+

2∑

κ=1

∑

(k,l)∈Λa

Aκk,l(x, y)Aκ

k,l(x + p, y + q).

Suppose n is odd. By calculation we have thatΦ(x, y; p, q) = 0 if p = 1/2+m (m ∈ Z) or q = 1/2+m′

(m′ ∈ Z), and that Φ(x, y; 0, 0) = |det M |. In particular,

Φ

(

x, y;1

2, 0

)

= Φ

(

x, y; 0,1

2

)

= Φ

(

x, y;1

2,1

2

)

= Φ

(

x, y;1

2,1

4

)

= Φ

(

x, y;1

2,3

4

)

= 0.

This implies (i). If n is even and n ≥ 4, then we haveΦ(x, y; 0, 0) = |det M |, and

Φ

(

x, y;1

2, 0

)

= Φ

(

x, y; 0,1

2

)

= Φ

(

x, y;1

2,1

2

)

= 0.

However Φ (x, y; 1/2, 1/4) and Φ (x, y; 1/2, 3/4) are notidentically zero. Suppose n = 2. Then it is easy to showthat Φ(x, y; 0, 0) = |det M |, and that Φ(x, y; 1/2, 1/2) isnot identically zero.

(QED)

Suppose M = Mr and n is a positive integer with n ≥3. Let B1(x) be the characteristic function of the interval[−1/2, 1/2) on R, and Bm+1(x) = Bm ∗ B1(x), m =1, 2, · · · , where ∗ is the convolution on R. We considerthe SP framelet of degree n. Let

f0,0(x, y) = Bn(x − 1/2)Bn(y − 1/2).

Then the Fourier transform of f0,0 is as follows:

f0,0(ξ1, ξ2) = e−πi(ξ1+ξ2)

(

s(ξ1)

πξ1

s(ξ2)

πξ2

)2

.

Hence we have

f0,0(2ξ1, 2ξ2) = c−1

M F0,0(ξ1, ξ2)f0,0(ξ1, ξ2).

Define

fk,l(ξ1, ξ2) = c−1

M Fk,l(ξ1/2, ξ2/2)f0,0(ξ1/2, ξ2/2),

gk,l(ξ1, ξ2) = c−1

M Gk,l(ξ1/2, ξ2/2)f0,0(ξ1/2, ξ2/2),

aκk,l(ξ1, ξ2) = c−1

M Aκk,l(ξ1/2, ξ2/2)f0,0(ξ1/2, ξ2/2).

By the unitary extension principle [8] we have thatfk,l(k,l)∈Λf\(0,0)

∪ gk,l(k,l)∈Λg∪ aκ

k,l(k,l)∈Λa,κ∈1,2

is a tight frame of L2(R2).

– 10 –


Fig. 1. Filters of MOGMRA (level 2, n = 5).

3. Discussion related to vision science

and image processing

To apply our framelets to computational experimentsfor studying vision, we discuss here a maximal over-lap version of the generalized multiresolution analysis(MOGMRA) defined on Z/N1Z × Z/N2Z, where N1

and N2 are positive even integers. We refer our pa-per [9] why MOGMRA is suitable for studying mathe-matical models of visual information processing. See [10]and [11] for maximal overlap multiresolution analysis forwavelets. We begin with describing MOGMRA basedon our SP framelet of degree n. Suppose M = Mr.For a positive integer N , let ZN = 0, 1, · · · , N − 1.For y = (y[m])m∈ZN1/2×ZN2/2

, we denote by yM =(

yM [m])

m∈ZN1×ZN2

the upsampling of y by the sam-

pling matrix M , that is yM [mMT ] = y[m] for m ∈ZN1/2 × ZN2/2, and otherwise yM [m] = 0. For x =(x[m])m∈ZN1

×ZN2

, let x[m] = x[m]+x[m1+N1/2,m2+

N2/2], m = (m1,m2) ∈ ZN1/2 × ZN2/2, and let S(x) =

(x)M

. Let S0(x) = x, and Sµ(x) = S(Sµ−1(x)) forµ = 1, 2, · · · . For a stable filter h ∈ l1(Z2), let p(h) beits periodization, that is, p(h) = (p(h)[m])m∈ZN1

×ZN2

where for m = (m1,m2) ∈ ZN1× ZN2

,

p(h)[m] =∞∑

k1,k2=−∞

h[m1 + N1k1,m2 + N2k2].

Let T j(h) = Sj−1(p(h)), j = 1, 2, · · · . For x =(x[m])m∈ZN1

×ZN2

, (k1, k2) ∈ Z2, and µ = (µ1, µ2) ∈

ZN1×ZN2

, let xper[µ1 +k1N1, µ2 +k2N2] = x[µ]. Thenxper is identified with a signal defined on Z/N1Z ×

Z/N2Z. Let x∨[m] = xper[−m1,−m2]. For x =(x[m])m∈ZN1

×ZN2

and y = (y[m])m∈ZN1×ZN2

, we de-

note by x ∗ y the cyclic convolution, that is, x ∗ y[m] =∑

k∈ZN1×ZN2

xper[k]yper[m − k], m ∈ ZN1× ZN2

. For

x = (x[m])m∈ZN1×ZN2

, the first stage of the decomposi-

tion of MOGMRA is defined by F 1k,l(x) = T 1(fk,l)

∨ ∗ x,

G1k,l(x) = T 1(gk,l)

∨∗x, and Aκ,1k,l (x) = T 1(aκ

k,l)∨∗x. The

second stage is defined by F 2k,l(x) = T 2(fk,l)

∨ ∗ F 10,0(x),

G2k,l(x) = T 2(gk,l)

∨ ∗F 10,0(x), and Aκ,2

k,l (x) = T 2(aκk,l)

∨ ∗F 1

0,0(x). In general, the j-th stage is as follows: F jk,l(x) =

T j(fk,l)∨ ∗ F j−1

0,0 (x), Gjk,l(x) = T j(gk,l)

∨ ∗ F j−1

0,0 (x), and

Aκ,jk,l (x) = T j(aκ

k,l)∨ ∗ F j−1

0,0 (x). These are the decompo-

sition phase. Let F j = T j(f0,0). For a positive integer J ,the synthesis phase is defined as follows:

˜F Jk,l(x) = 4−JF 1 ∗ · · · ∗ F J−1 ∗ T J(fk,l) ∗ F J

k,l(x),

˜F J−1

k,l (x) = 4−J+1F 1∗ · · · ∗ F J−2∗ T J−1(fk,l) ∗ F J−1

k,l (x),

...

˜F 1k,l(x) = 4−1T 1(fk,l) ∗ F 1

k,l(x),

Fig. 2. A test image.

– 11 –


Fig. 3. MOGMRA decomposition of the test image (level 2).

˜GJk,l(x) = 4−JF 1 ∗ · · · ∗ F J−1 ∗ T J(gk,l) ∗ GJ

k,l(x),

...

˜G1k,l(x) = 4−1T 1(gk,l) ∗ G1

k,l(x),

and ˜Aκ,jk,l are defined by the similar way. Then we obtain

x = ˜F J0,0(x) +

J∑

j=1

∑

(k,l)∈Λf\(0,0)

˜F jk,l(x)

+∑

(k,l)∈Λg

˜Gjk,l(x) +

∑

(k,l)∈Λa

∑

κ=1,2

˜Aκ,jk,l (x)

.

We call this decomposition MOGMRA decomposition ofx at level J . By the same way as in [4], we can defineMOGMRA when N1 and N2 are not even.

Let δ be the 2D unit impulse supported at (N1/2 +1, N2/2 + 1), and let δ′ = p(δ). Suppose n = 5. Fig. 1depicts the plots of the outputs of δ′ by F 2

k,l, G2k,l, and

Aκ,2k,l arranged by the following rule:

F 25,0, G

24,0, G

23,0, G

22,0, G

21,0, F

20,0,

G25,1, A

1,24,1, A

1,23,1, A

1,22,1, A

1,21,1, G

20,1, A

2,24,1, A

2,23,1, A

2,22,1, A

2,21,1,

G25,2, A

1,24,2, A

1,23,2, A

1,22,2, A

1,21,2, G

20,2, A

2,24,2, A

2,23,2, A

2,22,2, A

2,21,2,

G25,3, A

1,24,3, A

1,23,3, A

1,22,3, A

1,21,3, G

20,3, A

2,24,3, A

2,23,3, A

2,22,3, A

2,21,3,

G25,4, A

1,24,4, A

1,23,4, A

1,22,4, A

1,21,4, G

20,4, A

2,24,4, A

2,23,4, A

2,22,4, A

2,21,4,

F 25,5, G

24,5, G

23,5, G

22,5, G

21,5, F

20,5.

Next we consider a test image (Fig. 2). Fig. 3 is theMOGMRA decomposition of the test image at level 2.From this result of image processing we can concludethat our framelet has good orientation selectivity.

References

[1] R. Young, Oh say, can you see? The physiology of vision,

SPIE, 1453 (1991), 92–123.[2] I. Daubechies, B. Han, A. Ron and Z. Shen, Framelets: MRA-

based construction of wavelet frames, Appl.Comput.Harmon.Anal., 14 (2003), 1–46.

[3] B. Escalante-Ramırez and J. Silvan-Cardenas, Advancedmodeling of visual information processing: A multi-resolutiondirectional-oriented image transform based on Gaussian

derivatives, Signal Processing:Image Comm. 20 (2005), 801–812.

[4] H.Arai and S.Arai, Finite discrete, shift-invariant, directionalfilterbanks for visual information processing, I: construction,

Interdisciplinary Information Sciences, 13 (2007), 255–273.[5] E.J. Candes and D. Donoho, New tight frames of curvelets

and optimal representations of objects with piecewise C2 sin-

gularities, Comm. Pure and Appl. Math., 57 (2004), 219–266.[6] M. N. Do and M. Vetterli, The contourlet transform: An effi-

cient directional multiresolution image representation, IEEETrans. Image Processing, 14 (2005), 2091–2106.

[7] N. G. Kingsbury, Image processing with complex wavelets,Phil. Trans. Roy. Soc. London, A357 (1999), 2543–2560.

[8] A. Ron and Z. Shen, Affine systems in L2(Rd): the analysis ofthe analysis operator, J. Funct. Anal., 148 (1997), 408–447.

[9] H. Arai, A nonlinear model of visual information processingbased on discrete maximal overlap wavelets, InterdisciplinaryInformation Sciences, 11 (2005), 177–190.

[10] G. P. Nason and B. W. Silverman, The stationary wavelettransform and some statistical applications, Lect.Notes Stat.,Vol.103, pp.288–299, Springer-Verlag, 1995.

[11] D. B. Percival and A. T. Walden, Wavelet Methods for Time

Series Analysis, Cambridge Univ. Press, 2000.

– 12 –


Analysis of Neuronal Dendrite Patterns Using Eigenvalues

of Graph Laplacians

Naoki Saito1

and Ernest Woei1

Department of Mathematics, University of California, Davis, CA 95616 USA1


Received September 29, 2008, Accepted October 16, 2008 (INVITED PAPER)

Abstract

We report our current effort on extracting morphological features from neuronal dendritepatterns using the eigenvalues of their graph Laplacians and clustering neurons using thosefeatures into different functional cell types. Our preliminary results indicate the potentialusefulness of such eigenvalue-based features, which we hope to replace the morphologicalfeatures extracted by methods that require extensive human interactions.

Keywords pattern analysis, graph Laplacian, eigenvalues of Laplacian matrices

Research Activity Group Wavelet Analysis

1. Introduction

In recent years, the advent of new sensors and tech-niques has allowed one to image complicated intercon-nected structures in biology such as dendrites connectedto a single neuron, neuronal axon/fiber tracts in a hu-man brain, and a network of blood vessels in humanbody. Neuroscientists hope to gain insight into modelingand understanding brain functions by analyzing imagesof such network structures. The actual analysis of them,however, remains elusive. For example, vision scientistswant to understand how the morphological propertiesof dendrite patterns of retinal ganglion cells (RGCs),such as those shown in Figure 1, relate to the functionaltypes of these cells. Although such classification of neu-rons should ultimately be done on the basis of molec-ular or genetic markers of neuronal types, it has notbeen forthcoming. Hence, neuronal morphology has of-ten been used as a neuronal signature that allows oneto classify a neuron such as an RGC into different func-tional cell types [1]. The state of the art procedure is stillquite labor intensive and costly: automatic segmentationalgorithms to trace dendrites in a given 3D image ob-tained by a confocal microscope only generate imperfectresults due to occlusions and noise; moreover, one has topainstakingly extract many morphological and geometri-cal parameters (e.g., somal size, dendritic field size, totaldendrite length, the number of branches, branch angle,etc.) with the help of an interactive software system. Infact, 14 morphological and geometric parameters wereextracted from each cell in [1]. It takes roughly half aday to process a single cell from segmentation to param-eter extraction!

In this paper, we examine how to analyze and charac-terize such neuronal dendrite structures automaticallyusing computational harmonic analysis techniques sothat we can save human interaction cost in this dendritepattern analysis.

2. Analysis of Dendrite Structures via

Graph Laplacian Eigenvalues

The segmentation and tracing software system usedby our collaborator, Prof. Leo Chalupa and his group(Dept. Neurobiology, Physiology & Behavior, UC Davis)provides us with a sequence of 3D coordinates that rep-resent points sampled along dendrite arbors (or paths)of RGCs with the branching information [1]. One ofthe most natural and simplest ways to model such a

Fig. 1. Dendrites of various types of retinal ganglion cells of amouse; reprinted from [1] with permission from Elsevier.

– 13 –

JSIAM Letters Vol. 1 (2009) pp.13–16 Naoki Saito and Ernest Woei

network-like structure is to construct a graph. Hence,our first task is to convert such a sequence of 3Dpoints to a connected graph G consisting of the ver-tex set V and edge set E. To fix our notation, let Gbe a graph representing dendrite patterns of an RGC,V = V (G) = v1, v2, . . . , vn where each vk ∈ R

3 is a3D sample point along dendrite arbors of this RGC, andE = E(G) = e1, e2, . . . , em where ek connects two ver-tices vi, vj for some 1 ≤ i, j ≤ n, and write ek = (vi, vj).Let dvk

be the degree (or valency) of the vertex vk. Infact, dendrite patterns of each RGC in our dataset canbe converted to a tree rather than a general graph sinceit is connected and contains no cycles. We also note thatwe only deal with unweighted graphs in this paper. Inother words, we essentially examine the connectivitiesand complexity of the dendrite graphs, which may notreflect the physical lengths of the dendrite arbors. Wewill defer our investigation of models that reflect suchphysical realities as our future project, which includesthe weighted graphs where each edge e ∈ E has weightwe := ‖vi − vj‖−1, i.e., the inverse of the physical dis-tance between two vertices of e.

Once we construct a graph per RGC, we proceed asfollows:

Step 1: Construct the Laplacian matrix (often calledthe combinatorial Laplacian matrix) L(G) :=D(G) − A(G) where D(G) := diag(dv1

, . . . , dvn) is

the diagonal matrix of vertex degrees and A(G) =(ai,j) is the adjacency matrix of G, i.e., ai,j = 1 ifvi and vj are adjacent; otherwise it is 0.

Step 2: Compute the eigenvalues of L(G);

Step 3: Construct features using these eigenvalues;

Step 4: Repeat the above steps for all the RGCs andfeed these feature vectors to clustering algorithms.

Our rationale behind using the Laplacian eigenvalues isthe following: They reflect various intrinsic geometricinformation about the graph e.g., connectivity (or thenumber of separated components), diameter (the maxi-mum distance over all pairs of vertices), mean distance,etc.; see, e.g., [2,3] for the details on the graph Laplacianeigenvalues. In fact, we view the dendrites connected toa neuron as a musical instrument, try to “listen” to itssounds, and check if those can be used to characterizethe dendrite patterns. We know that it is not possible touniquely identify a graph from its Laplacian eigenvaluesin general. In particular, “almost all trees are cospec-tral”; see, e.g., [3]. In practice, however, it is often possi-ble to obtain good approximation of a graph from them.Hence, we believe that the features based on the Lapla-cian eigenvalues of a graph will be useful for variousrecognition and clustering purposes.

Before stating the facts or theorems in [3, 4] (seealso [2]) that are used to construct our features, letus fix our notation and define several key quantities.Let | · | denote a size of a set. Let |V | = n, and let0 = λ0 ≤ λ1 · · · ≤ λn−1 be the sorted eigenvalues ofL(G). Let mG(λ) denote the multiplicity of λ as aneigenvalue of L(G), and let mG(I) be the number ofeigenvalues of L(G), multiplicities included, that belong

to I, an interval of the real line. A vertex of degree 1is called a pendant vertex, and a vertex adjacent to apendant vertex is called pendant neighbor. Let p(G) andq(G) be the number of pendant vertices and the numberof pendant neighbors of G, respectively. For a nonemptysubset of vertices S ⊂ V (G), let ∂S be the boundary ofS defined as ∂S := e = (u, v) ∈ E(G) |u ∈ S, v /∈ S.Let i(G) be the isoperimetric number of G:

i(G) := inf

|∂S||S|

∣

∣

∣

∣

∅ 6= S ⊂ V, |S| ≤ n/2

. (1)

The isoperimetric number is closely related to the con-

ductance of a graph, i.e., how fast a random walk on Gconverges to a stationary distribution. The Wiener in-

dex W (G) of a graph G is the sum of the entries in theupper triangular part of the distance matrix ∆(G) ofG, where (∆(G))i,j is the number of edges in a shortestpath from vertex vi to vertex vj . The Wiener index of amolecular graph has been used in chemical applicationsbecause it may exhibit a good correlation with physicaland chemical properties of the corresponding molecule.

We now list several theorems we use in this paper.

• mG(0) is equal to the number of connected compo-nents of G.

• The number of pendant neighbors of G is boundedas:

p(G) − mG(1) ≤ q(G) ≤ mG(2, n], (2)

where the second inequality holds if G is connectedand satisfies 2q(G) < n.

• For n ≥ 4, the isoperimetric number i(G) satisfies

i(G) <

√

(

2 maxv∈V (G)

dv − λ1(G)

)

λ1(G). (3)

• Let G be a tree. Then

W (G) =

n−1∑

k=1

n

λk

. (4)

3. Numerical Experiments and Prelimi-

nary Results

In this section, we report our preliminary resultswe obtained very recently. We only use the dendritepatterns categorized into the so-called “monostratified”RGCs, meaning the dendrites of those RGCs are con-fined to either the On or the Off sublaminae of the innerplexiform layer (a layer immediately below the RGCstoward rods and cones) [1], which should be contrastedwith “bistratified” RGCs whose dendrites span the Onand Off sublaminae).

The following features were used to characterize thedendrite patterns of 130 monostratified RGCs.

Feature 1: (p(G)−mG(1))/|V (G)| as a lower bound ofthe number of the pendant neighbors q(G) as shownin (2) with the normalization by |V (G)| ;

Feature 2: The normalized Wiener indexW (G)/|V (G)| via (4) ;

Feature 3: mG(4,∞)/|V (G)|, i.e., the number of eigen-values of L(G) larger than 4 (normalized) ;

– 14 –


−110 −100 −90 −80 −70 −60 −50 −40 −30

−30

−20

−10

0

10

20

X (µ m)

Y (

µ m

)

(a) RGC #60

−40 −20 0 20 40 60 80

−220

−210

−200

−190

−180

−170

−160

−150

−140

−130

−120

X (µ m)

Y (

µ m

)

(b) RGC #100

Fig. 2. Zoom up of a part of two RGCs belonging to Cluster 1(a) and Cluster 6 (b). One can see some “spines” in (a).

Feature 4: The upper bound of the isoperimetric num-ber i(G) shown in (3) .

We normalized Features 1, 2, 3, by the number of verticesin the graph because we wanted to make features lessdependent on the number of samples or how the dendritearbors are sampled. Of course, the number of verticesitself could be a feature although it may not be a decisiveone. On the other hand, Feature 4 was not explicitlynormalized because the isoperimetric number (1) itselfis a normalized quantity in terms of number of vertices.

Feature 1 was used because the number of pendantneighbors seems to be strongly related to the so-called“spines,” short protrusions from the dendrite arbors.Figure 2(a) shows several spines as the edges of length 1each of which is attached to a terminal vertex of degree1. Hence, we expect that the larger this lower boundp(G) − mG(1) is, the more likely for the RGC to havespines. In contrast to Figure 2(a), Figure 2(b) shows anexample of the RGC whose Feature 1 value is small.Apparently, there is no spine in this figure and each ofthe pendant neighbors has exactly one pendant (or ter-minal) vertex. The reason why we used Feature 3, thenormalized version of mG(4,∞), is based on our follow-ing observations. The Laplacian eigenvalue distributionof each RGC dendrite graph typically looks like that inFigure 3. It consists of a smooth bell-shaped curve thatranges over the interval [0, 4] and the sudden burst abovethe value 4. We have observed that this value 4 is criti-cal since the eigenfunctions corresponding to the eigen-values below 4 are semi-global oscillations (like Fouriercosines/sines) over the entire dendrites or one of the den-drite arbors whereas those corresponding to the eigen-values above 4 are much more localized (like wavelets)in branching regions. Figures 4 and 5 demonstrate ourobservation.

Finally, Figures 6 and 7 show the scatter plots of thesefour features of 130 RGCs (we only show two such plotshere out of six possible scatter plots). The numbers inthe plots are the cluster numbers obtained by Coombs etal. [1] using the hierarchical clustering algorithm on the14 morphological features. From these figures, we canobserve that Cluster 6 RGCs separate themselves quitewell from the other RGC clusters. In fact, the sparse anddistributed dendrite patterns such as those in Clusters6 and 10 are located below the major axis of the pointclouds in Figure 6 and above the major axis of the pointclouds in Figure 7. These imply that the dendrite pat-

0 200 400 600 800 1000 12000

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

k

λ k

Fig. 3. A typical distribution of the Laplacian eigenvalues. RGC#100 in Cluster 6 was used for this figure.

−200 −100 0 100 200−250

−200

−150

−100

−50

0

50

100

150

200

250

X (µ m)

Y (

µ m

)

Fig. 4. The Laplacian eigenfunction of RGC #100 corresponding

to the eigenvalue λ1141 = 3.9994, immediately below the value4. Note that the support of this eigenfunction is semi-global, i.e.,covers one whole dendrite arbor.

terns belonging to Cluster 6 and 10 have smaller numberof spines and smaller Wiener indices compared to theother denser dendrite patterns such as Clusters 1 to 5.Also we observe that the feature variability of RGCs inClusters 7 and 8 are higher than the other clusters.

4. Discussion

Our results reported here are still preliminary. Thereare many things to be done. Among them, the mosturgent is to answer the following natural questions: 1)Among the features derivable by directly analyzing agraph (e.g., those 14 features used in [1]), which onescan be derived from the Laplacian eigenvalues and whichones cannot? 2) Among the features derivable by bothmethods, which ones can be derived more easily usingthe Laplacian eigenvalues than the direct graph analy-sis? For example, computing the isoperimetric number

– 15 –


−200 −100 0 100 200−250

−200

−150

−100

−50

0

50

100

150

200

250

X (µ m)

Y (

µ m

)

Fig. 5. The Laplacian eigenfunction of RGC #100 correspondingto the eigenvalue λ1142 = 4.3829, immediately above the value

4. Note that the support of this eigenfunction is localized aroundthe branching point.

−5.5 −5 −4.5 −4 −3.5 −3 −2.5 −2 −1.59

10

11

12

13

14

15

16

1

1

1

1

1

1

1

1

1

2

2

22

2

22

2

3

333

3

3

3

3

3333

3

3 3

3

33

44

4

4

4

4

4

4

5

55

5

5

5

5

5

5

5

6

666

6

6

6

6

6

7

7777

7

7

7

77

7

7

7

77

8

8 8

8

8

8

8

8

8

88

8

8

8

8

8

8

8

8

99

9 99

9

9

9

9

9

99

9

9

9

999

9

9

9

10

10

1010

10

10

10

10

10

10

10

10

10

p(G)−mG

(1) [normalized by #vertices] (log scale)

Wie

ner

Inde

x [n

orm

aliz

ed b

y #v

ertic

es] (

log

scal

e)

Fig. 6. A scatter plot of the normalized lower bounds of the num-ber of the pendant neighbors vs the normalized Wiener indices.

−5.5 −5 −4.5 −4 −3.5 −3 −2.5 −2−5.5

−5

−4.5

−4

−3.5

−3

−2.5

−2

1

1

1

1

1

1

1

1

1

2

2

222

22

2

3

3

33

3

3

3

3

333

3

3

3

3

3

33

444

4

4

4

44

5

5

55

5

5

5

55

5

66

66

66

6

6

6

7

77

7

7

77

7

77

7

7

7

778 8

8

8

8

8

8

88

88

8

8

8

8

8

88

8

9

9

99

99

9

9

9

99 99

9

99

99

9

9

9

10

10

10

10

10

10

10

10

10

10

10

10

10

mG

(4,∞) [normalized by #vertices] (log scale)

Upp

er B

ound

s of

Isop

erim

etric

Num

ber

(log

scal

e)

Fig. 7. A scatter plot of the normalized number of the eigen-values larger than 4 vs the upper bounds of the isoperimetric

numbers.

i(G) of a given graph G is an NP problem in terms ofnumber of the vertices [4], and yet we can estimate itsupper bound easily using the Laplacian eigenvalue asshown in (3).

Next, we also need to deepen our theoretical under-standing of the sudden behavior change (like a phasetransition) of the Laplacian eigenfunctions correspond-ing to the eigenvalues below and above 4 as demon-strated in Figures 3, 4, and 5. Note that this phe-nomenon occurs in each cell.

Another interesting thing we need to investigate is to“resample” the dendrite patterns so that each tree hasthe same number of vertices. If we can do so, then thereis no need to normalize the above features by |V (G)|,and we can really examine whether those features arereflecting topological information of the dendrite pat-terns rather than the number of vertices. This resam-pling, however, must be done very carefully (e.g., notskipping the vertices with degree other than 2) so thatwe do not change the topology of the patterns.

Yet another investigation should be to consider theDirichlet-Laplacian eigenvalue problems by explicitlyimposing the Dirichlet boundary condition on the termi-nal nodes of the trees, and then compare the eigenvalueswith those of the combinatorial Laplacians; see [2,4] formore about the Dirichlet-Laplacian eigenvalues.

Finally, analysis using the weighted graphs, as brieflymentioned in the beginning of Section 2, should be care-fully done. On one hand, the weighted graphs reflectmore physical reality of the RGCs, hence we can expectmore accurate results. On the other hand, analysis ofsuch graphs are expected to be tougher than combina-torial Laplacian used in this paper because for example,mG(1) among the different RGCs does not have the samemeaning anymore.

This is quite an interdisciplinary research project thattaps into extremely rich mathematical ideas. We hope toreport more results in the near future.

Acknowledgments

The authors would like to thank Prof. Leo Chalupaand Dr. Julie Coombs of UC Davis for providing uswith the dendrite datasets and answering many ques-tions. This research was partially supported by the USNational Science Foundation grant DMS-0410406, andthe US Office of Naval Research grant N00014-07-1-0166.

References

[1] J. Coombs, D. van der List, G.-Y. Wang and L. M. Chalupa,Morphological properties of mouse retinal ganglion cells, Neu-

roscience, 140 (2006), 123–136.[2] F. R. K. Chung, Spectral Graph Theory, CBMS Regional

Conference Series in Mathematics, No. 92, Amer. Math. Soc.,Providence, RI, 1997.

[3] R. Merris, Laplacian matrices of graphs: A survey, Lin. Alg.Appl., 197-198 (1994), 143–176.

[4] H. Urakawa, Spectral geometry and graph theory, Bull. JapanSIAM, 12 (2002), 29–45 (in Japanese).

– 16 –


The Gateau derivative of cost functions in the optimal

shape problems and the existence of the shape derivatives

of solutions of the Stokes problems

Satoshi Kaizu

Received November 14, 2008, Accepted December 16, 2008 (INVITED PAPER)

Abstract

In optimal shape problems the derivatives of costs with respect to shapes are important, be-cause it gives a direction of lower cost from an initial shape. The differentiability of costsstrongly depends on shape derivatives of solutions of mechanical problems, stationary lin-earized flow problems, the Stokes problems. The shape derivatives are usually given auto-matically by the associated material derivatives. We show the convergence of shape difference

quotients under sufficient conditions. These conditions are applied to the existence of theshape derivatives of the velocity and the pressure in the Stokes problems.

Keywords material derivative, shape derivative, Stokes problem

Research Activity Group Mathematical Design

1. Introduction

Let D be a bounded domain of Rd, d = 2, 3, havingLipschitz boundary Γ, S be a proper subdomain of Dsuch that Σ = ∂S, the closure S is also a proper subsetof D and let Ω = D \ S be a concerned domain. Someliquid with the velocity u = (ui)1≤i≤d and the pressurep is filled in Ω. Below we write the notation Ui,j = ∂Ui

∂xj.

Let U : D ∋ x 7→ U(x) ∈ Rd be preliminary given as

Ui,i = 0 in Ω,

U 6≡ 0 on Σ,

U ≡ 0 on Γ.

(1)

Let X = H1(Ω)d, X0 = H10 (Ω)d, M = q ∈

L2(Ω) |∫

Ωqdx = 0. We assume that this flow is de-

termined by the problem (P)Ω: find (u, p) in X × M ,such that

−∆ui + p,i = 0 in Ω,

ui,i = 0 in Ω,

u − U ∈ X0.

(2)

Using the solution (u, p) of (2) with help of other func-tion g : D × Rd × Rd ∋ (x, ξ, η) 7→ g(x, ξ, η) ∈ R weshall define a cost function J(Ω) = J(Ω, u, p) as follows.

J(Ω) =∫

Ωg(x, u,∇p)dx, (3)

where sufficient regularity of g is assumed. Examples ofsuch g are given by g = −uip,i(x) or g = fiui(x), x ∈ Ω,called the pressure loss and the compliance, respectively.Here in the latter g f = (fi)i is outer force. The aim isto give sufficient conditions of the regularity of domainsand domain perturbations, which show the existence theshape derivatives of the velocity and the pressure.

2. Domain perturbation

We differentiate the function J(Ω) in the sense ofGateau at the initial domain Ω = Ω0. We look for a

family Ωǫǫ, Ωǫ = Ωǫρ, having cost j(ǫ) = J(Ωǫ) lowerthan the initial cost j(0). Here, let ρ (= (ρi)1≤i≤d) ∈Ck,1(D)d. The space Ck,1(D)d is the totality ofρ having Lipschitz continuous k-th partial derivatives∂αρi, where α denotes a multi-index with |α| = k, cho-sen k = 0, 1, 2, with its support supp(ρ), the closure ofx ∈ D | ρ(x) 6= 0. We notice Ck,1(D)d ⊂ Xk+1 (=Hk+1(D)d) with dual X ′

k+1. Let

Ωǫ = Ωǫρ

= xǫ | xǫ = x + ǫρ(x), ∀x ∈ Ω0.The Gateau variation j′(0) is defined by

〈j′(0), ρ〉 = limǫ→+0

δj(ǫ)

ǫ,

if the limit of the right hand side exists for some ρ ∈Xk+1 with a certain k. If a Gateau variation j′ is re-garded as j′ ∈ X ′

k+1, then j′ is called the Gateau

derivative. A condition, j′(0) ∈ X ′

k+1(D), is important

for the traction method which determines an elementρ0 ∈ Xk+1, where ρ0 gives the lowest direction of costas an element Xk+1(D) uniquely. The traction methodbegins first by H. Azegami [1], and is applied to variousoptimal shape problems (see [2] and [3]).

After here the velocity and the pressure in (2) withΩ = Ωǫ are denoted by uǫ and pǫ respectively. In thenext section let Σ0 = ∂S0 (= ∂Ω0 \ ∂D). We denote byν the unit normal vector on ∂Ω0.

3. The Gateau derivative of costs

Let gu = ( ∂g∂ui

)1≤u1≤d and g∇p = ( ∂g∂p,i

)1≤i≤d. Under

sufficient regularity of g, ρ ∈ Ck,1(D)d and Ω0 we see

〈j′(0), ρ〉=∫

Σ0 g(x, u0, p0)ρ · νdΣ0

+∫

Ω0(gu(x, u0,∇p0) · u′

+ g∇p(x, u0,∇p0) · ∇p′)dx,(4)

– 17 –

JSIAM Letters Vol. 1 (2009) pp.17–20 Satoshi Kaizu

where u′ and p′ denote the shape derivatives of the ve-locity uǫ and the pressure pǫ defined by

F ′(x) = limǫ→+0

δF ǫ(x)

ǫ. (5)

Here F ǫ(x) = uǫi(x) or pǫ(x), δF ǫ(x) = F ǫ(x) − F 0(x).

The formula (4) is derived by a process below.

δj(ǫ) =∫

Ωǫ\Ω0 g(x, uǫ,∇pǫ)dx

−∫

Ω0\Ωǫ g(x, u0,∇p0)dx

+∫

ωǫ(g(x, uǫ,∇pǫ) − g(x, u0,∇p0))dx,

(6)

where ωǫ denotes Ω0 ∩ Ωǫ. The first term in the righthand side of (4) is obtained directly, after applying thewellknown limit formula to the sum of the first and thesecond one in the right hand side of (6). The remainingg term of (6) determines the last term of the right handside of (4) through the following equality

∫

ωǫ(g(x, uǫ,∇pǫ) − g(x, u0,∇p0))dx

= ǫ∫

ωǫ

∫ 1

0gu(x, u0 + tǫ δuǫ

ǫ,∇pǫ)dt · δuǫ

ǫdx

+ǫ∫

ωǫ

∫ 1

0g∇p(x, u0,∇p0 + tǫ ∇ δpǫ

ǫ)dt · ∇ δpǫ

ǫdx.

The above formula implies the identity (4) intrinsi-cally under some regularity of g and the convergence ofboth of δuǫ

ǫand ∇ δpǫ

ǫ.

4. Sufficient conditions for the existence

of the shape derivatives

The existence of the shape derivatives are stronglyconnected to the existence of the material derivatives ingeneral. The existences of the material derivatives andshape derivatives are derived by the convergences of ma-terial difference quotients and shape difference quotientsrespectively. Let F ǫ : D ∋ x 7→ F ǫ(x) ∈ R, for example,F ǫ = uǫ

i(x), pǫ(x). The term δF ǫ

ǫ(x) in (7) is called by

the shape difference quotient of F ǫ at x. Let

F ǫ(x) = F ǫ T ǫρ(x) = F ǫ(x + ǫρ(x)).

We define the material difference quotient by δF ǫ

ǫ(x).

The definitions of the material difference quotient andthe shape difference quotient imply directly

δF ǫ

ǫ(x) = δF ǫ

ǫ(x) + ρjF

0,j(x) + ρjGǫ

j(x),

Gǫj(x, t) = F ǫ

,j(x + tǫρ(x)) − F 0,j(x),

Gǫj(x) =

∫ 1

0Gǫ

j(x, t)dt.

(7)

The lemma below is shown using basically importantLebesgue theory with tiredness computations.

Lemma 1 Let k = 0, 1, 2. We assume

ρ ∈

Ck,1(D)d

,

F ǫ(x)strongly−−−−→ F 0(x) in Hk+1(Ω0).

(8)

Let ω be any domain such that the closure ω ⊂ Ω0. Then

ρjGǫj

strongly−−−−→ 0 in Hk(ω) as ǫ → 0.

Corollary 2 In Lemma 1 we further assume

∃F ∈ Hk(Ω0), δF ǫ

ǫ

weakly−−−−→ F in Hk(Ω0). (9)

Then

∃F ′ ∈ Hk(Ω0), δF ǫ

ǫ

weakly−−−−→ F ′ in Hk(ω). (10)

The condition on the convergence of shape differencequotients needs the strong convergence of δF ǫ to 0 inHk+1(Ω0) and the weak convergence of the material dif-ference quotients in Hk(Ω0).

5. The material derivatives of the veloc-

ity and the pressure

We assume

U ∈ Hk+1(D)d for some k = 0, 1, 2, (11)

ρ ∈

Ck,1(D)d

, (12)

Ω0 is of class Ck,1. (13)

Let Xǫk+1

= H1(Ωǫ)d,Xǫ0 = H1

0 (Ωǫ)d and M ǫ =

qǫ ∈ L2(Ωǫ) |∫

Ωǫ qǫdx = 0 and M ǫk = M ǫ ∩ Hk(Ωǫ).

We also write (vǫ1, v

ǫ2)ǫ =

∫

Ωǫ vǫ1v

ǫ2dx and (v1, v2) =

(v1, v2)0. Let (uǫ, pǫ) ∈ Xǫ1 × M ǫ be a pair of solutions

of the Stokes problem (P)Ωǫ :

uǫ − U ∈ Xǫ0,

(∇(uǫ − U)i,∇vǫi )ǫ − (vǫ

i,i, pǫ)ǫ = 0, ∀vǫ ∈ Xǫ

0,

((uǫ − U)i,i, qǫ)

ǫ= 0, ∀qǫ ∈ M ǫ.

(14)

Under (11), (12) and (13) (see Temam [4]), the problem(14) admits a unique (uǫ, pǫ) ∈ Xǫ

k+1×Hk(Ωǫ) such that

‖uǫ‖Xǫk+1

+ ‖pǫ‖Mǫk≤ Cǫ,k‖U‖Hk+1(D)d . (15)

A general method on the regulariy of solutions of ellipticsystems using an open covering Wii of Ω0 introducedin the paragraph below (28), is applied to (14) withǫ = 0. Since both T ǫ : Xǫ

k+1∋ vǫ(xǫ) 7→ vǫ(x) ∈ X0

k+1

and its inverse (T ǫ)−1 are continuous between X0k+1

andXǫ

k+1, we see the existence of ǫ1, which depends on con-

stants C0,k, C3 and C4 in (28) (see [5]), such that

Ck = sup0≤ǫ≤ǫ1

Cǫ,k < ∞. (16)

Let uǫ(x) = uǫ(xǫ), pǫ(x) = pǫ(xǫ) and bǫjk, bǫ

0, Rǫjk be

certain functions such that

bǫjk(x)

strongly−−−−→ − (ρj,k + ρk,j)(x) in L∞(Ω0),

bǫ0(x)

strongly−−−−→ ρi,i(x) in L∞(Ω0),

Rǫjk(x)

strongly−−−−→ − ρj,k(x) in L∞(Ω0).

(17)

Let M ǫ = q ∈ L2(Ω0) |∫

Ω0 q(x)(1 + ǫbǫ0(x))dx = 0

and U ǫ(x) = U(xǫ) also. Then the pair (uǫ, pǫ) satisfies

(uǫ − U ǫ, pǫ) ∈ Xǫ0 × M ǫ,

(∇(uǫ − U ǫ)i,∇vi) + ǫ(∇(uǫ − U ǫ)i · ∇vi, bǫ0)

+ǫ((uǫ − U ǫ)i,jvi,k, bǫjk) − (vi,i, p

ǫ(1 + ǫbǫ0))

−ǫ(vi,j Rǫji, p

ǫ(1 + ǫbǫ0)) = 0, ∀v ∈ X0,

((uǫ − U ǫ)i,i, q) + ǫ((uǫ − U ǫ)i,i, qbǫ0)

+ǫ((uǫ − U ǫ)i,j Rǫji, q(1 + ǫbǫ

0)) = 0, ∀q ∈ M ǫ.

(18)

– 18 –


Then estimates (15) and (16) imply that there exists aconstant C1 independent of ǫ ∈ (0, ǫ1] such that

‖uǫ‖Xk+1+ ‖pǫ‖

Mk≤ C1. (19)

Let X0 = X00 ,M = M0. Then (u0, p0) satisfies

(u0 − U, p0) ∈ X0 × M,

(∇(u0 − U)i,∇vi) − (vi,i, p0) = 0, ∀v ∈ X0,

((u0 − U)i,i, q) = 0, ∀q ∈ M.

(20)

Let δuǫ = uǫ − u0, δU ǫ = U ǫ −U and δpǫ = pǫ − p0. Wesubtract (20) from (18) and obtain

δ(uǫ − U ǫ) ∈ X0, (21)∫

Ω0 δpǫdx + ǫ∫

Ω0 pǫbǫ0dx = 0, (22)

(∇δ(uǫ − U ǫ)i,∇vi)

+ǫ(∇(uǫ − U ǫ)i · ∇vi, bǫ0)

+ǫ((uǫ − U ǫ)i,jvi,k, bǫjk)

−(vi,i, δ pǫ) − ǫ(vi,i, pǫ bǫ0)

−ǫ(vi,j Rǫji, p

ǫ(1 + ǫbǫ0)) = 0, ∀v ∈ X0,

(23)

(δ(uǫ − U ǫ)i,i, q) + ǫ((uǫ − U ǫ)i,i, bǫ0q)

+ǫ((uǫ − U ǫ)i,j Rǫji, q(1+ ǫbǫ

0)) = 0, ∀q ∈ M ǫ.(24)

The estimate (26) is implied by the lemma below.

Lemma 3 There exists a constant C0 such that, for

all q ∈ M ,

‖q‖M ≤ C0 supv 6=0,v∈X0

(vi,i,q)

‖∇v‖L2(Ω0)

. (25)

The relation (22) shows δpǫ + bǫ0p

ǫ ∈ M . The estimateon ‖δpǫ + bǫ

0pǫ‖M is implied by (23) through (25) just as

the proof of Theorem 4.1 of [6]. So the estimate ‖δpǫ‖M

reduces the estimate of ‖∇δ(uǫ − U)‖X0. Putting v =

δ(uǫ − U ǫ) into (23) with help of u0i,i = 0 and (24) (for

an exact description see [5]), the estimate (19) implies

‖δ(uǫ − U)‖X1+ ‖δpǫ‖L2(Ω0) = O(ǫ). (26)

This is an estimate for k = 0. For others k = 1, 2 withthe assumptions (11), (12) and (13) we obtain more es-timates of ‖δ(uǫ − U)‖Xk+1

and ‖δpǫ‖Hk(Ω0). A generalestimate of solutions of the Stokes problem, (21) to (24)(see, for example Proposition 2.3, p. 25 [4] or [5]) is de-scribed as

‖δ(uǫ − U)‖Xk+1+ ‖δ pǫ‖Hk(Ω0) ≤ C2ǫ, (27)

where C2 depends on C1 in (19), C3 and C4 below.

sup1≤i≤d,|α|≤k+1,x∈D

∂αρi(x) = C3 < ∞,

sup1≤i≤N

sup|α|≤k+1,x∈Zi

∂αφi(ξ) = C4 < ∞.(28)

Here, let Wi0≤i≤N and Zi1≤i≤N be two families oflocal coordinate open neighbourhoods and also open sets

in Rd−1 such that Ω0 ⊂ ⋃N

i=0Wi, Ω

0 \ ⋃N

i=1Wi ⊂ W0,

Wi ∩ ∂Ω0 = (ξ, φi(ξ)) ∈ Rd | Zi ∋ ξ 7→ φi(x) ∈ Rwith some functions φi(ξ)1≤i≤N . We show an exactdescription on the regularity of (uǫ, pǫ) in [5] using aboveWi, Zi. By (27) for any sequence of ǫm decreasingto 0 we have a subsequence ǫmn

such as un = uǫmn , pn =

pǫmn ,(

δun

ǫmn, δpn

ǫmn

)

weakly−−−−→ (w, r) in Xk+1 × Hk(Ω0).

Putting (δ(un − U), δpn) and v = ζ ∈ C∞

0 (Ω0)d intothose from (21) to (24), we divide them by ǫmn

and get

δ(un−U)

ǫmn∈ X0,

∫

Ω0

δpn

ǫmndx +

∫

Ω0 pnbǫmn0 dx = 0,

(∇ δ(un−U)i

ǫmn,∇vi) + (∇(un − U)i · ∇vi, b

ǫmn0 )

+((un − U)i,jvi,k, bǫmn

jk )

−(vi,i,δ pn

ǫmn) − (vi,i, b

ǫmn0 pn)

−(vi,j Rǫmnji , pn(1 + ǫmn

bǫmn0 )) = 0, ∀v ∈ X0,

(δ(un

−U)i,i

ǫmn, q) + ((un − U)i,i, b

ǫmn0 q)

+((un − U)i,j Rǫmnj,i , q(1 + ǫmn

bǫmn0 )) = 0,

∀q ∈ M ǫ.

(29)

Increasing n to ∞ in (29) with help of (17) implies

w − U ∈ X0,∫

Ω0 rdx +∫

Ω0 p0ρi,idx = 0,

(∇(w − U)i,∇ζi) + (∇(u0 − U)i · ∇ζi, ρi,i)−((u0 − U)i,jζi,k, ρj,k + ρk,j)−(vi,i, r) − (vi,i, ρi,ip

0)+(ζi,j ρj,i, p

0) = 0, ∀ζ ∈ C∞

0 (Ω0)d,

((w − U)i,i, η) + ((u0 − U)i,i, ρi,i η)−((u0 − U)i,j ρj,i, η) = 0, ∀η ∈ C∞

0 (Ω0).

(30)

Since the variational equalities (30) determine a pair(w, r) uniquely, a weak limit of pairs of any subse-quence ( δun

ǫmn, δun

ǫmn) becomes the same one. This means

( δuǫ

ǫ, δuǫ

ǫ) converges weakly in Xk+1 × Hk(Ω0). Hence

we get (w, r) = (u, p), because (w, r) is a limit of the

material difference quotients of ( δuǫ

ǫ, δpǫ

ǫ).

Theorem 4 We assume (11), (12) and (13) with k =0, 1, 2. Let (uǫ, pǫ) be a pair uniquely determined by the

problem (P)Ωǫ and set (uǫ, pǫ)(x) = (uǫ, pǫ)(xǫ), x ∈ Ω0.

Then we have(

δuǫ

ǫ, δpǫ

ǫ

)

weakly−−−−→ (u, p) in Xk+1 × Hk(Ω0). (31)

The pair (u, p) (∈ Xk+1×Hk) is uniquely determined by

the problem : find (u, p) ∈ X1 × L2(Ω0) such that

u − U ∈ X0,∫

Ω0 pdx +∫

Ω0 p0ρi,idx = 0, (32)

(∇(u − U)i,∇ζi) + (∇(u0 − U)i · ∇ζ, ρi,i)−((u0 − U)i,jζi,k, ρj,k + ρk,j)−(vi,i, p) − (vi,i, ρi,ip

0)+(ζi,j ρj,i, p

0) = 0, ∀ζ ∈ C∞

0 (Ω0)d,

(33)

((u − U)i,i, η) + ((u0 − U)i,i, ρi,i η )−((u0 − U)i,j ρj,i, η) = 0, ∀η ∈ C∞

0 (Ω0).(34)

6. The shape derivatives of the velocity

and the pressure

Let k = 1, 2. All the preparation for the convergenceof the shape difference quotients of the velocity and the

– 19 –


pressure are done till now and we notice that

(δuǫ, δpǫ)strongly−−−−→ 0 in Xk+1 × Hk(Ω0), (35)

( δuǫ

ǫ, δpǫ

ǫ)

strongly−−−−→ (u, p) in Xk × Hk−1(Ω0), (36)

as ǫ → +0, if all the assumptions in Theorem 4 aresatisfied. The identity between ( δuǫ

ǫ, δpǫ

ǫ) and ( δuǫ

ǫ, δpǫ

ǫ)

are combined by an equality (7). Let ω be any subdomainof Ω0 as in Lemma 1 and we write Xk(ω) = v|ω | ∀v ∈Xk. Applying Corollary 2 with F ǫ = uǫ

i , pǫ togetherwith (35) and (36), we get

( δuǫ

ǫ, δpǫ

ǫ)

strongly−−−−→ (u′, p′) in Xk(ω) × Hk−1(ω). (37)

In our setting we have

U ′ ≡ 0 in Ω. (38)

In the theorem, in spite of (38) the term U ′ is describedexplicitly, because a general form is seen in this case.

Theorem 5 Let k = 1, 2. We assume (11), (12) and

(13). Let ω be a any domain such that ω ⊂ Ω0. Then

( δuǫ

ǫ, δpǫ

ǫ)

strongly−−−−→ (u′, p′) in Xk(ω) × Hk−1(ω), (39)

as ǫ → +0. The shape derivative (u′, p′) is determined

uniquely as the solutions of the problem (Q)Ω0 : find

(u′, p′) ∈ X1 × L2(Ω0) such that

u′ − U ′ + ρj(u0 − U),j ∈ X0,

∫

Ω0(p′ + ρjp

0,j + p0ρi,i)dx = 0,

(∇(u′ − U ′)i,∇ζi)−(ζj,j , p

′) = 0, ∀ζ ∈ C∞

0 (Ω0)d,

((u′ − U ′)i,i, η) = 0, ∀η ∈ C∞

0 (Ω0).

(40)

Remark 6 The variational equalities in (40) are writ-

ten as

u′ − U ′ + ρj(u0 − U),j ∈ X0,

∫

Ω0(p′ + ρjp

0,j + p0ρi,i)dx = 0,

−∆(u′ − U ′) + ∇p′ = 0 in Ω,

(u′ − U ′)i,i = 0 in Ω for (u′, p′) ∈ X2 × H1(Ω0).

Proof Under the assumptions in Theorem 5 we have

(δuǫ

i

ǫ, δpǫ

ǫ) = (

δuǫi

ǫ, δpǫ

ǫ) + ρj(u

0i,j , p

0,j)

+(rǫ(uǫi), r

ǫ(pǫ)),(41)

δUǫi

ǫ=

δUǫi

ǫ+ ρjUi,j + rǫ(U ǫ

i ), (42)

(rǫ(uǫi), rǫ(pǫ))

strongly−−−−→ 0 in Xk(ω) × Hk−1(ω), (43)

rǫ(U ǫi )

strongly−−−−→ 0 in Xk(ω). (44)

First putting v = ζ ∈ C∞

0 (Ω0)d into (29), for smallenough ǫ2 such that, ∀ǫ ∈ (0, ǫ2], supp(ζ) ⊂ Ωǫ∩Ω0, afterputting (41) and (42) into (29), applying (43) and (44),using the integration by parts, we change each factor ofpartial derivatives such as ∂αζi to the factor ζi, after alot of tedious computation, we get a simple form below

−(∆(u′ − U ′)i, ζi) + (ζi, p′

,i) = 0.

The relations from (41) to (43) imply (u − U)i,i =(ρj(ui,j − Ui,j)),i + (u′ − U ′)i,i and then we obtain the

last equality of (40) with help of (u−U)i,i = 0. The firstone of (40) is given by the a definition, u′ = u − ρju,j

on Σ0 (just as (2.163) or (2.169) of [7]), shown to bejustified by the trace of u′. The second equality of (40)is implied directly.

(QED)

A general description on the material and shapederivatives are shown in [7] and [8]. In the latter onethe convergence of the material difference quotients isimplied by the regularity of associated functions assuredby the implicit function theorem (the proof of Propo-sition 2.82 of [7], for example). In this note, estimateson solutions of elliptic systems are directly and fully ap-plied to the convergence of material or shape differencequotients. Hence results are slightly different each other.The method of our paper can be applied to the behaviourof the solutions of the Poisson equation −∆uǫ = f in Ωǫ,uǫ = 0 on ∂Ωǫ. Propositions 2.82 and 2.83 in [7] showthat δuǫ

ǫ

strongly−−−−→ u in Hk(Ω0) ∩ H10 (Ω0), provided Ω0 is

of class Ck and ρ ∈ Ck(D) together with f ∈ H1(D)for k = 1, 2 (rewritten by our notation is done). Onthe other hand, by our method if Ω0 is of class Ck andρ ∈ Ck(D) together with f ∈ L2(D), then δuǫ

ǫ

weakly−−−−→ uin Hk(Ω0) ∩ H1

0 (Ω0).

References

[1] H. Azegami, A solution to domain optimization problems,

Trans. Japan Soc. Mech. Engs., Ser. A, 60 (1994), 1479–1486(in Japanese).

[2] H. Azegami and K. Takeuchi, A smoothing method for shapeoptimization: traction method using the Robin condition, Int.J. Comput. Meth., 31(1) (2006), 21–33.

[3] S. Kaizu and H. Azegami, Optimal shape problems and thetraction method, Trans.Japan Soc. Indust.Appl.Math., 16(3)

(2006), 277–290 (in Japanese).[4] R. Temam, Navier-Stokes Equations: Theory and Numerical

Analysis, AMS Chelsea Pub., 2001.[5] S. Kaizu, Sensitivity analysis of costs including both of the

velocity and the pressure and a finite element method for theStokes problems, in preparation.

[6] V. Girault and P. -A. Raviart, Finite Element Approximation

of the Navier-Stokes Equations, Lect. Notes Math., Vol.749,Springer-Verlag, Berlin, Heidelberg, New York, 1979.

[7] J. Sokolowski and J.-P.Zolesio, Introduction to Shape Op-timization: Shape Sensitivity Analysis, Spriger-Verlag, New

York, 1991.[8] J. Haslinger and R. A. E. Makinen, Introduction to Shape

Optimization: Theory, Approximation, and Computation,SIAM, Philadelphia, 2003.

– 20 –


On very accurate verification of solutions for boundary

value problems by using spectral methods

Mitsuhiro T. Nakao1 and Takehiko Kinoshita2

Faculty of Mathematics, Kyushu University, 33, Fukuoka 812-8581, Japan1

Graduate School of Mathematics, Kyushu University, 33, Fukuoka 812-8581, Japan2


Received October 31, 2008, Accepted December 13, 2008 (INVITED PAPER)

Abstract

In this paper, we consider a numerical verification method of solutions for nonlinear ellipticboundary value problems with very high accuracy. We derive a constructive error estimates forthe H1

0 -projection into polynomial spaces by using the property of the Legendre polynomials.On the other hand, the Galerkin approximation with higher degree polynomials enables us toget very small residual errors. Combining these results with existing verification procedures,several verification examples which confirm us the actual effectiveness of the method arepresented.

Keywords numerical verification, guaranteed error bound, spectral method

Research Activity Group Quality of Computations

1. Introduction

Spectral methods are well-known approximate tech-niques which achieve an arbitrary degree of accuracy incontrast to other methods such as finite difference orfinite element methods. On the other hand, in the nu-merical verification methods of solutions for boundaryvalue problems, e.g., [1, 2] etc., the smaller the residualerror, the finer the enclosure of exact solutions. There-fore, in the present paper, we formulate a method usingthe spectral technique with Legendre polynomials to geta highly accurate verification of solutions for nonlinearelliptic equations with Dirichlet boundary conditions.First, we derive some constructive a priori error esti-mates for the H1

0 -projection into polynomial spaces byusing the property of the Legendre polynomials, whichplays an essential role in our verification method. Next,describing briefly the verification method of solutions fornonlinear boundary value problems, we will present someverification results on the existence and local uniquenessof solutions for Emden’s equation. These results provethat the present method enables us to verify the solu-tions with very high accuracy which has not been at-tained up to now by other methods (e.g., [2, 3]).

2. Basis of H10 by Legendre polynomials

As well known the Legendre polynomials on Λ =(a, b) ⊂ R is defined as, for an arbitrary non-negativeinteger n,

Pn(x) :=(−1)n

n! |Λ|n(

d

dx

)n

(b − x)n(x − a)n, (1)

where |Λ| := b − a. Let Pn(Λ) denote the set of polyno-mials on Λ with degree ≤ n.

We define the set of homogeneous polynomials byP

1,0N (Λ) := uN ∈ PN (Λ) ; uN (a) = uN(b) = 0 which is

a subspace of H10 (Λ). Moreover, for ∀n ≥ 2, φn ∈ P

1,0n (Λ)

is defined by

φn(x) :=

√2n − 1

n(n − 1) |Λ|1/2(b − x)(x − a)P ′

n−1(x) (2)

or equivalently, by (1),

φn(x) =(−1)n

√2n − 1

(n − 1)! |Λ|n−1/2

(

d

dx

)n−2

(b−x)n−1(x−a)n−1.

Then, we have the following property.

Theorem 1 φnn≥2 ⊂ H10 (Λ) is a complete orthonor-

mal system in H10 (Λ), i.e.

(φm, φn)H1

0(Λ)

:= (φ′

m, φ′

n)L2(Λ)= δm,n, ∀m, n ≥ 2.

Proof (Orthogonality) For arbitrary m, n ≥ 2 then,by the well known property of Pn we have

(φm, φn)H1

0(Λ)

=cm,n(m(m−1)Pm−1, n(n−1)Pn−1)L2(Λ),

where cm,n := (−1)m+n√

(2m−1)(2n−1)/(m(m−1)n(n−1)|Λ|). Moreover, from the orthogonality of Pnn≥0, wehave

(φm, φn)H1

0(Λ)

= δm,n.

(Completeness) It suffices to show that, for arbitraryu ∈ H1

0 (Λ),

(u, φn)H1

0(Λ)

= 0, ∀n ≥ 2 =⇒ u = 0 in H10 (Λ).

From the definition, we have

(u, φn)H1

0(Λ)

= −√

2n − 1

|Λ|1/2(u′, Pn−1)L2(Λ)

, ∀n ≥ 2.

Therefore,

(u′, Pn)L2(Λ)= 0, ∀n ≥ 1.

– 21 –

JSIAM Letters Vol. 1 (2009) pp.21–24 Mitsuhiro T. Nakao et al.

It also holds that (u′, P0)L2(Λ)= 0. Since Pnn≥0 is

a complete orthogonal system in L2(Λ), we have u′ =0 in L2(Λ), which implies u = 0 in H1

0 (Λ).(QED)

Now, we define the H10 -projection π1,0

N : H10 (Λ) →

P1,0N (Λ) by

(

u − π1,0N u, vN

)

H1

0(Λ)

= 0, ∀vN ∈ P1,0N (Λ).

Owing to the complete orthogonality of φn, theoperator π1,0

N coincides with the truncation operator.Namely, we have, for arbitrary u ∈ H1

0 (Λ),

u =∞∑

n=2

anφn =⇒ π1,0N u =

N∑

n=2

anφn.

3. Constructive error estimates for H10 -

projection

Theorem 2 For arbitrary u ∈ H10 (Λ)∩H2(Λ), we have

∥

∥

∥u − π1,0N u

∥

∥

∥

H1

0(Λ)

≤ C(N) |u|H2(Λ), (3)

where the constant C(N) is defined as

C(N) =

|Λ| /√

2(2N − 1)(2N + 1), if N = 2, 3,

|Λ| /√

(2N + 1)(2N + 5), if N ≥ 4.

Proof For each u ∈ H10 (Λ) ∩ H2(Λ), we have the fol-

lowing expansion

u =

∞∑

n=2

anφn, an = (u, φn)H1

0(Λ)

. (4)

Here, the truncation operator π1,0N satisfies

π1,0N u =

N∑

n=2

anφn.

By the Parseval equality, we have

∥

∥

∥u − π1,0N u

∥

∥

∥

2

H1

0(Λ)

=

∥

∥

∥

∥

∥

∞∑

n=N+1

anφn

∥

∥

∥

∥

∥

2

H1

0(Λ)

=

∞∑

n=N+1

a2n. (5)

Next, u′′ ∈ L2(Λ) can be expanded by Pn as follows:

u′′ =

∞∑

n=0

bn

Pn

‖Pn‖L2(Λ)

, bn =

(

u′′,Pn

‖Pn‖L2(Λ)

)

L2(Λ)

.

(6)

Therefore, the Parseval equality implies

|u|2H2(Λ)= ‖u′′‖2

L2(Λ)=

∞∑

n=0

b2n. (7)

From the fact that φ′

n = −√

2n − 1 |Λ|−1/2Pn−1, we

have, for ∀n ≥ 2, by using well known properties of Pnan = (u′, φ′

n)L2(Λ)

= −√

2n − 1

|Λ|1/2(u′, Pn−1)L2(Λ)

= − |Λ|1/2

2√

2n − 1

(

u′, P ′

n − P ′

n−2

)

L2(Λ)

=|Λ|1/2

2√

2n− 1(u′′, Pn − Pn−2)L2(Λ)

=|Λ|1/2

2√

2n− 1

(

‖Pn‖L2(Λ)bn − ‖Pn−2‖L2(Λ)

bn−2

)

=:1√2αnbn − 1√

2βn−2bn−2.

Here, we define the constants αn, βn by

αn =|Λ|1/2 ‖Pn‖L2(Λ)√

2(2n − 1)=

|Λ|√

2(2n− 1)(2n + 1),

βn =|Λ|1/2 ‖Pn‖L2(Λ)√

2(2n + 3)=

|Λ|√

2(2n + 1)(2n + 3).

Then, each term in (5) is estimated as follows,

a2n =

1

2

(

αnbn − βn−2bn−2

)2

=1

2

(

α2nb2

n − 2αnbnβn−2bn−2 + β2n−2b

2n−2

)

≤ α2nb2

n + β2n−2b

2n−2.

From the above estimates and (5), we have the errorestimates∥

∥

∥u − π1,0N u

∥

∥

∥

2

H1

0(Λ)

=

∞∑

n=N+1

a2n

≤∞∑

n=N+1

(

α2nb2

n + β2n−2b

2n−2

)

= β2N−1b

2N−1 + β2

Nb2N +

∞∑

n=N+1

(

α2n + β2

n

)

b2n

≤ maxN+1≤n<∞

β2N−1, β2

N , α2n + β2

n

∞∑

n=N−1

b2n

≤ max

β2N−1, α2

N+1 + β2N+1

|u|2H2(Λ).

If N ≤ 7/2 then β2N−1 ≥ α2

N+1 + β2N+1. Therefore

C(N) = βN−1 = |Λ| /√

2(2N − 1)(2N + 1). If N ≥7/2 then β2

N−1 ≤ α2N+1 + β2

N+1. Therefore C(N) =√

α2N+1

+ β2N+1

= |Λ| /√

(2N + 1)(2N + 5).

(QED)

Theorem 3 For arbitrary u ∈ H10 (Λ), we have

∥

∥

∥u − π1,0N u

∥

∥

∥

L2(Λ)

≤ C(N)∥

∥

∥u − π1,0N u

∥

∥

∥

H1

0(Λ)

(8)

Here, C(N) is a constant same as in Theorem 2.

We omit the proof of (8), because it is almost thesame as the usual Aubin-Nitsche trick. For two or threedimensional domains like Λ1×· · ·×Λd, d = 2, 3, by usingthe tensor product of d times with one dimensional ba-sis φn, the problem reduces to one dimensional case.Namely, we obtain the same results in Theorems 2 and3 with the same constant C(N) for those domains.

– 22 –


4. Verification for elliptic boundary value

problems

In the below, we briefly describe the verification con-dition for nonlinear elliptic boundary value problemsbased on [2], which we applied for the actual verifica-tion in the present paper. Let Ω ⊂ R

d be a polygonal(polyhedral) domain, and let f : H1

0 (Ω) → L2(Ω) be aFrechet differentiable map. Consider the boundary valueproblem:

−u = f(u) in Ω, (9)

u = 0 on ∂Ω. (10)

Let SN be an m-dimensional subspace in H10∩H2(Ω) and

let uN ∈ SN be an appropriately approximate solutionof (9), (10). We set u = w + uN . Then, the residualequation of Newton type is given by

Lw := −w − f ′(uN )w = g(w) in Ω, (11)

w = 0 on ∂Ω, (12)

where g(w) = f(w + uN ) + uN − f ′(uN )w.If the operator L is invertible, then (11), (12) are

rewritten as the fixed point equation of the form w =L−1g(w) =: F (w) for a compact map F on H1

0 (Ω). Foran α > 0, define the set Wα ⊂ H1

0 (Ω) by

Wα :=

w ∈ H10 (Ω) ; ‖w‖H1

0(Ω)

≤ α

.

If F (Wα) ⊂ Wα then, by Schauder’s theorem there ex-ists a fixed point u in the set Wα which we call a candi-

date set. Furthermore, the local uniqueness condition ofsolutions on Wγ , for a γ > 0, is presented by

‖F (w1) − F (w2)‖H1

0

≤ k ‖w1 − w2‖H1

0

, ∀w1, w2 ∈ Wγ

(13)

for some 0 < k < 1.We now give an invertibility condition for the linear

operator L. Suppose that the linear operator L can berepresented as

Lw ≡ −w − f ′(uN )w

= −w + b · ∇w + cw,

where b ∈ W 1,∞(Ω)d, c ∈ L∞(Ω). Then we have thenext theorem

Theorem 4 ([2]) If the inequality

κ := C(N)(

C1M(N)K(N) + C2

)

< 1

holds, then L is invertible. In the above expression, we

have

C1 = ‖b‖L∞(Ω)d + Cp ‖c‖L∞(Ω),

C2 = ‖b‖L∞(Ω)d + C(N) ‖c‖L∞(Ω),

M(N) =∥

∥

∥DT/2G−1D1/2

∥

∥

∥

E,

K(N) = C(N)(

Cp ‖div b‖L∞ + ‖b‖L∞ + Cp ‖c‖L∞

)

,

Gi,j = (∇φj ,∇φi)L2 + (b · ∇φj , φi)L2 + (cφj , φi)L2 ,

Di,j = (∇φj ,∇φi)L2 ,

where G := (Gi,j), D := (Di,j) are m×m matrices, and

‖ · ‖E stands for the Euclidean norm of a matrix. Here,

Cp is a Poincare constant. The following estimate also

holds, which yields the norm of the inverse operator L−1.

That is, for arbitrary g ∈ L2(Ω),∥

∥L−1g∥

∥

H1

0(Ω)

≤ Cp ‖R‖1/2

E ‖g‖L2(Ω).

Here, the matrix R ∈ R2×2 is defined as

R = τ(

M(N)2(C

2

1C(N)

2+

`

1−C2C(N)´

2) symmetry.

M(N)(C1C(N)+`

1−C2C(N)´

M(N)K(N)) 1+M(N)2K(N)

2

)

,

where τ := 1/(1 − κ)2.

In general, from Theorem 4, the verification conditionF (Wα) ⊂ Wα reduces to some nonlinear inequality withrespect to the real parameter α. Furthermore, the localuniqueness condition is also represented by another kindof inequality in γ.

5. Numerical Example

We consider following Emden equation,

−u = u2 in Ω,

u = 0 on ∂Ω,

where Ω is a one dimensional interval (0, 1) or a rectan-gle (0, 1) × (0, 1) in two dimension. We define the finitedimensional space SN as P

1,0N (0, 1) or its tensor product

space P1,0N (0, 1)2 in H1

0 (Ω). Let φi be the basis of SN .In the below, we use the same symbols as before. First,we compute a numerical solution uN ∈ SN satisfying

(∇uN ,∇vN)L2(Ω)d =(

u2N , vN

)

L2(Ω), ∀vN ∈ SN , (14)

by using the usual Newton method with some appro-priate initial value. Note that, in the present case, it issufficient to compute the solution of the above nonlinearequation by the usual floating point arithmetic. Namely,it is not necessary to get the verified solution of (14). Thelinearized operator L is defined by Lw := −w−2uNw.Then, we compute each constant in Theorem 4 by us-ing guaranteed computation based on the interval arith-metic with Cp = 1/π. In this case, the verification con-dition of the existence of solution for a candidate set Wα

can be represented as the following quadratic inequalityin α.

Cp ‖R‖1/2

E

(

C24α2 +

∥

∥uN + u2N

∥

∥

L2

)

≤ α, (15)

where C4 is an embedding constant in Sobolev’s inequal-ity satisfying

‖u‖L4(Ω)≤ C4 ‖u‖H1

0(Ω)

, ∀u ∈ H10 (Ω).

From the inequality (15), one can find that the orderof magnitude for α is almost the same as the residualnorm.

On the other hand, the verification condition (13) ofthe uniqueness for a set Wγ can be given by the followinginequality in γ,

γ ≤ 1

2CpC24 ‖R‖1/2

E

. (16)

– 23 –


Here, the matrix R is same as in Theorem 4. After en-closing a solution in Wα, we can also obtain the L∞ aposteriori error estimates by using the explicit Sobolevinequality as below. Namely, for the error w := u − uN ,in one dimensional case, we have

‖u − uN‖L∞(Ω)≤ 1√

2‖u − uN‖H1

0(Ω)

≤ α√2.

And, in two dimensional case, it holds that by Plum’sestimate ([3])

‖w‖L∞ ≤ C∗

1 ‖∇w‖L2 + C∗

2 |w|H2

≤ C∗

1α + C∗

2 ‖w‖L2 .

Here, in the present case, constants C∗

1 and C∗

2 satisfyC∗

1 ≤ 2/√

3 and C∗

2 ≤√

14/5/6. Further, we have

‖w‖L2 =∥

∥2uNw + w2 + uN + u2N

∥

∥

L2

≤ 2Cpα ‖uN‖L∞ + C24α2 +

∥

∥uN + u2N

∥

∥

L2.

From the above estimates, it is seen that L∞ error is alsothe same order as the residual error. Table 1 shows the

Table 1. Verification results: one dimension

N ‖uN‖L∞

‖uN‖H

1

0

‚

‚u′′

N+ u2

N

‚

‚

L2 M(N)

2 11.6667 26.9431 5.18519E+01 1.00001

4 11.7178 25.6680 1.90865E+01 1.50865

8 11.7959 25.6254 6.13925E−01 1.66595

16 11.7967 25.6254 1.31699E−04 1.66667

24 11.7967 25.6254 1.28287E−08 1.66667

32 11.7967 25.6254 3.45912E−11 1.66669

N κ Existence Uniqueness ‖u − uN‖L∞

2 2.61658 —— —— ——

4 0.91785 Failed —— ——

8 0.32925 Failed —— ——

16 0.09631 8.33030E−05 2.73837 5.89042E−05

24 0.04529 7.41547E−09 2.99641 5.24353E−09

32 0.02622 1.93076E−11 3.10356 1.36525E−11

one dimensional verification results. These results arecomputed by using interval arithmetic with double pre-cision coded by INTLIB [4]. In the table, “——” meansno calculation due to the failure of the invertibility con-dition in Theorem 4. “Failed” means the verification con-dition (15) failed. The column for N in Table 1 stands forthe degree of polynomial. It turns out that the residualnorm decays with exponential order of N . If L is invert-ible, M(N) should be convergent to a certain constant.In the table, a bit of increasing of M(N) dependent onN would come from the influence by interval arithmeticcomputations. “Existence” means the smallest α whichsatisfies quadratic inequality (15) and “Uniqueness” thelargest γ satisfying (16). The L∞ error ‖u − uN‖L∞ isalmost same order as “Existence”. Table 2 shows the re-sult for two dimensional case using bi-N degree polyno-mials. By our numerical observation using floating pointarithmetic in double precision, M(N) was convergent to2.746811 · · · . However, in Table 2, this value tends toincrease as N . Actually, in the computational process,due to the accumulation of enclosing the rounding error,some unexpected enlargement of the width of intervalare caused, which brings the failure of verification, e.g.,for N = 40.

Quadrature rule. In the actual numerical compu-tations, in order to avoid the loss of significant digits

Table 2. Verification results: two dimension

N ‖uN‖L∞

‖uN‖H

1

0

‚

‚uN + u2

N

‚

‚

L2 M(N)

2 27.2223 64.9288 1.61353E+02 1.00001

4 28.3239 59.2653 8.75145E+01 2.15455

8 29.2334 58.8264 6.49785E+00 2.74009

16 29.2571 58.8259 7.61524E−03 2.74682

24 29.2571 58.8259 4.17816E−06 2.74719

32 29.2571 58.8259 1.70506E−08 3.19814

N κ Existence Uniqueness ‖u − uN‖L∞

2 11.8261 —— —— ——

4 6.47153 —— —— ——

8 2.82214 —— —— ——

16 0.82836 7.43965E−02 0.154966 1.39722E−00

24 0.38951 6.86979E−06 0.608198 1.32133E−04

32 0.26043 2.55080E−08 0.668438 4.92153E−07

due to the integration of higher degree polynomials, weeffectively used the Gauss-Legendre quadrature formulaon the interval Λ, satisfying, for each integer m ≥ 1,

∫

Λ

p(x) dx =

m∑

n=1

p(xn)wn, ∀p ∈ P2m−1(Λ). (17)

Here xn is the zero of Pm and wn is the weight at xn,which are computed with guaranteed accuracy.Computer environment. CPU: Intel Core2 Quad

Q6700, Memory: DDR2 8GB, OS: Ubuntu Linux 7.10AMD64, Compiler: Intel Fortran 10.1, LAPACK: ver-sion 3.1.1, BLAS: Goto BLAS 1.26, Interval arithmetic:INTLIB [4].

6. Conclusion

There are some existent verification results for thesame problem. In [2], the corresponding H1

0 error was4.1569×10−2 for the piecewise bi-quadratic C0 functionswith 400 elements and, in [3], the error bound in L∞

sense was 8.460×10−4 for the piecewise bi-quintic poly-nomials of C1-class with 64 elements. Therefore, by ourcomputational results, it was confirmed that the spec-tral methods enable us to get highly precise approxi-mation with guaranteed accuracy for Dirichlet problemswith reasonable computational costs. However, for thepresent, we could not completely overcome the errorpropagation in the computations of polynomials withhigher degree. It seems necessary to use some more pre-cise interval techniques based on multi-precision arith-metic or other efficient approaches such as [5].

References

[1] M.T. Nakao and Y. Watanabe, An efficient approach tothe numerical verification for solutions of elliptic differentialequations, Numerical Algorithms 37, Special issue for Proc.of SCAN2002 (2004), 311–323.

[2] M.T. Nakao, K. Hashimoto and Y. Watanabe, A numericalmethod to verify the invertibility of linear elliptic opera-tors with applications to nonlinear problems, Computing, 75(2005), 1–14.

[3] M. Plum, Numerical existence proofs and explicit bounds forsolutions of nonlinear elliptic boundary value problems, Com-puting 49 (1992), 25–44.

[4] R. Baker Kearfott, Algorithm 763: INTERVAL ARITH-METIC: A Fortran 90 module for an interval data type, ACMTrans. Math. Software, 22(4) (1996), 385–392.

[5] T. Ogita, S.M. Rump and S. Oishi, Accurate sum and dotproduct, SIAM J. Sci. Comput., 26 (2005), 1955–1988.

– 24 –


On oscillatory solutions

of the ultradiscrete Sine-Gordon equation

Shin Isojima1 and Junkichi Satsuma1

Department of Physics and Mathematics, College of Science and Engineering, Aoyama GakuinUniversity, 5-10-1 Fuchinobe, Sagamihara, Kanagawa, 229-8558 Japan1


Received December 6, 2008, Accepted February 21, 2009

Abstract

Exact solutions of the ultradiscrete Sine-Gordon equation which have oscillating structureare constructed. They are considered to be a counterpart of the breather solution of theSine-Gordon equation. They are given by setting specific parameters in the discrete solitonsolutions and ultradiscretizing the resulting solutions.

Keywords soliton, cellular automaton, Sine-Gordon equation, ultradiscrete system,breather solution


1. Introduction

Cellular automaton (CA) is a discrete dynamical sys-tem which consists of a regular array of cells. Each celltakes a finite number of states updated by a given rule indiscrete time steps. Although the updating rule is usu-ally simple, CAs may give very complex evolution pat-terns (see for example [1]). Moreover, CAs are suitablefor computer experiments since all variables take dis-crete values. Hence CAs may be good models to capturethe essential mechanisms for physical, social or biologicalphenomena by simple rules.

Ultradiscretization [2] is a procedure transforming agiven difference equation into a CA or an ultradiscretesystem. In general, to apply this procedure, we first re-place a dependent variable in a given equation xn witha new variable Xn by

xn = eXn/ε (1)

upon introduction of a parameter ε > 0. Then in thelimit ε ↓ 0, addition, multiplication and division of theoriginal variables are replaced with max, addition andsubtraction for the new ones, respectively. Note that xn

should be positive definite for (1) and that no generalway to cover subtraction in a discrete equation. In ad-dition to overcoming these difficulty, it is also an openproblem how to capture oscillatory phenomena in ultra-discrete systems. A partial answer is given in [3] and [4],in which ultradiscretization of the elliptic functions isdiscussed. The authors and coworkers reported an ultra-discrete analogue of the Airy function as the solution ofan initial value problem in [5].

It has already been reported that some ultradiscretesystems constructed from discrete soliton equations pos-sess soliton solutions similar to those of the discreteor corresponding continuous systems (see for example[2,6,7]). However, an ultradiscrete solution propagating

with oscillation, as the breather solution of the Sine-Gordon (SG) equation, has not been reported. In thisletter, we propose solutions of an ultradiscrete analogueof the SG (udSG) equation [8] which have oscillatingstructure. They are considered to be a counterpart ofthe breather solution. They are constructed by propersetting of parameters in the known discrete soliton solu-tions and ultradiscretizing the resulting solutions.

2. Ultradiscrete Sine-Gordon Equation

The SG equation, one of the well-known soliton equa-tions,

∂2ϕ

∂x∂t= sin ϕ (2)

is famous for possessing the breather solution, which de-scribes oscillatory phenomena and is given as the specialcase of the 2-soliton solution. Hirota proposed an inte-grable discrete analogue of the SG equation [9]

sin(φm+1

n+1 + φm−1n−1 − φm−1

n+1 − φm+1n−1

4

)

= δ2 sin(φm+1

n+1 + φm−1

n−1 + φm−1

n+1 + φm+1

n−1

4

)

(3)

through the bilinearizing technique. Note that this equa-tion also has the breather solution.

For the purpose of constructing an udSG equation,the authors and coworkers proposed another discrete SGequation [8]

∣

∣

∣

∣

∣

∣

∣

∣

(

1 − δ2)

um−1n−1 − 1

1 + δ2

um+1

n−1

− 1

1 + δ2

um−1

n+1

− 1(

1 − δ2)

um+1

n+1 − 1

∣

∣

∣

∣

∣

∣

∣

∣

= 0. (4)

– 25 –

JSIAM Letters Vol. 1 (2009) pp.25–27 Shin Isojima and Junkichi Satsuma

This equation is reduced to the trilinear form∣

∣

∣

∣

∣

∣

∣

(

1 − δ2)

τm−2n−2 τm

n−2

(

1 + δ2)

τm+2n−2

τm−2n τm

n τm+2n

(

1 + δ2)

τm−2

n+2 τmn+2

(

1 − δ2)

τm+2

n+2

∣

∣

∣

∣

∣

∣

∣

= 0 (5)

through the variable transformation

umn =

τm+1

n+1 τm−1

n−1

τm−1

n+1 τm+1

n−1

. (6)

If we set

δ = tanh

(

L

2ε

)

, τmn = eT m

n /ε, umn = eUm

n /ε (7)

and take the limit ε ↓ 0, we have the udSG equation forUm

n

max[

−|L|+ Um+1

n+1 + Um−1

n−1 , |L| − Um−1

n+1 , |L| − Um+1

n−1

]

= max[

|L| − Um−1n+1 − Um+1

n−1 , Um+1n+1 , Um−1

n−1

]

(8)

from (4) and for T mn

max[

− |L| + T m+2

n+2 + T mn + T m−2

n−2 ,

|L| + T m−2

n+2 + T m+2n + T m

n−2,

|L| + T mn+2 + T m−2

n + T m+2

n−2

]

= max[

|L| + T m−2

n+2 + T mn + T m+2

n−2 ,

T m+2n+2 + T m−2

n + T mn−2,

T mn+2 + T m+2

n + T m−2

n−2

]

(9)

from (5) and the relation between T mn and Um

n

Umn = T m+1

n+1 + T m−1

n−1 − T m−1

n+1 − T m+1

n−1 (10)

from (6). Refer to [8] for more detail about the udSGequation and its soliton solutions.

3. Oscillatory Solution

For the purpose of our discussion, we give the 2-solitonsolution of (5). Let pj , qj be parameters satisfying thedispersion relation

δ2(pj2 + 1)(qj

2 + 1) = (pj2 − 1)(qj

2 − 1) (11)

and aj be arbitrary phase constants. Phases xj and in-teraction factors bjk are defined by

xj = pjnqj

m, (12)

bjk =(pj

2 − pk2)2

((pjpk)2 − 1)2, (13)

respectively. In terms of these notations, the 2-solitonsolution is written as

τmn = 1 + a1x1 + a2x2 + a1a2b12x1x2. (14)

Now, we construct the 2-periodic solution by specificsetting of parameters in (14). Let us set

p2 = −p1, q2 = q1, a1 = α1 + α2, a2 = α2. (15)

Then (14) is reduced to

τmn =

1 + (α1 + 2α2)x1 (n : even),

1 + α1x1 (n : odd).(16)

The phase constant in (16) depends on whether n is aneven number or an odd number. This structure plays acrucial role for 2-periodic behaviour of the solution.

Let us ultradiscretize (16). First, we put

p1 = eP1/ε, q1 = eQ1/ε,

α1 = eA1/ε, α2 = eA2/ε (A1 < A2),(17)

and take the limit ε ↓ 0. Then we have the ultradiscreteanalogue of (16),

T mn =

max(0, P1n + Q1m + A2) (n : even),

max(0, P1n + Q1m + A1) (n : odd).(18)

Note that P1 and Q1 should satisfy the dispersion rela-tion

|P1 + Q1| = |L| + |P1 − Q1|, (19)

which is obtained by ultradiscretizing (11). Substitut-ing (18) into (10), we have Um

n solving (8). For generalparameters, the solution describes travelling pulse withoscillation. In order to emphasize its periodic behaviour,we set P = Q = |L|/2, which satisfy (19), and introducenew independent variables (k, l) by

n = k − l, m = k + l. (20)

Fig. 1–3 show behaviour of Umn for various values of pa-

rameters A1, A2. In all cases, the solution gives local-ized pulse for fixed time l. Each pulse is almost stableand its shape changes for l in period 2. Hence, this so-lution clearly describes oscillatory phenomena. Further-more, its behaviour is similar to that of the breathersolution.

l: even

æ æ

æ

æ

æ æ

-3 -2 -1 1 2k

1

2

Uk-lk+l

l: odd

æ æ

æ

æ æ æ

-3 -2 -1 1 2k

1

2

Uk-lk+l

Fig. 1. An example of oscillatory solution. L = 2, P1 = Q1 = 1,A1 = 1, A2 = 2.

l: even

æ æ

æ

æ æ

æ

æ æ

-5 -4 -3 -2 -1 1 2k

1Uk-l

k+ll: odd

æ æ æ

æ æ

æ æ æ

-5 -4 -3 -2 -1 1 2k

1Uk-l

k+l


– 26 –

JSIAM Letters Vol. 1 (2009) pp.25–27 Shin Isojima and Junkichi Satsuma

l: even

æ æ æ

æ

æ æ æ æ

æ

æ æ

-8-7-6-5-4-3-2-1 1 2k

1

2

Uk-lk+l

l: odd

æ æ æ æ æ æ æ

æ

æ æ æ

-8-7-6-5-4-3-2-1 1 2k

1

2

Uk-lk+l


For the sake of constructing the solution with richerstructure, we consider the 4-soliton solution

τmn = 1 +

4∑

j=1

ajxj +

4∑

j<k, j,k=1

ajakbjkxjxk

+

4∑

j<k<l, j,k,l=1

ajakalbjkbjlbklxjxkxl

+ a1a2a3a4b12b13b14b23b24b34x1x2x3x4. (21)

If we put (15) and

p4 = −p3, q4 = q3, a3 = α3 + α4, a4 = α4, (22)

then we have

τmn =

1 + (α1 + 2α2)x1 + (α3 + 2α4)x3

+ (α1 + 2α2)(α3 + 2α4)b13x1x3 (n : even),

1 + α1x1 + α3x3 + α1α3b13x1x3 (n : odd).

(23)

Moreover, setting

pj = ePj/ε, qj = eQj/ε,

αj = eAj/ε (A1 < A2, A3 < A4)(24)

and taking the limit ε ↓ 0, we have

T mn =

max[ 0, X1 + A2, X3 + A4,

X1 + X3 + A2 + A4

+ 2(|P1 − P3| − |P1 + P3|) ] (n : even),

max[ 0, X1 + A1, X3 + A3,

X1 + X3 + A1 + A3

+ 2(|P1 − P3| − |P1 + P3|) ] (n : odd).

(25)

The solution Umn constructed from (25) and (10) de-

scribes interaction among oscillating pulses. We considerthe specific case P1 = Q1 = |L|/2, P3 = Q3 = −|L|/2and introducing independent variables (k, l) defined by(20). We observe pulses which are almost stable andchange their shape in period 2 (see Fig. 4).

We can obtain a solution which describes larger num-bers of oscillating pulse by starting from the (2N)-soliton solution. We would, however, comment that wehave only two choices of P , Q such that P = Q and(19) holds, namely P = Q = ±|L|/2. Hence, the oscilla-tory solution constructed from the (2N)-soliton solutionmay be understood as nonlinear superposition of the so-lutions given in this section.

l: even

æ æ

æ

æ æ

æ

æ æ æ

æ

æ æ æ æ

æ

æ

-7-6-5-4-3-2-1 1 2 3 4 5 6 7 8k

1

2

Uk-lk+l

l: odd

æ æ æ

æ æ

æ æ æ æ æ

æ

æ æ æ æ æ

-7-6-5-4-3-2-1 1 2 3 4 5 6 7 8k

1

2

Uk-lk+l

Fig. 4. An example of oscillatory solution with richer structure.L = 2, P1 = Q1 = 1, P3 = Q3 = −1, A1 = 1, A2 = 5, A3 = 1,A4 = 10.

4. Concluding Remarks

We have given exact solutions of the udSG equationwhich describe oscillatory phenomena. They are consid-ered to be a counterpart of the breather solution. It isan interesting problem to construct oscillatory solutionsfor other ultradiscrete systems by applying the proce-dure developed in Section 3. We also comment that theperiod of oscillation of our solution is essentially 2 by itsconstruction. It is a future problem to find the ultradis-crete system having solutions with arbitrary periods.

References

[1] S. Wolfram, A New Kind of Science, Wolfram Media, Inc.,Champaign, 2002.

[2] T. Tokihiro, D. Takahashi, J. Matsukidaira and J. Sat-suma, From soliton equations to integrable cellular automatathrough a limiting procedure, Phys. Rev. Lett., 76 (1996),3247–3250.

[3] D. Takahashi, T. Tokihiro, B. Grammaticos, Y. Ohta and A.Ramani, Constructing solutions to the ultradiscrete Painleveequations, J. Phys. A: Math. Gen., 30 (1997), 7953–7966.

[4] A. Nobe, Ultradiscretization of elliptic functions and its ap-plications to integrable systems, J. Phys. A: Math. Gen., 39(2006), L335–L342.

[5] S. Isojima, B. Grammaticos, A. Ramani and J. Satsuma, Ul-tradiscretization without positivity, J. Phys. A: Math. Gen.,39 (2006), 3663–3672.

[6] J. Matsukidaira, J. Satsuma, D. Takahashi, T. Tokihiro andM. Torii, Toda-type cellular automaton and its N-soliton so-lution, Phys. Lett. A, 225 (1997), 287–295.

[7] M. Murata, S. Isojima, A. Nobe and J. Satsuma, Exact solu-tions for discrete and ultradiscrete modified KdV equationsand their relation to box-ball systems, J. Phys.A: Math.Gen.,39 (2006), L27–L34.

[8] S. Isojima, M. Murata, A. Nobe and J. Satsuma, An ultradis-cretization of the sine-Gordon equation, Phys. Lett. A, 331(2004), 378–386.

[9] R. Hirota, Nonlinear partial difference equations III; discretesine-Gordon equation, J. Phys. Soc. Japan, 43 (1977), 2079–2086.

– 27 –


Computational and Symbolic Anonymity

in an Unbounded Network

Hubert Comon-Lundh1,2

, Yusuke Kawamoto3

and Hideki Sakurada4

Ecole Normale Superieure de Cachan, 1

Research Center for Information Security, National Institute of Advanced Industrial Scienceand Technology, Akihabara-Daibiru Room 1102, 1–18–13, Sotokanda, Chiyoda-ku, Tokyo 101–0021, Japan2

Graduate School of Information Science and Technology, The University of Tokyo, 7–3–1Hongo, Bunkyo-ku, Tokyo 113–8656, Japan3

NTT Communication Science Laboratories, NTT Corporation, 3–1 Morinosato-Wakamiya,Atsugi, Kanagawa 243–0198, Japan4

E-mail h.comon-lundh at aist.go.jp

Received October 17, 2008, Accepted February 24, 2009 (INVITED PAPER)

Abstract

We provide a formal model for protocols using ring signatures and prove that this model iscomputationally sound: if there is an attack in the computational world, then there is anattack in the formal (abstract) model. Our original contribution is that we consider securityproperties, such as anonymity, which are not properties of a single execution trace, whileconsidering an unbounded number of sessions of the protocol.

Keywords computational soundness, security protocols, communicating processes, ring sig-natures

Research Activity Group Formal Approach to Information Security

1. Introduction

There are two main approaches to protocol security.The first approach considers an attacker modeled asa probabilistic polynomial time interactive Turing ma-chine (PPT) and the protocol is an unbounded numberof copies of PPTs. The attacker is assumed to controlthe network and can schedule the communications andsend fake messages. The security property is defined asan indistinguishability game: the protocol is secure if, forany attacker A, the probability that A gets an advan-tage in this game is negligible. A typical example is theanonymity property, by which an attacker should notbe able to distinguish between two networks in one ofwhich identities have been switched. The problem withsuch computational security notions is the difficulty inobtaining detailed proofs: they are in general unmanage-able, and cannot be verified by automatic tools.

The second approach relies on a formal model: bit-strings are abstracted by formal expressions (terms), theattacker is any formal process, and security properties,such as anonymity, can be expressed by the observationalequivalence of processes. This model is much simpler:there is no coin tossing, no complexity bounds, and theattacker is given only a fixed set of primitive operations(the function symbols in the term algebra). Therefore itis not surprising that security proofs become much sim-pler and can sometimes be automatized. However, thedrawback is that we might miss some attack becausethe model might be too rough.

Starting with the seminal work of Abadi and Rog-away [1], there have been several results showing the

computational soundness of the formal models: we donot miss any attacks when considering the abstractmodel, provided that the security primitives satisfy cer-tain properties; for instance IND-CPA or IND-CCA inthe case of encryption. Such results allow to perform for-mal symbolic proofs, while yielding computational se-curity guarantees. It is therefore an approach, that isrelevant, in principle, to all protocol security proofs.

The present paper is a contribution to this line ofresearch. Until recently, only a few security propertieswere considered in the soundness results. Roughly speak-ing, only passive attackers or the properties of executiontraces were considered. However, several properties, suchas anonymity, cannot be expressed as a property satis-fied by all execution traces. Here we consider an activeattacker and indistinguishability properties. In [1], theauthors only consider a passive attacker and encryptionschemes, while we are considering ring signatures andactive intruders: we cannot rely on their results in thepresent paper.

This problem has been discussed in two recent papers.In [2], we reported a soundness result for the anonymityof ring signatures. However, we assumed only a fixednumber of instances of the protocol, which is a strongsimplification. Furthermore, the symbolic model gavequite a lot of power to the attacker and the soundnessproof was dedicated to anonymity. In [3], there are nosuch restrictions, however the results are limited to sym-metric encryption, which does not provide any hint asregards an adequate formal model for ring signatures.

The current paper bridges these two recent studies: we

– 28 –

JSIAM Letters Vol. 1 (2009) pp.28–31 Hubert Comon-Lundh et al.

consider a formal model for ring signatures and prove thesoundness of observational equivalence for an unboundednumber of sessions.

2. Ring Signatures

The aim of a ring signature is to enable the verificationwithout revealing the signer’s identity with a group ofsigners.

A ring signature scheme RS = (G,S,V) consists oftwo probabilistic algorithms G and S, and a determinis-tic algorithm V:

• The key-generation algorithm G, given a securityparameter 1η, outputs a private signing key and apublic verification key.

• The signing algorithm S, given a signing key, a set ofverification keys and a message, outputs a signaturefor the message.

• The verification algorithm V, given a set of verifi-cation keys, a message, and a signature, outputs 0or 1.

If a signature is produced by S with keys generated byG, then the verification of the signature always succeeds.

We consider two security notions for ring signatureschemes: existential unforgeability and basic anonymity

[4]. A ring signature scheme RS is existentially unforge-

able if a signature cannot be forged without knowing thesigning key: for any PPT attacker A having access to anoracle O, the following probability is negligible in η:

Pr

(sk1, vk1)← G(1η); · · · ; (skn, vkn)← G(1η);Lleg:=vk1, . . . , vkn;(L,m, σ)← AO(1η,Lleg) :L ⊆ Lleg and V(L,m, σ) = 1 andfor any i with vki ∈ L, neither sign(i,L,m)nor corrupt(i) has been queried to O

where the oracle O returns σ ← S(ski,L,m) whenqueried with sign(i,L,m) and returns ski when queriedwith corrupt(i).RS is basically anonymous if the signer of a message

cannot be inferred: for any PPT attacker A having accessto an oracle O (as above), the following probability isnegligible in η:

Pr

(sk1, vk1)← G(1η); · · · ; (skn, vkn)← G(1η);Lleg:=vk1, . . . , vkn;(i0, i1,L,m, ω)← AO(1η,Lleg);

b$← 0, 1;

σ ← S(skib,L,m);

b′ ← AO(ω, σ);Neither corrupt(i0) nor corrupt(i1)has been queried to O :b = b′

−1

2.

In addition, we assume unpredictability, which meansthat no PPT attacker, even with the signing keys, canpredict the output of the signing algorithm. Unpre-dictability is also assumed in the soundness of symboliczero-knowledge proofs [5]. It is easily obtained by addingextra random bits to signatures.

3. Symbolic Model

We use a fragment [3] of the applied pi-calculus [6].Below, we only give the definitions related to ring signa-tures: for other constructions, refer to [3].

3.1 Terms, predicates and equational theory

The names are split into several disjoint sets:

• identities: K: we confuse the identities and the pri-vate signing keys held by those identities.

• random symbols: R• nonces: NThe set T of ground terms is obtained from the names

by applying the following function symbols, with somerestrictions on the types of their arguments.

• vk(k) constructs a verification key from a signingkey k ∈ K,

• 〈u, v〉 is a pair consisting of two terms u, v,

• check(u,VK ) checks the validity of a signature uw.r.t. a set of verification keys VK = vk(k1), . . . ,vk(kn),

• [u]rk,VK constructs a signature for u ∈ T witha signing key k ∈ K, verification keys VK =vk(k1), . . . , vk(kn) and randomness r ∈ R; twosignature terms with the same random symbol rmust be identical,

• RR(u, r) modifies the random number used in a sig-nature u, replacing it with r ∈ R,

• π1(u), π2(u) retrieve the components of a pair.

These function symbols satisfy certain equations,which we turn into rewrite rules:

π1(〈x, y〉) → xπ2(〈x, y〉) → y

check([x]ry,Z , Z) → x if vk(y) ∈ ZRR([x]ry,Z , r

′) → [x]r′

y,Z

.

This defines an (infinite) convergent term rewritingsystem on terms. The normal form of u is written as u↓.

We also introduce predicate symbols that reflect the(maximal) distinguishing capabilities of an attacker:

• M is the well-formedness predicate on ground terms:M(u) is true if u is in normal form and u does notcontain the symbols π1, π2, check, RR.

• EQ is the strict equality predicate: EQ(u, v) holdsif u = v and both terms are well-formed.

• SK is true on pairs of well-formed terms (k, [s]rk,V ):an attacker who knows a signing key can checkwhether that key is used for signing a given mes-sage.

3.2 Frames and static equivalence

A frame is a sequence of ground terms in whichsome names (typically secret keys) n are hidden: φ =νn.u1, . . . , uk. We let bn(φ) be n. The frames will recordthe sequences of messages sent over the network. Witheach frame φ = νn.s1, . . . , sm, we associate a substitu-tion σφ that replaces the variable xi with si.

– 29 –


A term s is deducible from a frame φ, which we writeas φ ⊢ s, if there is a term u with m variables, notusing the names hidden in φ and such that uσφ ↓= s.This captures the possible attacker’s computations on asequence of messages.

Two frames φ1, φ2 are equivalent, which is written asφ1 ∼ φ2, if, for any terms u, v (with m variables andnot using the names hidden by the frames), M(uσφ1

↓) holds iff M(uσφ2

↓) holds and, for P ∈ EQ,SK,P (uσφ1

↓, vσφ1↓) holds iff P (uσφ2

↓, vσφ2↓) holds. In

words: when we apply any combination of functions tothe two frames, the results always look similar.

Examples

νn,k, r, k′, r′.[n]r′

k′,V , [n]rk,V

∼ νn′, k, r, k′, r′.[n′]rk,V , [n′]r

′

k′,V

since the attacker can only observe an equality betweenthe two signed messages.

[n]r′

k,V 6∼ [n′]rk,V

as soon as n 6= n′ since, unlike the previous example,n, n′ are not hidden, and so can be used by the attacker:EQ(check(x, V ), n) holds on the first message and noton the second.

νk, νr.[u]rk,V , k 6∼ νk, νk′.νr.[u]rk,V , k′

since SK is true on the first sequence and not on thesecond.

3.3 Computation trees, symbolic equivalence

If φ is a frame, we letK(φ) be the set of keys deduciblefrom φ.

A computation tree is a tree whose nodes are labeledwith states (out of a setQ) and frames, and the edges are

labeled with terms. We write tu−→ t′ if there is an edge

labeled with u departing from the root of t and yieldingthe subtree t′. φ(t) is the frame labeling the root of tand q(t) is the state labeling the root of t.∼ is extended to computation trees: ∼ is the largest

equivalence relation on trees such that, if t1 ∼ t2, then

• φ(t1) ∼ φ(t2),

• if t1u1−→ t′1 then there exist u2, t

′

2 such that

t2u2−→ t′2 and t′1 ∼ t′2, and

• if t2u2−→ t′2 then there exist u1, t

′

1 such that

t1u1−→ t′1 and t′1 ∼ t′2.

3.4 Symbolic equivalence of reduced trees

For each sequence of verification keys, we let the firstnon-compromised key be its representative. When allsubterms [u]rk,VK of a frame φ are such that k is therepresentative of keys in VK , we say that φ is reduced.A computation tree is reduced if all the frames labelingits nodes are reduced.

Let ≃ be the equivalence relation on frames definedby: νn1.u1 ≃ νn2.u2 iff there are renamings ρ1 of n1 andρ2 of n2 such that ρ1(u1) = ρ2(u2). ≃ is extended tocomputation trees in the same way as ∼ was extended.

Lemma 1 Let t1, t2 be two reduced computation trees.

Then t1 ∼ t2 iff t1 ≃ t2.

3.5 Processes

A protocol is specified as a simple process, which isa parallel composition of processes that repeatedly re-ceive a message, test it, and send messages. Each testis specified by a conjunction of atomic predicates. Eachmessage is assumed to include its intended recipient.

Each process P in the calculus can be associated witha computation tree tP that records all possible interac-tions with the network: labels of edges are messages fromthe attacker and nodes are labeled with the state of thenetwork and the record of messages that have alreadybeen sent.

4. Computational Interpretation

4.1 Computational interpretation of terms

Given a security parameter η and an interpretationτ of names as bitstrings, a computational interpretation[[t]]τη of each term t is defined as in [3]. We assume that theinterpretation of a ring signature [u]rk,VK contains theinterpretations of u and VK in addition to the signaturebitstring ms: [[[u]rk,VK ]]τη = ([[u]]τη ,ms, [[VK ]]τη). We alsoassume that verification keys come with a certificate:the attacker cannot generate such keys oneself and mustget them from an authority.

4.2 Computational indistinguishability of computation

trees

Given a security parameter η and an interpretation τof names as bitstrings, we assume that there is a totalinjective parsing function κτ

η from bitstrings to terms.From injectivity, for every m, [[κτ

η(m)]]τη = m.Given a computation tree t and an assignment τ of

names to bitstrings, the oracle Ot,τ is defined as follows:

• When queried for the first time with a bitstring m,it returns [[φ(t′)]]τη if t

u−→ t′ and u = κτη(m).

• If there is no edge labeled with κτη(m) and departing

from the root of t, it returns an error message.

• After the first query, it behaves as Ot′,τ .

t1 and t2 are computationally indistinguishable, whichwe write as t1 ≈ t2, if, for any PPT AO,

∣

∣Pr[

τ : AOt1,τ (0η) = 1]

−Pr[

τ : AOt2,τ (0η) = 1]∣

∣

is negligible in the security parameter η.

4.3 Tree soundness

We consider trees without dynamic corruption. In sucha tree t, if ψ is labeling any node of t, we assume thatK(ψ) = K(φ(t)): corrupted keys are identical along allbranches of the tree.

Given a frame φ and a term u, ΨVK ,φ(u) is the termobtained by replacing signatures [s]rk,VK occurring in uwith [s]rk′,VK if k′ is a minimal element in k1 ∈ bn(φ) \K(φ) | vk(k1) ∈ VK. ΨVK is the function that mapseach frame φ to the frame in which all subterms u of φare replaced with ΨVK ,φ(u).

ΨVK is extended to computation trees as follows:φ(ΨVK (t)) = ΨVK (φ(t)), q(ΨVK (t)) = q(t) and, if

tu−→ t′, then ΨVK (t)

ΨVK ,φ(t)(u)−−−−−−−→ ΨVK (t′).

Note that all labels of edges departing from a nodein ΨVK (t) are distinct, as soon as it is the case for t,

– 30 –


because different random symbols must be used for dif-ferent signatures.

Lemma 2 For any computation tree t without dy-

namic corruption, and any set of verification keys VK,

ΨVK (t) ∼ t.Lemma 3 If t1 ≃ t2, then t1 ≈ t2.Lemma 4 Assuming basic anonymity, t ≈ ΨVK (t).

For this crucial lemma, we need to build a machineB, which breaks the basic anonymity, from a machine Athat distinguishes t and ΨVK (t). Roughly speaking, Bwill simulate the network, keeping the state in its mem-ory, and behave as A: when A sends a query m,B parsesm, computes the next state and obtains the symbolicreply u. Then B computes [[u]]τη , possibly sending re-quests to the signing oracle. When such a request wouldyield different answers depending on whether A interactswith t or ΨVK (t), then B requests a signed message andguesses the signer according to the guess of A.

Lemma 5 (Tree soundness for ring signatures)Assuming basic anonymity, if t, t′ are computation trees

without dynamic corruption such that t ∼ t′, then t ≈ t′.Proof sketch We successively apply ΨVK to all setsof verification keys VK occurring in the tree and applyLemmas 1–4.

(QED)

4.4 Trace mapping

We assume here that there is no occurrence of RR orSK in the protocol and that M can be implemented inPTIME. For any simple process P , security parameterη and random tape τ , [[P ]]τη is the computational inter-pretation of P . It behaves as P , except that it sends, re-ceives and compares bitstrings instead of terms. Given aPPT attacker A, a sample τ and the network [[P ]]τη , theexecution of A‖[[P ]]τη yields a (computational) messagesequence. Mesg(P, η, τ, A) is the set of messages that areproduced by either the agents or the attacker along theexecution of A‖[[P ]]τη . The execution of A‖[[P ]]τη is fully

abstracted by a path p of a computation tree t if themessage sequence is the computational interpretation ofthe sequence of symbolic messages in p.

We show that message sequences of A‖[[P ]]τη are fullyabstracted by some path of tP , with an overwhelmingprobability. First we identify the cases in which a com-putational trace cannot be fully abstracted:

Lemma 6 Let P be a simple process, A a PPT at-

tacker, η a security parameter and τ a random tape.

Then either of the following conditions holds:

• The execution of A‖[[P ]]τη is fully abstracted by some

path p in t.

• In the execution of A‖[[P ]]τη, the attacker A sends a

message m after receiving the messages in [[φ]]τη and

there is a message m1 polynomially computable from

m that is neither a pair nor a signature of a group

with a corrupted member and such that φ 6⊢ κτη(m1).

• There are a subterm v of κτη(Mesg(P, η, τ, A)) and

a verification key VK such that check(v,VK ) is in

normal form, while V([[v]]τη) = 1.

• There are a name k occurring in P and a term uoccurring as a subterm in κτ

η(Mesg(P, η, τ, A)) such

that τ(k) = [[u]]τη and k 6= u,

• There are a name k occurring in P and a term vk(u)occurring as a subterm in κτ

η(Mesg(P, η, τ, A)) such

that [[vk(k)]]τη = [[vk(u)]]τη and k 6= u,

• There are a term [u]rk,Z occurring as a subterm in

κτη(Mesg(P, η, τ, A)) and a term [u]r

′

k′,Z such that

[[[u]rk,Z ]]τη = [[[u]r′

k′,Z ]]τη and [u]rk,Z 6= [u]r′

k′,Z .

Lemma 7 Assume unforgeability and unpredictability.

Let P be a simple process, t its process computation tree,

and A a PPT attacker. With an overwhelming probabil-

ity over all samples τ , there is a path p in t that fully ab-

stracts the computational message sequence of A ‖ [[P ]]τη.

Proof sketch Assume that the probability is not over-whelming. One of the cases, except the first one, in theprevious lemma is true with a non-negligible probability.For each of the cases, we can construct a PPT attackerthat breaks either unforgeability or unpredictability bysimulating [[P ]]τη and calling A as a subroutine.

(QED)

5. Soundness of Observational Equiva-

lence

The anonymity of a protocol is specified by theequivalence between, for example, two simple processesP0(k0)‖P1(k1) and P0(k1)‖P1(k0) where k0 and k1 arethe identities (signing keys) of two agents. (We omit thedetails of how we publish vk(k0) and vk(k1).) The sym-bolic anonymity P0(k0)‖P1(k1) ∼ P0(k1)‖P1(k0), im-plies the computational anonymity [[P0(k0)‖P1(k1)]] ≈[[P0(k1)‖P1(k0)]] thanks to the soundness theorem be-low.

Theorem 1 Assume basic anonymity, unforgeability

and unpredictability. Let P and Q be simple processes

and A be a PPT attacker. If P ∼ Q, then [[P ]] ≈ [[Q]].

Proof sketch As shown in [3], P ∼ Q implies tP ∼ tQ.Then tP ≈ tQ follows from Lemma 5. Then [[P ]] ≈ [[Q]]follows from Lemma 7.

(QED)

References

[1] M. Abadi and P. Rogaway, Reconciling two views of cryptog-

raphy (the computational soundness of formal encryption), J.Cryptology, 15 (2) (2002), 103–127.

[2] Y. Kawamoto, H. Sakurada and M. Hagiya, Computationallysound symbolic anonymity of a ring signature, in: Proc. FCS-

ARSPA-WITS’08, pp. 161–175, 2008.[3] H. Comon-Lundh and V. Cortier, Computational soundness

of observational equivalence, in: Proc. CCS’08, pp. 109–118,ACM, 2008.

[4] A. Bender, J. Katz and R. Morselli, Ring signatures: Strongerdefinitions, and constructions without random oracles, in: The-ory of Cryptography, Proc. TCC’06, Lect. Notes Comp. Sci.,

pp. 60–79, Springer-Verlag, 2006.[5] M. Backes and D. Unruh, Computational soundness of sym-

bolic zero-knowledge proofs against active attackers, in: Proc.CSF’08, pp. 255–269, IEEE Computer Society, 2008.

[6] M. Abadi and C. Fournet, Mobile values, new names, and se-cure communication, in: Proc. POPL’01, pp. 104–115, ACM,

2001.

– 31 –


Reformulation of the Anderson method

using singular value decomposition

for stable convergence in self-consistent calculations

Akitaka Sawamura1

Sumitomo Electric Industries, Ltd., 1-1-3, Shimaya, Konohana-ku, Osaka 554-0024, Japan1


Received March 18, 2009, Accepted April 20, 2009

Abstract

The Anderson method provides a significant acceleration of convergence in solving nonlinearsimultaneous equations by trying to minimize the residual norm in a least-square sense ateach iteration step. In the present study I use singular value decomposition to reformulatethe Anderson method. The proposed version contains only a single parameter which shouldbe determined in a trial-and-error way, whereas the original one contains two. This reductionleads to stable convergence in real-world self-consistent electronic structure calculations.

Keywords nonlinear simultaneous equations, least-square method, the Broyden method,the Pulay method, electronic-structure calculations

Research Activity Group Algorithms for Matrix / Eigenvalue Problems and their Applications

1. Introduction

In the past few years first-principles calculations basedon density functional theory [1] have gained enormousinterest among solid-state physists, materials scientists,and quantum chemists. The Kohn-Sham equation [2],which plays a vital role within the density functionaltheory, is not only an eigenvalue problem, but also animplicitly defined, nonlinear fixed-point problem of in-terelecton potential [3–5] at least when local density ap-proximation [2] is introduced. In other words, the Kohn-Sham equation is solved when the self-consistent inter-electon potential is found. The Anderson method [6] isfrequently employed for this purpose. It should be notedthat the Pulay method [7] and limited-memory modifica-tions [8–11] of the second Broyden method [12] are essen-tially equivalent to the Anderson method [13], while thefirst Broyden method can also be cast into the limited-memory form [14,15].

Suppose that for a system of nonlinear equations~F (~x) = ~0, there are independent variable column vec-tors,

~xn, ~xn−1, . . . , ~xn−k ,

which are hopefully approaching a solution, and accom-panying residual column vectors,

~yn, ~yn−1, . . . , ~yn−k ,

where subscripts denote iteration steps. In a simple it-eration method, the independent vector at the (n+1)thiteration step is given by

~xn+1 = ~xn + α~yn, (1)

where α is a mixing factor ranging from a scalar toa preconditioning matrix [16, 17] to a nonlinear proce-

dure [18,19]. In the Anderson method, however, a virtualresidual vector,

~y ⋆n = ~yn +

∑

1≤ν≤k

γν

~yn−ν+1 − ~yn−ν

‖~yn−ν+1 − ~yn−ν‖, (2)

is introduced. Here γν are parameters to be so deter-mined that the virtual residual norm ‖~y ⋆

n‖ is minimizedin a least-square sense. Then an accompanying virtualindependent vector,

~x ⋆n = ~xn +

∑

1≤ν≤k

γν

~xn−ν+1 − ~xn−ν

‖~yn−ν+1 − ~yn−ν‖, (3)

is defined on assumption of linearity. ~x ⋆n is expected

to be a minimizer for∥

∥

∥

~F∥

∥

∥ within the available sub-

space ~xn, ~xn−1, . . . , ~xn−k. Last, the independent vari-able vector for the next step is predicted by applying thesimple iteration method to ~x ⋆

n and ~y ⋆n as

~xn+1 = ~x ⋆n + α~y ⋆

n . (4)

In practice, a specialized linear solver should be usedto determine the parameters γν reliably without encoun-tering numerical instability. This means that a maxi-mum condition number must be set for the linear solverbeforehand. Moreover a limit for the number of the pre-vious independent and residual vectors considered mustbe also set beforehand. Since the two parameters can-not be obtained a priori, they are determined in an adhoc way. In the present study I eliminate the latter byreformulating the Anderson method based on singularvalue decomposition (SVD) [20]. This makes applicationof the Anderson method a little easier. Furthermore, sta-ble convergence is achieved in the sense that the num-bers of iteration steps required by self-consistency are

– 32 –

JSIAM Letters Vol. 1 (2009) pp.32–35 Akitaka Sawamura

less sensitive with respect to the remaining parametersas confirmed by test calculations.

2. Conventional Method

For simplicity I define a rectangular matrix as

Yn =

~yn − ~yn−1

‖~yn − ~yn−1‖,

~yn−1 − ~yn−2

‖~yn−1 − ~yn−2‖,

. . . ,~yn−k+1 − ~yn−k

‖~yn−k+1 − ~yn−k‖

, (5)

and a column vector containing γν as

Γ =

γ1

γ2

...γk

. (6)

I omit the right-pointing arrow above Γ to emphasizethat in general Γ is different form xν and yν in the num-ber of rows. Using Yn and Γ, (2) is rewritten as

~y ⋆n = ~yn + YnΓ. (7)

The formal solution of Γ which minimizes ‖~y ⋆n‖ is given

by

Γ = −(

Y Tn Yn

)−1Y T

n ~yn. (8)

Determining Γ using (8) literally should be discour-aged, because of the potentially large condition numberof Y T

n Yn. Instead, Γ is computed in a following way.First, if the SVD is employed, Yn is factorized into

Yn = YnΣnV Tn , (9)

where Yn and Vn are matrices containing the left andright singular vectors of Yn, respectively, while Σn is adiagonal matrix of the singular values. Then a corre-sponding truncated factorization,

Yn ≈ Y n = YnΣnVT

n , (10)

is considered. Here Σn is a diagonal matrix of the l

largest singular values of Σn, while Yn and VT

n containthe l column vectors of Yn and the l row vectors of V T

n

corresponding to the l largest singular values, respec-tively. l, the effective rank of Yn, is the largest integerso determined that the condition number of Σn does notexceed the first predetermined limit smax. Of course, lcan be equal to k. Last, Γ is given by

Γ = −V nΣ−1

n YT

n~yn. (11)

At the next iteration step, Yn+1 may be set to be

Yn+1 =

~yn+1 − ~yn

‖~yn+1 − ~yn‖, Yn

=

~yn+1 − ~yn

‖~yn+1 − ~yn‖,

~yn − ~yn−1

‖~yn − ~yn−1‖,

. . . ,~yn−k+1 − ~yn−k

‖~yn−k+1 − ~yn−k‖

. (12)

The usual practice is, however, that if k has reached thesecond predetermined limit kmax, the rightmost (oldest)

column of the right-hand side of (12) is removed as

Yn+1 =

~yn+1 − ~yn

‖~yn+1 − ~yn‖,

~yn − ~yn−1

‖~yn − ~yn−1‖,

. . . ,~yn−k+2 − ~yn−k+1

‖~yn−k+2 − ~yn−k+1‖

, (13)

to avoid excessive growth.

3. Proposed Method

Along with (5), I define a rectangular matrix contain-ing the independent variable vectors as

Xn =

~xn − ~xn−1

‖~yn − ~yn−1‖,

~xn−1 − ~xn−2

‖~yn−1 − ~yn−2‖,

. . . ,~xn−k+1 − ~xn−k

‖~yn−k+1 − ~yn−k‖

. (14)

Since Yn = YnV nΣ−1

n holds, a similar quantity,

Xn = XnV nΣ−1

n , (15)

is introduced. ~x ⋆n and ~y ⋆

n are computed by working withXn and Yn as

~x ⋆n = ~xn + XnΓ′ (16)

and

~y ⋆n = ~yn + YnΓ′, (17)

respectively, where Γ′ is obtained by

Γ′ = −YT

n~yn. (18)

At the next iteration step, Xn+1 and Yn+1 are updatedby

Xn+1 =

~xn+1 − ~xn

‖~yn+1 − ~yn‖,Xn

(19)

and

Yn+1 =

~yn+1 − ~yn

‖~yn+1 − ~yn‖,Yn

, (20)

respectively. Xn and Yn consist of the l column vectorswhile by construction l ≤ k holds. Therefore Xn+1 andYn+1 are unlikely to fatten endlessly even if no limit isimposed. No column in Xn+1 and Yn+1 has to be dis-carded artificially.

Since Yn represents a numerically effective subspacespanned by Yn, replacing (12) with (20) makes little in-formation carried in Yn+1 be lost even in the case ofl < k. This no longer holds when the leftmost columnis discarded as in the case of (13). Nevertheless if kmax

is set too large with smax kept moderate in the con-ventional method, the predicted xn+1 may be contam-inated by excessively old xν and yν , because Γ in (11)is a minimum-norm least-square solution. Therefore theproposed method is expected to outperform the conven-tional one.

4. Test Calculations

The conventional and proposed methods are com-pared by applying them to first-principles calculationsfor wurzite ZnO based on plane-wave, pseudopotential

– 33 –


Table 1. Iterations required to reach self-consistency for wurzite ZnO with various maximum singular values smax and history datalimits kmax. Maximum numbers of history data reached in the proposed method are shown in parentheses. Note that lattice parametersand atomic positions in the unit cell are optimized.

α = 0.2

1/smax 3× 10−1 1× 10−1 3× 10−2 1× 10−2 3× 10−3 1× 10−3 3× 10−4 1× 10−4

kmax

5 74 67 65 49 51 48 50 50

10 74 54 56 46 47 42 42 45

20 55 62 51 47 47 42 47 43Conventional

40 56 62 54 65 56 48 46 44

Proposed 43(14) 42(14) 42(15) 45(20) 52(27) 42(28) 42(28) 42(28)

α = 0.4

1/smax 3× 10−1 1× 10−1 3× 10−2 1× 10−2 3× 10−3 1× 10−3 3× 10−4 1× 10−4

kmax

5 53 52 46 45 45 45 44 44

10 50 45 44 42 39 38 37 38

20 58 42 44 47 42 41 40 39Conventional

40 46 58 46 40 43 39 41 41

Proposed 38(12) 40(13) 36(15) 40(21) 40(23) 42(26) 41(27) 41(27)

α = 0.8

1/smax 3× 10−1 1× 10−1 3× 10−2 1× 10−2 3× 10−3 1× 10−3 3× 10−4 1× 10−4

kmax

5 39 33 33 31 32 31 32 3210 36 34 34 31 30 31 31 3120 35 35 37 35 33 32 34 34

Conventional

40 37 35 35 34 32 33 34 34

Proposed 32(6) 31(11) 32(16) 33(16) 34(20) 34(20) 34(20) 34(20)

α = 1.6

1/smax 3× 10−1 1× 10−1 3× 10−2 1× 10−2 3× 10−3 1× 10−3 3× 10−4 1× 10−4

kmax

5 100 59 50 46 45 45 45 4510 139 59 42 40 40 37 39 39

20 167 63 45 40 41 37 38 37Conventional

40 72 68 71 47 45 37 38 40

Proposed 41(13) 37(13) 38(18) 38(20) 39(23) 40(25) 40(26) 40(26)

approach [21, 22]. Lattice parameters and atomic posi-tions in the unit cell are also optimized. Remaining tech-nical details are explained elsewhere [23]. The mixingfactor α is chosen to be a scalar parameter.

The parameters and results are shown in Table 1. Forα = 0.8, both the methods have achieved fast conver-gence of iteration steps below 40. Clearly, however, theproposed method is the less sensitive to the selectionof α and smax. Almost always the self-consistency isreached within about 40 steps. In contrast, when theparameters are chosen poorly, for example at α = 1.6and 1/smax = 3 × 10−1, the conventional method re-quires more than 100 iteration steps depending on kmax.More importantly, finding the optimal kmax seems tobe difficult, because though the iteration steps increasewith kmax ≤ 20, the fastest convergence is achieved atkmax = 40. As a whole whereas the larger smax is de-sirable for the conventional method, a guiding principlefor kmax is unclear. Table 1 shows also the maximum lreached within the proposed method. These values mightbe taken as the best kmax for the conventional method.At kmax near these values, however, the conventionalmethod does not necessarily show the comparable per-formance of the proposed one. This is likely because dis-carding the oldest column is not the best strategy to

keep Xn and Yn from excessive growth as pointed out inthe previous section.

5. Conclusion

Reformulation of the Anderson method for a systemof nonlinear equations has been described. The Ander-son method in practice requires two empirical parame-ters commanding to what extent stably the least-squareproblem appearing at each iteration step is solved andhow many vectors containing the convergence historyinformation are retained. In the proposed method theSVD is used to extract the effective information fromthe history vectors, rather than as a black-box tool forsolving the least-square problem. The extracted vectorsare chosen to play a role of storage space for the historyinformation. Thereby the latter empirical parameter isno more needed. This makes the proposed method bethe less sensitive to the selection of the remaining pa-rameter and the mixing factor and the more efficientbecause of a smarter way of discarding a redundant partof the history information, as supported by the stableconvergence in the test calculations.

– 34 –


References

[1] P. Hohenberg and W. Kohn, Inhomogeneous electron gas,Phys. Rev. 136 (1964), B864–B871.

[2] W. Kohn and L. J. Sham, Self-consistent equations includ-ing exchange and correlation effects, Phys. Rev., 140 (1965),A1133–A1138.

[3] P. Bendt and A. Zunger, New approach for solving the

density-functional self-consistent-field problem, Phys. Rev. B,26 (1982), 3114–3137.

[4] J. F. Annett, Efficiency of algorithms for Kohn-Sham densityfunctional theory, Comput. Mater. Sci., 4 (1995), 23–42.

[5] X. Gonze, Toward a potential-based conjugate gradient al-gorithm for order-N self-consistent total energy calculations,Phys. Rev. B, 54 (1996), 4383–4386.

[6] D. G. Anderson, Iterative procedures for nonlinear integralequations, J. Assoc. Comput. Mach., 12 (1965), 547–560.

[7] P. Pulay, Convergence acceleration of iterative sequence: thecase of SCF iteration, Chem. Phys. Lett., 73 (1980), 393–398.

[8] G. P. Srivastava, Broyden’s method for self-consistent fieldconvergence acceleration, J. Phys. A: Math. Gen., 17 (1984),L317–L321.

[9] D.Vanderbilt and S.G.Louie, Total energies of diamond (111)surface reconstructions by a linear combination of atomic or-bitals method, Phys. Rev. B, 30 (1984), 6118–6130.

[10] D. D. Johnson, Modified Broyden’s method for accelerating

convergence in self-consistent calculations, Phys. Rev. B, 38(1988), 12807–12813.

[11] A. Sawamura, M. Kohyama, T. Keishi, M. Kaji, Accelerationof self-consistent electronic-structure calculations: Storage-

saving and multiple-secant implementation of the Broydenmethod, Mater. Trans., JIM, 40 (1999), 1186–1192.

[12] C. G. Broyden, A class of methods for solving nonlinear si-

multaneous equations, Math. Comput., 19 (1965), 577–593.[13] V. Eyert, A comparative study on methods for convergence

acceleration of iterative vector sequences, J. Comput. Phys.,124 (1996), 271–285.

[14] M. S. Engelman, G. Strang and K. -J. Bathe, The applicationof quasi-Newton method in fluid mechanics, Intern. J. Numer.Meth. Eng., 17 (1981), 707–718.

[15] R. H. Byrd, J. Nocedal and R. B. Schnabel, Representationsof quasi-Newton matrices and their use in limited memorymethods, Math. Progm., 63 (1994), 129–156.

[16] G. P. Kerker, Efficient iteration scheme for self-consistent

pseudopotential calculations, Phys. Rev. B, 23 (1981), 3082–3084.

[17] K. -M. Ho, J. Ihm and J. D. Joannopoulos, Dielectric ma-trix scheme for fast convergence in self-consistent electronic-

structure calculations, Phys. Rev. B, 25 (1982), 4260–4262.[18] D. Raczkowski, A. Canning and L. W. Wang, Thomas-Fermi

charge mixing for obtaining self-consistency in density func-

tional calculations, Phys. Rev. B, 64 (2001), 121101-1–121101-4.

[19] A. Sawamura and M. Kohyama, A second-variational predic-tion operator for fast convergence in self-consistent electronic

structure calculations, Mater. Trans., JIM, 45 (2004), 1422–1428.

[20] G.H.Golub and C.F.van Loan, Matrix computations, Second

edition, John Hopkins Univ. Press, London, 1989.[21] J. Ihm, A. Zunger and M. L. Cohen, Momentum-space for-

malism for the total energy of solids, J. Phys. C: Solid StatePhys., 12 (1979), 4409–4422.

[22] W. E. Pickett, Pseudopotential methods in condensed matterapplications, Comput. Phys. Rep., 9 (1989), 115–197.

[23] A.Sawamura, M.Kohyama and T.Keishi, An efficient precon-ditioning scheme for plane-wave-based electronic structure

calculations, Comput. Mater. Sci., 14 (1999), 4–7.

– 35 –


On the qd-type discrete hungry Lotka-Volterra system

and its application to the matrix eigenvalue algorithm

Akiko Fukuda1, Emiko Ishiwata

2, Masashi Iwasaki

3and Yoshimasa Nakamura

4

Graduate School of Science, Tokyo University of Science, 1-3 Kagurazaka, Shinjuku-ku, Tokyo162-8601, Japan1

Department of Mathematical Information Science, Tokyo University of Science, 1-3 Kagu-razaka, Shinjuku-ku, Tokyo 162-8601, Japan2

Faculty of Life and Environmental Sciences, Kyoto Prefectural University, 1-5 Nagaragi-cho,Shimogamo, Sakyo-ku , Kyoto 606-8522, Japan3

Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto 606-8501, Japan4


Received December 22, 2008, Accepted May 25, 2009

Abstract

The discrete hungry Lotka-Volterra (dhLV) system is already shown to be applied to matrixeigenvalue algorithm. In this paper, we discuss a form of the dhLV system named as theqd-type dhLV system and associate it with a matrix eigenvalue computation. Along a waysimilar to the dqd algorithm, we also design a new algorithm without cancellation in terms ofthe qd-type dhLV system.

Keywords discrete hungry Lotka-Volterra system, dqd algorithm, matrix eigenvalue


1. Introduction

Integrable systems have some relationships to numer-ical algorithms. For example, the continuous time Todaequation corresponds to one step of the QR algorithm [1]for computing eigenvalues of a symmetric tridiagonalmatrix. A discretization of the Toda equation is justthe quotient difference (qd) algorithm [2]. The discreteToda (dToda) equation also leads to a new algorithm forthe Laplace transformation [3]. The discrete relativisticToda equation is applicable for continued fraction ex-pansion [4].

Some of the authors designed new algorithms namedthe dLV algorithm for computing singular values of abidiagonal matrix in terms of the integrable discreteLotka-Volterra (dLV) system [5]. For k = 1, 2, . . . , 2m−1and n = 0, 1, . . . ,

u(n+1)

k (1 + δ(n+1)u(n+1)

k−1) = u

(n)

k (1 + δ(n)u(n)

k+1), (1)

u(n)

0 ≡ 0, u(n)

2m ≡ 0,

where δ(n) is the n-th discrete step-size and u(n)

k de-notes the number of k-th species at the discrete time∑n−1

j=0δ(j). It is shown in [6] that u

(n)

2k−1and u

(n)

2k con-verge to certain positive constant and zero, respectively,as n → ∞. The dLV algorithm is also surveyed in arecent review paper [7].

Now we introduce new variables

q(n)

k :=1

δ(n)(1 + δ(n)u

(n)

2k−2)(1 + δ(n)u

(n)

2k−1), (2)

e(n)

k := δ(n)u(n)

2k−1u

(n)

2k . (3)

Then, the dLV system (1) yields the recursion formulaof the qd algorithm

q(n+1)

k+1= q

(n)

k+1− e

(n+1)

k + e(n)

k+1,

e(n+1)

k = e(n)

k

q(n)

k+1

q(n+1)

k

.(4)

As mentioned above, this recursion formula is equivalentto the dToda equation. Namely, the dLV system has arelationship to the dToda equation. Rutishauser intro-duced a modified version, named the dqd (differentialqd) algorithm [2], for the purpose of avoiding numericalinstability of qd algorithm.

Recently, in [8,9], we designed a new algorithm namedthe dhLV algorithm for computing complex eigenval-ues of a certain band matrix. The dhLV algorithmis derived from the integrable discrete hungry Lotka-Volterra (dhLV) system [10]. For k = 1, 2, . . . ,Mm andn = 0, 1, . . . ,

u(n+1)

k

M∏

j=1

(1+δ(n+1)u(n+1)

k−j ) = u(n)

k

M∏

j=1

(1+δ(n)u(n)

k+j), (5)

u(n)

1−M ≡ 0, . . . , u(n)

0 ≡ 0, u(n)

Mm+1≡ 0, . . . , u

(n)

Mm+M ≡ 0,

where Mk := (k−1)M+k, and the meaning of u(n)

k is thesame as that of the dLV system. The dLV system (1) isa prey-predator model that the k-th species is predatorof the (k + 1)-th species. On the other hand, the dhLVsystem (5) is derived by considering the case where thek-th species is predator of the (k + 1)-th, (k + 2)-th,. . . , (k + M)-th species. Of course, if M = 1 then (5)

– 36 –

JSIAM Letters Vol. 1 (2009) pp.36–39 Akiko Fukuda et al.

coincides with (1).In this paper, we discuss a new algorithm for comput-

ing matrix eigenvalues from a viewpoint of the qd-typedhLV system based on (5). See Section 3 for the qd-typedhLV system. Along a way similar to the dqd algorithm,we derive a recursion formula without subtraction.

This paper is organized as follows. In Section 2, we de-scribe some properties of the dhLV system. In Section 3,we show two invariants of the qd-type dhLV system. Weclarify a relationship between the qd-type dhLV systemand the matrix eigenvalue algorithm in Section 4. Wedesign an algorithm for computing eigenvalues withoutcancellation and demonstrate a numerical example. Inthe final section, we give concluding remarks.

2. Some properties for the dhLV system

In this section, we explain some properties for thedhLV system briefly. The matrix representation of (5)is given as

R(n)L(n+1) = L(n)R(n), (6)

L(n) := (e2, . . . , eM+1, U(n)

1 e1 + eM+2, . . . ,

U(n)

Mm−1eMm−1 + eMm+M , U

(n)

MmeMm

), (7)

R(n) := (V(n)

1 e1 + δ(n)eM+2, . . . ,

V(n)

Mm−1eMm−1 + δ(n)

eMm+M , . . . ,

V(n)

MmeMm

, . . . , V(n)

Mm+MeMm+M ), (8)

ek := ( 0, . . . , 0, 1︸︷︷︸

k

, 0, . . . , 0 )⊤, (9)

U(n)

k := u(n)

k

M∏

j=1

(1 + δ(n)u(n)

k−j), (10)

V(n)

k :=

M∏

j=0

(1 + δ(n)u(n)

k−j). (11)

Eq. (6) is called Lax form of the dhLV system (5), cf. [11,

12]. Assume that 0 < u(0)

k < K0 for k = 1, 2, . . . ,Mm,

then we have 0 < u(n)

k < K as is shown in [8, 9], whereK0 is an arbitrary and K is a related positive constant.

In (11), if δ(n) > 0 holds for n = 0, 1, . . . , then V(n)

k ≥ 1holds for k = 1, 2, . . . ,Mm + M in the Lax matrix (8).Hence, there exists the inverse matrix of R(n), and (6)can be rewritten as

L(n+1) = (R(n))−1L(n)R(n). (12)

This is a similarity transformation from L(n) to L(n+1).Namely, the eigenvalues of L(n) are invariant under theevolution from n to n + 1. Moreover, the eigenvalues ofL(n)+dI are invariant for any n, where I is a unit matrixand d is an arbitrary constant.

The asymptotic behavior of the dhLV system is asfollows.

limn→∞

u(n)

Mk= ck, k = 1, 2, . . . ,m, (13)

limn→∞

u(n)

Mk+p = 0, p = 1, 2, . . . ,M. (14)

See [9] for the proof of (13) and (14). By combining (10)and (11) with (13) and (14), it is obvious that the limits

of U(n)

k and V(n)

k also exist. As n → ∞, the Lax matrixL(n) + dI converges to

L(d) := limn→∞

(L(n) + dI)

=

L1(d) 0EM L2(d)

. . .. . .

0 EM Lm(d)

, (15)

where Lk(d) and EM are (M+1)×(M+1) block matricesdefined by

Lk(d) :=

d ck

1 d. . .

. . .

0 1 d

, EM :=

0 · · · 0 1

00...

0

.

It is of significance to note that L(n) +dI can be dividedinto several block matrices. The characteristic polyno-mial of L(d) is given as

det(λI − L(d)) =

m∏

k=1

(λ − d)M+1 − ck

.

Therefore, we obtain the eigenvalues λk,l of L(0) + dI asfollows.

λk,l = M+1√

ck

cos

(

2lπ

M + 1

)

+ i sin

(

2lπ

M + 1

)

+ d,

l = 1, 2, . . . ,M + 1, k = 1, 2, . . . ,m,

where i =√−1. For a sufficiently large n, λk,l becomes

the approximate value of the eigenvalues of L(0) +dI. Asa result, the dhLV algorithm is designed for computingeigenvalues of L(0) + dI in [8, 9].

3. Invariants of the qd-type dhLV system

In this section, we investigate some properties of arecursion formula derived from the Lax form (6).

By comparing the both sides of (6), the variables U(n)

k

in (7) and V(n)

k in (8) satisfy the following relations

δ(n)U(n+1)

k + V(n)

k+M+1= δ(n)U

(n)

k+M+1+ V

(n)

k+M , (16)

V(n)

k U(n+1)

k = U(n)

k V(n)

k+M , k = 1, 2, . . . ,Mm. (17)

We call (16) and (17) the qd-type dhLV system. Let ushere impose the boundary condition

V(n)

k−M ≡ 1, U(n)

k−M ≡ 0, k = 0, 1, . . . ,M,

V(n)

Mm+M+k ≡ 1, U(n)

Mm+k ≡ 0, k = 1, 2, . . . ,M.

The existence of invariants is one of characteristicproperties in integrable systems. Now we give two propo-sitions concerning invariants independent of the discretevariable n.

Proposition 1 Variable U(n)

k satisfies

Mm∑

k=1

U(n+1)

k =

Mm∑

k=1

U(n)

k . (18)

– 37 –


Proof Taking a sum on both sides of (16) for k =−M,−M + 1, . . . ,Mm, as

Mm∑

k=−M

(δ(n)U(n+1)

k + V(n)

k+M+1)

=

Mm∑

k=−M

(

δ(n)U(n)

k+M+1+ V

(n)

k+M

)

.

Let us expand the above equation and substitute theboundary condition, then we have (18).

(QED)

Proposition 2 Variable U(n)

k satisfies

m∏

k=1

U(n+1)

Mk=

m∏

k=1

U(n)

Mk. (19)

Proof Let us recall that, in (12), L(n+1) has the sameeigenvalues as L(n) for n = 0, 1, . . . . Then it is obviousthat

det(L(n+1)) = det(L(n)), n = 0, 1, . . . . (20)

By cofactor expansion, the determinants of L(n) andL(n+1) are given as

det(L(n)) = (−1)mMU(n)

M1U

(n)

M2. . . U

(n)

Mm,

det(L(n+1)) = (−1)mMU(n+1)

M1U

(n+1)

M2. . . U

(n+1)

Mm,

respectively. Substituting the above expression into (20),we have

(−1)mMU(n+1)

M1U

(n+1)

M2. . . U

(n+1)

Mm

= (−1)mMU(n)

M1U

(n)

M2. . . U

(n)

Mm.

This leads to (19).(QED)

Let us assume that 0 < U(0)

k < K0 for k = 1, 2, . . . ,

Mm, where K0 is an arbitrary positive constant. Then

0 <∑Mm

k=1U

(0)

k < K1 and 0 <∏m

k=1U

(0)

Mk< K2, where

K1, K2 are positive constants related to K0. Proposi-

tions 1 and 2 also imply that, 0 <∑Mm

k=1U

(n)

k < K1 and

0 <∏m

k=1U

(n)

Mk< K2. Under the assumption 0 < u

(0)

k <

K0, it is concluded that 0 < U(n)

k < K3 for n = 0, 1, . . . ,

where K3 is a positive constant related to K0. Note thatthe time evolution is performed in the arithmetic suchthat positivity of variables is assured. This property isimportant for designing numerical algorithms.

4. The qd-type dhLV system and matrix

eigenvalue

In this section, we propose an application of the qd-type dhLV system to matrix eigenvalue computation.Assume that there exists the limit of δ(n) as n → ∞,and let δ∗ := limn→∞ δ(n). By taking account of (10) and

(11), the limits of U(n)

k and V(n)

k also exist as n → ∞.Namely,

limn→∞

U(n)

Mk= ck, k = 1, 2, . . . ,m,

limn→∞

U(n)

Mk+p = 0, p = 1, 2, . . . ,M,

limn→∞

V(n)

Mk+p = δ∗ck + 1, p = 0, 1, . . . ,M.

We simply rewrite the qd-type dhLV system (16) and(17) as the following recursion formula.

U(n+1)

k =U

(n)

k V(n)

k+M

V(n)

k

, (21)

V(n)

k = δ(n)U(n)

k + V(n)

k−1− δ(n)

U(n)

k−M−1V

(n)

k−1

V(n)

k−M−1

. (22)

The time evolution from n to n + 1 in (21) with (22) isapplicable for computing eigenvalues of L(0) + dI. For

U(0)

k > 0 the time evolution in (21) with (22) generates

the same matrix as (15), where V(0)

k is calculated if U(0)

k

is given. In other words, computed eigenvalues by (21)with (22) are theoretically equal to those by the dhLValgorithm.

In finite arithmetic, it is doubtful whether the timeevolution in (21) with (22) is performed with high ac-curacy. This is because that cancellation by subtractionmay occur. Subtraction also appears in the recursion for-mula of the qd algorithm.

Rutishauser [2] recognized some numerical instability

of the qd algorithm (4) where variables q(n)

k and e(n)

k

are not related to the dLV variables u(n)

k . So he intro-duced an modified version, named the dqd (differentialqd) algorithm [2], for the purpose of avoiding numericalinstability. Along a way similar to the dqd algorithm, wederive a recursion formula without subtraction. Let usintroduce a new variable

P(n)

k := V(n)

k−1− δ(n)

U(n)

k−M−1V

(n)

k−1

V(n)

k−M−1

. (23)

Then P(n)

k satisfies the recursion formula

P(n)

k =V

(n)

k−1

V(n)

k−M−1

P(n)

k−M−1, (24)

where we set P(n)

k = 1 for k = −M,−M + 1, . . . , 0. By

using P(n)

k , (22) is rewritten as

V(n)

k = δ(n)U(n)

k + P(n)

k . (25)

Obviously, (24) and (25) have no subtraction, and thecancellation does not occur. The recursion formula (21)with (24) and (25) is essentially equivalent to the qd-type dhLV system (16) and (17). Note that the ratio

V(n)

k /V(n)

k−M appears in both (21) and (24). Let Q(n)

k :=

V(n)

k /V(n)

k−M , and set Q(n)

0 = 1 for n = 0, 1, . . . . Then thetime evolution of the qd-type dhLV system is performed

by the following Procedure 1. In Procedure 1, U(0)

k isgiven by the entry of L(0) + dI and δ(n) for n = 0, 1, . . .is an optional parameter. The time evolution requiresless operations in Procedure 1 than in original (21) with

(22). As shown in [8, 9], ck = limn→∞ U(n)

Mkis equal to

the eigenvalue of L(0) + dI. We call this algorithm the

– 38 –


Table 1. Computed eigenvalues of the Toeplitz matrix T by the dhLV and the qd-type dhLV algorithms

by the dhLV algorithm by the qd-type dhLV algorithm

2.00000000000000 + i 1.788617417884120 2.00000000000000 + i 1.7886174178841190.211382582115879 0.2113825821158802.00000000000000 − i 1.788617417884120 2.00000000000000 − i 1.7886174178841193.78861741788412 3.788617417884122.00000000000000 + i 1.333397829783662 2.00000000000000 + i 1.3333978297836620.666602170216338 0.6666021702163382.00000000000000 − i 1.333397829783662 2.00000000000000 − i 1.3333978297836623.33339782978366 3.333397829783662.00000000000000 + i 0.5683177818055106 2.00000000000000 + i 0.56831778180551071.43168221819449 1.431682218194482.00000000000000 − i 0.5683177818055106 2.00000000000000 − i 0.56831778180551072.56831778180551 2.56831778180551

qd-type dhLV algorithm.

Procedure 1

set boundary conditions of U(n)

k , V(n)

k , P(n)

k , Q(n)

k

for n := 0, 1, . . . , nmax dofor k := 1, 2, . . . ,Mm + M do

P(n)

k = Q(n)

k−1P

(n)

k−M−1

V(n)

k = δ(n)U(n)

k + P(n)

k

Q(n)

k = V(n)

k /V(n)

k−M

end forfor k := 1, 2, . . . ,Mm do

U(n+1)

k = Q(n)

k+MU(n)

k

end forend for

Now we present a numerical experiment carried outon our computer with OS: Windows XP, CPU: Gen-uine Intel (R) 1.66GHz, RAM: 1.99GB. We also useWolfram Mathematica 6.0 with double-precision floatingpoint arithmetic. As a numerical example, we considera 12 × 12 Toeplitz matrix T as L(0) + dI with M = 3,

m = 3, d = 2 and U(0)

k = 1.5 for k = 1, 2, . . . , 9. Letδ(n) = 1.0 for n = 0, 1, . . . .

Table 1 shows computed eigenvalues by the dhLV al-gorithm [8, 9] and the qd-type dhLV algorithm. We seefrom Table 1 that both algorithms can compute the sameeigenvalues with almost the same accuracy. The opera-tion number of the dhLV algorithm and of the qd-typedhLV algorithm are 6M and 5 times, respectively, forthe evolution from n to n + 1 of one variable. From theviewpoint of the operation number, the qd-type dhLValgorithm is better than the dhLV algorithm.

5. Concluding remarks

In this paper, we discuss some properties of the qd-type dhLV system. Based on the qd-type dhLV systemand its properties, we design a new algorithm for com-puting complex eigenvalues of a certain band matrix,similar to the dhLV algorithm. Along the way similarto the dqd algorithm, we design the qd-type dhLV algo-rithm without subtraction. We also confirm that the newalgorithm can compute same eigenvalues as the dhLV al-gorithm through a numerical example. In order to com-pare numerical accuracy and running time of the qd-

type dhLV algorithm with or without subtraction andthe dhLV algorithm, it is necessary to perform more nu-merical experiments.

Acknowledgments

The authors thank the reviewer for his carefully read-ing and helpful suggestions. The authors would like toalso thank Dr. S. Tsujimoto and Dr. A. Nagai for manyfruitful discussions and helpful advices in this work.This was partially supported by Grants-in-Aid for YoungScientist (B) No.20740064, and Scientific Research (C)No.20540137 of Japan Society for the Promotion of Sci-ence.

References

[1] W. W. Symes, The QR algorithm and scattering for the finitenonperiodic Toda lattice, Physica D, 4 (1982), 275–280.

[2] H. Rutishauser, Lectures on Numerical Mathematics,

Birkhauser, Boston, 1990.[3] Y. Nakamura, Calculating Laplace transforms in terms of the

Toda molecule, SIAM J. Sci. Comput., 20 (1999), 306–317.[4] Y. Minesaki and Y. Nakamura, The discrete relativistic Toda

molecule equation and a Pade approximation algorithm, Nu-mer. Algorithms, 27 (2001), 219–235.

[5] R. Hirota, Conserved quantities of a “random-time Todaequation”, J. Phys. Soc. Japan, 66 (1997), 283–284.

[6] M. Iwasaki and Y. Nakamura, An application of the dis-crete Lotka-Volterra system with variable step-size to singularvalue computation, Inverse Problems, 20 (2004), 553–563.

[7] M. T. Chu, Linear algebra algorithm as dynamical systems,Acta Numerica, 17 (2008), 1–86.

[8] A. Fukuda, E. Ishiwata, M. Iwasaki and Y. Nakamura, Anumerical factorization of characteristic polynomial by the

discrete hungry Lotka-Volterra system (in Japanese), Trans.Japan. Soc. Indust. Appl. Math., 18 (2008), 409–425.

[9] A.Fukuda, E. Ishiwata, M.Iwasaki and Y.Nakamura, The dis-

crete hungry Lotka-Volterra system and a new algorithm forcomputing matrix eigenvalues, Inverse Problems, 25 (2009),015007.

[10] Y. Nakamura (Eds.), Applied Integrable Systems (in

Japanese), Shokabo, Tokyo, 2000.[11] S. Tsujimoto, R. Hirota and S. Oishi, An extension and dis-

cretization of Volterra equation I (in Japanese), IEICE Tech.Rep. NLP 92–90 (1993), 1–3.

[12] S. Tsujimoto, On the discrete Toda lattice hierarchy and or-thogonal polynomials (in Japanese), RIMS Kokyuroku, 1280(2002), 11–18.

– 39 –


Eigendecomposition algorithms solving sequentially

quadratic systems by Newton method

Koichi Kondo1, Shinji Yasukouchi

1and Masashi Iwasaki

2

Faculty of Science and Engineering, Doshisha University, 1-3 Tatara Miyakodani, KyotanabeCity, 610-0394, Japan1

Faculty of Life and Environmental Sciences, Kyoto Prefectural University, 1-5 Nagaragi-choShimogamo, Sakyo-ku, Kyoto 606-8522, Japan2


Received March 17, 2009, Accepted June 20, 2009

Abstract

In this paper, we design new algorithms for eigendecomposition. With the help of the New-ton iterative method, we solve a nonlinear quadratic system whose solution is equal to aneigenvector on a hyperplane. By choosing normal vector of the hyperplane in the orthog-onal complement of the space spanned by already obtained eigenvectors, all eigenpairs aresequentially obtained by solving the quadratic systems.

Keywords eigendecomposition, the Newton method, quadratic method


1. Introduction

The quadratic method is known as one of the methodsfor all eigenpairs [1]. In this method, the eigenvalue prob-lem is replaced with the nonlinear quadratic systems.For an eigenpair, the solution of the quadratic system iscomputed by using the Newton iterative method. For alleigenpairs, the continuation method is proposed in [1].The continuation method requires not only a quadraticsystem to be solved for original eigenvalue problem butalso many perturbative ones. And furthermore it oftenfails in finding the desired eigenpairs. Even if it succeeds,the obtained eigenpairs are not always computed withhigh accuracy. In this paper, we design new eigendecom-position algorithms, which are different from the contin-uation method, through solving the quadratic systemswith the help of the Newton method. Our algorithmsare not also equivalent to the standard inverse iterationmethod. In some numerical experiments, we show thatall eigenvectors are computable by our algorithms.

2. Quadratic method

In this paper, we consider the eigenvalue problem

Ax = λx, A ∈ Cn×n, (1)

where λ ∈ C and x ∈ Cn denote the eigenvalue and the

corresponding eigenvector of A, respectively.Let z be an n-dimensional vector. Let (z,x) = z

Hx

= C for some nonzero constant C, where (·, ·) and thesuperscript H denote the inner product of two vectorsand the complex conjugate of matrix, respectively. Thecase where z = ek is discussed in [1] where ek is a unitvector whose kth entry is the unity. Noting that λ =λ(x) = (AH

z,x)/C for suitable z, then the eigenvector

x is given by solving the nonlinear quadratic system

F (x) := Ax − (w,x)

Cx = 0, w = AH

z. (2)

With the help of the Newton iterative method, the so-lution x is computable by the recurrence formula

x(ℓ+1) =

Cx(ℓ+1)

(z, x(ℓ+1)), ℓ = 0, 1, . . . , ℓmax,

x(ℓ+1) = x

(ℓ) − J(x(ℓ))−1F (x(ℓ)),

J(x(ℓ)) = A − λ(x(ℓ))I − x(ℓ)

wH

C,

λ(x(ℓ)) =(w,x(ℓ))

C,

(3)

where I is an n-dimensional unit matrix and x(0) is an

initial vector. See Section 3 for the setting of x(0). Let

ℓ∗ be the number in (3) such that

‖Ax(ℓ∗) − λ(x(ℓ∗))x(ℓ∗)‖∞ < ǫitr‖x(ℓ∗)‖2 (4)

for small ǫitr. Then x(ℓ∗) becomes a good approximation

of x in (2). By the normalization x(ℓ) → x

(ℓ)/‖x(ℓ)‖2

for each ℓ in (3), the inequality (4) becomes

‖Ax(ℓ∗) − λ(x(ℓ∗))x(ℓ∗)‖∞ < ǫitr. (5)

Note here that ‖x(ℓ)‖2 = 1 for each ℓ. Moreover, byreplacing C with C(ℓ) = (z,x(ℓ)) in (2) and (3), it followsthat

x(ℓ+1) =

x(ℓ+1)

‖x(ℓ+1)‖2

, ℓ = 0, 1, . . . , ℓmax,

x(ℓ+1) = x

(ℓ) − J(x(ℓ))−1F (x(ℓ)),

J(x(ℓ)) = A − λ(x(ℓ))I − x(ℓ)

wH

(z,x(ℓ)),

λ(x(ℓ)) =(w,x(ℓ))

(z,x(ℓ)).

(6)

– 40 –

JSIAM Letters Vol. 1 (2009) pp.40–43 Koichi Kondo et al.

At each ℓ, the hyperplane (z,x(ℓ)) = C(ℓ) is translatedwithout changing its normal vector. We call the algo-rithm for an eigenpair based on (6) the neig J algo-rithm. By applying the Sherman-Morrison formula

(M + uvH)−1 =

(

I − M−1uv

H

1 + (v,M−1u)

)

M−1 (7)

to the inverse J(x(ℓ))−1 in (6), we have

x(ℓ+1) =

λ(x(ℓ))

(w, x(ℓ))/(z,x(ℓ)) − 1x

(ℓ),

x(ℓ) = (A − λ(x(ℓ))I)−1

x(ℓ).

(8)

Hence the following recurrence formula also generatesthe evolution from ℓ to ℓ + 1 of x

(ℓ).

x(ℓ+1) =

x(ℓ+1)

‖x(ℓ+1)‖2

, ℓ = 0, 1, . . . , ℓmax,

x(ℓ+1) = (A − λ(x(ℓ))I)−1

x(ℓ),

λ(x(ℓ)) =(w,x(ℓ))

(z,x(ℓ)).

(9)

In [2, p. 194], (9) is called as a generalized Rayleigh quo-tient iteration. If λ(x(ℓ)) = λ is given, then the iteration(9) becomes

x(ℓ+1) =

x(ℓ+1)

‖x(ℓ+1)‖2

, x(ℓ+1) = (A − λI)−1

x(ℓ). (10)

This is well-known as the inverse iteration for com-puting eigenvector. The iteration (9) may be regardedas one of inverse iterations with updating λ(x(ℓ)) ateach ℓ by a generalized Rayleigh quotient λ(x(ℓ)) =(z, Ax

(ℓ))/(z,x(ℓ)). We call the algorithm based on (9)the neig I algorithm. Though the computed eigenpairby the neig I algorithm is theoretically the same as thatby the neig J algorithm, the neig I algorithm is obvi-ously different from the neig J algorithm with respectto numerical accuracy. See Section 4 for numerical accu-racy.

3. Eigendecomposition algorithm

An eigenpair (λ,x) is computable if suitable initialvector x

(0) is given in (6), (9). The other eigenpairs arealso computed by changing x

(0) in (6), (9). Namely, wecan theoretically compute all eigenpairs by using theneig ∗ algorithm. It is, however, not easy to computeall eigenpairs if x

(0) is randomly given. It is well-knownthat the fractal graph is given from the relationship be-tween the initial vector x

(0) and the limit limℓ→∞ x(ℓ)

in the Newton iteration method (cf. [3, pp. 237–242]).Namely, it is not expected to choose x

(0) for computingthe desired eigenpair in the neig ∗ algorithm.

Let x1, . . . , xk be the already obtained eigenvectorswhere k < n . We here consider the subspace Wk := 〈x1,· · · , xk〉C and its orthogonal complement W⊥

k . Since thenormal vector z of the hyperplane (z,x(ℓ)) = C(ℓ) ischangeable, we may adopt the vector in W⊥

k as z. It isremarkable that W⊥

k does not include x1, . . . , xk. Letus assume that x

(ℓ) converges as ℓ → ∞. Then it is obvi-ous that, for ℓ = 1, 2, . . . , C(ℓ) 6= 0 and limℓ→∞ C(ℓ) 6= 0.

This implies that limℓ→∞ x(ℓ) /∈ Wk. Hence x

(ℓ) → xk+1

and λ(x(ℓ)) → λk+1 as ℓ → ∞. Namely, the eigenpair(λk+1,xk+1) is computable by the neig ∗ algorithm.Similarly, the others are obtained only if x

(ℓ) convergesas ℓ → ∞ for each k. Therefore, all eigenpairs are se-quentially computed by the following algorithm.

Algorithm 101 function [X,D]=sneig ∗(A)02 t := 003 Q = (q1 · · · qn) := I04 for k = 1, 2, . . . , n05 z := qk ∈ W⊥

k−1

06 f := 007 do08 x

(0) := random vec(n)09 [xk, λk, Ek] := neig ∗(A,x(0),z, ℓmax)10 θ := minj=1,...,k−1 angle(xk,xj)11 t := t + 1; f := f + 112 if f ≥ fmax then stop % failed13 while(Ek ≥ ǫgood or θ ≤ θsame)14 rk := xk

15 for j = 1, . . . , k − 116 rk := rk − αj(hj , rk)hj

17 end18 [hk, αk] := householder vec(rk)19 Q := Q − αk(Qhk)hH

k

20 end21 X := (x1 · · · xn); D := diag(λ1, . . . , λn)

Here we call Algorithm 1 the sneig ∗ algorithm. Thesneig J, the sneig I algorithms employ the neig J, theneig I algorithms, respectively.

In the 8th line of Algorithm 1, we make choice of theinitial complex vector x

(0) randomly. In the 9th line, bythe neig ∗ algorithm, we compute the kth eigenvalueλk, the corresponding eigenvector xk and the residualnorm Ek := ‖Axk −λkxk‖∞. As discussed in the above,the neig ∗ algorithm does not converge for unsuitablex

(0). We regard that the neig ∗ algorithm does not con-verge if Ek ≥ ǫgood for small ǫgood. And then we performthe neig ∗ algorithm after the change of x

(0). The op-erations from the 7th line to the 13th line are repeateduntil Ek < ǫgood. Theoretically, xk is not equal to one ofx1, . . . ,xk−1. This property is not always guaranteed inthe double precision arithmetic. In the 10th line, we com-pute the minimal angle θ := minj=1,...,k−1 angle(xk,xj)where

angle(xk,xj) :=180

πcos−1

( |(xk,xj)|‖xk‖2‖xj‖2

)

. (11)

We regard that xk is equal to one of x1, . . . ,xk−1 ifθ ≤ θsame for small θsame, and then we perform theneig ∗ algorithm after the change of x

(0). Let f be theiteration number of the neig ∗ algorithm for an eigen-pair. Then we regard that only a part of eigenpairs iscomputed by the sneig ∗ algorithm if f ≥ fmax forthe maximal iteration number fmax. In this case, thesneig ∗ algorithm is coercively stopped in the 12th line.

In the 5th line of Algorithm 1, we choose z in theorthogonal complement W⊥

k−1. In this paper, for the

– 41 –


choice of z we use the QR decomposition based on theHouseholder transformation. Let Xk−1 = Qk−1Rk−1 bethe QR decomposition of Xk−1 = (x1 · · · xk−1), whereQk−1 = (q1 · · · qn) ∈ C

n×n, Rk−1 = (r1 · · · rk−1) ∈C

n×(k−1) are the unitary, the upper triangle matrices,respectively. Let Wk−1 = 〈q1, · · · , qk−1〉C. Then it isobvious that W⊥

k−1= 〈qk, · · · , qn〉C. This implies that z

should be the linear combination of the basis qk, . . . ,qn. In Algorithm 1, we set z = qk. From the view-point of the running time, it is not desirable that wecompute the QR decomposition of Xk for each k. Itis of significance to note here that the columns fromthe 1st to the (k − 1)th of Rk, Qk are equal to thoseof Rk−1, Qk−1, respectively. Hence, in the kth House-holder transformation, we compute only the kth columnof Rk. In the lines from the 14th to the 17th, we com-pute the kth column rk = (r1,k · · · rk−1,k rk,k · · · rn,k)T

of QHk−1

Xk = (r1 · · · rk−1 rk) from xk. In the 18th line,we derive hk and αk from rk for computing the House-holder matrix Hk := I − αkhkh

Hk as follows:

hk = (0 · · · 0 − ζξ rk+1,k · · · rn,k)T , (12)

ζ :=rk,k

|rk,k|, η :=

√

|rk,k|2 + · · · + |rn,k|2, (13)

ξ :=|rk+1,k|2 + · · · + |rn,k|2

|rk,k| + η, αk =

1

ξη, (14)

where Hk : rk 7→ rk = (r1,k · · · rk−1,k ζη 0 · · · 0)T

and HHk QH

k−1Xk = Rk = (r1 · · · rk−1 rk). In the 19th

line, we compute Qk as Qk = Qk−1Hk = Qk−1 −αk(Qk−1hk)hH

k . It is remarkable that q1, . . . , qk−1 arenot changed in the 19th line since hk has 0 from the 1stentry to the (k − 1)th entry. As a result, the sneig ∗algorithm requires only the operations for a QR decom-position. The Lanczos method is also shown in [4] asthe vector orthogonormalization method by using theHouseholder transformation without saving the upper-triangle matrix.

4. Numerical experiments

In this section, we show some numerical experimentswith respect to the sneig ∗ algorithm and the inverseiteration based on (10). Let us call the inverse itera-tion based on (10) the sii algorithm for simplicity. Nu-merical experiments have been carried out on our com-puter with OS: Linux 2.6.26, CPU: Intel Core i7, RAM:2GB. We also use GNU C Compiler 4.3.2 and LAPACK3.1.1 [5]. As test matrix, we adopt the Toeplitz matrix

A =

2 1

0 2 1

γ 0 2. . .

γ. . .

. . . 1. . . 0 2 1

γ 0 2

. (15)

In [6], the Toeplitz matrix (15) appears in numeri-cal test for the solvers of the linear equations. In thesneig ∗ algorithm, we set ǫitr = 10−13, ℓmax = 50,ǫgood = 5 × 10−13, θsame = 0.3, fmax = 2n. The inverse

20 40 60 80 100 120n

1e-15

1e-12

1e-09

1e-06

1e-03

Em

ax

Fig. 1. Maximal residual norm Emax in the case of test matrix(15) with γ = 1.6. © : sneig J, : sneig I, × : sii.

20 40 60 80 100 120n

1.0

2.0

4.0

8.0

16.0

t / n

Fig. 2. Ratio of the iteration number t to the matrix size n in thecase of test matrix (15) with γ = 1.6. © : sneig J, : sneig I,

× : sii.

60 80 100 120n

0.5

1.0

1.5

2.0

θm

in[d

eg]

Fig. 3. Minimal angle θmin among the eigenvectors in the case

of test matrix (15) with γ = 1.6. © : sneig J, : sneig I, × :sii, – : Maple.

matrices appeared in (6), (9), (10) are computed by us-ing the solver of the linear equations with the help ofthe LAPACK routine zgesv. In the sii algorithm, aneigenvalue and its corresponding eigenvector are com-puted by the LAPACK routine zgeev and the inverseiteration based on (10), respectively. The initial vectorx

(0) in (10) is changed if Ek ≥ ǫgood or θ ≤ θsame. Let tbe the iteration number of (6), (9), (10) for computingall eigenpairs.

Figs. 1–3 describe the numerical properties in the casewhere γ = 1.6. No plotted points exist for the casewhere the sneig ∗ algorithms stop without computingall eigenpairs. Fig. 1 shows the maximal residual norm

Emax = maxk=1,...,n

Ek = maxk=1,...,n

‖Axk − λkxk‖∞. (16)

– 42 –


1.0 1.5 2.0 2.5 3.0γ

50

100

150

200

nm

ax

Fig. 4. Computable matrix size nmax. © : sneig J, : sneig I,× : sii.

1.0 1.5 2.0 2.5 3.0γ

1.0

2.0

3.0

θm

in[d

eg]

Fig. 5. Minimal angle θmin in the case of n = nmax. © : sneig J,

: sneig I, × : sii.

By using the sneig ∗, the sii algorithms, Emax be-comes O(10−13). Though, in the sneig ∗ and the sii

algorithms, the eigenvectors seem to be computed withhigh accuracy, it is necessary to investigate the anglesamong the computed eigenvectors. This is shown in thelater discussion. Fig. 2 shows the ratio of t to the matrixsize n for several n. For n ≤ 40, t slightly increases inthe sneig ∗ algorithm. For n≥ 60, there is the observa-tion that by both the neig ∗ algorithm and the inverseiteration for an eigenpair, the computed eigenvector isnot with high accuracy, or, is almost equal to the al-ready obtained ones. And then the sneig ∗, the sii

algorithms require the change of the initial vector x(0).

This flow is surely dependent on the angles among theeigenvectors. Let θmin := min1≤i<j≤n angle(xi,xj) bethe minimal angle among the eigenvectors. Fig. 3 showsthe relationship between n and θmin. Fig. 3 also includesthe numerical results by Maple, where 100 digits arith-metic is performed in Maple. For n ≥ 60, θmin is about1. A part of eignevectors are nearly parallel. As thematrix size n increases, θmin by Maple becomes smaller.All eigenvectors computed by the sneig ∗ algorithm arenear to those by Maple. The minimal angle θmin by thesii algorithm are different from that by Maple. Let θ∗min

be the minimal angle among the eigenvectors by Maple.In the sii algorithm, for n ≥ 90, θmin does not satisfy|θmin − θ∗min| < 0.03.

Next we investigate the computable maximal matrixsize nmax as the entry γ in (15) becomes larger. We re-gard that the algorithms fail if |θmin − θ∗min| ≥ 0.03

as the matrix size n grows larger. Fig. 4 shows therelationship between γ and nmax. For γ > 1.2, nmax

in the sneig J algorithm is much larger than that inthe sneig I algorithm. And nmax in the sii algorithmis about 0.79 times as that in the sneig J algorithm.Fig. 5 shows the relationship between γ and θmin inthe case where the matrix size is equal to nmax. Forall γ, θmin in the sneig J algorithm are almost 0.46.For γ > 1.2, θmin in the sneig I algorithm is largerthan that in the sneig J algorithm. And θmin in thesii algorithm is slightly larger than that in the sneig J

algorithm. Compared with the results by Maple, it is ob-vious that the sii algorithm is not with high accuracy.Consequently, the sneig J algorithm generates the mostaccurate eigendecomposition among three algorithms.

5. Conclusion

In this paper, we design new eigendecomposition al-gorithms based on solving the nonlinear quadratic sys-tems. In our algorithms, the existence space of eigenvec-tors is restricted to the suitable hyperplane. The eigen-value problem is replaced with solving the quadraticsystems. An eigenpair is computed through solving thequadratic systems with the help of the Newton iterativemethod. The normal vector of the hyperplane is givenfrom the orthogonal complement of the space spannedby the already obtained eigenvectors. The solutions ofthe quadratic systems are not equal to the already ob-tained eigenvectors. Of course, for any initial vector, thecomputed vector by the Newton iterative method doesnot become the already obtained eigenvectors. Conse-quently, all eigenpairs are sequentially computable. Ouralgorithms are two types named the sneig J algorithmwith the Newton iterative method and the sneig I algo-rithm with a modified inverse iteration. Our algorithmsare compared with the standard inverse iteration fromthe viewpoint of numerical accuracy. It is shown that thesneig J algorithm is the best algorithm for computingall eigenvectors with high accuracy in the case where theminimal angle among the eigenvectors is small.

Acknowledgments

The authors thank the reviewer for his carefully read-ing and helpful suggestions.

References

[1] M. B. Elgindi and A. Kharab, The quadratic method for com-puting the eigenpairs of a matrix, Int. J. Comput. Math., 73

(2000), 517–530.[2] F. Chatelin, Eigenvalues of Matrices (in Japanese), Springer-

Verlag, Tokyo, 2003.

[3] K. Falconer, Fractal Geometry, Mathematical Foundationsand Applications, Second Edition, John Wiley & Sons, Eng-land, 2003.

[4] G.H.Golub and C.F.Van Loan, Matrix Computations, Third

Edition, The Johns Hopkins Univ. Press, Baltimore and Lon-don, 1996.

[5] LAPACK, http://www.netlib.org/lapack/.[6] M. H. Gutknecht, Variants of BiCGSTAB for matrix with

complex spectrum, SIAM J. Sci. Comput., 14 (1993), 1020–1033.

– 43 –


Block BiCGGR: a new Block Krylov subspace method

for computing high accuracy solutions

Hiroto Tadano1, Tetsuya Sakurai

1and Yoshinobu Kuramashi

2

Department of Computer Science, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki305-8573, Japan1

Graduate School of Pure and Applied Sciences, University of Tsukuba, 1-1-1 Tennodai,Tsukuba, Ibaraki 305-8573, Japan2

E-mail tadano, [email protected], [email protected]

Received March 19, 2009, Accepted June 15, 2009

Abstract

In this paper, the influence of errors which arise in matrix multiplications on the accuracyof approximate solutions generated by the Block BiCGSTAB method is analyzed. In orderto generate high accuracy solutions, a new Block Krylov subspace method named “BlockBiCGGR” is also proposed. Some numerical experiments illustrate that the Block BiCGGRmethod can generate high accuracy solutions compared with the Block BiCGSTAB method.

Keywords Block Krylov subspace methods, Block BiCGSTAB, linear systems with multipleright hand sides, high accuracy solutions


1. Introduction

Linear systems with multiple right hand sides

AX = B, (1)

where A ∈ Cn×n, B,X ∈ C

n×L, appear in many scien-tific applications such as lattice quantum chromodynam-ics (lattice QCD) calculation of physical quantities [1],an eigensolver using contour integration [2]. To solvethese linear systems for X, some Block Krylov subspacemethods (e.g., Block BiCG [3], Block BiCGSTAB [4],Block QMR [5]) have been proposed.

Block Krylov subspace methods can compute approxi-mate solutions of linear systems with multiple right handsides efficiently compared with Krylov subspace methodsfor single right hand side [5]. However, the gap betweenthe residual generated by the recursion of the BlockBiCGSTAB method and the true residual may arise. Inthis paper, the gap which arises in the Block BiCGSTABmethod is analyzed. Then, a new Block Krylov subspacemethod named “Block BiCGGR” for reducing the gapis also proposed.

This paper is organized as follows. In Section 2, amatrix-valued polynomial and an operation are defined.The Block BiCGSTAB method is briefly described inSection 3. In Section 4, the influence of errors whicharise in matrix multiplications on the accuracy of ap-proximate solutions of the Block BiCGSTAB method.In Section 5, the Block BiCGGR method is proposed forreducing the gap between the residual generated by therecursion and the true residual. Then the true residualof the Block BiCGGR method is also evaluated. In Sec-tion 6, the accuracy of approximate solutions generatedby both methods is verified by numerical experiments.The paper is concluded in Section 7.

2. Matrix-valued polynomial

Let Mk(z) be a matrix-valued polynomial of degree kdefined by

Mk(z) ≡k

∑

j=0

zjMj ,

where Mj ∈ CL×L and z ∈ C. The operation is used

in this paper for the multiplication

Mk(A) V ≡k

∑

j=0

AjV Mj ,

where V ∈ Cn×L. This operation satisfies the following

properties [4].

Proposition 1 Let M(z) and N (z) be matrix-valued

polynomials of degree k and let V and ξ be an n × Lmatrix and an L×L matrix, respectively. Then, the fol-

lowing properties are satisfied.

(i) (M(A) V )ξ = (Mξ)(A) V,

(ii) (M + N )(A) V = M(A) V + N (A) V.

3. The Block BiCGSTAB method

The (k+1)th residual Rk+1 ∈ Cn×L of the Block

BiCGSTAB method is defined by

Rk+1 = B − AXk+1 ≡ (Qk+1Rk+1)(A) R0, (2)

where R0 = B −AX0 is an initial residual. The matrix-valued polynomial Rk+1(z) of degree (k+1) which ap-pears in (2) can be computed by the following recursions

R0(z) = P0(z) = IL,

Rk+1(z) = Rk(z) − zPk(z)αk,

Pk+1(z) = Rk+1(z) + Pk(z)βk,

– 44 –

JSIAM Letters Vol. 1 (2009) pp.44–47 Hiroto Tadano et al.

X0 ∈ Cn×L is an initial guess,

Compute R0 = B − AX0,

Set P0 = R0,

Choose R0 ∈ Cn×L,

For k = 0, 1, . . . , until ‖Rk‖F ≤ ε‖B‖F do:

Vk = APk,

Solve (RH

0 Vk)αk = RH

0 Rk for αk,

Tk = Rk − Vkαk,

Zk = ATk,

ζk = Trˆ

ZH

k Tk

˜

/Trˆ

ZH

k Zk

˜

,

Xk+1 = Xk + Pkαk + ζkTk,

Rk+1 = Tk − ζkZk,

Solve (RH

0 Vk)βk = −RH

0 Zk for βk,

Pk+1 = Rk+1 + (Pk − ζkVk)βk,

End

Fig. 1. Algorithm of the Block BiCGSTAB method.

where Pk+1(z) is an auxiliary matrix-valued polynomialof degree (k+1), IL is an L×L identity matrix, αk andβk are L×L complex matrices. The polynomial Qk+1(z)of degree (k+1) is defined as follows:

Q0(z) = 1,

Qk+1(z) = (1 − ζkz)Qk(z),

where ζk ∈ C. The residual Rk+1 can be computed bythe following recursions,

Rk+1 = Tk − ζkATk, (3)

Pk+1 = Rk+1 + (Pk − ζkAPk)βk,

Tk = Rk − APkαk, (4)

where matrices Pk+1 and Tk are defined by Pk+1 ≡(Qk+1Pk+1)(A) R0 and Tk ≡ (QkRk+1)(A) R0, re-spectively. The Proposition 1 is used to derive the aboverecursions. From the Eqs. (2), (3), and (4), recursion forthe approximate solution Xk+1 can be obtained by

Xk+1 = Xk + Pkαk + ζkTk. (5)

The L×L matrices αk and βk are determined so thatbi-orthogonal conditions:

RH0 Aj(Rk(A) R0) = OL, j = 0, 1, . . . , k−1, (6)

RH0 Aj+1(Pk(A) R0) = OL, j = 0, 1, . . . , k−1, (7)

are satisfied. Here, R0 is an n × L arbitrary nonzeromatrix, OL is an L × L zero matrix, and ‖ · ‖F denotesthe Frobenius norm of a matrix. Typically, R0 is set toR0, or given by random numbers. The scalar parameterζk is determined so that ‖Rk+1‖F is minimized. Fig. 1shows the algorithm of the Block BiCGSTAB method.Here, Tr[ · ] denotes the trace of a matrix, and ε > 0 isa sufficiently small value for the stopping criterion.

4. Evaluation of the true residual of the

Block BiCGSTAB method

In this section, it is assumed that computation errorsarise in the multiplications with α0, α1, . . . , αk which ap-

pear in the Block BiCGSTAB method. The influence ofthese errors on the true residual of the Block BiCGSTABmethod is considered. A matrix enclosed by a symbol〈 · 〉 denotes the perturbed matrix. Throughout this sec-tion, it is assumed that no calculation errors arise exceptfor multiplications with α0, α1, . . . , αk.

The perturbed matrices 〈Pjαj〉 and 〈(APj)αj〉 are re-quired for the computation of Xj+1 and Rj+1, respec-tively. These matrices can be written as follows:

〈Pjαj〉 = Pjαj + Fj , (8)

〈(APj)αj〉 = APjαj + Gj , (9)

where Fj and Gj denote error matrices.From the Eqs. (5) and (8), Xk+1 is written as

Xk+1 = Xk + 〈Pkαk〉 + ζkTk

= X0 +

k∑

j=0

(Pjαj + ζjTj) +

k∑

j=0

Fj . (10)

By using the Eq. (9), the residual Rk+1 generated by therecursion (3) is also written as

Rk+1 = Rk − 〈(APk)αk〉 − ζkATk

= R0 −k

∑

j=0

(APjαj + ζjATj) −k

∑

j=0

Gj . (11)

By using the Eqs. (10) and (11), the true residualB−AXk+1 of the Block BiCGSTAB method is given by

B − AXk+1 = R0 −k

∑

j=0

(APjαj + ζjATj) −k

∑

j=0

AFj

= Rk+1 +

k∑

j=0

Ej , (12)

where the matrix Ej is defined by Ej ≡ Gj −AFj . Fromthe Eqs. (8) and (9), the matrix Ej can be written as

Ej = 〈(APj)αj〉 − A〈Pjαj〉.The error matrices E0, E1, . . . , Ek appear in (12) whenthe computation errors arise in the multiplications withα0, α1, . . . , αk. The Eq. (12) implies that the true resid-ual B − AXk+1 of the Block BiCGSTAB method ap-

proaches to∑k

j=0Ej when the residual norm ‖Rk+1‖F

is sufficiently small.

5. The Block BiCGGR method

The error matrices Fj and Gj generate the gap be-tween the residual and the true residual. To negate theinfluence of these matrices, a condition Gj = AFj shouldbe satisfied. In this section, a new Block Krylov subspacemethod is proposed to reduce the gap.

5.1 Construction of an algorithm

There are two ways of constructing the recursion forthe residual Rk+1 = (Qk+1Rk+1)(A) R0. In the BlockBiCGSTAB method, firstly, the polynomial Qk+1 is ex-panded. In this case, as shown in the Eq. (12), the trueresidual B − AXk+1 is not equal to the residual Rk+1

– 45 –


generated by the recursion. In the proposed method,firstly, the polynomial Rk+1 is expanded for computingQk+1Rk+1. The recursion of this polynomial is given by

Qk+1Rk+1 = QkRk − ζkzQkRk − zQk+1Pkαk.

The polynomials Qk+1Pk and Qk+1Pk+1 are computedby the following recursions:

Qk+1Pk = QkPk − ζkzQkPk,

Qk+1Pk+1 = Qk+1Rk+1 + Qk+1Pkβk.

From the above recursions, the residual Rk+1 and aux-iliary matrices can be computed by

Rk+1 = Rk − ζkARk − AUk, (13)

Pk+1 = Rk+1 + Ukα−1

k βk, (14)

Sk = Pk − ζkAPk,

where Sk ≡ (Qk+1Pk)(A) R0 and Uk ≡ Skαk. Fromthe Eqs. (2) and (13), Xk+1 can be computed by

Xk+1 = Xk + ζkRk + Uk. (15)

In the proposed method, the generation of the gap be-tween the residual and the true residual can be avoidedby computing the multiplication of Sk and αk before thecomputation of Xk+1 and Rk+1.

Matrices αk and βk are determined so that the bi-orthogonality conditions (6) and (7) are satisfied. Fromthe Eq. (6), the matrix αk can be computed by

αk = (RH0 APk)−1RH

0 Rk. (16)

By the bi-orthogonality condition (7) and the relation

RH0 Rk+1 = −ζkRH

0 ATk,

the matrix βk can be obtained by

βk = (RH0 APk)−1RH

0 Rk+1/ζk. (17)

The matrix γk ≡ α−1

k βk is appeared in the Eq. (14). Byusing the Eqs. (16) and (17), γk can be obtained by

γk = (RH0 Rk)−1RH

0 Rk+1/ζk.

If the parameter ζk is determined so that ‖Rk+1‖F isminimized, then extra multiplications with A are re-quired in the proposed method. To avoid the multiplica-tions with A, the parameter ζk is computed by

ζk = Tr[

(ARk)HRk

]

/Tr[

(ARk)HARk

]

.

In the proposed method, the three multiplicationswith A are required in each iteration. To reduce thenumber of multiplications with A, the matrix APk+1 iscomputed by the following recursion

APk+1 = ARk+1 + AUkγk.

5.2 Evaluation of the true residual

Similar to the previous section, assume that no cal-culation errors arise except for the multiplications withα0, α1, . . . , αk. The multiplication with αj is appearedin the computation of Uj = Sjαj . By using the symbol〈 · 〉, the perturbed matrix 〈Sjαj〉 is represented as

〈Sjαj〉 = Sjαj + Hj , (18)

X0 ∈ Cn×L is an initial guess,

Compute R0 = B − AX0,

Set P0 = R0 and V0 = W0 = AR0,

Choose R0 ∈ Cn×L,

For k = 0, 1, . . . , until ‖Rk‖F ≤ ε‖B‖F do:

Solve (RH

0 Vk)αk = RH

0 Rk for αk,

ζk = Trˆ

WH

k Rk

˜

/Trˆ

WH

k Wk

˜

,

Sk = Pk − ζkVk,

Uk = Skαk,

Yk = AUk,

Xk+1 = Xk + ζkRk + Uk,

Rk+1 = Rk − ζkWk − Yk,

Wk+1 = ARk+1,

Solve (RH

0 Rk)γk = RH

0 Rk+1/ζk for γk,

Pk+1 = Rk+1 + Ukγk,

Vk+1 = Wk+1 + Ykγk,

End

Fig. 2. Algorithm of the Block BiCGGR method.

where Hj is an error matrix. From the Eqs. (15) and(18), the approximate solution Xk+1 is written as

Xk+1 = Xk + ζkRk + 〈Skαk〉

= X0 +k

∑

j=0

(ζjRj + Sjαj) +k

∑

j=0

Hj . (19)

By using the Eqs. (13) and (19), Rk+1 is represented as

Rk+1 = Rk − ζkARk − A〈Skαk〉

= R0 −k

∑

j=0

(ζjARj + ASjαj) −k

∑

j=0

AHj

= B − A

X0 +

k∑

j=0

(ζjRj + Sjαj) +

k∑

j=0

Hj

= B − AXk+1.

By regarding the matrices Hj and AHj as Fj and Gj ,it is confirmed that the proposed method satisfies Ej =Gj − AFj = O. Since the proposed method can reducethe gap between the residual and the true residual, thismethod is named “Block Bi-Conjugate Gradient Gap-Reducing (Block BiCGGR)”. The algorithm of the BlockBiCGGR method is shown in Fig. 2.


Test matrices used in numerical experiments werePDE900, JPWH991, and CONF5.4-00L8X8-1000 [6].The size and the number of nonzero elements of thesematrices are shown in Table 1. The coefficient matrixof CONF5.4-00L8X8-1000 is constructed by In − κD,where D is an n × n non-Hermitian matrix and κ is areal valued parameter. This parameter was set to 0.1782.

The initial solution X0 was set to the zero matrix.The shadow residual R0 was given by a random numbergenerator. The right hand side B of (1) was given byB = [e1,e2, . . . ,eL], where ej is a jth unit vector. The

– 46 –


Table 1. The size and the number of nonzero elements of testmatrices. NNZ denotes the number of nonzero elements.

Matrix name Size NNZ

PDE900 900 4,380

JPWH991 991 6,027

CONF5.4-00L8X8-1000 49,152 1,916,928

Table 2. Results of the Block BiCGSTAB method.

PDE900

L #Iter. Time/L [s] Res. True Res.

1 53 0.0096 4.8 × 10−15 4.8 × 10−15

2 46 0.0067 1.1 × 10−15 2.0 × 10−13

4 41 0.0031 4.8 × 10−15 1.8 × 10−12

JPWH991


1 56 0.0159 5.7 × 10−15 1.2 × 10−14

2 49 0.0083 8.3 × 10−15 4.1 × 10−13

4 43 0.0034 6.3 × 10−15 5.9 × 10−12

CONF5.4-00L8X8-1000


1 555 13.9408 8.9 × 10−15 9.5 × 10−15

2 452 7.5609 7.3 × 10−15 2.5 × 10−13

4 406 6.1544 8.7 × 10−15 2.8 × 10−13

value ε for the stopping criterion was set to 1.0× 10−14.All experiments were carried out in double precision

arithmetic on CPU: Intel Core 2 Duo 2.4GHz, Memory:4GBytes, Compiler: Intel Fortran ver. 10.1, Compile op-tion: -O3 -xT -openmp. The multiplication with the co-efficient matrix was parallelized by OpenMP.

The results of the Block BiCGSTAB method areshown in Table 2. In this Table, #Iter., Res, and TrueRes. denote the number of iterations, the relative resid-ual norm ‖Rk‖F/‖B‖F, and the true relative residualnorm ‖B − AXk‖F/‖B‖F, respectively.

As shown in Table 2, the relative residual norms ofthe Block BiCGSTAB method were satisfied the con-vergence criterion. However, the true residual norms didnot reach 10−14 when L = 2, 4.

The relation between the true relative residual normand ‖∑k

j=0Ej‖F/‖B‖F for JPWH991 with L = 4 is

shown in Fig. 3. The true relative residual norm becamealmost equal to the value ‖

∑k

j=0Ej‖F/‖B‖F. The Eq.

(12) was verified through this numerical example.The results of the Block BiCGGR method are shown

in Table 3. The true relative residual norms reached10−14 except for JPWH991 with L = 1. By using theBlock BiCGGR method, the gap between the residualand the true residual can be reduced compared with theBlock BiCGSTAB method.

7. Conclusions

In this paper, we have evaluated the true residual ofthe Block BiCGSTAB method when the computationerrors arise in the multiplications with α0, α1, . . . , αk.We have shown that the true residual of this methodapproaches to the sum of error matrices when the resid-ual norm is sufficiently small. Then, we have proposedthe Block BiCGGR method for reducing the gap be-

Table 3. Results of the Block BiCGGR method.

PDE900


1 53 0.0107 3.2 × 10−15 3.3 × 10−15

2 46 0.0051 1.1 × 10−15 1.4 × 10−15

4 45 0.0031 5.5 × 10−15 5.6 × 10−15

JPWH991


1 52 0.0134 8.4 × 10−15 1.3 × 10−14

2 51 0.0082 3.7 × 10−15 6.1 × 10−15

4 44 0.0035 1.5 × 10−15 2.3 × 10−15

CONF5.4-00L8X8-1000


1 555 14.2714 7.4 × 10−15 8.5 × 10−15

2 456 8.1093 5.6 × 10−15 6.7 × 10−15

4 386 6.0348 7.4 × 10−15 8.6 × 10−15

Fig. 3. Relation between the true relative residual norm of

Block BiCGSTAB and ‖P

k

j=0Ej‖F/‖B‖F (JPWH991, L = 4).

--- : true relative residual norm ‖B−AXk‖F/‖B‖F,--- : rel-

ative residual norm ‖Rk‖F/‖B‖F, --- : ‖P

k

j=0Ej‖F/‖B‖F.

tween the residual and the true residual. Through somenumerical experiments, we have verified that the BlockBiCGGR method generates the high accuracy solutionscompared with the Block BiCGSTAB method.

Acknowledgments

This work was supported by Grant-in-Aid for YoungScientists (Start-up) (No. 20800009).

References

[1] PACS-CS Collaboration, S. Aoki et al., 2 + 1 Flavor LatticeQCD toward the Physical Point, arXiv:0807.1661v1 [hep-lat],2008.

[2] T. Sakurai, H. Tadano, T. Ikegami, and U. Nagashima, Aparallel eigensolver using contour integration for generalizedeigenvalue problems in molecular simulation, Tech. Rep. CS–

TR–08–14, Univ. of Tsukuba, 2008.[3] D. P. O’Leary, The block conjugate gradient algorithm and

related methods, Lin. Alg. Appl., 29 (1980), 293–322.[4] A. El Guennouni, K. Jbilou and H. Sadok, A block version of

BiCGSTAB for linear systems with multiple right-hand sides,Elec. Trans. Numer. Anal., 16 (2003), 129–142.

[5] R. W. Freund and M. Malhotra, A block QMR algorithm for

non-Hermitian linear systems with multiple right-hand sides,Lin. Alg. Appl., 254 (1997), 119–157.

[6] Matrix Market, http://math.nist.gov/MatrixMarket/

– 47 –


On parallelism of the I-SVD algorithm

with a multi-core processor

Hiroki Toyokawa1,2, Kinji Kimura1, Masami Takata3 and Yoshimasa Nakamura1,2

Graduate School of Informatics, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto, 606-8501, Japan1

SORST, JST, Japan2

Graduate School of Humanities and Sciences, Nara Women’s University, Kitauoyanishi-machi,Nara, 630-8506, Japan3


Received February 23, 2009, Accepted June 4, 2009

Abstract

The I-SVD algorithm is a singular value decomposition algorithm consisting of the mdLVsscheme and the dLV twisted factorization. By assigning each piece of computations to eachcore of a multi-core processor, the I-SVD algorithm is parallelized partly. The basic idea is ause of splitting and deflation in the mdLVs. The splitting divides a bidiagonal matrix into twosmaller matrices. The deflation gives one of the singular values, and then the correspondingsingular vector becomes computable by the dLV. Numerical experiments are done on a multi-core processor, and the algorithm can be about 5 times faster with 8 cores.

Keywords singular value decomposition, I-SVD, multi-core processor, parallelism


1. Introduction

Singular value decomposition (SVD) is one of the mostimportant matrix operations in numerical linear algebra.It is applied to many fields in engineering such as imageprocessing and data search.

Several algorithms have been developed for SVD. TheQR method, Divide and Conquer (DC) and a combina-tion of the bisection and the inverse iteration have beenthe standard algorithms [1]. However, these algorithmsleave something to be improved, for example, in compu-tative cost, relative error of singular values, orthogonal-ity of singular vectors, or memory usage.

Recently a new SVD algorithm called the IntegrableSVD (I-SVD) algorithm is designed in [2–5]. To solve alarge scaled SVD problem, the I-SVD should be paral-lelized. However, it is difficult to parallelize the whole I-SVD algorithm efficiently [6]. This is because the mdLVsscheme [2] is a serial algorithm. In [7,8], the double DCalgorithm is proposed which is a combination of a sim-plified DC and the dLV twisted factorization, and thensuitable for parallel SVD. A parallelism of the I-SVDalgorithm itself is still an important open problem.

These days multi-core processors are widely used.Therefore, a parallelism of the I-SVD algorithm withmulti-core processors comes to be hoped.

In this article, a new parallelism of the I-SVD algo-rithm is established by regarding the I-SVD as a con-tinual job sequences and by assigning each job to eachcore. Besides, numerical experiments are done in orderto evaluate this new parallelism.

2. Singular value decomposition and the

I-SVD algorithm

2.1 Singular value decompositionSingular value decomposition for a square matrix M ∈

Rm×m is described as follows:

M = UΣV T ,

where U, V ∈ Rm×m are orthogonal matrices and Σ ∈Rm×m is a diagonal matrix whose elements are non-negative. It is known that any arbitrary square matrixadmits SVD [9]. SVD itself can be also achieved for m×nrectangular matrices, but in this article we treat thecases of m × m square matrices. The kth diagonal el-ement σk in descending order of Σ is the kth singularvalue of M . The kth columns of U and V are the kthleft singular vector uk and the kth right singular vectorvk, respectively.

2.2 mdLVs scheme and the dLV twisted factorizationThe mdLVs scheme is one of the efficient algorithms

for computing singular values of an upper bidiagonalmatrix B = (bi,j) ∈ Rm×m. As the procedure continues,the matrix converges as follows:

Bconverges−−−−−−→

b(n)1,1 0

b(n)2,2 0

. . . . . .b(n)m−1,m−1 0

b(n)m,m

,

– 48 –

JSIAM Letters Vol. 1 (2009) pp.48–51 Hiroki Toyokawa et al.

I-SVD

mdLVsScheme

dLV TwistedFactorization

HouseholderTransformation

B Σ

M U,V

Fig. 1. Process flow of the I-SVD.

limn→∞

b(n)k,k =

√√√√σ2k −

∞∑l=1

θ(l)2

where θ(n) is a non-negative value called shift. Comparedto other algorithms, the mdLVs scheme converges fasterthan QR and bisection, and the resulting singular valueshave a better relative accuracy than QR and dqds. Thefast convergence is due to the shift, but it is the causeof the seriality of the mdLVs scheme.

An arbitrary m×m matrix can be transformed into anupper bidiagonal matrix B by the Householder transfor-mation [9]. This is the preconditioning process. There-fore the mdLVs scheme can be applied to an arbitrarydense matrix M .

The dLV twisted factorization is an algorithm for com-puting singular vectors of an upper bidiagonal matrix Bfor given singular values. Compared to other algorithmssuch as the inverse iteration, the singular vectors can becomputed faster without iteration [3].

2.3 Integrable singular value decomposition (I-SVD)The I-SVD algorithm is accomplished by using the

mdLVs scheme and the dLV twisted factorization [4, 5].First the I-SVD algorithm computes singular valuesσk of B by the mdLVs scheme. Second it computesthe corresponding singular vectors (uk, vk) by the dLVtwisted factorization(Fig. 1). It is possible to parallelizethe computation of singular vectors because the dLVtwisted factorization is parallel executable. However themdLVs scheme is difficult to parallelize. Thus, the totalparallelism is practically limited by the seriality of themdLVs scheme [6].

3. How to parallelize

In this section, we explain how to parallelize the I-SVD algorithm.

3.1 Parallelization of the mdLVs scheme by splitting ofmatrices

During iteration in the mdLVs scheme, a subdiagonalelement, say b

(n)k,k, may be less than a small positive value

εc ≈ 0. Then this element can be regarded as 0, andthe matrix B(n) ∈ Rm×m can be separated into twosmaller bidiagonal matrices B

(n)1 ∈ Rm1×m1 and B

(n)2 ∈

Rm2×m2 , where m2 = m −m1. We call this division asplitting of B(n). The splitting is illustrated as follows.

B(n) =

∗ ∗

∗ ε∗ ∗

∗

splitting−−−−−→ε≈0

∗ ∗

∗ 0∗ ∗

∗

=

(B

(n)1

B(n)2

).

The singular values of B(n)1 , B

(n)2 is computable by

the mdLVs scheme individually. Therefore, the mdLVsscheme can be partly parallelized by assigning the com-putations of singular values of B

(n)1 , B

(n)2 to each of the

cores.

3.2 Parallel execution of the dLV twisted factorizationbased on a deflation

We call the splitting for m1 = m− 1,m2 = 1, a defla-tion. The deflation is expressed as

B(n) =

∗ ∗

∗ ∗∗ ε

∗

deflation−−−−−→ε≈0

∗ ∗

∗ ∗∗ 0

∗

=

B(n)1

b(n)m,m

.

In this case the bottom row and the right column iseliminated, and the (m,m) element b

(n)m,m converges to√

σ2m −

∑nl=1 θ(l)2. Therefore the deflation gives rise to

the singular value σm.In the I-SVD algorithm, when a singular value σk is

computed, computation of the corresponding singularvectors uk, vk can be started, independent of the compu-tation of other singular vectors. Thus, when a deflationoccurs, computation of the corresponding singular vec-tors can be started by an idle core.

3.3 Multi-core processorA multi-core processor is a processor which is com-

posed of several computation cores in one package. Com-pared to a traditional multi-processor system, a multi-core processor has several advantages. For example, itsshared cache memory contributes to efficient memoryaccess. Another point is that a multi-core processor con-sumes less electric power than other multi-processor sys-tems such as PC clusters.

3.4 Assigning jobs to each coreIn order to parallelize both the mdLVs scheme and the

dLV twisted factorization, we regard the SVD process asthe following continual job sequences.

(A) Continue the mdLVs scheme until the next splittingoccurs.

(B) Compute singular vectors corresponding to the

– 49 –


job B

deflation deflationsplitting

Core #1

Core #2

Core #3

job A

job B job B

job A

job A

job B job B

job B

job A : a job for computing singular values by mdLVs method

: a job for computing singular vectors by dLV twisted factorization

time

the list ofunprocessed jobs

Fig. 2. Assigning jobs to each core.

computed singular value with the twisted factoriza-tion.

When a job appears, it is added to the list of unprocessedjobs immediately. An idle core takes out a job from thelist and process it. If a core takes out a job (A), thecore executes the mdLVs scheme until a splitting occurs.When a splitting occurs, the core makes two jobs of type(A) for each of the divided matrices. When a deflationoccurs during the procedure, the core creates a job (B)for the computed singular value. If a core takes out ajob (B), the core executes the twisted factorization andcomputes left and right singular vectors of a singularvalue. In this way, these jobs are processed parallely andthe I-SVD algorithm can be parallelized partly (Fig. 2).

The program is made to execute as follows (Fig. 3):

(1) The main process makes working threads and allo-cate them to each of the cores, and prepares the listof unprocessed jobs (pre-processing).

(2) The working threads execute jobs (compute the sin-gular values with the mdLVs scheme and the singu-lar vectors with the dLV twisted factorization).

(3) When all the singular values and vectors are com-puted, the working threads terminate and themain process prepares the result of SVD (post-processing).

In the pre-processing, the upper limit of the numberof working threads which the main process makes isthe same as the number of cores which the computerhas. The working threads take jobs from the list ofunprocessed jobs, execute the jobs autonomically, andexchange the result with other working threads by us-ing shared memory. Therefore, some jobs wait when thenumber of jobs is greater than that of cores.


Tests have been carried out on our computer de-scribed in Table 1. As numerical examples, we considertwo types of matrices as in Table 2. Let us set the step-size δ(n) ≡ 1 in the mdLVs scheme and the shift θ(n)2

is calculated by using the generalized Newton bound

Bidiagonal matrix B

Pre-process

SVD completed?Yes

No

Take a job from the list of unprocessed jobs

Execute the job

Executed on the working threads

Singular values / vectors of B

Post-process

Fig. 3. Process flow.

Table 1. Specification of the test computer.

CPU Intel Xeon E5430 2.66GHz Quad × 2(4 cores × 2 processors)

Memory 64GBytesOS Linux (Fedora 7)Compiler g++ and gfortran 4.1.2-27 (-O3 option)

Table 2. Two types of upper bidiagonal matrices.

MatrixDiagonal Subdiagonal Distribution

bk,k bk,k+1 of σk

B1 2.001 2 separatedB2 random random clustered

(without subtraction, p = 2) [10]. The test program iswritten in C++ and parallelized with pthread (librariesimplementing POSIX Threads standard), neither MPIor OpenMP.4.1 Results for B1

Table 3 and Fig. 4 show the computation time andrelative speed for SVD of B1. According to them, thecomputation time for B1 does not decrease efficiently asthe number of cores increases. Fig. 5 shows the num-ber of working cores as the time of SVD of 8000× 8000matrix B1 elapses (the time for pre-processing and post-processing is excluded). It is to be noted that no splittingoccurs during the computation for B1. Hence, not suffi-ciently many jobs (B) appear and some of the cores areidle. It seems that only several cores work at the sametime, but remaining cores have nothing to process andare idle.

4.2 Results for B2

Table 4 and Fig. 6 and Fig. 7 show the result for B2.Fig. 7 shows the number of working cores during SVD of

– 50 –


Table 3. Computation time for B1.

#Cores 1 2 4 8

m = 4000 9.28 4.74 2.51 1.87(speed-up) (1.00) (1.96) (3.70) (4.96)

m = 8000 36.96 18.87 9.98 7.33(speed-up) (1.00) (1.96) (3.70) (5.04)

m = 12000 83.08 42.60 23.03 16.45(speed-up) (1.00) (1.95) (3.61) (5.05)

m = 16000 154.12 78.47 44.54 30.19(speed-up) (1.00) (1.96) (3.46) (5.11)

in second: [s]

76543210

deepS evitaleR

0 1 2 3 4 5 6 7 8 9

m=4000m=8000m=12000m=16000

#Cores

Fig. 4. Relative speed for B1.

876543210

seroC gnikro

W#

0 1 2 3 4 5 6 7Elapsed Time [sec]

Fig. 5. Changes of the number of working cores for B1.

8000× 8000 matrix B2 (the time for pre-processing andpost-processing is excluded). Different from the case ofB1, splittings occur successively. Hence, jobs (A) and (B)appear frequently and the cores are used a little moreefficiently than the case of B1.

5. Conclusion

It has been difficult to parallelize the mdLVs [6]. Inthis paper, a new scheme is designed to parallelize the I-SVD algorithm on a multi-core processor. We achieve theparallelism by assigning the jobs of computing singularvalues and singular vectors to each of the cores.

References

[1] J. W. Demmel, Applied Numerical Linear Algebra, SIAM,Philadelphia, 1997.

[2] M. Iwasaki and Y. Nakamura, Accurate computation of sin-gular values in terms of shifted integrable schemes, Japan J.Indust. Appl. Math., 23 (2006), 239–259.

[3] M. Iwasaki, S. Sakano and Y. Nakamura, Accurate twistedfactorization of real symmetric tridiagonal matrices and itsapplication to singular value decomposition (in Japanese),Trans. Japan Soc. Indust. Appl. Math., 15 (2005), 461–481.

Table 4. Computation time for B2.

#Cores 1 2 4 8

m = 4000 9.96 5.23 2.82 1.75(speed-up) (1.00) (1.90) (3.53) (5.69)

m = 8000 38.07 19.93 10.92 6.57(speed-up) (1.00) (1.91) (3.49) (5.79)

m = 12000 87.67 46.30 27.21 16.48(speed-up) (1.00) (1.89) (3.22) (5.32)

m = 16000 162.18 87.83 54.50 33.79(speed-up) (1.00) (1.85) (2.98) (4.80)

in second: [s]

76543210

deepS evitaleR

0 1 2 3 4 5 6 7 8 9

m=4000m=8000m=12000m=16000

#Cores

Fig. 6. Relative speed for B2.

876543210

seroC gnikro

W#

0 1 2 3 4 5Elapsed Time [sec]

Fig. 7. Changes of the number of working cores for B2.

[4] Y. Nakamura, Functionality of Integrable System (inJapanese), Kyoritsu Pub., Tokyo, 2006.

[5] M. Takata, K. Kimura, M. Iwasaki and Y. Nakamura, Per-formance of a new scheme for bidiagonal singular valuedecomposition of large scale, Proc. of 24th IASTED Int.Conf. on Parallel and Distributed Computing and Networks(PDCN2006), pp. 304–309, 2006.

[6] T. Konda, M. Takata, M. Iwasaki, S. Tsujimoto and Y. Naka-mura, A parallelization of singular value computation algo-rithm by the Lotka-Volterra system (in Japanese), IPSJ SIGTech. Rep., HPC-100 (2004), 13–18.

[7] T. Konda and Y. Nakamura, A new algorithm for singularvalue decomposition and its parallelization, Parallel Comput-ing, 35 (2009), 331–344.

[8] T. Konda, M. Takata, M. Iwasaki and Y. Nakamura, A newsingular value decomposition algorithm suited to paralleliza-tion and preliminary results, Proc. of 2nd IASTED Int.Conf. on Advances in Computer Science and Technology(ACST2006), pp. 79–85, 2006.

[9] G. Golub and C. Van Loan, Matrix Computation, Third Edi-tion, John Hopkins Univ. Press, Baltimore, 1996.

[10] T. Yamashita, K. Kimura and Y. Nakamura, On subtractionfree formula for the diagonal elements of inverse powers ofsymmetric positive definite tridiagonal matrices, in prepara-tion, 2009.

– 51 –


A numerical method for nonlinear eigenvalue problems

using contour integrals

Junko Asakura1, Tetsuya Sakurai

2, Hiroto Tadano

2, Tsutomu Ikegami

3and Kinji Kimura

4

Research and Development Division, Square Enix Co. Ltd., Shinjuku Bunka Quint Bldg. 3-22-7 Yoyogi, Shibuya-ku, Tokyo 151-8544, Japan1

Department of Computer Science, University of Tsukuba, 1-1-1 Tennoudai, Tsukuba, Ibaraki305-8573, Japan2

Information Technology Research Institute, AIST, 1-1-1 Umezono, Tsukuba, Ibaraki 305-8568,Japan3

Graduate School of Informatics, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto, 606-8501, Japan4


Received February 14, 2009, Accepted May 1, 2009

Abstract

A contour integral method is proposed to solve nonlinear eigenvalue problems numerically.The target equation is F (λ)x = 0, where the matrix F (λ) is an analytic matrix function of λ.The method can extract only the eigenvalues λ in a domain defined by the integral path, byreducing the original problem to a linear eigenvalue problem that has identical eigenvalues inthe domain. Theoretical aspects of the method are discussed, and we illustrate how to applyof the method with some numerical examples.

Keywords nonlinear eigenvalue problem, contour integral, analytic matrix function


1. Introduction

We consider a numerical method using contour inte-grals to solve nonlinear eigenvalue problems. The non-linear eigenvalue problem (NEP) involves finding eigen-pairs (λ,x) that satisfy F (λ)x = 0, where the matrixF (λ) is an analytic matrix function of λ. NEPs appearin a variety of problems in science and engineering, suchas accelerator designs [1] and delay differential equa-tions [2].

We herein propose a numerical method using contourintegrals to solve eigenvalue problems for analytic matrixfunctions. The method is closely related to the Sakurai-Sugiura (SS) method for generalized eigenvalue prob-lems [3], and inherits many of its strong points includingsuitability for execution on modern distributed parallelcomputers. We have already extended the SS methodto polynomial eigenvalue problems [4]. In this paper, wewill further generalize the SS method to eigenvalue prob-lems of analytic matrix functions. In the SS method, theoriginal problem is converted to a generalized eigenvalueproblem whose dimension is smaller than the originalone. The converted problem is obtained numerically bysolving a set of linear equations. These linear equationsare derived from the original problem and can form alarge system, but they are independent and can be solvedin parallel. Moreover, the proposed method is free fromthe fixed point iterations required in Newton’s method.In this paper, the extension of the SS method for NEPsis discussed from a theoretical point of view. Some nu-merical examples are also reported, with results that areconsistent with the theory.

The reminder of the present paper is organized as fol-lows. In the next section, we introduce the Smith formfor analytic matrix functions, which is a natural exten-sion of the Smith form for matrix polynomials [5]. InSection 3, we present the numerical method for solvingNEPs by means of the SS method and discuss theoret-ical results related to the proposed method. In Section4, we present the algorithm of the SS method for thecase where the integral path is given by a circle and nu-merical integration is performed using the trapezoidalrule. Some numerical examples are shown in Section 5.Finally, conclusions and suggestions for future researchare presented in Section 6.

2. Canonical form for matrix analytic

functions

Let F (z) be an analytic matrix function defined in asimply connected region in C. The matrix F (z) is calledregular if the determinant of F (z) is not identically zeroin a domain Ω.

We introduce the Smith form for analytic matrix func-tions [5].

Theorem 1 Let F (z) be an n × n regular matrix ana-

lytic function. Then, F (z) admits the representation

P (z)F (z)Q(z) = D(z), (1)

where D = diag(d1(z), . . . , dn(z)) is a diagonal matrix of

analytic functions dj(z) for j = 1, 2, . . . , n and such that

dj(z)/dj−1(z) are analytic functions for j = 2, 3, . . . , n.

In addition, P (z) and Q(z) are n × n regular analytic

– 52 –

JSIAM Letters Vol. 1 (2009) pp.52–55 Junko Asakura et al.

matrix functions with constant nonzero determinants.

The eigenpairs of the NEP are formally derived fromthe Smith form. Let qj(z) be the column vectors of Q(z):

Q(z) = (q1(z) . . . qn(z)), (2)

and pj(z) be

P (z)H = (p1(z) . . . pn(z)). (3)

Let λ1, . . . , λs be distinct zeros of dn(z) in Ω. Becausedj(z)/dj−1(z) is an analytic function, dj(z) can be rep-resented in terms of λi as

dj(z) = hj(z) ·s

∏

i=1

(z − λi)αji , j = 1, 2, . . . , n,

where hj(z) are analytic functions and hj(z) 6= 0 forz ∈ Ω. In addition, αji ∈ Z

+ (non-negative integer) andαji ≤ αj′i for j < j′.

The eigenpairs of the NEP are related to the λi andthe qj(λi) above as follows.

Lemma 2 Let qj(z) be the vector in (2), and λi be a

zero of dj(z). Then, the eigenpair (λi, qj(λi)) is a solu-

tion for the NEP F (λ)x = 0.

Proof Because P (z) and Q(z) are invertible,

F (λi)qj(λi) = P (λi)−1D(λi)Q(λi)

−1(Q(λi)ej)

= dj(λi)P (λi)−1

ej .

Since dj(λi) = 0, we have the result of the lemma.(QED)

Note that if the eigenvalue λi is simple and not de-generate, i.e., λi is a simple zero of det F (z), we haveαji = 0 for j 6= n and αni = 1.

3. An eigensolver using contour integrals

In this section, we propose a numerical method us-ing contour integrals for eigenvalue problems of analyticmatrix functions.

Let F (z) be an n×n regular analytic matrix function.For nonzero vectors u and v ∈ C

n, we define

f(z) := uHF (z)−1

v

for z ∈ Ω such that |F (z)| 6= 0, namely dn(z) 6= 0.The existence of the Smith form allows us to prove thefollowing theorem.

Theorem 3 Let D(z) = diag(d1(z), . . . , dn(z)) be the

Smith form for F (z), and let P (z) and Q(z) be defined

by (1). Then, f(z) admits the representation

f(z) =n

∑

j=1

χj(z)

dj(z), (4)

where χj(z) are analytic functions in Ω.

Proof By Theorem 1, we obtain

f(z) = uHQ(z)D(z)−1P (z)v

=

n∑

j=1

uHqj(z)pj(z)Hv

dj(z)

=

n∑

j=1

χj(z)

dj(z),

where χj(z) := uHqj(z)pj(z)Hv.

(QED)

Let Γ be a positively oriented closed Jordan curvein Ω. Without loss of generality, we may assume thatλ1, . . . , λm(m ≤ s) are distinct eigenvalues in the inte-rior of Γ ⊂ Ω. Assume that these eigenvalues are simpleand not degenerate. Then we can suppose that αji = 0for j 6= n and αni = 1.

Definition 4 For a non-negative integer k, the moment

µk is defined as

µk :=1

2πi

∫

Γ

zkf(z)dz, k = 0, 1, . . . . (5)

Definition 5 Two m×m Hankel matrices H<m and Hm

can be defined as

Hm := [µi+j−2]mi,j=1, H<

m := [µi+j−1]mi,j=1.

The following theorem is one of the main results ofthe present paper.

Theorem 6 If χn(λl) 6= 0 for 1 ≤ l ≤ m, then

the eigenvalues of the pencil H<m − λHm are given by

λ1, . . . , λm.

Proof By Theorem 3 and (5), we obtain

µk =1

2πi

∫

Γ

zkf(z)dz

=

n∑

j=1

1

2πi

∫

Γ

χj(z)

dj(z)zkdz

=

m∑

l=1

νlλkl ,

where νl := χn(λl)/d′n(λl).Let Vm be the Vandermonde matrix

Vm :=

1 1 · · · 1λ1 λ2 · · · λm

......

...λm−1

1 λm−12 · · · λm−1

m

.

Let Dm := diag(ν1, . . . , νm), Λm := diag(λ1, . . . , λm).One can easily verify that

H<m − λlHm = VmDm(Λm − λlI)V T

m . (6)

If χn(λl) 6= 0 for 1 ≤ l ≤ m, then νl 6= 0 for 1 ≤ l ≤ m.Therefore, λ1, . . . , λm are the eigenvalues of the pencilH<

m − λHm.(QED)

Therefore, we can obtain eigenvalues λ1, . . . , λm of theanalytic matrix function F (z) by solving the general-ized eigenvalue problem H<

mw = λHmw. The proof ofthe above theorem for generalized eigenvalue problemsis given in [3].

Now, we evaluate eigenvectors. Let

sk :=1

2πi

∫

Γ

zkF (z)−1vdz, k = 0, 1, . . . ,m − 1, (7)

– 53 –


and let S := (s0 . . . sm−1). We obtain the following re-lationship between S and qn(z) of (2).

Lemma 7 Let qn(z) be the vector in (2) and let (λl,wl)(1 ≤ l ≤ m) be the eigenpairs of the pencil H<

m − λHm.

Then,

qn(λl) = clSwl, cl ∈ C\0for l = 1, 2, . . . ,m.

Proof From (6), we have

0 = (H<m − λlHm)wl = VmDm(Λm − λlI)V T

mwl.

Since Vm and Dm are nonsingular, and Λmel = λlel,V T

mwl admits the following representation:

V Tmwl = αlel, αl ∈ C\0.

Here, el is the lth unit vector. Let pn(z) be the vectorin (3) and let

βl :=pn(λl)

Hv

d′n(λl)

for l = 1, . . . ,m. Note that βl 6= 0 if χn(λl) 6= 0. As inthe proof of Theorem 3.4, we can derive the followingequation.

S = (s0 . . . sm−1) = (β1qn(λ1) . . . βmqn(λm))V Tm .

Therefore,

qn(λl) =1

βl

SV −Tm el =

1

βl

S1

αl

wl = clSwl,

with cl = 1/(αlβl) for l = 1, 2, . . . ,m.(QED)

From Lemma 2 and Lemma 7, we have the followingtheorem.

Theorem 8 Let (λj ,wj)(j = 1, . . . ,m) be the eigen-

pairs of the pencil H<m − λHm. Then, (λj ,xj)(j =

1, . . . ,m) are the eigenpairs for the NEP F (λ)x = 0,

where

xj = Swj , j = 1, . . . ,m.

4. A case where Γ is given by a circle

Let Γ = γ + ρeiθ(0 ≤ θ < 2π) be a circle in Ω withcenter γ and radius ρ. To retain numerical accuracy, weuse the shifted and scaled moments

µk :=1

2πi

∫

Γ

(

z − γ

ρ

)k

f(z)dz, k = 0, 1, . . . (8)

instead of (5). We evaluated the integral using the N -point trapezoidal rule, leading to the approximations forµk,

µk ≈ µk :=1

N

N−1∑

j=0

(

ωj − γ

ρ

)k+1

f(ωj),

where ωj = γ + ρe2πi(j+1/2)/N for j = 0, 1, . . . , N − 1.Note that due to the shift and scaling, the eigenvaluesλl(l = 1, . . . ,m) are also shifted and scaled. The eigen-values of the original NEP can be recovered from γ+ρλl.

The block version of the SS method for generalizedeigenvalue problems was proposed in [6]. The numerical

examples in [6] indicate that the block SS method hasthe potential to achieve greater accuracy.

Let U and V be n × L matrices, the column vec-tors of which are linearly independent. The block SSmethod is defined by replacing f(z) in (5) with the ma-trix UHF (z)−1V . Accordingly, the kth moment µk in(5), the Hankel matrices Hm, H<

m, the vector sk in (7)and the matrix S = (s0 . . . sm−1) are replaced by thecorresponding block versions:

Mk :=1

2πi

∫

Γ

zkUHF (z)−1V dz, k = 0, 1, . . . , (9)

HmL := [Mi+j−2]mi,j=1, H<

mL := [Mi+j−1]mi,j=1,

Sk :=1

2πi

∫

Γ

zkF (z)−1V dz, k = 0, 1, . . . ,

and S = (S0 . . . Sm−1), respectively. Here m is a positiveinteger such that mL ≥ m. Note that Mk = UHSk bydefinition. Using the N -point trapezoidal rule, we obtainthe following approximation for Sk:

Sk :=1

N

N−1∑

j=0

(

ωj − γ

ρ

)k+1

F (ωj)−1V, k = 0, 1, . . . .

(10)

The algorithm for the block SS method is shown below.

Algorithm of the block SS method

Input: U, V ∈ Cn×L, N, K, L, δ, γ, ρ

Output: λ1, . . . , λm′ , x1, . . . , xm′

1. Set ωj ← γ + ρexp(2πi(j + 1/2)/N), j = 0, 1, . . . , N − 1

2. Compute F (ωj)−1V , j = 0, 1, . . . , N − 1

3. Compute Sk, k = 0, . . . , 2K − 1 by (10)

4. Form Mk = UHSk, k = 0, 1, . . . , 2K − 1

5. Construct HKL and H<

KL∈ C

KL×KL

6. Perform a singular value decomposition of HKL

7. Omit small singular value components σj < δ ·maxi σi

so that

Hm′ = HKL(1 : m′, 1 : m′), H<

m′ = H<

KL(1 : m′, 1 : m′),

where m′ ≤ KL

8. Compute the eigenpairs (ζ1, w1), . . . , (ζm′ , wm′) of the

pencil H<

m′ − λHm′

9. Construct S = (S0 . . . Sm′−1)

10. Compute xj = Swj , j = 1, 2, . . . , m′

11. Set λj ← γ + ρζj , j = 1, 2, . . . , m′

In practice, we assign random matrices to U and V .We can obtain the eigenvectors corresponding to theeigenvalues whose algebraic multiplicity is less than Lby the proposed method.

5. Numerical Examples

In this section, we confirm the validity of the proposedmethod using some nonlinear eigenvalue problems. Thealgorithm was implemented in MATLAB 7.4. We gen-erated a matrix V := (v1 . . . vL) using the MATLABfunction rand and set U = V . The MATLAB com-mand mldivide was used to evaluate F (z)−1V numeri-cally. The evaluated eigenvectors are normalized so that‖x‖2 = 1.

Example 1 We consider the NEP with F (z) that

– 54 –


Table 1. Relative errors and residuals for Example 1.

k λk |λk − λk|‖F (λk)xk‖2

‖F (λk)‖2‖xk‖2

1 −3.141592653589789 4.00 ×10−15 2.58 × 10−12

2 −1.570796326794277 6.20 ×10−13 1.67 × 10−12

3 0.000000000000661 6.61 ×10−13 1.52 × 10−11

4 1.570796326761298 3.36 ×10−11 1.11 × 10−10

5 1.945910151338245 2.28 ×10−9 3.11 × 10−8

6 3.141592653589055 7.39 ×10−13 3.57 × 10−11

Table 2. Residuals for Example 2.

k λ1/2

k

‖F (λk)xk‖F

‖F (λk)‖F ‖xk‖F

1 0.059793132432759+0.000000862974322i 1.41 ×10−15

2 0.083768827897551+0.000019602073839i 6.38 ×10−17

3 0.084151690319656+0.000003399562592i 1.25 ×10−16

4 0.087765211962668+0.000038185170188i 3.47 ×10−17

5 0.088352686155210+0.000005726087041i 3.13 ×10−17

6 0.093424713463988+0.000393486671297i 5.55 ×10−17

was transformed using elementary transformations fromdiag(cos(z), sin(z), ez − 7). The following list shows theelements of F (z).

(1, 1) 2ez + cos(z) − 14(1, 2) (z2 − 1) sin(z) + (2ez + 14) cos(z)(1, 3) 2ez − 14(2, 1) (z + 3)(ez − 7)(2, 2) sin(z) + (z + 3)(ez − 7) cos(z)(2, 3) (z + 3)(ez − 7)(3, 1) ez − 7(3, 2) (ez − 7) cos(z)(3, 3) ez − 7

The integral path Γ taken was as follows:

Γ = γ + ρeiθ (γ = 0, ρ = 3.2).

There are six eigenvalues λ1 = −π, λ2 = −π/2, λ3 =0, λ4 = π/2, λ5 = log 7(≈ 1.9459), λ6 = π in Γ. We tookN = 64, K = 8, L = 2, and δ = 10−12.

The numerical results are shown in Table 1. We com-pared the eigenvalues λj that are obtained by theblock SS method to the exact eigenvalues λj. As shownin Table 1, we obtained all of the eigenvalues in Γ.

Example 2 We consider the problem that models aradio-frequency gun cavity given in [1] with

F (λ) = A0 − λA1 + i√

λ − σ21W1 + i

√

λ − σ22W2,

where A0, A1,W1,W2 ∈ R9956×9956. We took σ1 = 0 and

σ2 = 0.043551. The integral path Γ taken was as follows:

Γ = γ + ρeiθ (γ = 0.00625, ρ = 0.00375).

We took N = 64, K = 8, L = 24, and δ = 10−12.The numerical results are shown in Table 2. We

used Frobenius norm instead of 2-norm. Table 2 showsthat the proposed method found six eigenvalues in Γ.The largest residual of the computed eigenpairs was1.41 × 10−15.

Example 3 Lastly, we consider the problem derivedby the delay-differential equation with a single delay

Table 3. Residuals for Example 3.

k λk

‖F (λk)xk‖2

‖F (λk)‖2‖xk‖2

1 17.773906360548423 2.41 ×10−16

2 14.471490519110109 2.14 ×10−16

3 8.961335387916407 3.43 ×10−16

4 0.941336550782964 1.43 ×10−15

5 −10.407305274429442 1.08 ×10−15

6 −31.755615500815374 9.43 ×10−16

given in [2]:

F (λ) = −λI + A0 + A1e−τλ,

where A0, A1 ∈ R1000×1000 are tridiagonal matrices and

I is the identity matrix. We took τ = 0.05. The integralpath Γ taken was as follows:

Γ = γ + ρeiθ (γ = −10, ρ = 30).

We took N = 48, K = 16, L = 4 and δ = 10−12. It isknown that a total of six real eigenvalues lie in [−40, 20].

The numerical results are shown in Table 3. As shownin Table 3, the proposed method found all eigenvaluesin the specified domain. The largest residual of the com-puted eigenpairs was 1.43 × 10−15.

6. Conclusion

In the present paper, we have proposed a numericalmethod using contour integrals for nonlinear eigenvalueproblems of analytic matrix functions. The method isconsidered as an extension of the numerical method forpolynomial eigenvalue problems proposed in [4]. Themethod enables us to obtain the eigenpairs of analyticmatrix functions by solving the generalized eigenvalueproblem, which is derived by solving systems of linearequations. Since these linear systems are independent ofeach other, they can be solved in parallel. In addition,the proposed method does not need fixed point iterationssuch as Newton’s iteration. Error analysis for the pro-posed method and the estimation of suitable parametersremain as topics for future research.

References

[1] B. Liao, Subspace projection methods for model order reduc-tion and nonlinear eigenvalue computation, PhD thesis, De-

partment of mathematics, Univ. of California at Davis, 2007.[2] E. Jarlebring, The spectrum of delay-differential equations:

numerical methods, stability and perturbation, PhD thesis,

Inst. Comp. Math, TU Braunschweig, 2008.[3] T. Sakurai and H. Sugiura, A projection method for general-

ized eigenvalue problems, J.Comput.Appl.Math., 159 (2003)119–128.

[4] J.Asakura, T. Sakurai, H.Tadano, T. Ikegami and K.Kimura,A numerical method for polynomial eigenvalue problems us-

ing contour integral, submitted.

[5] I. Gohberg and L. Rodman, Analytic matrix functions withprescribed local data, J. d’Analyse Mathematique, 40 (1981)90–128.

[6] T. Ikegami, T. Sakurai, and U. Nagashima, A filter diago-

nalization for generalized eigenvalue problems based on theSakurai-Sugiura method, Tech. Rep. CS-TR-08-13, Univ. of

Tsukuba, 2008.

– 55 –


Differential qd algorithm for totally nonnegative band

matrices: convergence properties and error analysis

Yusaku Yamamoto1

and Takeshi Fukaya1

Department of Computational Science and Engineering, Nagoya University, Furo-cho,Chikusa-ku, Nagoya, 464-8603, Japan1


Received May 27, 2009, Accepted August 13, 2009

Abstract

We analyze convergence properties and numerical properties of the differential qd algorithmgeneralized for totally nonnegative band matrices. In particular, we show that the algorithmis globally convergent and can compute all eigenvalues to high relative accuracy.

Keywords eigenvalue, totally nonnegative, band matrix, qd algorithm, error analysis


1. Introduction

The dqds (differential quotient-difference with shift)algorithm [1] is one of the most successful algorithmsfor computing the eigenvalues of a symmetric positive-definite tridiagonal matrix. It is mathematically equiva-lent to the LR algorithm, but instead of working on thetridiagonal matrix T , it uses the elements of its bidiag-onal factor B, where T = BT B, as basic variables and

performs the LR step(

B(n+1))T

B(n+1) = B(n)(

B(n))T

without forming B(n)(

B(n))T

explicitly. The dqds al-gorithm has a unique feature that it can compute allthe eigenvalues to high relative accuracy and is used asone of the key ingredients in the MR3 algorithm for thesymmetric tridiagonal eigenproblem [2].

The high relative accuracy of the dqds algorithm ismade possible thanks to the following two properties:

(i) The algorithm involves only positive variables anduses no subtractions except in the introduction oforigin shifts. Thus the rounding errors arising in thecomputation of B(n) → B(n+1) are kept small in thesense of relative error.

(ii) Small relative errors in the elements of B(n) causesmall relative perturbation in the eigenvalues.

It is therefore interesting to investigate if the dqds algo-rithm can be extended to other types of matrices whileretaining these useful properties.

In this paper, we consider a class of totally nonnega-tive band matrices. A matrix is called totally nonnega-tive (TN) if all of its minors are nonnegative [3]. The TNband matrices can be regarded as a generalization of thesymmetric positive-definite tridiagonal matrices and thedqd (differential qd) algorithm, which is a shiftless ver-sion of the dqds, can be naturally extended to deal withthis type of matrix. We study the convergence propertiesand numerical properties of the dqd algorithm applied tothe TN band matrix. In particular, we prove its globalconvergence and show that it shares the properties (i)and (ii) of the dqds algorithm. Using these facts, we

show that the algorithm can compute all eigenvalues ofa TN band matrix to high relative accuracy.

Recently, Fukuda et al. formulated a new algorithm foreigenvalue computation based on an integrable systemcalled the discrete hungry Lotka-Volterra (dhLV) system[4]. We also point out that there is a close relationshipbetween this algorithm and the dqd algorithm for theTN band matrix.

2. The differential qd algorithm for a to-

tally nonnegative band matrix

Let Li (i = 1, . . . ,mL) and Ri (i = 1, . . . ,mR) bem × m lower and upper bidiagonal matrices defined by

Li =

qi1

1 qi2

1 qi3

. . .. . .

1 qim

and

Ri =

1 ei1

1 ei2

1. . .

. . . ei,m−1

1

, (1)

respectively, where qik (i = 1, . . . ,mL, k = 1, . . . ,m)and eik (i = 1, . . . ,mR, k = 1, . . . ,m − 1) are somepositive numbers. In this paper, we consider the problemof computing the eigenvalues of a matrix defined as theproduct of these bidiagonal factors:

A = L1L2 · · ·LmLR1R2 · · ·RmR

. (2)

Clearly, A is a nonsingular band matrix with lower band-width mL and upper bandwidth mR. Moreover, A is to-tally nonnegative, since it is a product of positive bidi-agonal matrices [5]. From this fact, it can also be con-cluded that all the eigenvalues of A are simple, real and

– 56 –

JSIAM Letters Vol. 1 (2009) pp.56–59 Yusaku Yamamoto et al.

positive. When mL = mR = 1, A is similar to some sym-metric positive-definite tridiagonal matrix. In this sense,the matrix in (2) can be regarded as a generalization ofthe symmetric positive-definite tridiagonal matrix.

Now, consider computing the eigenvalues of A withthe LR algorithm [6]. In the LR algorithm, we first de-compose the input matrix A into the product of lowerand upper triangular matrices as A = L(0)R(0). Thenwe reverse the order of the triangular factors to obtainthe next iterate A(1) = R(0)L(0). This iterate is againdecomposed as A(1) = L(1)R(1) and this process is con-tinued until convergence.

In our case, because the original A is already definedas a product of multiple lower and upper triangular ma-trices, we can omit the first decomposition and writeL(0) and R(0) as

L(0) = L1 · · ·LmL, R(0) = R1 · · ·RmR

. (3)

Furthermore, the decomposition A(1) = L(1)R(1) can beperformed stepwise as the following example shows:

A(1) = R(0)

1 R(0)

2 L(0)

1 L(0)

2

= R(0)

1 L(0,1)1 R

(0,1)2 L

(0)

2

= L(0,2)1 R

(0,1)1 R

(0,1)2 L

(0)

2

= L(0,2)1 R

(0,1)1 L

(0,1)2 R

(0,2)2

= L(0,2)1 L

(0,2)2 R

(0,2)1 R

(0,2)2 ≡ L

(1)

1 L(1)

2 R(1)

1 R(1)

2 .

In summary, when the input matrix A is defined as aproduct of bidiagonal factors as in (2), one step of theLR algorithm can be performed as a sequence of mLmR

LR transformations RiLj = LjRi for a pair of bidiago-nal matrices without forming A(1) explicitly. Since eachLR transformation can be done without subtractions byusing the differential qd algorithm [1], one step of theLR algorithm can be carried out without subtractions.We call this the multiple dqd algorithm. The outline ofthis algorithm is shown below.

[Algorithm 1: The multiple dqd algorithm]Initialization:

q(0,0)

j,k = qj,k (1 ≤ j ≤ mL, 1 ≤ k ≤ m)

e(0,0)

i,k = ei,k (1 ≤ i ≤ mR, 1 ≤ k ≤ m − 1)1: for n = 0, 1, . . . do2: for j = 1, . . . ,mL do3: for i = mR, . . . , 1 do

4: dj,1 = q(n,mR−i)

j,1

5: for k = 1, . . . ,m − 1 do

6: q(n,mR−i+1)

j,k = dj,k + e(n,j−1)

i,k

7: e(n,j)

i,k = e(n,j−1)

i,k q(n,mR−i)

j,k+1/q

(n,mR−i+1)

j,k

8: dj,k+1 = dj,kq(n,mR−i)

j,k+1/q

(n,mR−i+1)

j,k

9: end for10: q

(n,mR−i+1)

j,m = dj,m

11: end for12: end for13: q

(n+1,0)

j,k = q(n,mR)

j,k (1 ≤ j ≤ mL, 1 ≤ k ≤ m)

14: e(n+1,0)

i,k = e(n,mL)

i,k (1 ≤ i ≤ mR, 1 ≤ k ≤ m−1)15: end for

3. Global convergence properties

In this section, we establish a theorem that guaranteesglobal convergence of the multiple dqd algorithm. Thisis proved by extending a technique used in the proof ofglobal convergence of the dqds algorithm [7].

Theorem 1 Let the eigenvalues of a TN matrix defined

by (2) be λ1 > λ2 > · · · > λm > 0. When the multi-

ple dqd algorithm is applied to this matrix, the following

equalities hold:

limn→∞

mL∏

j=1

q(n,0)

j,k = λk (1 ≤ k ≤ m), (4)

limn→∞

e(n,0)

i,k = 0 (1 ≤ i ≤ mR, 1 ≤ k ≤ m − 1), (5)

limn→∞

e(n+1,0)

i,k

e(n,0)

i,k

=λk+1

λk

(1 ≤ k ≤ m − 1). (6)

That is, each subdiagonal element e(n,0)

ik of R(n)

i con-

verges to zero linearly at a rate that depends only on kand not on i. The product of the kth diagonal elements

of L(n)

1 , . . . , L(n)mL converges to the eigenvalue λk.

Proof Algorithm 1 uses the differential qd transforma-tion (lines 4 to 10) to implement the LR transformation

R(n,j−1)

i L(n,mR−i)

j = L(n,mR−i+1)

j R(n,j)

i . (7)

However, for the purpose of proof, it is more convenientto go back to the original equation (7). By comparingthe diagonal elements of (7), we have

q(n,mR−i+1)

j,k = q(n,mR−i)

j,k − e(n,j)

i,k−1+ e

(n,j−1)

i,k (8)

for 1 ≤ k ≤ m with the boundary condition e(n,j)

i,0 =

e(n,j)

i,m = 0. By summing up (8) for 1 ≤ i ≤ mR and

1 ≤ l ≤ n and noting that q(n+1,0)

j,k > 0, we have

0 <

n∑

l=0

mR∑

i=1

e(l,j)

i,k−1< q

(0,0)

j,k +

n∑

l=0

mR∑

i=1

e(l,j−1)

i,k . (9)

Using (9) repeatedly, noting that e(l,j)

i,m = 0 and taking

the limit of n → ∞ [7], we have∑

∞

n=0

∑mR

i=1e(n,j)

i,k < ∞for 1 ≤ k ≤ m − 1. Hence e

(n,j)

i,k → 0 and (5) is proved.

Thus R(n)

1 , . . . , R(n)mR tend to identity matrices. From (8)

and e(n,j)

i,k → 0 we know that each q(n,i)

j,k converges toa constant that does not depend neither on n nor i.Consequently, A(n) converges to a lower triangular ma-

trix whose kth diagonal element is∏mL

j=1q(n,0)

j,k , whichin turn are the eigenvalues of A. Using the same argu-ment as in [7], we know that these eigenvalues appear inthe descending order of magnitude, as claimed by (4).Eq. (6) follows by multiplying line 7 of Algorithm 1 sideby side from j = 1 to j = mL and using (4).

(QED)

4. Accuracy of the computed eigenvalues

Our next objective is to study the accuracy of themultiple dqd algorithm in finite precision arithmetic. Tothis end, we need to combine rounding error analysis

– 57 –


with perturbation theory, as in the case of the dqds algo-rithm [1]. First, we quote a lemma concerning roundingerrors resulting from a single LR transformation [1].

Lemma 2 Assume that qkmk=1

and ekm−1

k=1are com-

puted from qkmk=1

and ekm−1

k=1by the LR transforma-

tion RL = LR of differential qd type using finite pre-

cision arithmetic. Then there exist qkmk=1

, ekm−1

k=1,

qkmk=1

and ekm−1

k=1such that

• each qk and ek differs from qk and ek by at most 3

ulps (units in the last place) and 1 ulp, respectively,

• each qk and ek differs from qk and ek by at most 2

ulps, respectively, and

• qkmk=1

and ekm−1

k=1are computed from qkm

k=1

and ekm−1

k=1by an LR transformation in exact

arithmetic.

As for the relative perturbation of the eigenvalues, Koevproves the following lemma [5].

Lemma 3 Let A be a matrix obtained by multiplying an

arbitrary element of one of the bidiagonal factors of the

right-hand-side of (2) by 1 + ǫ, where |ǫ| ≪ 1. Denote

the eigenvalue of A corresponding to λk by λk. Then the

following bound holds.

|λk − λk| ≤2|ǫ|

1 − 2|ǫ|λk (1 ≤ k ≤ m). (10)

By combining Lemmas 2 and 3, we can prove the fol-lowing theorem.

Theorem 4 Assume that the multiple dqd algorithm is

executed in finite precision arithmetic. Denote the ma-

trix (defined implicitly as a product of bidiagonal fac-

tors) at the nth step by A(n) and its eigenvalues by

λ(n)

1 , . . . , λ(n)m . Then for 1 ≤ k ≤ m,

∣

∣

∣λ

(n+1)

k − λ(n)

k

∣

∣

∣≤ 16mmLmRu

1 − 16mmLmRuλ

(n)

k , (11)

where u denotes the machine epsilon.

Proof From Lemma 2, we can decompose each LRtransformation (7) in one step of the multiple dqd al-gorithm into three (virtual) steps:

(a) Multiply each diagonal element of matrix L(n,mR−i)

j

by 1 + ǫk and each subdiagonal element of R(n,j−1)

i

by 1 + δk, where |ǫk| ≤ 3u and |δk| ≤ u, to obtainLj and Ri.

(b) Perform exact LR transformation to get Lj and Ri

from Lj and Ri.

(c) Multiply each diagonal element of Lj by 1 + ǫk and

each subdiagonal element of Ri by 1 + δk, where

|ǫk| ≤ 2u and |δk| ≤ 2u, to obtain L(n,mR−i+1)

j and

R(n,j)

i .

We also recall a lemma from [8] that for positive inte-gers m1,m2 and a sufficiently small positive number δ,if |δ1| ≤ m1δ/(1−m1δ) and |δ2| ≤ m2δ/(1−m2δ), then

|(1 + δ1)(1 + δ2) − 1| ≤ (m1 + m2)δ

1 − (m1 + m2)δ. (12)

1 200 400 600 800 1000

10 -20

10 -40

10 -60

10 -80

10 -100

1

Iterationse i, k(n,0)

e1,1(n,0) e2,1

(n,0) e3,1(n,0)

Fig. 1. Convergence of e(n,0)

i,k.

Using Lemma 3 and (12) repeatedly, we know that theeigenvalues before and after step (a), which we denoteby λk and λk, respectively, are related as follows:

∣

∣λk − λk

∣

∣ ≤ 8mu

1 − 8muλk. (13)

Clearly, step (b) does not change the eigenvalues. The

eigenvalues after step (c), which we denote by λk, is re-lated with λk as follows:

∣

∣

∣λk − λk

∣

∣

∣≤ 8mu

1 − 8muλk. (14)

Combining (13) and (14) using (12) again and notingthat mLmR LR transformations are needed to completeone multiple dqd step, we get at (11).

(QED)

Eq. (11) means that only small relative error is intro-duced into each eigenvalue by one multiple dqd step.Consequently, by iterating the step until A(n) becomessufficiently close to a lower triangular matrix, all eigen-values can be computed to high relative accuracy.

5. Numerical results

To confirm our analysis in the preceding sections, weperformed preliminary numerical experiments. We setm = 10, mL = 1 and mR = 3 and set the values of eik

and qik using random numbers in [0, 1]. The computationwas done using Fortran in double-precision arithmetic.To check the accuracy of the computed eigenvalues, wealso used Mathematica with 100 decimal digits, formedA = L1R1R2R3 explicitly and computed its eigenvalues.

Fig. 1 shows e(n,0)

i,k as a function of n. Clearly, all ofthem converge to zero linearly. Though there are 27 ofthem, only 9 lines can be seen in Fig. 1. This is because

the convergence rate of e(n,0)

i,k depends only on k and

therefore the lines for e(n,0)1,1 , e

(n,0)2,1 and e

(n,0)3,1 , for exam-

ple, overlap. This is in accordance with Theorem 1. Theeigenvalues computed by the multiple dqd algorithm, aswell as those computed by Mathematica, are shown inTable 1. It is clear that all the eigenvalues are computedto high relative accuracy. This confirms the analysis inthe previous section.

– 58 –


Table 1. Accuracy of the computed eigenvalues.

k Eigenvalue (multiple dqd) Eigenvalue (Mathematica)

1 4.91157038895942 4.911570388959422 4.39994910448026 4.39994910448027

3 2.78858778700010 2.788587787000094 1.70154451982839 1.701544519828395 1.54018002523324 1.540180025233236 7.87221915454847× 10−1 7.87221915454843× 10−1

7 5.05289068997342× 10−1 5.05289068997343× 10−1

8 2.99791761043647× 10−1 2.99791761043646× 10−1

9 2.55626925974812× 10−2 2.55626925974812× 10−2

10 1.24375120694785× 10−4 1.24375120694785× 10−4

6. Relationship with the dhLV algorithm

Let’s consider the case of mL = 1 and mR = M ,where M is some positive integer. In this case it is wellknown [9] that the eigenvalue problem of A is equivalentto the eigenvalue problem of an (M + 1)m × (M + 1)mmatrix C defined by

C =

L1

RM

. . .

R2

R1

. (15)

More precisely, if λk is an eigenvalue of A, then

(λk)1/(M+1)

exp(2πp/(M + 1)) (0 ≤ p ≤ M) are eigen-values of C, and vice versa. Furthermore, express therow index i of C as i = (l − 1)m + b (1 ≤ l ≤ M + 1,1 ≤ b ≤ m) and permute the ith row to the i′th row,where i′ = (b− 1)(M + 1) + l. Apply the same permuta-tion also to the columns. This amounts to replacing Cwith F = PCPT , where P is a permutation matrix, andis called shuffling [9]. It is easy to see that F is a matrixwith only two nonzero diagonals as follows:

Fi+1,i = 1 (1 ≤ i ≤ (M + 1)m − 1),

Fi,i+M = F(b−1)(M+1)+l,b(M+1)+l−1

=

q1,b (l = 1, 1 ≤ b ≤ m),eM+2−l,b (2 ≤ l ≤ M+1, 1 ≤ b ≤ m−1).

By rewriting Fi,i+M as Ui (1 ≤ i ≤ (m − 1)M + m),it can be seen that F is exactly the type of matrix forwhich the dhLV algorithm [4] has been designed. Thuswe can say that the class of matrices whose eigenvaluescan be computed by the dhLV algorithm is a subset ofthe class of matrices whose eigenvalues can be computedaccurately by the multiple dqd algorithm.

7. Related work

It is widely recognized that by representing a TN ma-trix as a product of positive bidiagonal factors, variouslinear algebra operations can be performed without sub-tractions [10] [11]. Using this fact, several highly accu-rate algorithms have been proposed for linear simultane-ous equations [11], eigenvalue problems [5] and singularvalue problems [12] with TN coefficient matrices.

Among them, Koev’s algorithm [5] can compute allthe eigenvalues of a general TN matrix to high relativeaccuracy. It first reduces a TN matrix in factored form to

a product of a lower bidiagonal matrix an upper bidiag-onal matrix using subtraction-free operations and thencompute the eigenvalues of the resulting matrix with thedqds algorithm. In this approach, the reduction phaserequires O(m3) flops. In contrast, the multiple dqd al-gorithm applies the LR transformation directly to a ma-trix represented by (2). The computational work of oneLR transformation is O(mmLmR). Hence the latter ap-proach may be advantageous when mL,mR ≪ m andonly a small number of eigenvalues of the smallest mag-nitude are required.

8. Conclusion

In this paper, we studied convergence properties andnumerical properties of the differential qd algorithm fortotally nonnegative band matrices. Our analysis showsthat the algorithm is globally convergent and can com-pute all eigenvalues to high relative accuracy. Theseproperties were confirmed by numerical experiments.Our future work includes introducing origin shifts intothis algorithm to speed up the convergence. It is also thesubject of our future research to further investigate therelationship between the multiple dqd algorithm and thedhLV algorithm.

Acknowledgements

The authors would like to thank Prof. MasashiIwasaki, Prof. Satoshi Tsujimoto, Prof. Yoshimasa Naka-mura, Ms. Akiko Fukuda and Mr. Kensuke Aishima forvaluable discussion. This work is partially supported bythe Ministry of Education, Science, Sports and Culture,Grant-in-aid for Scientific Research.

References

[1] K. V. Fernando and B. N. Parlett, Accurate singular valuesand differential qd algorithms, Numer.Math., 67 (1994), 191–229.

[2] I. S. Dhillon and B. N. Parlett, Multiple representations tocompute orthogonal eigenvectors of symmetric tridiagonalmatrices, Lin. Alg. Appl., 387 (2004), 1–28.

[3] T.Ando, Totally positive matrices, Lin.Alg.Appl., 90 (1987),165–219.

[4] A.Fukuda, E. Ishiwata, M.Iwasaki and Y.Nakamura, The dis-crete hungry Lotka-Volterra system and a new algorithm forcomputing matrix eigenvalues, Inverse Problems, 25 (2009),015007.

[5] P. Koev, Accurate eigenvalues and SVDs of totally nonnega-

tive matrices, SIAM J. Matrix Anal. Appl., 27 (2005), 1–23.[6] J. H. Wilkinson, The Algebraic Eigenvalue Problem, Claren-

don Press, Oxford Univ., Oxford, 1965.[7] K. Aishima, T. Matsuo, K. Murota and M. Sugihara, On con-

vergence of the dqds algorithm for singular value computa-tion, SIAM J. Matrix Anal. Appl., 30 (2008), 522–537.

[8] N. J. Higham, Accuracy and Stability of Numerical Algo-rithms, SIAM, Philadelphia, 1996.

[9] D. S. Watkins, Product Eigenvalue Problems, SIAM Review,47 (2005), 3–40.

[10] A. Berenstein, S. Fomin and A. Zelevinsky, Parametrizations

of canonical bases and totally positive matrices, Adv. Math.,122 (1996), 49–149.

[11] M. Gasca and J. M. Pena, Total positivity and Neville elimi-nation, Lin. Alg. Appl., 165 (1992), 25–44.

[12] J. Demmel, Accurate singular value decomposition of struc-tured matrices, SIAM J. Matrix Anal. Appl., 21 (1999), 562–580.

– 59 –


Algorithm for computing Kronecker basis

Yoshiaki Kakinuma1, Kazuyuki Hiraoka

1, Hiroki Hashiguchi

1,

Yutaka Kuwajima1

and Takaomi Shigehara1

Graduate School of Science and Engineering, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama City, Saitama 338-8570, Japan1


Received March 18, 2009, Accepted September 16, 2009

Abstract

To make clear geometrical structure of an arbitrarily given pencil, it is crucial to understandKronecker structure of the pencil. For this purpose, GUPTRI is the only practical numericalalgorithm at present. However, although GUPTRI determines the Kronecker canonical form(KCF), it does not give any direct information on Kronecker bases (KB). In this paper, we pro-pose a numerical algorithm which gives a full of information on Kronecker structure includingKB as well as KCF. The main ingredient of the algorithm is singular value decompositions,which guarantee the backward stability of the algorithm.

Keywords pencil, Kronecker canonical form, Kronecker basis


1. Introduction

Let (f, g)V,W be a pencil, namely a pair of linear map-pings f, g between two finite-dimensional linear spacesV and W over C. It is known [1] that for an arbitrary(f, g)V,W , there exists a Kronecker basis (KB), namelya pair of bases 〈xj〉, 〈yj〉 of V and W composed of se-quences, each of which satisfies one of five diagrams;

(R) 0f−µg←−− x1

g−→ y1

f−µg←−− x2

g−→ · · · f−µg←−− xlg−→ yl,

(S1) 0f←− x1

g−→ y1

f←− x2

g−→ · · · f←− xlg−→ 0,

(S2) 0f←− x1

g−→ y1

f←− x2

g−→ · · · f←− xlg−→ yl,

(S3) y0

f←− x1

g−→ y1

f←− x2

g−→ · · · f←− xlg−→ 0,

(S4) y0

f←− x1

g−→ y1

f←− x2

g−→ · · · f←− xl−1

g−→ yl−1,

where µ is a nonzero constant associated to the R se-quence and l ≥ 1 is the length of each sequence. Ma-trix representations of R, S1, S2, S3 and S4 sequencesof length l lead to Kronecker blocks Jl(µ), Ll−1, Jl(0),Nl and LT

l−1in the Kronecker canonical form (KCF) for

(f, g)V,W in the standard notation. If V = W and g = i(identity transformation), a KB is just a Jordan basis(JB) of V , composed of only R and S2 sequences. Inthis special case, the constant µ corresponds to a nonzeroeigenvalue of f . For a general case, we will show later inthis paper that µ corresponds to a nonzero eigenvalueassociated to a regular linear transformation g−1

s fs

naturally induced from the original pencil (f, g)V,W .At present, the most reliable and the only practical

numerical algorithm to compute the KCF for an arbi-trary pencil is GUPTRI [2, 3]. However, it cannot giveany direct information on KBs for the pencil. In this pa-per, we propose a novel numerical algorithm to computea KB as well as the KCF for an arbitrary pencil underthe premise that the eigenvalues of the linear transfor-mation g−1

s fs are separately computed. The algorithmis based on a recently found constructive proof for the

existence of a KB which reveals a multilayered geomet-rical structure inherent in the pencils [4]. After outliningtheoretical issues, we describe the algorithm in details.Numerical examples to test the numerical accuracy ofthe algorithm are also reported.

The paper is organized as follows. In Section 2, we il-lustrate the essentials of [4] through a simple but genericexample, which serves to understand the subsequent sec-tions. On the basis of Section 2, the algorithm for com-puting a KB is presented in Section 3 in a form withoutrelying on matrix representations, thereby it is describedin a basis-independent, unique form. After a short dis-cussion on a possible matrix representation in Section 4,numerical examples are shown to confirm the numericalaccuracy of the algorithm in Section 5.

2. Sketch of theoretical aspects

Hereafter, we assume that V and W are unitary spacesover C. For a linear mapping h, in general, denote thekernel, the image and the adjoint mapping of h byN(h), R(h) and h∗, respectively.

Definition 1 For a pencil (f, g)V,W , define a pencil

(f ′, g′)V ′,W ′ by

V ′ ≡ R(f) ∩R(g) ⊂W,W ′ ≡ R(f∗) ∩R(g∗) ⊂ V,f ′ ≡ i∗

R(g∗)←W ′ d−1g iR(g)←V ′ ,

g′ ≡ i∗R(f∗)←W ′ d−1

f iR(f)←V ′ ,

(1)

where dh : R(h∗)→ R(h) is the restriction of h to R(h∗)for each h = f, g, and iU1←U2

is the inclusion from a

subspace U2 of U1 to U1 in general.

Note that the operation of the adjoint mapping i∗U1←U2

on U1 is the orthogonal projection from U1 to U2. Theassertion below represents the importance of the pencil(f ′, g′)V ′,W ′ .

– 60 –

JSIAM Letters Vol. 1 (2009) pp.60–63 Yoshiaki Kakinuma et al.

Assertion 2 Every R sequence in a KB for (f, g)V,W

is obtained by lifting a one-to-one corresponding R se-

quence with the same µ and l in a KB for (f ′, g′)V ′,W ′ ,

while every Si sequence with length l ≥ 2 in a KB

for (f, g)V,W is obtained by lifting a one-to-one corre-

sponding Si sequence with length l − 1 in a KB for

(f ′, g′)V ′,W ′ (i = 1, . . . , 4). Supplying Si sequences of

length 1 (i = 1, . . . , 4), we can construct a KB for

(f, g)V,W from a KB for (f ′, g′)V ′,W ′ .

To confirm this, Theorem 3 plays a crucial role.

Theorem 3 (i) Let x ∈ V and p be the orthogonal pro-

jection from V to W ′. Then we have g′(f(x)) = p(x) if

f(x) ∈ V ′. Similarly, f ′(g(x)) = p(x) if g(x) ∈ V ′.(ii) (a) and (b) are equivalent for y1, y2 ∈ V ′; (a) There

exists x ∈ V such that y1 = f(x) and y2 = g(x). (b)f ′(y2) = g′(y1).

To illustrate Assertion 2, consider a simple but genericexample. Suppose that dimV = 9, dim W = 8, dimV ′ =5, dimN(f) = dim N(g) = 3, dim(N(f)+N(g)) = 5 andthat (f ′, g′)V ′,W ′ has a KB composed of three sequences;

1’) 0f ′−µg′

←−−− y1;1

g′

−→ z1;1

f ′−µg′

←−−− y1;2

g′

−→ z1;2 (µ 6= 0),

2’) 0f ′

←− y2;1

g′

−→ z2;1,

3’) 0f ′

←− y3;1

g′

−→ z3;1

f ′

←− y3;2

g′

−→ 0.

Note that the assumption leads to dimW ′ = 4, dim(N(f) ∩ N(g)) = 1, dimR(f) = dimR(g) = 6 anddim(R(f) + R(g)) = 7. In this setting, we can find aKB for (f, g)V,W composed of six sequences;

1) 0f−µg←−− x1;1

g−→ y1;1

f−µg←−− x1;2

g−→ y1;2,

2) 0f←− x2;1

g−→ y2;1

f←− x2;2

g−→ y2;2,

3) 0f←− x3;1

g−→ y3;1

f←− x3;2

g−→ y3;2

f←− x3;3

g−→ 0,

4) 0f←− x4;1

g−→ 0,

5) y5;0

f←− x5;1

g−→ 0,

6) y6;0 ∈ N(f∗) ∩N(g∗).

To see this, we need three steps.

(I) Existence of x1;1, x1;2, x2;1, x2;2, x3;1, x3;2, x3;3 ∈ V ,y2;2 ∈ W in 1)–3): By Theorem 3 (ii), there existx2;1 in 2) and x3;1, x3;2, x3;3 in 3). Since y2;1 ∈ V ′

in 2), there exists x2;2 such that y2;1 = f(x2;2). Weset y2;2 = g(x2;2). Now we show the existence ofx1;1, x1;2 in 1). The sequence 1’) indicates

f ′(y1;1) = µz1;1 = g′(µy1;1),

f ′(y1;2) = µz1;2 + z1;1 = g′(µy1;2 + y1;1).

Hence, by Theorem 3 (ii), there exist x1;1, x1:2 suchthat

f(x1;1) = µy1;1,

g(x1;1) = y1;1,

f(x1;2) = µy1;2 + y1;1,

g(x1;2) = y1;2,

and these vectors satisfy the diagram 1).

(II) Construction of the basis of V : By the constructionof x1;1, x1;2, x2;2, x3;2 and by Theorem 3 (i), we have

p(x1;1) = µz1;1, p(x1;2) = µz1;2 + z1;1,

p(x2;2) = z2;1, p(x3;2) = z3;1.

By the assumption, the four vectors on the right-hand side are a basis of W ′ since µ 6= 0. SinceW ′⊥ = N(f)+N(g), we confirm that x1;1, x1;2, x2;2,

Table 1. Properties of the KB for (f, g)V,W .

∈ N(g) /∈ N(g)

∈ N(f) x4;1 x2;1, x3;1

/∈ N(f) x3;3, x5;1 x1;1, x1;2, x2;2, x3;2

∈ R(g) /∈ R(g)

∈ R(f) y1;1, y1;2, y2;1, y3;1, y3;2 y5;0

/∈ R(f) y2;2 y6;0

x3;2 are a basis of a complementary space of N(f)+N(g) in V . Thus we can construct a basis of V byappending a basis of N(f) + N(g) to this. The vec-tors x2;1 in 2) and x3;1 in 3) belong to N(f) byconstruction. Furthermore, y2;1 = g(x2;1), y3;1 =g(x3;1) are a basis of N(f ′). Thus we confirm thatx2;1, x3;1 are a basis of a complementary space ofN(f) ∩ N(g) in N(f), since dim N(f) = 3 anddim(N(f)∩N(g)) = 1. Similarly, we confirm x3;3 ∈N(g)−N(f)∩N(g). By taking x4;1 ∈ N(f)∩N(g)(x4;1 6= 0), x3;3, x4;1 are a basis of a two-dimensionalsubspace of N(g). Furthermore, the subspace in-cludes N(f) ∩ N(g). Hence, since dim N(g) = 3,we have a basis x3;3, x4;1, x5;1 of N(g) by append-ing x5;1 ∈ N(g)−N(f) ∩N(g). Now we have a ba-sis x1;1, x1;2, x2;1, x2;2, x3;1, x3;2, x3;3, x4;1, x5;1 of Vwhich satisfies 1)–5) and the upper table in Table1.

(III) Construction of the basis of W : Set y5;0 = f(x5;1).By construction of a basis of V in (II), the images ofthe six vectors x1;1, x1;2, x2;2, x3;2, x3;3, x5;1 ∈ V −N(f) by f , namely

f(x1;1) = µy1;1, f(x1;2) = µy1;2 + y1;1,

f(x2;2) = y2;1, f(x3;2) = y3;1,

f(x3;3) = y3;2, f(x5;1) = y5;0

are a basis of R(f). Similarly, the images of the sixvectors x1;1, x1;2, x2;1, x2;2, x3;1, x3;2 ∈ V −N(g) byg, namely

g(x1;1) = y1;1, g(x1;2) = y1;2, g(x2;1) = y2;1,

g(x2;2) = y2;2, g(x3;1) = y3;1, g(x3;2) = y3;2

are a basis of R(g). Recalling that the five vec-tors y1;1, y1;2, y2;1, y3;1, y3;2 are a basis of V ′ =R(f) ∩R(g), we confirm that y5;0 ∈ R(f)−R(f) ∩R(g), y2;2 ∈ R(g) − R(f) ∩ R(g) and that theseven vectors y1;1, y1;2, y2;1, y2;2, y3;1, y3;2, y5;0 area basis of R(f) + R(g). Since dimW = 8, wehave a basis of W by appending y6;0 ∈ (R(f) +R(g))⊥ = N(f∗)∩N(g∗) to this. Now we have a ba-sis y1;1, y1;2, y2;1, y2;2, y3;1, y3;2, y5;0, y6;0 of W whichsatisfies 1)–6) and the lower table in Table 1.

By definition of (f ′, g′)V ′,W ′ , if and only if both of f, gare bijective, we have V ′ = W , W ′ = V , leading tof ′ = g−1, g′ = f−1. Otherwise, we have either dim V ′ <dim W or dimW ′ < dim V . Since V , W are finite-dimensional, by iterating the procedure to construct(fj , gj)Vj ,Wj

≡ (f ′j−1, g′

j−1)V ′

j−1,W ′

j−1(j = 1, 2, . . . ) with

the initial pencil (f0, g0)V0,W0≡ (f, g)V,W several times

(say s), we reach a pencil (fs, gs)Vs,Wswhere both of

fs, gs are bijective. For this pencil, g−1s fs is a regular

– 61 –


linear transformation on Vs, that has a JB of Vs. TheJB immediately gives a KB for (fs, gs)Vs,Ws

, composedof only R sequences. By applying the above process tothis successively, we can construct a KB for the originalpencil (f, g)V,W . Note that, if we know all the eigenval-ues µ for g−1

s fs separately, a JB for g−1s fs is obtained

within the present framework, by finding S2 sequencesfor the pencil (fs−µgs, gs)Vs,Ws

for each eigenvalue µ ofg−1

s fs.

3. KB algorithm

The algorithm below computes a KB for (fj , gj)Vj ,Wj

(j = s, s−1, . . . , 1, 0) successively; The sequences in fivesets Rj , S1;j , S2;j , S3;j and S4;j supply R, S1, S2, S3 andS4 sequences in the KB for (fj , gj)Vj ,Wj

, respectively.Hereafter, denote a sequence with the property

y0

fj−µgj←−−− x1

gj−→ y1

fj−µgj←−−− x2

gj−→ · · · fj−µgj←−−− xl

gj−→ yl

by cj(µ) ≡ (y0, x1, y1, . . . , xl, yl). For µ = 0, cj(0) issimply abbreviated to cj .

KB Algorithm (KBA)

1. Define a series of pencils (fj , gj)Vj ,Wj(j=0, 1, . . . , s)

recursively by (fj , gj)Vj ,Wj≡ (f ′j−1, g

′

j−1)V ′

j−1,W ′

j−1

(j = 1, 2, . . . , s) from (f0, g0)V0,W0≡ (f, g)V,W ,

where s is the minimum integer such that both offs, gs are bijective.

2. If dimVs = 0, set Rs = ∅. Otherwise, define the setRs of R sequences for (fs, gs)Vs,Ws

such that thesequences in Rs give a KB for (fs, gs)Vs,Ws

. (Seethe last part of Section 2.) Set S1;s = S2;s = S3;s =S4;s = ∅.

3. Repeat the steps (a)–(f) for j = s, . . . , 1;

(a) If Rj = ∅, set Rj−1 = ∅. Otherwise, find the setRj−1 from Rj as follows; For each cj(µ) = (z0 =0, y1, z1, . . . , yl, zl) ∈ Rj , find a solution xk foreach linear system

fj−1(xk) = µyk +yk−1

gj−1(xk) = yk

(k = 1, . . . , l), y0 ≡ 0.

Define LiftR(cj(µ)) ≡ (y0, x1, y1, . . . , xl, yl) andset Rj−1 = LiftR(cj(µ)) | cj(µ) ∈ Rj.

(b) Repeat the following procedure for i = 1, . . . , 4.If Si;j = ∅, set Si;j−1 = ∅. Otherwise, find the

set Si;j−1 from Si;j as follows; For each cj =(z0, y1, z1, . . . , yl, zl) except cj = (z) ∈ S4;j (seeTable 2 for z0, zl), find a solution xk for each lin-ear system

fj−1(xk) = yk−1

gj−1(xk) = yk

(k = 1, . . . , l + 1)

with y0, yl+1 in Table 2. Define LiftS(cj) ≡(y0, x1, y1, . . . , xl, yl, xl+1, yl+1). For cj = (z) ∈S4;j , define LiftS(cj) ≡ (fj−1(z), z, gj−1(z)). Fi-

nally, set Si;j−1 = LiftS(cj) | cj ∈ Si;j.(c) Set S1;j−1 = S1;j−1 ∪ cj−1 = (0, xt, 0) | t =

1, . . . , q1 with a basis x1, . . . , xq1of N(fj−1) ∩

N(gj−1).

(d) Set S2;j−1 = S2;j−1∪cj−1 = (0, ut, gj−1(ut)) | t =1, . . . , q2,where u1, . . . , uq2

are chosen such that

Table 2. Property of z0, zl and definition of y0, yl+1.

z0 zl y0 yl+1

cj ∈ S1;j 0 0 0 0

cj ∈ S2:j 0 gj(yl) 0 gj−1(xl+1)

cj ∈ S3:j fj(y1) 0 fj−1(x1) 0

cj ∈ S4:j fj(y1) gj(yl) fj−1(x1) gj−1(xl+1)

the set u1, . . . , uq2 ∪ x1 | cj−1 = (0, x1, y1, . . . ,

xl, yl) ∈ S1;j−1 ∪ S2;j−1 is a basis of N(fj−1).(e) Set S3;j−1 = S3;j−1∪cj−1 = (fj−1(ut), ut, 0) | t =

1, . . . , q3, where u1, . . . , uq3are chosen such that

the set u1, . . . , uq3∪xl | cj−1 = (y0, x1, y1, . . . ,

xl, 0) ∈ S1;j−1 ∪ S3;j−1 is a basis of N(gj−1).(f) Set S4;j−1 = S4;j−1 ∪ (yt) | t = 1, . . . , q4 with a

basis y1, . . . , yq4of N(f∗j−1) ∩N(g∗j−1).

4. Matrix representation

We consider a possible matrix representation of thetwo core procedures required in KBA. One is to compute(f ′, g′)V ′,W ′ from (f, g)V,W in step 1. The other is tosolve linear systems in steps (a) and (b) in step 3.

For an m × n matrix pencil (F,G)V,W with V =Cn, W = Cm, we can construct the m′ × n′ pencil(F ′, G′)V ′,W ′ in three steps, where V ′ = R(F ) ∩ R(G)(n′ = dimV ′) and W ′ = R(F ∗)∩R(G∗) (m′ = dim W ′).Set rH = rankH for each H = F,G.

1. Calculate the singular value decomposition (SVD)for each H = F,G;

H = IW←R(H)DHI∗V←R(H∗),

where DH ∈ CrH×rH is a diagonal matrix withnonzero singular values of H as diagonal entries,and IV←R(H∗) ∈ Cn×rH and IW←R(H) ∈ Cm×rH

are column-orthogonal matrices. Note that the col-umn vectors of IV←R(H∗) and IW←R(H) are rightand left singular vectors associated to nonzero sin-gular values of H, respectively.

2. For each (F , G,m,W , V ′) = (F,G,m,W, V ′),(F ∗, G∗, n, V,W ′), calculate a basis c1, . . . , cr of thekernel of (IW←R(F),−IW←R(G))∈Cm

×(rF +rG),

and define IR(F)←V ′ ∈ CrF×r

and IR(G)←V ′ ∈CrG×r

by setting(

IR(F)←V ′

IR(G)←V ′

)

= (c1, . . . , cr) ∈ C(rF +rG)×r

.

3. Finally, calculate the matrix products

F ′ = I∗R(G∗)←W ′D

−1

G IR(G)←V ′ ∈ Cm′×n′

,

G′ = I∗R(F∗)←W ′D

−1

F IR(F )←V ′ ∈ Cm′×n′

.

Note that sinceIW←V ′ ≡ IW←R(F)IR(F)←V ′

= IW←R(G)IR(G)←V ′ ∈ Cm×r

by construction in step 2, the column vectors of IW←V ′

are a basis of V ′ ⊂ W and in particular, we confirmr = dimV ′. The computation of c1, . . . , cr is carriedout also by SVD.

Let yf ,yg ∈ V ′. The linear system for x ∈ V(

FG

)

x =

(

yf

yg

)

(2)

– 62 –


is written as(

IW←R(F )DF I∗V←R(F∗)

IW←R(G)DGI∗V←R(G∗)

)

x =

(

IW←V ′y′

f

IW←V ′y′

g

)

,

where y′

f ,y′g ∈ Cn′

are the coordinate vectors of yf ,yg

with respect to the basis of V ′ determined by the col-umn vectors of IW←V ′ . Thus the linear system in (2) isequivalent to

(

DF I∗V←R(F∗)

DGI∗V←R(G∗)

)

x =

(

IR(F )←V ′y′

f

IR(G)←V ′y′

g

)

.

Within a finite-precision computation, the equationmight be overdetermined in general. A possible solverin numerics is the least squares method, where SVD(Moore-Penrose inverse) plays a crucial role.

5. Numerical experiment

Numerical computation is carried out in double-precision arithmetic. As described in the previous sec-tion, the main ingredient of KBA is SVD. To keep nu-merical accuracy, cut-off parameter ε is required for re-moving small singular values of a relative size less thanε compared to the maximum singular value. At eachstage involved in (Fj , Gj)Vj ,Wj

(j = 1, . . . , s), we in-troduce two parameters; εj;1 for computing SVD of Fj ,Gj and the kernels, and εj;2 to solve linear systems.For the moment, we use a common cut-off parameterεj;1 = εj;2 = 10−8 (j = 1, . . . , s). This value is adoptedas a default value of the cut-off parameter EPSU for SVDin double-precision GUPTRI routine in LAPACK.

To confirm the numerical accuracy, we examine themaximum relative error involved in the sequences in KB;

EK ≡ maxc∈K

‖FXc−µYc;g −Yc;f‖∞‖FXc‖∞

,‖GXc−Yc;g‖∞‖GXc‖∞

.

Here K ≡ R0 ∪ (∪4i=1Si;0) is the output of KBA,

namely the set of the sequences giving rise to a KBfor input pencil (F,G)V,W . For each c ≡ c0(µ) =(y0,x1,y1, . . . ,xl,yl) ∈ K, we set Xc = (x1, . . . ,xl),Yc;f = (y0, . . . ,yl−1) and Yc;g = (y1, . . . ,yl). Note thatFXc−µYc;g−Yc;f = GXc−Yc;g = O in infinite-precisioncomputation.

We examine two types of test matrices;

Type-A (generic): m× n matrices F,G with randomintegers m,n∈ [100, 110](m 6= n) and random numbersin the range [−1, 1] for the elements.

Type-B (non-generic): F −λG = PK(λ)Q−1, whereP,Q are invertible matrices with random numbers inthe range [−1, 1] for elements, and

K(λ) =

(

n1⊕

k1=1

Jlk1(µk1

)

)

⊕(

n2⊕

k2=1

Llk2

)

⊕(

n3⊕

k3=1

Jlk3(0)

)

⊕(

n4⊕

k4=1

Nlk4

)

⊕(

n5⊕

k5=1

LTlk5

)

is a KCF with random integers nj ∈ [1, 5] (j = 1, . . . ,5),random integers lkj

∈ [1, 5] (kj = 1, . . . , nj ; j = 1, . . . ,5) and random numbers µk1

∈ (0, 10] (k1 = 1, . . . , n1).

Table 3. Distribution of EK .

relative errorfrequency

Type-A Type-B

10−2 < EK 0 14

10−4 < EK ≤ 10−2 0 0

10−6 < EK ≤ 10−4 0 6

10−8 < EK ≤ 10−6 0 3

10−10 < EK ≤ 10−8 0 33

10−12 < EK ≤ 10−10 0 411

10−14 < EK ≤ 10−12 72 533

EK ≤ 10−14 928 0

As known for non-square m × n generic pencils (d ≡|n−m| 6= 0), we have for n−m > 0, K(λ) = (

⊕s

k=1Ll)⊕

(⊕s′

k′=1Ll+1) with l = [m/d], s′ = m − dl, s = d − s′,

while for m−n > 0, K(λ) = (⊕s

k=1LT

l )⊕ (⊕s′

k′=1LT

l+1)

with l = [n/d], s′ = n−dl, s = d−s′. Type A is expectedto simulate a generic case. Meanwhile, type B has a non-trivial general Kronecker structure by construction. Themiddle (right) column in Table 3 shows a distribution ofEK for 1000 samples of Type-A (Type-B) matrix pencils.Note for Type-B that, at the final stage involved in theregular pencil (Fs, Gs)Vs,Ws

, we use exact eigenvaluesdetermined from K(λ) as input for computing a KB for(Fs, Gs)Vs,Ws

. Thus experimental results of Ek for Type-B below directly show the numerical error caused byKBA.

For Type-A, we can confirm EK ≃ 10−12 even in theworst case. We also numerically confirmed that the KCFis of generic type in all cases, as expected.

For Type-B, we can confirm EK ≤ 10−8 (the valueof the common cut-off parameter) in 977 cases. ThoughEK > 10−8 in 23 cases, we confirmed in all cases thatEK is made less than 10−8 if the two cut-off param-eters εj;1, εj;2 are appropriately adjusted in the range[10−15, 10−7] at each iterative step (j = 1, . . . , s). In ad-dition, we observed that KBA works well even with theeigenvalues numerically computed for (Fs, Gs)Vs,Ws

, ifwe use an average for closely-spaced eigenvalues.

As well-known, the determination of Kronecker struc-ture is essentially ill-conditioned problem in general. Inparticular, round-off error in numerics might reducesnon-generic Kronecker structures to generic ones. In thepresent implementation, we numerically confirmed forType-B matrix pencils that KBA succeeds in reproduc-ing the original KCF for the 97% of all. An extensiveanalysis on numerical stability of KBA is one of the mainissues in future.

References

[1] F. R. Gantmacher, The theory of matrices, Vol. II, Chelsea,New York, 1959.

[2] J. Demmel, B. Kagstrom, The generalized Schur decomposi-tion of an arbitrary pencil A−λB : Robust software with er-ror bounds and applications. Part I : Theory and algorithms,

ACM Trans. Math. Software, 19 (1993), 160–174.[3] J. Demmel, B. Kagstrom, The generalized Schur decomposi-

tion of an arbitrary pencil A−λB : Robust software with errorbounds and applications. Part II : Software and applications,

ACM Trans. Math. Software, 19 (1993), 175–201.[4] H.Hashiguchi, K.Hiraoka, T. Shigehara, An elementary proof

for the existence of Kronecker basis, preprint.

– 63 –


Robust exponential hedging in a Brownian setting

Keita Owari1

Graduate School of Economics, Hitotsubashi University, 2-1, Naka, Kunitachi, 186-8601,Japan1


Received September 29, 2009, Accepted October 15, 2009

Abstract

This paper studies the robust exponential hedging in a Brownian factor model, giving asolvable example using a PDE argument. The dual problem is reduced to a standard stochasticcontrol problem, of which the HJB equation admits a classical solution. The optimal strategywill be expressed in terms of the solution to the HJB equation.

Keywords robust utility maximization, stochastic control, duality

Research Activity Group Mathematical Finance

1. Introduction

This paper aims to provide a solvable example for therobust exponential hedging problem studied by [1]:

minimize supP∈P

EP [e−α(θ·ST −H)], over θ ∈ Θ. (1)

Here S is a d-dim. cadlag locally bounded semimartin-gale on a filtered probability space (Ω,F , (Ft)t∈[0,T ], R),P is a convex set of probability measures absolutely con-tinuous w.r.t. R, H is a random variable and Θ is a set ofadmissible integrands for S. The set P is a mathematicalexpression of model uncertainty, and (1) is equivalent tothe maximization of the robust exponential utility fromthe net terminal wealth for the seller of the claim H.

The problem (1) is solved via its dual:

minimize H(Q|P )−αEQ[H], over (Q,P ) ∈ Qf ×P,(2)

where H( · | ·) denotes the relative entropy, and Qf is theset of R-absolutely continuous local martingale measuresfor S, having finite relative entropy with some P ∈ P.

Assume:

(A1) dP/dR : P ∈ P is weakly compact in L1(R).

(A2) Qef (S) := Q ∈ Qf : Q ∼ R 6= ∅.

(A3) eα|H|dP/dR : P ∈ P is uniformly integrable andsupP∈P

EP [e(α+ε)|H|] < ∞, for some ε > 0.

Under (A1)–(A3), [1] shows that the dual problem (2)

of (1) admits a solution ( QH , PH) ∈ Qf × P which is

maximal in that if ( ˜Q, ˜P ) ∈ Qf ×P is another solution,

then ˜P ≪ PH and d ˜Q/d ˜P = d QH/d PH , ˜P -a.s. Thissolution has a kind of martingale representation:

d QH

d PH

= c · e−α(θ·ST −H), QH -a.s., (3)

where c is a constant, and θ is a predictable (S, QH)-

integrable process such that θ · S is a QH -martingale.Finally, if we assume additionally:

(A4) QH ∼ R,

the strategy θ is shown to be optimal for (1) with theadmissible class ΘH defined as the set of all (S,R)-integrable predictable processes θ such that θ · S is amartingale under all Q ∈ Qf with H(Q| PH) < ∞.

In the sequel, we investigate this problem in a specificsetting for which the optimal strategy θ is explicitely rep-resented, using a standard stochastic control technique.

2. Main results

This section states the main results of this paper. Allproofs are collected in Section 4.

2.1 Setup

Let W = (W 1,W 2) be a 2-dimensional R-Brownianmotion, (Ft)t∈[0,T ] be its augmented natural filtration.Suppose that the price process S is given by the SDE:

dSt = St(b(Yt)dt + σ(Yt)dW 1t ),

dYt = g(Yt)dt + ρdW 1t + ρdW 2

t ,(4)

where ρ ∈ [−1, 1] and ρ =√

1 − ρ2. The set P of can-didate models is given as follows. Let C be a convexcompact subset of R

2 containing the origin, and IP bethe set of 2-dimensional predictable C-valued processes.Then we set

P :=

P ν ∼ R :dP ν

dR= ET (−ν · W ), ν ∈ IP

, (5)

where E(M) := exp(M − 〈M〉/2) denotes the Doleans-

Dade exponential of a continuous local martingale M .Finally, the claim H is assumed to be of the form H =h(YT ) for some measurable function h.

Remark 1 A typical situation underlying our setup is

as follows. A financial institution sells an option written

on an untradable index Y , and want to maximize her

utility by trading an asset S which is correlated to Y .

However, the probabilistic model of assets (S, Y ) is un-

certain in its expected rate of return (drift, in mathemat-

ical language). Actually, the dynamics under the proba-

– 64 –

JSIAM Letters Vol. 1 (2009) pp.64–67 Keita Owari

bility P ν is:

dSt = St((b(Yt) − ν1t σ(Yt))dt + σ(Yt)dW 1,ν

t ),

dYt = (g(Yt) − ρν1t − ρν2)dt + ρdW 1,ν

t + ρdW 2,νt .

In this context, we can know only the range of the drift

through the set C appearing in the definition of P.

In what follows, we assume

(B1) b, σ, g ∈ C2b (R), where C2

b (R) = f ∈ C2(R) : f, f ′,f ′′are bounded.

(B2) For some k > 0, σ(y) ≥ k for all y.

(B3) h ∈ C2(R), h′ is bounded and h′′ has a polynomialgrowth.

Our first task is to check that:

Lemma 2 Under (B1) – (B3), the conditions (A1) –(A4) of [1] are satisfied.

Once this lemma is established, an optimal strategy θwill be derived via (i) solving the dual problem (2), and

(ii) finding θ satisfying (3).

Remark 3

(I) In this setting, we can show that

H(Q|P ) < ∞ for some P ∈ P ⇔ H(Q|R) < ∞,(6)

for all local martingale measures Q. In particu-

lar, ΘH is characterized as the class of predictable

(S,R)-integrable processes θ such that θ ·S is a mar-

tingale under all absolutely continuous local martin-

gale measures Q with H(Q|R) < ∞. This condition

is further reduced to “all equivalent martingale mea-

sures with...”. Therefore, the class ΘH is actually

independent of PH , hence of H. This point is con-

ceptually important since the dependence of Θ onPH , which is a part of the solution to the dual prob-

lem, implies that we can not specify the admissible

class for the primal problem until we solve the dual

problem.

(II) For our purpose, it suffices to consider Qef for the

domain of dual problem since we already know that a

solution to the dual problem is obtained in Qef ×P.

Let IM be the set of predictable processes η with

ER[∫ T

0η2

t dt] < ∞, and ER[ET (−(λ(Y ), η)·W )] = 1,where λ := b/σ, and

dQη

dR:= ET (−(λ(Y ), η) · W ), η ∈ IM . (7)

Then Qef = Qη : η ∈ IM.

2.2 Dual problem

Let

Jη,νt := Eη

[

αh(YT ) − 1

2

∫ T

t

‖νs − (λ(Ys), ηs)′‖2ds|Ft

]

,

where Eη[ · ] denotes the expectation under Qη, “ ′ ” isthe transpose, and ‖ · ‖ is the Euclidean norm of R

2.The dual problem (2) is now reduced to the followingstochastic control problems:

maximize Jη,ν0 among (η, ν) ∈ IM × IP . (8)

For each constant η ∈ R, set

Aη := (g − ρλ − ρη)∂y +1

2∂yy

=A0 − ρη∂y, (9)

where ∂y := ∂/∂y and ∂yy := ∂2/∂y2 etc. Then the HJBequation for (8) is formally given by

vt +sup(η,ν)∈R×C

(

Aηv− 1

2‖ν− (λ, η)′‖2

)

= 0,

v(T, y) = αh(y).

(10)

Theorem 4 The HJB equation (10) admits a unique

classical solution v ∈ C1,2((0, T ) × R) ∩ C([0, T ] × R)such that vy := ∂yv is bounded. Then we can choose

measurable functions ν : [0, T ]×R −→ C and η : [0, T ]×R −→ R so that

ν(t, y) ∈ arg infν∈C

(

1

2(ν1 − λ(y))2 + ν2ρvy(t, y)

)

,

η = ν2(t, y) − ρvy(t, y),

and (ν·, η·) := (ν(·, Y·), η(·, Y·)) is an optimal control for

(8). In particular, (Qη, P ν) is a solution to (2).

2.3 Optimal strategy

We now give a representation of an optimal strategyθ via Theorem 4 and the duality result of [1].

Theorem 5 An optimal strategy for the problem (1) is

given by

θt =ρvY (t, Yt) + λ(Yt) − ν1(t, Yt)

ασ(Yt)St

. (11)

Remark 6 Here we give a brief review of related liter-

ature. In the case without uncertainty i.e., P = R (⇔C = (0, 0) in our setup), explicit solutions to expo-

nential hedging through duality are studied by [2] using

BSDE arguments with the help of Malliavin calculus, and

by [3] using PDE arguments close to ours.

There are also a few recent works deriving explicit

form of optimal strategies for robust utility maximiza-

tion. Our setup and idea for the proof of Theorem 4 are

due to [4], where robust power utility maximization is

considered. See also [5] for the case of logarithmic util-

ity.

3. Explicit examples

This section provides two explicit examples which maybe reduced to linear PDEs, hence can be computed viaboth elementary numerical schemes and the Feynman-Kac formula. Recall that our model is characterized bythe compact set C, and the HJB equation takes the form:

vt + A0v +ρ2v2

y

2− l(y, vy) = 0,

v(T, y) = αh(y),

where

l(y, p) := infν∈C

(

1

2(ν1 − λ(y))2 + ρν2p

)

.

Thus, if l(y, p) can be explicitly calculated, then we canexpect an explicit solution.

– 65 –


3.1 The case of disk

We first consider the case where the set C is a disk inR

2 with radius r:

C = x ∈ R2 : ‖x‖ ≤ r. (12)

But due to a technical difficulty, we assume the driftb of S under R is identically zero, or equivalently, λ isidentically zero. In this case,

l(y, p) = inf‖ν‖≤r

(

ν21

2+ ρν2p

)

= −rρ|p|,and ν(y, p) = (0,−r · sgn(p)) is a minimizer. Then theHJB equation is written as:

vt + A0v +ρ2v2

y

2+ rρ|vy| = 0.

Now suppose that the payoff function h is non-increasing. Then noting that the 1-dimensional stochas-tic flow associated to Y is order-preserving under (B1)and (B2), the value function is also decreasing in y vari-able, hence vy ≤ 0. Therefore the term rρ|vy| in theequation is replaced by −rρvy. Moreover, changing the

drift, the equation becomes:

vt + Arρv +ρ2v2

y

2= 0.

Here Arρ is the generator of Y under Qrρ. Note that asimple calculation using the Ito formula yields:

deρ2v(t,Yt) = ρ2eρ2v(t,Yt)vy(t, Yt)dW rρt .

Thus eρ2v(t,Yt) is a martingale, and since v(T, y) = αh(y),

v(t, y) =1

ρ2log Erρ

[

eαρ2h(YT )

∣

∣

∣ Yt = y]

=:1

ρ2v(t, y).

Now the Feynman-Kac formula yields:

Corollary 7 Suppose that C is given by (12), λ ≡ 0and h is non-increasing. Then the value function is rep-

resented as

v(t, y) =1

ρ2log v(t, y),

where v is the solution to the Cauchy problem:

vt + Arρv = 0,

v(T, y) = eαρ2h(y).(13)

Furthermore, (η, ν)=(r−ρ(vy/v)(·, Y ),0, r) is an optimal

control, and an optimal portfolio strategy is given by

θt =ρ

αρ2

vy(t, Yt)

v(t, Yt)σ(Yt)St

. (14)

Remark 8 The case of non-decreasing h can be treated

in a symmetric way.

3.2 The case of rectangle

Let C be a rectangle in R2, that is:

C = x ∈ R2 : |x1| ≤ m1, |x2| ≤ m2. (15)

In this case,

l(y, p) =1

2(ν1(y) − λ(y))2 + ρν2(p)p

=k(y;m1)

2− ρm2|p|,

where

ν1(y) = sgn(λ(y))(|λ(y)| ∧ m1),

ν2(p) = −m2sgn(p), k(y;m1) := (|λ(y)| − m1)+2.

Therefore, the HJB equation is written as:

vt + A0v +ρ2v2

y

2+ ρm2|vy| −

k(y;m1)

2= 0. (16)

As in the case of disk, if the value function is mono-tone (e.g., h is non-increasing and λ is constant), the lin-

earization procedure as in the previous subsection yieldsa linear PDE and a Feynman-Kac representation.

4. Proofs

Proof of Lemma 2 (A1) is guaranteed by [4, Lemma3.1] and [6, Lemma 3.2]. The function b/σ =: λ isbounded by the assumptions (B1) and (B2). ThereforedQ0/dR := ET (−(λ(Y ), 0) ·W ) defines an equivalent lo-cal martingale measure. Since R ∈ P and H(Q0|R) =

ER[∫ T

0λ(Ys)

2ds]/2 < ∞, (A2) is satisfied. Also, (B3)implies that h is globally Lipschitz continuous, henceadmits a constant Kh such that |h(y)| ≤ Kh(1 + |y|) forall y ∈ R. Then (A3) will be verified by checking thateγ|h(YT )|ET (−ν ·W ) : ν ∈ IP is bounded in L2(R) forany γ > α. By the Cauchy-Schwarz inequality,

ER

[

(

eγ|h(YT )|ET (−ν · W ))2

]

≤ ER[

e4γ|h(YT )|

]1

2

ER[

e−4ν·WT]

1

2 . (17)

Introducing another R-Brownian motion W = ρW 1+ρW 2,

e4γ|h(YT )| ≤ e4γKh(1+|YT |)

≤ e4γKh(1+|Y0|+‖g‖∞T+|WT |).

Therefore, the first component in the RHS of (17) isbounded by

√2e2γKh(1+|Y0|+(‖g‖∞+2γKh)T . For the sec-

ond, we can apply [7, Th. III 39] to get an upper bound

e8T (diamC)2

. Thus (A3) is verified, and the dual problem

admits a maximal solution ( QH , PH). Finally, (A4) istrivially satisfied since all P ∈ P are equivalent.

(QED)

For the proof of Theorem 4, we first consider a familyof auxiliary control problems, restricting the domain ofη. For each closed interval I ⊂ R, set II

M := η ∈ IM :ηt ∈ I ∀t, a.s., and consider the equation:

∂tvI+ sup

η∈I,ν∈C

AηvI− 1

2‖ν−(λ(y), η)′‖2

= 0,

vI(T, y) = αh(y).

(18)

If I is compact, then so is I × C, hence we can applyTheorem VI.4.1 and VI.6.2 of [8] to get:

– 66 –


Lemma 9 For each compact I ⊂ R, (18) admits a

unique classical solution vI ∈ C1,2p ((0, T )×R)∩C([0, T ]

×R). Then taking (ηI(t, y), νI(t, y)) ∈ arg supη∈I,ν∈C

AηvI − ‖ν − (λ(y), η)′‖2/2

, we have

vI(t, Yt) = ess supη∈I

IM ,ν∈IP

Jη,νt = J

ηI(·,Y ),νI

(·,Y )

t .

Lemma 10 There exists a constant Kv such that |vIy | ≤

Kv for all compact I.

Proof Let Jη,νt (y) := Eη[αh(Yt,T (y))− (1/2)

∫ T

t‖νs −

(λ(Yt,s(y), ηs)′‖2ds], where Yt,T denotes the stochastic

flow associated to Y . Then noting that | supx f(x) −supx g(x)| ≤ supx |f(x)−g(x)|, it suffices to show the ex-istence of a constant Kv such that |Jη,ν

t (y)−Jη,νt (y′)| ≤

Kv|y−y′| for all t ∈ [0, T ], y, y′ ∈ R and (η, ν) ∈ IM×IP .Since h, g, λ ∈ C2

b , a simple computation yields that

|Jη,νt (y) − Jη,ν

t (y′)|

≤ αKhEη [|Yt,T (y) − Yt,T (y′)|]

+ ˜KKλ

∫ T

t

Eη[|Yt,s(y) − Yt,s(y′)|]ds,

where Kh,Kλ are Lipschitz constants for h, λ, respec-tively, and ˜K = diam(C) + max λ. Also, ∀s ∈ [t, T ],

Eη [|Yt,s(y) − Yt,s(y′)|]

≤ |y − y′| + Eη

[∫ s

t

|g(Yt,u(y) − g(Yt,u(y′))|du

]

≤ |y − y′| + Kg

∫ s

t

Eη[|Yt,u(y) − Yt,u(y′)|]du,

where Kg is a Lipschitz constant for g. Then the Gron-wall inequality shows that Eη[|Yt,s(y) − Yt,s(y

′)|] ≤eKg(s−t)|y − y′| ≤ eKgT |y − y′| for any t ≤ s ≤ T . Hence

we get the result with Kv = eKgT (αKh + ˜KKλT ).(QED)

Proof of Theorem 4 The inside of the bracket in(18) is written as:

A0vI + ρ(vIy)2 − 1

2

η − (ν2 − ρvIy)

2

−

1

2(λ(y) − ν1)

2 + ν2ρvIy

.

Here the third term attains the global maximum atηI = ν2 − ρvI

y , which is bounded by diam(C) + Kv

independently of I. Thus taking I0 := [−diam(C) −Kv,diam(C) + Kv], we have

−∂tvI0 = sup

η∈I0,ν∈C

AηvI0 − 1

2‖ν − (λ(y), η)′‖2

= supη∈R,ν∈C

AηvI0 − 1

2‖ν − (λ(y), η)′‖2

.

Hence v := vI0 is a desired classical solution to (10).The rest of the proof is a standard verification argu-

ment, and we omit this.(QED)

Proof of Theorem 5 By the duality, it suffices to

show that θ ∈ Θ and

dQη

dP ν=

e−α(θ·ST −h(YT ))

EP ν[

e−α(θ·ST −h(YT ))

] .

Since v satisfies the HJB equation (10), the Ito formulayields:

αh(YT ) = v(0, Y0) +

∫ T

0

(

∂t + Aη)

v(s, Ys)ds

+

∫ T

0

vy(s, Ys)dW ηs

= v(0, Y0) + logdQη

dP ν

+

∫ T

0

(ρvy + λ − ν1)(s, Ys)dW 1,ηs

= v(0, Y0) + logdQη

dP ν+ αθ · ST .

Therefore we get dQη/dP ν = ev(0,Y0)e−α(θ·ST −h(YT )).Finally,

∫ T

0

θ2sd〈S〉s =

1

α2

∫ T

0

(ρvy + λ − ν1)(s, Ys)2ds

is bounded, hence θ · S is a martingale under every Q ∈Qe

f . This concludes the proof.(QED)

Acknowledgments

The author is grateful for the financial support fromthe Global Center of Excellence (COE) program “theResearch Unit for Statistical and Empirical Analysis inSocial Sciences (G-COE Hi-Stat)” of Hitotsubashi Uni-versity.

References

[1] K. Owari, Robust exponential hedging and indifference valua-

tion, Discussion paper No. 2008-09, Hitotsubashi Univ., 2008.[2] J. Sekine, On exponential hedging and related quadratic back-

ward stochastic differential equations, Appl. Math. Optim., 54(2006), 131–158.

[3] M. H. A. Davis, Optimal hedging with basis risk, in: FromStochastic Calculus to Mathematical Finance, Y. Kabanov, R.Liptser and J. Stoyanov, eds., pp. 169–187, Springer-Verlag,

Berlin, 2006.[4] D. Hernandez-Hernandez and A. Schied, Robust utility max-

imization in a stochastic factor model, Statist. Decisions, 24(2006), 109–125.

[5] D. Hernandez-Hernandez and A. Schied, A control approach torobust utility maximization with logarithmic utility and time-consistent penalties, Stochastic Process. Appl., 117 (2007),

980–1000.[6] A. Schied and C.-T.Wu, Duality theory for optimal investment

under model uncertainty, Statist. Decisions, 23 (2005), 199–217.

[7] P. E. Protter, Stochastic Integration and Differential Equa-tions, Second Edition, Springer-Verlag, Berlin, 2004.

[8] W. H. Fleming and R. W. Rishel, Deterministic and StochasticOptimal Control, Springer-Verlag, Berlin, 1975.

– 67 –


A hybrid of the optimal velocity and the slow-to-start

models and its ultradiscretization

Kazuhito Oguma1

and Hideaki Ujino2

Department of Mathematical Engineering and Information Physics, Faculty of Engineering,The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan1

Gunma National College of Technology, 580 Toriba, Maebashi, Gunma 371-8530, Japan2


Received August 24, 2009, Accepted October 6, 2009

Abstract

Through an extension of an ultradiscrete optimal velocity (OV) model, we introduce an ul-tradiscretizable traffic flow model, which is a hybrid of the OV and the slow-to-start (s2s)models. Its ultradiscrete limit gives a generalization of a special case of the ultradiscrete OV(uOV) model recently proposed by Takahashi and Matsukidaira. A phase transition from freeto jam phases as well as the existence of multiple metastable states are observed in numeri-cally obtained fundamental diagrams for cellular automata (CA), which are special cases ofthe ultradiscrete limit of the hybrid model.

Keywords optimal velocity (OV) model, slow-to-start (s2s) effect, ultradiscretization


1. Introduction

Studies on microscopic models for vehicle traffic pro-vided a good point of view on the phase transition fromfree to congested traffic flow. Related self-driven many-particle systems have attracted considerable interestsnot only from engineers but also from physicists [1, 2].Among such models, the optimal velocity model [3],which successfully shows a formation of “phantom traf-fic jams” in the high-density regime, is a car-followingmodel describing an adaptation to the optimal velocitythat depends on the distance from the vehicle ahead.

Whereas the OV model consists of ordinary differ-ential equations (ODE), cellular automata (CA) suchas the Nagel-Schreckenberg model [4], the elementaryCA of Rule 184 (ECA184) [5], the Fukui-Ishibashi (FI)model [6] and the slow-to-start (s2s) model [7] are ex-tensively used in analyses of traffic flow. Recently, Taka-hashi and Matsukidaira proposed a discrete OV (dOV)model, which enables an ultradiscretization of the OVmodel [8]. The resultant ultradiscrete OV (uOV) modelincludes both the ECA184 and the FI model as its spe-cial cases. However, the s2s effect remains to be includedin their ultradiscretization. The aim of this letter is topresent an ultradiscretizable hybrid of the OV and thes2s models.

2. The OV model and the s2s effect

Imagine many cars running in one direction on asingle-lane highway. Let xk(t) denote the position of thek-th car at time t. No overtaking is assumed so thatxk(t) ≤ xk+1(t) holds for arbitrary time t. The time-evolution of the OV model [3] is given by

dvk(t)

dt=

1

t0[vopt (∆xk(t)) − vk(t)] , (1)

where vk := dxk/dt and ∆xk := xk+1−xk are the veloc-ity of the k-th car and the interval between the cars kand k+1, respectively. A function vopt and a constant t0represent an optimal velocity and sensitivity of drivers,or the delay of drivers’ response, in other words.

Since the current velocity and the current interval be-tween the car ahead determine the acceleration throughthe time-evolution and the optimal velocity, we clas-sify the OV model (1) as the acceleration-control type(aOV). On the other hand, the OV model of the velocity-control type (vOV) was proposed in earlier studies of thecar-following models [9],

vk(t) = vopt (∆xk(t − t0)) . (2)

Replacement of t in the above equation (2) with t + t0and the Taylor expansion of vk(t + t0) yield

vopt (∆xk(t)) = vk(t + t0)

= vk(t) +dvk(t)

dtt0 +

1

2

d2vk(t)

dt2t20 + · · · ,

which is rewritten as

dvk(t)

dt+

1

2

d2vk(t)

dt2t0 + · · · =

1

t0[vopt (∆xk(t)) − vk(t)] .

Thus the aOV model (1) is given by neglecting the higherorder terms in the Taylor series (2). Though the aOVmodel is more common in the studies on vehicle traf-fic, we shall concentrate on an ultradiscretizable hybridof the vOV and the s2s models. Thus we call the vOVmodel (2) simply as the OV model, hereafter.

Note that the input to the OV function vopt(x) in theOV model (2) is the headway at a single point of timet − t0 that is prior to the present time t. Thus we maysay that the OV model describes, in a sense, “reckless”

– 68 –

JSIAM Letters Vol. 1 (2009) pp.68–71 Kazuhito Oguma and Hideaki Ujino

drivers since the model pays no attention to the head-way between the time t − t0 and the present time t. Onthe other hand, “cautious” drivers governed by the s2smodel [7] keep watching and require enough length ofheadway to go on for a certain period of time before theyrestart their cars. The contrast between the two modelssuggests the idea that the s2s effect and the OV modelcan be brought together by appropriately choosing an ef-fective distance ∆effxk(t) containing information on theheadway for a certain period of time going back fromthe present as an input to the OV function vopt(x). Weshall see this idea works in what follows.

What is crucial in the ultradiscretization of the aOVmodel [8] is the choice of the OV function,

vopt(x) := v0

(

1

1 + e−(x−x0)/δx− 1

1 + ex0/δx

)

, (3)

where v0, x0 and δx are positive constants. In terms ofthe auxiliary functions,

vopt(x) := v0

dxopt(x)

dx(4)

xopt(x) := δx log(

1 + e(x−x0)/δx)

(5)

the OV function (3) is expressed as

vopt(x) = vopt(x) − vopt(x = 0).

A naive discretization of the auxiliary function (4),

vdopt(x) :=

xopt(x) − xopt(x − v0δt)

δt,

introduces the OV function for the discrete OV (dOV)model,

vdopt(x) = vd

opt(x) − vdopt(x = 0)

=δx

δtlog

(

1+e(x−x0)/δx

1+e−x0/δx

/

1+e(x−x0−v0δt)/δx

1+e−(x0+v0δt)/δx

)

,

(6)

which is found to be ultradiscretizable [8].Let xn

k := xk(t = nδt) and vnk := (xn+1

k −xnk )/δt where

n(= 0, 1, 2 · · · ) and δt(> 0) are the integral time and thediscrete time-step, respectively. Employing the effectivedistance as

∆deffxn

k := δx log

(

n0∑

n′=0

e−∆xn−n′

k /δx

n0 + 1

)−1

, (7)

where n0 := t0/δt, we extend the OV model (2) in atime-discretized form as

vnk = vd

opt

(

∆deffxn

k

)

, (8)

which is equivalent to

xn+1

k

= xnk + δx

log

1 +

(

n0∑

n′=0

e−(∆xn−n′

k −x0)/δx

n0 + 1

)−1

− log

1 +

(

n0∑

n′=0

e−(∆xn−n′

k −x0−v0δt)/δx

n0 + 1

)−1

− log(

1+e−x0/δx)

− log(

1+e−(x0+v0δt)/δx)

.

It is straightforward to confirm that the continuum limitδt → 0 of the above discrete s2s–OV (ds2s–OV) model(8) reduces to the integral-differential equation which wecall the s2s–OV model,

dxk(t)

dt= vopt (∆effxk(t))

= v0

(

1 +1

t0

∫ t0

0

e−(∆xk(t−t′)−x0)/δxdt′)−1

− v0

(

1 + ex0/δx)−1

, (9)

where the corresponding effective distance is given by

∆effxk := δx log

(

1

t0

∫ t0

0

e−∆xk(t−t′)/δxdt′)−1

.

We shall see that the s2s effect is indeed built into theOV model in the ultradiscrete limit of the ds2s–OVmodel.

3. Ultradiscretization

Ultradiscretization [10] is a scheme for getting apiecewise-linear equation from a difference equation viathe limit formula

limδx→+0

δx(eA/δx + eB/δx + · · · ) = max(A,B, · · · ).

In order to go forward to the ultradiscretization of theds2s–OV model (8), it will be a good choice for us to be-gin with the ultradiscrete limit δx → +0 of the auxiliaryfunction (5):

xuopt(x) := lim

δx→+0xopt(x) = max(0, x − x0). (10)

In the same way that the OV function for the dOV model(6) is obtained from the auxiliary function (5), we obtainthe OV function for the uOV model [8] as

vuopt(x) = vu

opt(x) − vuopt(x = 0)

= max

(

0,x−x0

δt

)

−max

(

0,x−x0

δt−v0

)

, (11)

where vuopt(x) := (xu

opt(x)− xuopt(x−v0δt))/δt. The effec-

tive distance (7), on the other hand, is ultradiscretizedin the same manner:

∆ueffxn

k := limδx→+0

∆deffxn

k = − n0

maxn′=0

(

−∆xn−n′

k

)

=n0

minn′=0

(

∆xn−n′

k

)

. (12)

Thus we obtain an ultradiscrete equation

vnk = vu

opt (∆ueffxn

k ) , (13)

which is equivalent to

xn+1

k = xnk + max

[

0,n0

minn′=0

(

∆xn−n′

k

)

− x0

]

− max

[

0,n0

minn′=0

(

∆xn−n′

k

)

− x0 − v0δt

]

,

– 69 –


as the ultradiscrete limit of the ds2s–OV model (8).We name it the ultradiscrete s2s–OV (us2s–OV) model.When the monitoring period n0 is fixed at zero, the us2s–OV model reduces to a special case of the uOV model [8].As we can see from (11), (12) and (13), the velocity vn

k

is determined by the optimal velocity for the minimumheadway in the period between n− n0 and n. Thus carswill not restart nor accelerate, unless enough clearancegoes on for a certain period of time. On the other hand,cars immediately stop or slow down when their head-ways become too small to keep their velocities. The s2seffect and a “cautious” manner of driving are built intothe uOV model in this way.

Now let us see how a CA comes out from the us2s–OVmodel. Let x0 be the discretization step of the headway∆xn

k , or equivalently, the size of the unit cell of the CA.Then with no loss of generality, we may set x0 = 1.Assume that the number of vacant cells between the carsk and k + 1, ˜∆xn

k := ∆xnk − x0, must be non-negative,

˜∆xnk ≥ 0, which prohibits car-crash. Then the us2s–OV

model (13) reduces to

xn+1

k = xnk + min

[

n0

minn′=0

(

˜∆xn−n′

k

)

, v0δt

]

. (14)

Fixing v0δt at an integer, we call this model the s2s–OV cellular automaton (CA). The s2s–OV CA reducesto the FI model [6] when n0 = 0 and to the ECA184 [5]when n0 = 0 and v0δt = 1(= x0). The s2s model [7] alsocomes out from the s2s–OV CA by choosing n0 = 1 andv0δt = 1(= x0). Thus the s2s–OV CA is regarded as ahybrid of the FI model and an extended s2s model.


We shall numerically investigate the s2s–OV CA (14).Throughout this section, the length of the circuit L isfixed at L = 100 and the periodic boundary condition isassumed as well so that xn

k + L is identified with xnk .

Spatio-temporal patterns showing trajectories of eachvehicle are given in Fig. 1. We choose the parameters andinitial conditions so that jams appear in the trajectories.The two figures in the top share the same monitoring pe-riod n0 = 2 but their maximum velocities are different.The top left trajectories show that the velocities of thevehicles are zero or one, which is less than or equal toits maximum velocity v0δt = 1. In the top right trajec-tories whose maximum velocity v0δt = 3, on the otherhand, the velocities of the vehicles read zero, one, twoand three. Thus we notice that the vehicles driven bythe s2s–OV CA can run at any allowed integral velocitywhich is less than or equal to its maximum velocity v0δt.

The other two figures in the bottom in Fig. 1 share thesame maximum velocity v0δt = 2, but their monitoringperiods are different. As is observed in the bottom twofigures, the longer the monitoring period is, the longerit takes for the cars to get out of the traffic jam. Thejam front is observed to propagate against the streamof vehicles at constant velocity x0/(n0 + 1)δt, sincecars have to wait n0 + 1 time-steps to restart after theirpreceding cars restarted, as is depicted in Fig. 2.

Fig. 3 shows fundamental diagrams giving the relation

Po

siti

on

of

Veh

icle

s

Time0 10 20 30 40 50 60 70 80 90 100

10

20

30

40

50

60

70

80

90

100

Po

siti

on

of

Veh

icle

s

Time0 10 20 30 40 50 60 70 80 90 100

10

20

30

40

50

60

70

80

90

100

Po

siti

on

of

Veh

icle

s

Time0 10 20 30 40 50 60 70 80 90 100

10

20

30

40

50

60

70

80

90

100

Po

siti

on

of

Veh

icle

s

Time0 10 20 30 40 50 60 70 80 90 100

10

20

30

40

50

60

70

80

90

100

Fig. 1. The spatio-temporal patterns of the s2s–OV CA. For allfour patterns, the number of cars K is fixed at K = 30. Themaximum velocities v0δt and the monitoring periods n0 for thesepatterns are (top left) v0δt = 1, n0 = 2, (top right) v0δt = 3,

n0 = 2, (bottom left) v0δt = 2, n0 = 1 and (bottom right)v0δt = 2, n0 = 3, respectively.

( ( = 3) + 1) δt ( = 4) 0n

( = 1)0xdirection of the stream

5

4

3

2

1

0

jam fronttime

Fig. 2. Backward propagation of the jam front at constant ve-

locity x0/(n0 + 1)δt = 1/4 for the case v0δt = 2, n0 = 3 andx0 = 1.

between the vehicle flow

Q :=1

(n1 − n0 + 1)L

K∑

k=1

n1∑

n=n0

xn+1

k − xnk

δt,

which is equivalent to the total momentum of vehiclesper unit length, and the vehicle density ρ := K/L, whereK is the number of vehicles. The fundamental diagramsclearly show phase transitions from free to jam phasesas well as metastable states, which are also observedin empirical flow-density relations [1,2]. It is remarkablethat the fundamental diagrams have multiple metastablebranches. This feature is similar to that reported byNishinari et al. [11]. We observe that each fundamentaldiagram has v0δt metastable branches and a jammingline. The branches and the jamming line correspond tointegral velocities that are less than or equal to the maxi-mum velocity v0δt. Let us confirm it with Fig. 3. The toptwo figures share the same monitoring period n0 = 3, buttheir maximum velocities are different. The top left dia-

– 70 –


Flow

Density

2

1

0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Flow

Density

2

1

0

34

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Flow

Density0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Flow

Density0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Fig. 3. The fundamental diagrams of the s2s–OV CA. The flows

Q are computed by averaging over the time period 800 ≤ n ≤1000. The maximum velocities v0δt and the monitoring periodsn0 for these patterns are (top left) v0δt = 2, n0 = 3, (top right)v0δt = 4, n0 = 3, (bottom left) v0δt = 3, n0 = 2 and (bottom

right) v0δt = 3, n0 = 4, respectively. The inclination of the freeline equals to the maximum velocity v0δt. The jamming line hasa negative inclination.

gram corresponding to v0δt = 2 has three branches. Thisnumber equals to that of all the integral velocities, two,one and zero, as is depicted in the diagram. The numberof the metastable branches in the top right diagram aswell as those of the bottom two are explained in the samemanner. This observation also suggests that the moni-toring period is irrelevant to the number of metastablebranches.

All the end points of the branches as well as the jam-ming line are on the line ρ+Q(= ρ+Q(δt/x0)) = 1. Thisis because the density at the end point is the maximumdensity ρmax(v) that allows the velocity of the slowestcar to be vδt. The maximum density ρmax(v) is deter-mined by

ρmax(v) =x0

vδt + x0

.

Since all the cars flow at the velocity vδt when ρ =ρmax(v), the corresponding flow is given by Q(ρmax) =vρmax. Thus the relation ρmax + Q(ρmax)(δt/x0) = 1holds.

The free line is a branch whose inclination equals tothe maximal velocity v0δt. Any other metastable branchand the jamming line branch out from the free line.We observe that the density at the branch point of thebranch corresponding to the velocity vδt reads

ρb =x0

(v0δt − vδt)n0 + v0δt + x0

.

This observation is explained as follows. Suppose onecar, say the car k, runs at the velocity v and the otherK − 1 cars run at the maximum velocity v0. At the mo-ment the k-th car slows down to v, the headway betweenthe cars k and k + 1 is vδt + x0. Since it takes at least

n0 + 1 time-steps for the car k to speed up to v0, theheadway between the cars k and k + 1 expands up toH = (v0δt − vδt)(n0 + 1) + vδt + x0 = x0/ρb ≥ v0δtby the time the k-th car speeds up to v0. If all the carscan obtain the headway H, slow cars running at the ve-locity v disappear in the end. Thus the density at thebranch point of the branch corresponding to the velocityvδt is given by ρb = x0/H. Note that the density at thebranch point becomes smaller as the monitoring periodbecomes larger.

5. Concluding remarks

Through an extension of the ultradiscrete OV model[8], we introduced the ds2s–OV (8) and s2s–OV (9) mod-els as ultradiscretizable traffic flow models. The model isa hybrid of the OV [3] and the s2s [7] models whose ul-tradiscrete limit gives a generalization of a special case ofthe uOV model by Takahashi and Matsukidaira [8]. Thephase transition from free to jam phases as well as theexistence of multiple metastable states were observed inthe numerically obtained fundamental diagrams for thes2s–OV CA (14), which are special cases of the us2s–OVmodel (13).

Detailed studies on the properties of the hybrid mod-els (8), (9), (13) and (14) such as exact solutions, com-parison with other traffic flow models as well as empiricaldata remain to be investigated.

Acknowledgments

The authors are grateful to D. Takahashi, J. Mat-sukidaira, A. Tomoeda, D. Yanagisawa and R. Nishi fortheir valuable comments at the spring meeting of JSIAMin March, 2009.

References

[1] D. Chowdhury, L. Santen and A. Schadschneider, Statisticalphysics of vehicular traffic and some related systems, Phys.

Rep., 329 (2000), 199–329.[2] D. Helbing, Traffic and related self-driven many-particle sys-

tems, Rev. Mod. Phys., 73 (2001), 1067–1141.

[3] M. Bando, K. Hasebe, A. Nakayama, A. Shibata and Y.Sugiyama, Dynamical model of traffic congestion and numer-

ical simulation, Phys. Rev. E, 51 (1995), 1035–1042.[4] K. Nagel and M. Schreckenberg, A cellular automaton model

for freeway traffic, J. Physique I, 2 (1992), 2221–2229.[5] S. Wolfram, Theory and applications of cellular automata,

World Scientific, Singapore, 1986.

[6] M. Fukui and Y. Ishibashi, Traffic flow in 1D cellular automa-ton model including cars moving with high speed, J. Phys.

Soc. Jpn., 65 (1996), 1868–1870.[7] M. Takayasu and H. Takayasu, 1/f noise in a traffic model,

Fractals, 1 (1993), 860–866.[8] D. Takahashi and J. Matsukidaira, On a discrete optimal ve-

locity model and its continuous and ultradiscrete relatives,JSIAM Letters, 1 (2009), 1–4.

[9] G. F. Newell, Nonlinear effects in the dynamics of car follow-ing, Oper. Res., 9 (1961), 209–229.

[10] T. Tokihiro, D. Takahashi, J. Matsukidaira and J. Satsuma,

From soliton equations to integrable cellular automata

through a limiting procedure, Phys. Rev. Lett., 76 (1996),3247–3250.

[11] K. Nishinari, M. Fukui and A. Schadschneider, A stochas-

tic cellular automaton model for traffic flow with multiple

metastable states, J. Phys. A: Math. Gen., 37 (2004), 3101–3110.

– 71 –


A new compressible fluid model for traffic flow

with density-dependent reaction time of drivers

Akiyasu Tomoeda1,2

, Daisuke Shamoto3, Ryosuke Nishi

3,

Kazumichi Ohtsuka2

and Katsuhiro Nishinari2,4

Meiji Institute for Advanced Study of Mathematical Sciences, Meiji Univeristy, 1-1-1 Higashi-mita, Tama-ku, Kawasaki, Kanagawa 214-8571, Japan1

Research Center for Advanced Science and Technology, The University of Tokyo, 4-6-1Komaba, Meguro-ku, Tokyo 153-8904, Japan2

Department of Aeronautics and Astronautics, School of Engineering, The University of Tokyo,7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan3

PRESTO, JST4


Received September 30, 2009, Accepted November 8, 2009

Abstract

In this paper, we have proposed a new compressible fluid model for the one-dimensional trafficflow taking into account a variation of the reaction time of drivers, which is based on the actualmeasurements. The model is a generalization of the Payne model by introducing a density-dependent function of reaction time. The linear stability analysis of this new model showsthe instability of homogeneous flow around a critical density of vehicles, which is observed inreal traffic flow. Moreover, the condition of the nonlinear saturation of density against smallperturbation is theoretically derived from the reduction perturbation method.

Keywords jamology, traffic flow, compressible fluid, stability analysis, reaction time


1. Introduction

Among various kinds of jamming phenomena, a traf-fic jam of vehicles is a very familiar phenomenon andcauses several losses in our daily life such as decreas-ing efficiency of transportation, waste of energy, seri-ous environmental degradation, etc. In particular, high-way traffic dynamics has attracted many researchersand has been investigated as a nonequilibrium systemof interacting particles for the last few decades [1]. Alot of mathematical models for one-dimensional trafficflow have ever been proposed [2–8] and these modelsare classified into microscopic and macroscopic mod-els in terms of the treatments of particles. In micro-

scopic models, e.g. car-following model [2, 3] and cel-lular automaton model [4, 5], the dynamics of trafficflow is described by the movement of individual ve-hicles. Whereas in macroscopic models, the dynamicsis treated as an effectively one-dimensional compress-ible fluid by focusing on the collective behavior of ve-hicles [6–8]. Moreover, it is widely known that some ofthese mathematical models are related with each other,which is shown by using mathematical method such asultra-discretization method [9] or Euler-Lagrange trans-formation [10]. That is, rule-184 elementary cellular au-tomaton model [4] is derived from Burgers equation [6]by ultra-discretization method. The specific case of op-timal velocity (OV) model [3] is formally derived fromthe rule-184 cellular automaton model via the Euler-Lagrange transformation. Quite recently, more notewor-

thy are that the ultra-discrete versions of OV model areshown by Takahashi et. al [11] and Kanai et. al [12].

In contrast to practically reasonable microscopic mod-els, only a small number of macroscopic models withreasonable expression have been proposed, even in vari-ous traffic models based on the hydrodynamic theory offluids [6–8]. In previous fluid models, one does not haveany choice to introduce the diffusion term into the mod-els such as Kerner and Konhauser model [8], in orderto represent the stabilized density wave, which indicatesthe formation of traffic jam. However, it emerges as aserious problem that some vehicles move backward evenunder heavy traffic, since the diffusion term has a spatialisotropy. As mentioned by Daganzo in [13], the most es-sential difference between traffic and fluids is as follows:

“A fluid particle responds to the stimulus from the

front and from behind, however a vehicle is an anisotropic

particle that mostly responds to frontal stimulus.”That is, traffic vehicles exhibits an anisotropic behav-

ior, although the behavior of fluid particles with simplediffusion is isotropic. Therefore, unfortunately we wouldhave to conclude that traffic models which include thediffusion term are not reasonable for the realistic expres-sion of traffic flow. Given these factors, we suppose thattraffic jam forms as a result of the plateaued growth ofsmall perturbation by the nonlinear saturation effect.

Now let us return to Payne model [7], which is oneof the most fundamental and significant fluid models oftraffic flow without diffusion term. Payne model is given

– 72 –

JSIAM Letters Vol. 1 (2009) pp.72–75 Akiyasu Tomoeda et al.

= 3.0

= 2.0

= 1.0

1.00.80.60.40.20.0

0.8

0.6

0.4

0.2

0.0

0.2

0.4

τ

τ

τ

-

-

-

-

Density

Sta

bil

ity

Fig. 1. The plots of stability in the case of (3) with ρmax = 1.0and V0 = 1.0.

by

∂ρ

∂t+

∂

∂x(ρv) = 0, (1)

∂v

∂t+ v

∂v

∂x=

1

τ(Vopt(ρ) − v) +

1

2τρ

dVopt(ρ)

dρ

∂ρ

∂x, (2)

where ρ(x, t) and v(x, t) corresponds to the spatial ve-hicle density and the average velocity at position x andtime t, respectively. τ is the reaction time of drivers,which is a positive constant value, and Vopt(ρ) is theoptimal velocity function, which represent the desiredvelocity of drivers under the density ρ.

As the optimal velocity function, Payne employs

Vopt(ρ) = V0

(

1 − ρ

ρmax

)

, (3)

where V0 is the maximum velocity in free flow phase andρmax corresponds to the maximum density when all carsbecome completely still.

The linear stability analysis for Payne model gives usthe dispersion relation as follows [7]:

ω =−kVopt(ρ0)+i

2τ

[

1±√

1+4a20τ

2(2kρ0i−k2)

]

, (4)

where a20 = − 1

2τ

dVopt(ρ)

dρ

∣

∣

∣

∣

ρ=ρ0

> 0. (5)

Furthermore, the linear stability condition is calculatedfrom dispersion relation (4),

1

2τ> −ρ2

0

dVopt(ρ)

dρ

∣

∣

∣

∣

ρ=ρ0

. (6)

If one applies velocity-density relation of (3) to this sta-bility condition, the following linear stability conditionis obtained

ρmax

2τV0

> ρ20. (7)

Here, let us define the stability function S(ρ0) by

S(ρ0) =ρmax

2τV0

− ρ20. (8)

In this function, the condition S(ρ0) > 0 (S(ρ0) < 0)corresponds to the stable (unstable) state.

Fig. 1 shows the stability plots for several constantvalues of τ . From this figure, we can observe that theinstability region of homogeneous flow occurs beyond acritical density of vehicles.

However, Payne model shows the condensation of ve-

Vel

oci

ty[m

/sec

]

Time[sec]540520500480460440

20.5

free

21.5

22.5

23.5

24.5

21

22

23

24

25

Car2Car1

τ

Time[sec]

Car2Car1

0

2

4

6

8

10

12

jamτ

600 620 640 660 680 700 720

Velocity[m

/sec]

Fig. 2. The synthetograph of time-series data in two phases. The

left figure corresponds to the free flow phase (from 430 sec. to560 sec.) and the right one corresponds to the jam phase (from600 sec. to 720 sec.).

hicles due to the momentum equation (2). That is, asthe density increases, the value of optimal velocity inthe first term of right-hand-side becomes zero and thevalue of second term also becomes zero. Thus, the ve-hicles gather in one place due to the nonlinear effectvvx and the small perturbation blows up without sta-bilization. Therefore, Payne model is also incomplete todescribe the realistic dynamics of traffic flow.

Thus, in this paper we propose a new compressiblefluid model, by improving the Payne model in terms ofthe reaction time of drivers based on the following actualmeasurements.

2. New compressible fluid model based

on experimental data

We have performed car-following experiment on ahighway. The leading vehicle cruises with legal velocityand following vehicle pursues the front one. The time-series data of the velocity and position (latitude andlongitude) of each vehicle are recorded every 0.2 seconds(5Hz) by a global positioning system (GPS) receiver on-board with high-precision (< 60 centimeters precision).

By dividing the time-series data into two phases, i.e.free-flow phase and jam phase, based on the velocity, wehave obtained the synthetograph of two time-series dataas shown in Fig. 2, which shows that drivers obviouslyreact to the front car with a slight delay in both twophases. Here, assuming that the reaction time of driversis considered as a slight delay of behavior, we calculatethe correlation coefficient which is denoted by

ri+1(τ)

= 〈vi(t)vi+1(t + τ)〉t (9)

=

∑

k

(

vi(t(k))−vi

) (

vi+1(t(k) +τ)−vi+1

)

√

∑

k

(

vi(t(k))−vi

)2√

∑

k

(

vi+1(t(k) +τ)−vi+1

)2,

(10)

where vi(t) shows the velocity of i-th car at time t. Notethat, i-th car drives in front of (i + 1)-th car. The sym-bol 〈∗〉 and bar indicates an ensemble average and time-average, respectively. Finally we have obtained the cor-relation coefficient for each given τ which is shown inFig. 3. From this figure, we have found that the peak ofcorrelation coefficient shifts according to the situationof the road. Here, since the reaction time of a driver isconsidered as τ , the reaction time of a driver is not con-

– 73 –


Free

Jam

Corr

elat

ion C

oef

fici

ent

(r(τ

))

Reaction Time of Drivers (τ)

1510500.0

0.2

0.4

0.6

0.8

1.0

Fig. 3. The plots of correlation coefficient based on (9) for giveneach reaction time (τ).

stant, but obviously changes according to the situationof the road. That is, if the traffic state is free (jam), thereaction time of a driver is longer (shorter).

As a reasonable assumption based on this result, thereaction time of drivers depend on the density on theroad. Under this assumption, we have extended Paynemodel and proposed a new compressible fluid model asfollows,

∂ρ

∂t+

∂

∂x(ρv) = 0, (11)

∂v

∂t+ v

∂v

∂x=

1

τ(ρ)(Vopt(ρ) − v) +

1

2ρτ(ρ)

dVopt(ρ)

dρ

∂ρ

∂x.

(12)

The difference between this model and Payne model isthat the reaction time of drivers is changed from con-stant value to density-dependent function τ(ρ).

2.1 Linear stability analysis

Now let us perform the linear stability analysis forour new dynamical model to investigate the instabilityof homogeneous flow. The homogeneous flow and smallperturbation are given by

ρ = ρ0 + ερ1, v = Vopt(ρ0) + εv1. (13)

One obtains the form of dispersion relation as

ω =−kVopt +i[

1±√

1+4a20τ(ρ0)2(2kρ0i−k2)

]

2τ(ρ0), (14)

where a20 = − 1

2τ(ρ0)

dVopt(ρ)

dρ

∣

∣

∣

∣

ρ=ρ0

> 0. (15)

Hence, the stability conditions

1

2τ(ρ0)> −ρ2

0

dVopt(ρ)

dρ

∣

∣

∣

∣

ρ=ρ0

, (16)

are obtained. The difference between (6) and (16) comesfrom the reaction time, which changes from τ to τ(ρ0)which is decided by only the initial density of homoge-neous flow. Therefore, this stability condition (16) andthe stability condition of Payne model (6) are essentiallyequivalent, that is, our new model also shows the insta-bility of homogeneous flow. Substituting (3) into (16),the stability condition leads to

ρmax

2τ(ρ0)V0

> ρ20. (17)

The most important point of our new model is that itis possible to stabilize the perturbation by the nonlineareffect which is created by the function τ(ρ), though thisstabilizing mechanism was failed in the Payne model. Inorder to show this nonlinear effect, the evolution equa-tion of small perturbation is derived in the next subsec-tion.

2.2 Reductive perturbation analysis

Let us define the slowly-varying variables X and T byGalilei transformation as

X = ε(x − cgt), T = ε2t, (18)

where cg = dω/dk is the group velocity. Next, we as-sume that ρ(x, t), v(x, t) can be expressed in terms ofthe power series of ε, i.e.,

ρ ∼ ρ0 + ερ1 + ε2ρ2 + ε3ρ3 + · · · , (19)

v ∼ v0 + εv1 + ε2v2 + ε3v3 + · · · . (20)

Substituting (19) and (20) into (11) and (12), for eachorder term in ε, we have, respectively,

ε3:∂ρ1

∂T+

∂

∂X(ρ2(v0−cg)+ρ1v1 +ρ0v2) = 0, (21)

ε4:∂ρ2

∂T+

∂

∂X(ρ3(v0−cg)+ρ2v1 +ρ1v2 +ρ0v3) = 0,

(22)

and

ε2: (v0 − cg)∂v1

∂X

=V ′′

optρ21 + 2V ′

optρ2 − 2v2

2τ(ρ0)+

V ′

opt

2τ(ρ0)ρ0

∂ρ1

∂X, (23)

ε3:∂v1

∂T+ (v0 − cg)

∂v2

∂X+ v1

∂v1

∂X

=1

τ(ρ0)

(

V ′′′

optρ31

6+ V ′′

optρ1ρ2 + V ′

optρ3 − v3

)

− ρ1τ′(ρ0)

τ(ρ0)2

(

V ′

optρ2 +V ′′

optρ21

2− v2

)

+1

2τ(ρ0)

(

V ′

opt

ρ0

∂ρ2

∂X+

V ′′

optρ1

ρ0

∂ρ1

∂X

−V ′

optρ1

ρ20

∂ρ1

∂X−

V ′

optρ1τ′(ρ0)

ρ0τ(ρ0)

∂ρ1

∂X

)

.

(24)

Note that the prime means the abbreviation of eachderivation.

Putting φ1 = ρ1 as a first-order perturbation quantityand eliminating the second-order quantities (ρ2, v2) in(21) and (23), we have obtained the Burgers equation

∂φ1

∂T=

[

2(v0 − cg)

ρ0

− V ′′

optρ0

]

φ1

∂φ1

∂X

+

[

v0 − cg

2ρ0

− τ(ρ0)(v0 − cg)2

]

∂2φ1

∂X2, (25)

as a evolution equation of first-order quantity. Moreover,eliminating the third-order quantities (ρ3, v3) in (22) and

– 74 –


Table 1. Classification table based on the coefficient of diffusionterm.

P Q P − εQΦ Time evolution

(Linear unstable) Q < 0P − εQΦ > 0 Saturation

P < 0P − εQΦ < 0 Amplification

Q > 0 P − εQΦ < 0 Amplification

(Linear stable)Q < 0 P − εQΦ > 0 Damping

P > 0 Q > 0P − εQΦ > 0 DampingP − εQΦ < 0 Amplification

(24) and defining the perturbation Φ included the first-and second-order perturbation as Φ = φ1 + εφ2, thehigher-order Burgers equation

∂Φ

∂T=

2(v0−cg)

ρ0

Φ∂Φ

∂X+

[

v0−cg

2ρ0

− (v0−cg)2τ(ρ0)

]

∂2Φ

∂X2

− ε(v0 − cg)2

[

2τ(ρ0)

ρ0

+ τ ′(ρ0)

]

∂

∂X

(

Φ∂Φ

∂X

)

+

[

τ(ρ0)

ρ0

− 2(v0 − cg)τ(ρ0)2

]

∂3Φ

∂X3

, (26)

is obtained. Note that, in this derivation, we put V ′′

opt =V ′′′

opt = 0 due to the relation (3).Although the first-order equation (25) of our model

is essentially equivalent to that of Payne model, thesecond-order equation (26) is different from that ofPayne model in terms of the coefficient of the third termof right-hand side.

In order to analyze the nonlinear effect of our model,let us consider the coefficient of the diffusion term ofsecond-order equation. Let us put the coefficient of thesecond term of right-hand side in (26) as

P =v0 − cg

2ρ0

− τ(ρ0)(v0 − cg)2, (27)

and also put the coefficient of the third term as

Q =2(v0 − cg)

2τ(ρ0)

ρ0

+ (v0 − cg)2τ ′(ρ0). (28)

Thus, diffusion term of (26) is given by

(P − εQΦ)∂2Φ

∂X2. (29)

Since P = 0 corresponds to the neutrally stable condi-tion, we assume the value P is negative, which corre-sponds to the linear unstable case. In the case of Paynemodel, Q is always positive because τ is constant, i.e. τ ′

is always zero. Therefore, the diffusion coefficient (29) isalways negative under the linear unstable condition ofPayne model, which makes the model difficult to treatnumerically. However, in the case of our model, τ ′(ρ) isalways negative since τ(ρ) is considered as monotonicallydecreasing function. If Q is negative, the diffusion coef-ficient becomes positive as Φ increases even under thelinear unstable condition. In this situation, the smallperturbation will be saturated by nonlinear effect cre-ated by the density-dependent function of reaction timeof drivers. The conditions for nonlinear saturation cor-responds to P < 0 and Q < 0, which are transformed

into the following expressions,

τ(ρ0) >1

2ρ0(v0 − cg), τ ′(ρ0) < −2τ(ρ0)

ρ0

. (30)

All conditions which include the other cases are sum-marized in Table 1.

3. Conclusion

A new compressible fluid model for one-dimensionaltraffic flow has been proposed by introducing the density-dependent function τ(ρ) about reaction time of driversbased on actual measurements. Our new model does notinclude the diffusion term which exhibits the unrealis-tic isotropic behavior of vehicles, since vehicles mostlyresponds to the stimulus from the front one. The linearstability analysis for our new model gives us the exis-tence of instability of homogeneous flow. We have foundthat the stability condition is essentially equivalent toPayne model. Moreover, the behavior of small pertur-bation of density is classified according to the diffusioncoefficient of the higher-order Burgers equation, which isderived from our new model by using reductive perturba-tion method. From this classification, we have obtainedthe special condition where the small perturbation issaturated by nonlinear effect.

References

[1] D. Chowdhury, L. Santen and A. Schadschneider, Statisticalphysics of vehicular traffic and some related systems, Phys.

Rep., 329 (2000), 199–329.[2] G. F. Newell, Nonlinear effects in the dynamics of car follow-

ing, Oper. Res., 9 (1961), 209–229.

[3] M. Bando, K. Hasebe, A. Nakayama, A. Shibata and Y.Sugiyama, Dynamical model of traffic congestion and numer-

ical simulation, Phys. Rev. E, 51 (1995), 1035–1042.[4] K. Nishinari and D. Takahashi, Analytical properties of ultra-

discrete Burgers equation and rule-184 cellular automaton, J.Phys. A: Math. Gen., 31 (1998), 5439–5450.

[5] M.Kanai, K.Nishinari and T.Tokihiro, Stochastic optimal ve-locity model and its long-lived metastability, Phys. Rev. E,

72 (2005), 035102.[6] G. B. Whitham, Linear and Nonlinear Waves, Wiley-

Interscience, New York, 1974.

[7] H. J. Payne, Models of freeway traffic and control, in: Simu-lation Council Proc., G. A. Bekey ed., Mathematical Models

of Public Systems, 1 (1971), 51–61.[8] B. S. Kerner and P. Konhauser, Cluster effect in initially

homogeneous traffic flow, Phys. Rev. E, 48 (1993), R2335–R2338.

[9] T. Tokihiro, D. Takahashi, J. Matsukidaira and J. Satsuma,

From soliton equations to integrable cellular automata

through a limiting procedure, Phys. Rev. Lett., 76 (1996)3247–3250.

[10] J. Matsukidaira and K. Nishinari, Euler-Lagrange correspon-

dence of cellular automaton for traffic-flow models, Phys.Rev.Lett., 90 (2003), 088701.

[11] D. Takahashi and J. Matsukidaira, On a discrete optimal ve-

locity model and its continuous and ultradiscrete relatives,JSIAM Letters, 1 (2009), 1–4.

[12] M. Kanai, S. Isojima, K. Nishinari and T. Tokihiro, Ultra-discrete optimal velocity model: A cellular-automaton modelfor traffic flow and linear instability of high-flux traffic, Phys.Rev. E, 79 (2009), 056108.

[13] C. F. Daganzo, Requiem for second-order fluid approxima-tions of traffic flow, Trans. Res. B, 29 (1995), 277–286.

– 75 –


Error analysis for a matrix pencil of Hankel matrices

with perturbed complex moments

Tetsuya Sakurai1, Junko Asakura

2, Hiroto Tadano

1and Tsutomu Ikegami

3

Department of Computer Science, University of Tsukuba, 1-1-1 Tennoudai, Tsukuba, Ibaraki305-8573, Japan1

Research and Development Division, Square Enix Co. Ltd., Shinjuku Bunka Quint Bldg. 3-22-7 Yoyogi, Shibuya-ku, Tokyo 151-8544, Japan2

Information Technology Research Institute, AIST, 1-1-1 Umezono, Tsukuba, Ibaraki 305-8568,Japan3


Received September 30, 2009, Accepted December 6, 2009

Abstract

In this paper, we present perturbation results for eigenvalues of a matrix pencil of Hankelmatrices for which the elements are given by complex moments. These results are extendedto the case that matrices have a block Hankel structure. The influence of quadrature erroron eigenvalues that lie inside a given integral path can be reduced by using Hankel matricesof an appropriate size. These results are useful for discussing the numerical behavior of rootfinding methods and eigenvalue solvers which make use of contour integrals. Results fromsome numerical experiments are consistent with the theoretical results.

Keywords perturbation results, eigenvalues, block Hankel matrix, matrix-valued moments


1. Introduction

We consider the problem of determining poles and re-spective residues from a sequence of complex moments.This problem appears in methods for finding roots of an-alytic functions [1–3] and eigenvalue solvers [4–8] usingcontour integrals. In these methods, the problem of de-termining zeros or eigenvalues in a given circle is reducedto an eigenvalue problem for a matrix pencil correspond-ing to Hankel matrices.

In this paper, we present perturbation results for theeigenvalues of a matrix pencil of Hankel matrices associ-ated with complex moments. We extend these results tothe case where matrices have a block Hankel structurewith matrix valued moments. These results are usefulin discussing the numerical behavior of moment-basedmethods, and they can be used to determine parametersfor these methods.

Our results suggest that the use of Hankel matrices ofan appropriate size reduces the influence of quadratureerror on eigenvalues that lie inside a given integral path.Hankel matrices are known to be very ill-conditioned [9].Indeed, the condition number for Hankel matrices oftenincreases exponentially. However, our element-wise erroranalysis shows that eigenvalues inside the unit circle ofa matrix pencil of Hankel matrices can be obtained ac-curately. This result can be generalized to an arbitrarycircle by a shift and scale transformation.

The rest of this paper is organized as follows. In Sec-tion 2, we present perturbation results for a matrix pen-cil of Hankel matrices. In Section 3, we extend the resultsto the case where the matrix pencil consists of blockHankel matrices. Some numerical experiments, the re-

sults of which are consistent with the theoretical results,are reported on in Section 4.

2. Perturbation results for a matrix pen-

cil of Hankel matrices

Let f(z) be a rational function with n simple polesηi ∈ C for 1 ≤ i ≤ n, and let νi ∈ C for 1 ≤ i ≤ n be theirresidues, where C denotes the set of complex numbers.Throughout this paper, we assume that η1, . . . , ηn aremutually distinct and that νi 6= 0 for 1 ≤ i ≤ n.

Define the sequence of complex moments as follows:

µk =1

2πi

∫

T

zkf(z)dz, k = 0, 1, . . . , (1)

where T is the unit circle. Let m poles η1, . . . , ηm be lo-cated inside the unit circle, with the rest located outsidethe unit circle. Then, from the residue theorem, µk isgiven by

µk =

m∑

i=1

νiηki , k = 0, 1, . . . .

Let the Hankel matrix Hm ∈ Cm×m associated with

µk2m−2

k=0and the shifted Hankel matrix H<

m ∈ Cm×m

associated with µk2m−1

k=1be

Hm = [µi+j−2]mi,j=1, H<

m = [µi+j−1]mi,j=1,

respectively. Let Vm ∈ Cm×m be the Vandermonde ma-

trix Vm = [ηj−1

i ]mi,j=1. The eigenvalues and eigenvectors ofthe matrix pencil H<

m−λHm can be expressed as follows:

Theorem 1 The eigenvalues of the matrix pencil H<m−

λHm are given by η1, . . . , ηm. The right eigenvector xi

– 76 –

JSIAM Letters Vol. 1 (2009) pp.76–79 Tetsuya Sakurai et al.

with respect to ηi is given by

xi =

(

1√νi

)

V −1m ei,

and the left eigenvector yi is given by yi = xi with

y∗

i Hmxi = 1. Here ei is the i-th unit vector.

Proof Let um ∈ Cm be

um = [ν1

2

1 , . . . , ν1

2

m]T,

and let ∆m = diag(η1, . . . , ηm). It can be easily seen that

µk = uTm∆k

mum, k = 0, 1, . . . .

It follows that the Hankel matrices can be factorized asfollows:

Hm = φTmφm, H<

m = φTm∆mφm,

where

φm = [um ∆mum · · · ∆m−1m um] ∈ C

m×m.

This implies that

H<m − λHm = φT

m(∆m − λIm)φm,

where Im is the m × m identity matrix. The matrix φm

is nonsingular, because φm can be expressed as

φm = diag(ν1

2

1 , . . . , ν1

2

m)Vm,

where η1, . . . , ηm are all distinct, and ν1, . . . , νm are notzero. Thus, the eigenvalues of the matrix pencil H<

m −λHm are given by η1, . . . , ηm.

Since ∆mei = ηiei, it can be verified that

xi = φ−1m ei = V −1

m diag(ν1

2

1 , . . . , ν1

2

m)−1ei =

1√νi

V −1m ei.

We can also verify that yi = (φm)−1ei = xi. From these

results, we have

y∗

i Hmxi = eTi (φT

m)−1 φTm φm φ−1

m ei = 1.

This proves the theorem.(QED)

An error estimation for the eigenvalues of a perturbedmatrix pencil, when all the eigenvalues are simple, isgiven in [2]. Let λ1, . . . , λn be eigenvalues of the matrixpencil A−λB, and let xi and yi be right and left eigen-vectors with respect to λi, respectively. Then the eigen-value λi of the perturbed matrix pencil (A+∆A)−λ(B+∆B), where ‖∆A‖2 ≤ δ and ‖∆B‖2 ≤ δ for sufficientlysmall δ > 0, satisfies the following relation:

|λi − λi| ≤ δ(1 + |λi|)‖xi‖2 · ‖yi‖2

|y∗

i Bxi|+ O(δ2). (2)

Define

τi(A,B) = (1 + |λi|)‖xi‖2 · ‖yi‖2

|y∗

i Bxi|,

then τi(A,B) expresses the condition for the i-th eigen-value of the matrix pencil A−λB. From Theorem 1, wehave the following expression.

Lemma 2

τi(H<m,Hm) =

1 + |ηi||νi|

‖V −1m ei‖2

2. (3)

Suppose that the contour integral (1) is approximatedusing the N -point trapezoidal rule:

µk =1

N

N−1∑

j=0

θk+1

j f(θj), k = 0, 1, . . . ,

with the equi-distributed points on the unit circle:

θj = e2πi

N(j+ 1

2), j = 0, 1, . . . , N − 1.

The approximate moments µk suffer from quadratureerror. For error analysis of the trapezoidal rule, we usethe following estimation.

Lemma 3 Let η be a complex number with |η| 6= 1. For

any integer k with 0 ≤ k < N , the following holds:

1

N

N−1∑

j=0

θk+1

j

θj − η=

ηk

1 + ηN. (4)

Proof If |η| < 1, we have

1

N

N−1∑

j=0

θk+1

j

θj − η=

1

N

N−1∑

j=0

θkj

1 − η

θj

=

∞∑

p=0

ηp 1

N

N−1∑

j=0

θk−pj

=

∞∑

q=0

(−1)qηNq+k. (5)

The last step follows from the fact that

1

N

N−1∑

j=0

θpj =

(−1)q if p = qN for q ∈ Z

0 otherwise,

where Z denotes the set of integers.Similarly, for the case in which |η| > 1, we have

1

N

N−1∑

j=0

θk+1

j

θj − η=

1

N

N−1∑

j=0

(−1

η

)

θk+1

j

1 − θj

η

=

∞∑

p=0

( −1

ηp+1

)

1

N

N−1∑

j=0

θp+k+1

j

=∞∑

q=1

(−1)q−1η−Nq+k. (6)

From (5) and (6), we have (4).(QED)

From this Lemma, we derive the following equation.

µk =

n∑

i=1

(

νi

1 + ηNi

)

ηki , k = 0, 1, . . . , N − 1. (7)

This equation implies that µk is a moment with a newweight νi = νi/(1 + ηN

i ) instead of νi. Therefore, we seethat the quadrature error affects the weight; however,the poles η1, . . . , ηn are unchanged, if computations areperformed without any numerical error.

For ηi such that |ηNi | ≫ 1, the weight νi = νi/(1+ηN

i )is close to zero. Suppose that η1, . . . , ηn are ordered suchthat |η1| ≤ · · · ≤ |ηn|. Let m′ be an integer such thatνi = O(ε) for any i with m′ < i ≤ n for sufficiently smallε > 0. Then (7) can be expressed as

µk =

m′

∑

i=1

νiηki + O(ε).

– 77 –


Let µk =∑m′

i=1νiη

ki , then µk can be regarded as a per-

turbed moment from µk which is obtained from m′ polesη1, . . . , ηm′ with weights ν1, . . . , νm′ . Let

Fm′ = diag((1 + ηN1 )−

1

2 , . . . , (1 + ηNm′)−

1

2 ),

then we have

µk = (Fm′um′)T∆km′(Fm′um′).

Therefore, Hm′ and H<m′ , the m′ × m′ Hankel matrices

associated with µk, can be factorized as follows:

Hm′ = (Fm′φm′)T(Fm′φm′), (8)

H<m′ = (Fm′φm′)T∆m′(Fm′φm′). (9)

The right eigenvector of H<m′ − λHm′ with respect to ηi

is given by

xi =1√νi

V −1

m′ ei =

√

1 + ηNi√

νi

V −1

m′ ei,

and the left eigenvector is given by yi = xi. From theseresults and Lemma 2, the following relation is obtained.

Theorem 4 Let Hm′ = Hm′ + O(ε) and H<m′ = H<

m′ +O(ε) with sufficiently small ε > 0. Then

τi(H<m′ , Hm′) = |1 + ηN

i | × τi(H<m′ ,Hm′) + O(ε).

This theorem shows that the condition on the i-th eigen-vector of the matrix pencil H<

m′ − λHm′ , which is con-structed from the moments calculated by numerical in-tegration, is magnified by a factor |1 + ηN

i | > 1. How-ever, the influence of the quadrature error on our targeteigenvalues that lie inside the unit circle is |1 + ηN

i | ≈ 1if νi = O(1). We should take m′ large enough forνi/(1 + ηN

m′) = O(ε). This condition can be assessed by

the singularity of Hm′ .

3. Extension to block Hankel matrices

with matrix-valued moments

Now we extend the results in the previous section tothe case of matrix-valued moments. Let L be a positiveinteger with L ≤ n, and let Ni ∈ C

L×L, 1 ≤ i ≤ n begiven by Ni = dic

Ti , i = 1, 2, . . . , n, with vectors ci,di ∈

CL, 1 ≤ i ≤ n. Define the matrix valued moments Mk ∈

CL×L by

Mk =1

2πi

∫

T

zkF (z)dz, k = 0, 1, . . . ,

where F (z) ∈ CL×L is the matrix-valued function de-

fined by F (z) =∑n

i=1Ni/(z−ηi). This function appears

in the block Sakurai-Sugiura method for both general-ized and nonlinear eigenvalue problems [4–6].

It can be verified that

Mk =m

∑

i=1

Niηki = DT

m∆kmCm, k = 0, 1, . . . ,

where

Cm = [c1 c2 · · · cm]T, Dm = [d1 d2 · · · dm]T.

Here, we assume that the column vectors of Cm andthose of Dm are linearly independent, respectively.

Let K be an integer such that m ≤ KL ≤ n. De-fine the block Hankel matrices HKL,H<

KL ∈ CKL×KL

with elements Mk by HKL = [Mi+j−2]Ki,j=1, H<

KL =

[Mi+j−1]Ki,j=1. Let Φm,KL,Ψm,KL ∈ C

m×KL be

Φm,KL = [Cm ∆mCm . . . ∆K−1m Cm],

Ψm,KL = [Dm ∆mDm . . . ∆K−1m Dm].

We define the m × m leading submatrices as follows:

Hm = HKL(1 : m, 1 : m), H<m = H<

KL(1 : m, 1 : m),

and also

Φm = Φm,KL(1 : m, 1 : m), Ψm = Ψm,KL(1 : m, 1 : m).

Then Hm and H<m, the m × m block Hankel matrices

corresponding to Mk, can be factorized as follows:

Hm = ΨTmΦm, H<

m = ΨTm∆mΦm.

These relations lead to the following theorem.

Theorem 5 The eigenvalues of H<m − λHm are given

by η1, . . . , ηm. The right and left eigenvectors xi and yi

with respect to ηi are given by xi = Φ−1m ei and yi =

(Ψm)−1ei, respectively, and y

∗Hmxi = 1.

The approximations for Mk are calculated by

Mk =1

N

N−1∑

j=0

θk+1

j F (θj), k = 0, 1, . . . . (10)

Similar to the case of µk, we have

Mk =

m′

∑

i=1

Ni

1 + ηNi

ηki + O(ε).

Therefore, we can see that Mk approximately consistsof m′ poles η1, . . . , ηm′ with the matrix-valued weightsN1, . . . , Nm′ . In this case, the quadrature error is O(ε),which is small enough.

Setting Mk =∑m′

i=1Ni/(1 + ηN

i )ηki , we have the fol-

lowing theorem.

Theorem 6 Let Hm′ = Hm′ + O(ε) and H<m′ = H<

m′ +O(ε) for sufficiently small ε > 0. Then

τi(H<m′ , Hm′) = |1 + ηN

i | × τi(H<m′ ,Hm′) + O(ε).

Thus, we obtain a similar result to that of the scalarmoments case. The influence of the quadrature error ofmatrix-valued moments depends on the location of eacheigenvalue ηi. For eigenvalues that lie outside the unitcircle, the influence of the quadrature error is magnifiedby |1 + ηN

i | > 1. However, the perturbation resultingfrom quadrature error is not large for eigenvalues insidethe unit circle.

4. Numerical examples

In this section, some numerical experiments are con-sidered. The computations are performed in MATLABin double precision arithmetic. The matrix pencil issolved by the MATLAB function eig, and the systemof linear equations is solved by mldivide.

Example 1 The first example simply verifies the er-ror estimation in (3). Let n = m = 5. Let η1, . . . , ηm

– 78 –


Table 1. Results of Example 1. Underlines indicate the incorrectdigits.

i Real(ηi) |ηi − ηi| νi

1 −1.012073233465553 1.2 × 10−2 10−14

2 0.499999980832560 4.5 × 10−8 10−8

3 0.500000000000000 5.5 × 10−16 1.04 1.000000000000000 8.5 × 10−16 1.0

5 1.999999999978390 4.7 × 10−11 10−6

Table 2. Results for the case of m′ = 12 in Example 2. Parame-ters are set as N = 32 and L = 5.

i Real(ηi) |ηi − ηi| τi

1 0.199999999999951 4.9 × 10−14 1.2 × 10−13

2 0.399999999999759 2.4 × 10−13 7.1 × 10−13

3 0.600000000000011 2.0 × 10−14 1.1 × 10−12

4 0.799999999999916 9.4 × 10−14 3.6 × 10−12

5 1.000000000000462 4.6 × 10−13 4.6 × 10−12

6 1.200000000176180 1.8 × 10−10 2.8 × 10−9

7 1.400000019943357 2.0 × 10−8 2.8 × 10−7

8 1.600000150508887 1.6 × 10−7 4.3 × 10−6

9 1.799956725555216 4.3 × 10−5 1.8 × 10−4

10 2.000315575880587 3.2 × 10−4 2.7 × 10−3

11 2.207688176008069 7.9 × 10−3 1.9 × 10−1

12 2.442759435060872 5.0 × 10−2 5.0 × 100

and ν1, . . . , νm be −1.0, 0.5 + i, 0.5 − i, 1.0, 2.0 and10−14, 10−8, 1.0, 1.0, 10−6, respectively.

The values η1, . . . , ηm are obtained by solving the gen-eralized eigenvalue problem H<

mx = λHmx. The mo-ments are calculated by µk =

∑n

i=1νk

i ηi. In Table 1, weshow ηi, |ηi − ηi| for each i. The condition number ofHm is cond(Hm) = 1.9 × 1014, however, η3 and η4 arecalculated numerically with sufficient accuracy from thematrix pencil. Other poles suffer from numerical errorwhere magnitude of each error is proportional to 1/νi.

Example 2 Let n = 20, ηi = 0.2× i for 1 ≤ i ≤ n. Herewe set m = 5. The elements of c1, . . . , cn and d1, . . . ,dn

are set by a random number generator from a uniformdistribution over the interval [0, 1]. Mk, k = 0, 1 . . . arecalculated by the N -point trapezoidal rule (10). The pa-rameters are set as N = 32, L = 5.

For each ηi with 1 ≤ i ≤ m, the error is evaluated bymax1≤i≤m |ηi − ηi|. To estimate the perturbation in theHankel matrices of size m′, we computed σm′/σ1 for vari-ous m′, where σ1, . . . , σm′ are the singular values of Hm′ .

In Table 2, we present the results for the case of m′ =12. Instead of calculating τi(H

<m′ ,Hm′), we calculated

τi by using the eigenvectors of Hm′ − λHm′ . Note thatη5 = 1 is located on the unit circle; however, it can beobtained because it does not meet any quadrature nodes.The condition number of Hm′ is cond(Hm′) = 1.1×1017.

The results for various m′ are shown in Table 3. Themaximum error for the eigenvalues in the unit circle de-creases as the matrix size m′ increases. The ratio of thesingular values σm′/σ1 gives a good evaluation of theperturbation of the coefficients of Hm′ .

5. Conclusions

Perturbation results for the eigenvalues of a matrixpencil of Hankel matrices associated with complex mo-

Table 3. Maximum error of ηi for 1 ≤ i ≤ 5 for the various m′

in Example 2. Parameters are set as N = 32 and L = 5.

m′ max |ηi − ηi| σm′/σ1 (σm′/σ1) × max(τi)

5 3.5 × 10−3 1.3 × 10−2 1.3 × 10−1

6 4.8 × 10−4 9.7 × 10−4 5.2 × 10−2

7 6.9 × 10−7 1.1 × 10−5 4.1 × 10−4

8 9.9 × 10−9 9.5 × 10−8 1.7 × 10−6

9 5.5 × 10−9 3.2 × 10−9 1.4 × 10−8

10 3.8 × 10−11 3.3 × 10−10 1.7 × 10−9

11 3.0 × 10−12 7.2 × 10−12 4.1 × 10−11

12 2.4 × 10−13 1.4 × 10−13 3.6 × 10−12

13 2.0 × 10−13 9.0 × 10−15 1.4 × 10−13

14 2.2 × 10−13 5.0 × 10−15 1.4 × 10−13

ments have been given. We extended these results to thecase where matrices have a block Hankel structure.

From these results, we ascertain that the use of Han-kel matrices of an appropriate size reduces the influenceof quadrature error for eigenvalues that lie inside a givenintegral path. In this case, the Hankel matrices are ill-conditioned, however, element-wise error analysis showsthat the target eigenvalues can be obtained accurately.The singular values of the Hankel matrix give good in-formation for quadrature errors, and we can estimate anappropriate size of the Hankel matrix.

Numerical examples are consistent with the theoreti-cal results. More detailed error estimations and applica-tions to practical problems are subjects for future study.

Acknowledgments

This research was supported in part by a Grant-in-Aidfor Scientific Research of Ministry of Education, Culture,Sports, Science and Technology, Japan, Grant number:21246018, 21105502 and 19300001.

References

[1] P. Kravanja, T. Sakurai and M. Van Barel, On locating clus-

ters of zeros of analytic functions, BIT, 39 (1999), 646–682.[2] P. Kravanja, T. Sakurai, H. Sugiura and M. Van Barel, A

perturbation result for generalized eigenvalue problems and

its application to error estimation in a quadrature methodfor computing zeros of analytic functions, J. Comput. Appl.Math., 161 (2003), 339–347.

[3] T. Sakurai, P. Kravanja, H. Sugiura and M. Van Barel, An

error analysis of two related quadrature methods for comput-ing zeros of analytic functions, J. Comput. Appl. Math., 152(2003), 467–480.

[4] J.Asakura, T. Sakurai, H.Tadano, T. Ikegami and K.Kimura,A numerical method for polynomial eigenvalue problems us-ing contour integral, Japan J. Indust. Appl. Math., to appear.

[5] J.Asakura, T. Sakurai, H.Tadano, T. Ikegami and K.Kimura,

A numerical method for nonlinear eigenvalue problems usingcontour integral, JSIAM Letters, 1 (2009), 52–55.

[6] I. Ikegami, T. Sakurai and U. Nagashima, A filter diago-nalization for generalized eigenvalue problems based on the

Sakurai-Sugiura projection method, J. Comput Appl. Math.,to appear.

[7] T. Sakurai and H. Sugiura, A projection method for general-

ized eigenvalue problems using numerical integration, J.Com-put. Appl. Math., 159 (2003), 119–128.

[8] T. Sakurai and H. Tadano, CIRR: a Rayleigh-Ritz typemethod with contour integral for generalized eigenvalue prob-

lems, Hokkaido Math. J., 36 (2007), 745–757.[9] E. E. Tyrtyshnikov, How bad are Hankel matrices?, Numer.

Math., 67 (1994), 261–269.

– 79 –

JSIAM Letters Vol.1 (2009)

ISBN : 978-4-9905076-0-2

ISSN : 1883-0617

©2009 The Japan Society for Industrial and Applied Mathematics

Publisher :


4F, Nihon Gakkai Center Building

2-4-16, Yayoi, Bunkyo-ku, Tokyo, 113-0032 Japan

tel. +81-3-5684-8649 / fax. +81-3-5684-8663

Documents

The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol1-2009.pdf · The Japan Society for Industrial and Applied Mathematics Vol.1 (2009)