98
The Japan Society for Industrial and Applied Mathematics Vol.6 (2014) pp.1-92

The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

The Japan Society for Industrial and Applied Mathematics

Vol.6 (2014) pp.1-92

Page 2: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and
Page 3: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

The Japan Society for Industrial and Applied Mathematics

Vol.6 (2014) pp.1-92

Page 4: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

Editorial Board

Chief Editor Hideyuki Azegami (Nagoya University)

Vice-Chief Editor Yoshimasa Nakamura (Kyoto University)

Tetsuya Sakurai (University of Tsukuba)

Secretary Editors Reiji Suda (University of Tokyo)

Ken'ichiro Tanaka (Future University Hakodate)

Kenji Shirota (Aichi Prefectural University)

Tomohiro Sogabe (Aichi Prefectural University)

Akira Imakura (University of Tsukuba)

Associate Editors Kazuo Kishimoto (Tsukuba University)

Satoshi Tsujimoto (Kyoto University)

Masashi Iwasaki (Kyoto Prefectural University)

Norikazu Saito (University of Tokyo)

Koh-ichi Nagao (Kanto Gakuin University)

Koichi Kato (Japan Institute for Pacific Studies)

Nagai Atsushi (Nihon University)

Takeshi Mandai (Osaka Electro-Communication University)

Kiyoshi Mizohata (Doshisha University)

Tamotu Kinoshita (University of Tsukuba)

Yuzuru Sato (Hokkaido University)

Ken Umeno (Kyoto University)

Kazuyuki Yoshimura (NTT Communication Science Laboratories)

Katsuhiro Nishinari (University of Tokyo)

Tetsu Yajima (Utsunomiya University)

Narimasa Sasa (Japan Atomic Energy Agency)

Fumiko Sugiyama (Kyoto University)

Jun Mitani (University of Tsukuba)

Hitoshi Imai (University of Tokushima)

Takuya Tsuchiya (Ehime University)

Daisuke Furihata (Osaka University)

Takayasu Matsuo (Tokyo University)

Hiroto Tadano (University of Tsukuba)

Takafumi Miyata (Nagoya University

Ken Hayami (National Institute of Informatics)

Kensuke Aishima (University of Tokyo)

Yoshitaka Watanabe (Kyushu University)

Katsuhisa Ozaki (Shibaura Institute of Technology)

Naoya Yamanaka (Waseda University)

Takaaki Nara (University of Electro-Communications)

Takashi Suzuki (Osaka University)

Tetsuo Ichimori (Osaka Institute of Technology)

Tatsuo Oyama (National Graduate Institute for Policy Studies)

Eiji Katamine (Gifu National College of Technology)

Junichi Matsumoto (National Institute of Advanced Industrial Science and Technology)

Mitsuharu Yamamoto (Chiba University)

Maki Yoshida (Osaka University)

Page 5: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

Hideki Sakurada (NTT Communication Science Laboratories)

Naoyuki Ishimura (Hitotsubashi University)

Jiro Akahori (Ritsumeikan University)

Kiyomasa Narita (Kanagawa University)

Ken Nakamula (Tokyo Metropolitan University)

Toru Komatsu (Tokyo University of Science)

Kazuto Matsuo (Kanagawa University)

Hiroshi Kawaharada (Chuo University)

Ichiro Kataoka (Hitachi)

Naoshi Nishimura (Kyoto University)

Hiromichi Itou (Tokyo University of Science)

Shuji Kijima (Kyushu University)

Akiyoshi Shioura (Tohoku University)

Takeshi Ogita (Tokyo Woman's Christian University)

Maho Nakata (Riken)

Takaharu Yaguchi (Kobe University)

Page 6: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

Contents

Primality testing of Woodall numbers ・・・ 1-4

Kazuki Azami, Shigenori Uchiyama

The elliptic curve Diffie-Hellman problem and an equivalent hard problem for elliptic

divisibility sequences ・・・ 5-7

Junichi Yarimizu, Yukihiro Uchida, Shigenori Uchiyama

Explicit error bound for the tanh rule and the DE formula for integrals with logarithmic

singularity ・・・ 9-11

Tomoaki Okayama

Efficient system parameters for Identity-Based Encryption using supersingular elliptic

curves ・・・ 13-16

Takumi Tomita, Tsuyoshi Takagi

Affine term structure as multi-soliton ・・・ 17-20

Hidemi Aihara, Jirô Akahori, Edouard Grenier

Hierarchical graph Laplacian eigen transforms ・・・ 21-24

Jeff Irion, Naoki Saito

Strong Lp convergence associated with Rellich-type discrete compactness for

discontinuous Galerkin FEM ・・・ 25-28

Fumio Kikuchi, Daisuke Koyama

Shape derivative of cost function for singular point: Evaluation by the generalized J

integral ・・・ 29-32

Hideyuki Azegami, Kohji Ohtsuka, Masato Kimura

On ramifications of Artin-Schreier extensions of surfaces over algebraically closed fields

of positive characteristic I ・・・ 33-36

Masao Oi

Scalar multiplication for twisted Edwards curves using the extended double-base

number system ・・・ 37-40

Yasunori Mineo, Shigenori Uchiyama

Some examples of multidimensional Shintani zeta distributions ・・・ 41-44

Takahiro Aoyama, Kazuhiro Yoshikawa

Some results of multidimensional discrete probability measures represented by Euler

products ・・・ 45-48

Takahiro Aoyama, Nobutaka Shimizu

Credit risk valuation model for real estate non-recourse loan ・・・ 49-52

Suguru Yamanaka, Masaaki Otaka

An experiment of number field sieve for discrete logarithm problem over GF (pn) ・・・ 53-56

Kenichiro Hayasaka, Kazumaro Aoki, Tetsutaro Kobayashi, Tsuyoshi Takagi

Convergence analysis of the parallel classical block Jacobi method for the symmetric

eigenvalue problem ・・・ 57-60

Yusaku Yamamoto, Zhang Lan, Shuhei Kudo

Improvement of the accuracy of the approximate solution of the Block BiCR method ・・・ 61-64

Hiroto Tadano, Youichi Ishikawa, Akira Imakura

Page 7: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

Development of the Block BiCGSTAB(`) method for solving linear systems with multiple

right hand sides ・・・ 65-68

Shusaku Saito, Hiroto Tadano, Akira Imakura

Computing fixed argument pairings with the elliptic net algorithm ・・・ 69-72

Yang Liu, Naoki Kanayama, Kazutaka Saito, Tadanori Teruya, Shigenori Uchiyama, Eiji Okamoto

Heuristic counting of Kachisa-Schaefer-Scott curves ・・・ 73-76

Yutaro Kiyomura, Noriyasu Iwamoto, Shun'ichi Yokoyama, Kenichiro Hayasaka, Yuntao Wang, Takanori

Yasuda, Katsuyuki Takashima, Tsuyoshi Takagi

Some results on Parisian walks ・・・ 77-80

Jirô Akahori, Yuuki Ida

Accelerated multiple precision matrix multiplication using Strassen's algorithm and

Winograd's variant ・・・ 81-84

Tomonori Kouya

A key exchange protocol based on Diophantine equations and S-integers ・・・ 85-88

Attila Bérczes, Lajos Hajdu, Noriko Hirata-Kohno, Tünde Kovács, Attila Pethő

Shape optimization of a rubber bushing ・・・ 89-92

Kouhei Shintani, Hideyuki Azegami

Page 8: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol.6 (2014) pp.1–4 c⃝2014 Japan Society for Industrial and Applied Mathematics J S I A MLetters

Primality testing of Woodall numbers

Kazuki Azami1 and Shigenori Uchiyama1

1 Tokyo Metropolitan University, Tokyo 192-0397, Japan

E-mail azami-kazuki ed.tmu.ac.jp

Received July 13, 2013, Accepted October 8, 2013

Abstract

In 1969, Riesel proposed a primality test for natural numbers of the form h · 2n − 1, whichincludes the case h = n, the Woodall numbers Wn = n · 2n − 1. In this paper, we utilizeRiesel’s primality test for Woodall numbers, and propose a primality testing algorithm forWoodall numbers. The implementations of the algorithm and its optimization are discussed.

Keywords Woodall number, Lucasian criteria, deterministic primality test

Research Activity Group Algorithmic Number Theory and Its Applications

1. Introduction

In 1917, Woodall and Cunningham defined theWoodall numbers as Wn = n · 2n − 1 [1]. Until now,only 33 prime Woodall numbers (“Woodall primes”)have been discovered, the largest of which is Wn forn = 3, 752, 948 [2]. However, it is conjectured that thereare an infinite number of Woodall primes.In 1969, Riesel considered a primality test for natural

numbers of the form h · 2n − 1 [3]. Riesel’s proposedcriteria for the primality of those numbers in [3] is basedon the Fermat’s little theorem of real quadratic fields. Inthat paper, no algorithms based on the Riesel’s criteriaare explicitly shown.In this paper, we propose a primality testing algorithm

for Woodall numbers based on Riesel’s criteria, and con-sider its speeding-up. We also give a characterization forthe discriminants of real quadratic fields which are ef-fective for the proposed algorithm.In Section 2, we review the Lucasian criteria for the

primality of natural numbers h ·2n−1 [3], and their pri-mality testing. In Section 3, using the Chinese Remain-der Theorem (CRT), we introduce our specialized ver-sion of Riesel’s primality test, and explain its use in thecase of the Woodall numbers. In Section 4, we explainour proposed algorithm for primality testing of Woodallnumbers, and discuss a speeding-up of the algorithm.Our conclusions are presented in Section 4.3.

2. Lucasian criteria for the primality of

N = h · 2n − 1

In this section, we briefly review the Lucasian criteriafor N := h · 2n − 1, where h, n ∈ N such that n ≥ 2,h is odd, and h < 2n. In what follows, let D ∈ N suchthat D > 1 and D is square-free, let a, b be rationalintegers, and define r := |a2− b2D|, α := (a+ b

√D)2/r.

Additionally, define the sequence uv by:u0 := αh + α−h,uv := u2v−1 − 2.

In [3], Riesel gave the following theorem:

Theorem 1 (Lucasian criteria for N = h · 2n − 1)With the above notation, if (D/N) = −1 and

(r/N)(a2 − b2D)/r = −1, then

un−2 ≡ 0 (mod N)⇐⇒ N ∈ P,

where P denotes the set of prime numbers, and (a/m) isthe Jacobi symbol.

Remark 2 For D, as above, and a fundamental unit ϵof Q(

√D), the simplest choice of α is thus ϵ or ϵ2. (This

is explained in detail in [3].) Moreover, by the method offinding fundamental units for Q(

√D) (Section 5.7 [4]),

this determines a, b, and r uniquely. Moreover, if r issquare and (r/N)(a2 − b2D)/r = −1, then it suffices toconsider only (D/N) = −1.For example, when D = 5, a fundamental unit of

Q(√5) is ϵ = (1 +

√5)/2. In this case, ϵ does not

have a representation of the form ϵ = (a + b√D)2/r,

so α = ϵ2 = (1 +√5)2/4 = (3 +

√5)/2, and: u0 =

αh + α−h = [(3 +√5)/2]h + [(3−

√5)/2)]h. The condi-

tions (D/N) = −1 and (r/N)(a2 − b2D)/r = −1 thenrestrict (h, n) to the following pairs:

h ≡ 1 (mod 5), n ≡ 2, 3 (mod 4),h ≡ 2 (mod 5), n ≡ 1, 2 (mod 4),h ≡ 3 (mod 5), n ≡ 0, 3 (mod 4),h ≡ 4 (mod 5), n ≡ 0, 1 (mod 4).

3. Primality testing of the Woodall num-

bers

In this section, we apply the above Lucasian criteriato the Woodall numbers, so h = n in the above, whichnaturally leads us to apply the CRT to the list of pos-sible (h, n) outputted by Theorem 1. These then giveconditions on n, for which primality testing is applica-ble.In the prior example, this would lead to:

h = n ≡ 2, 3, 4, 6, 8, 9, 11, 17 (mod 20).

For this n and u0 = αn + α−n = [(3 +√5)/2]n + [(3 −√

5)/2]n, we can then conduct primality testing ofWn =n ·2n−1, as follows: letting vk = αk+α−k for any k ∈ Z,

– 1 –

Page 9: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.1–4 Kazuki Azami et al.

then: v0 = 2, v1 = 3, and vk = v1 · vk−1− vk−2 (See [3]),which means that vn = u0.As we saw above, when D = 5, we can do primality

testing for 8 out of 20 possible values of n (mod 20). Ifwe redo the calculation for D = 13, we can find 72 out of156 possible values of n (mod 156). To get some controlon this range of possible values for n, we shall considerthe following special cases of D:

Definition 3 In Theorem 1, if r = |a2−b2D| is square,(a2 − b2D)/r = −1 and D ∈ P or 2P, then we call thisD good.

The reason for considering good D, as in Def. 3, isexplained by the following two theorems:

Theorem 4 If D is good (in the sense of Def. 2), andD = p or 2p, for p ∈ P, then the number of (h, n) ∈(Z/pZ) × (Z/(p − 1)Z) satisfying [D/(h · 2n − 1)] = −1and n ≥ 3 is (p− 1)2/2.

Proof Let g ∈ Z/pZ such that (Z/pZ)× = ⟨g⟩. Toconsider the condition [D/(h · 2n − 1)] = −1, it sufficesto consider D = p; for, if D = 2p: 2p/(h · 2n − 1) =[2/(h · 2n − 1)][p/(h · 2n − 1)] = 1 · [p/(h · 2n − 1)].We then prove the theorem by cases:

1) If p ≡ 1 (mod 4), then [p/(h · 2n − 1)] = [(h · 2n −1)/p], by the Quadratic Reciprocity Theorem, so if[D/(h · 2n − 1)] = −1, then [(h · 2n − 1)/p] = −1,and thus, to find conditions on (h, n), we only needto consider (h, n) such that:

h · 2n ∈gk + 1 mod p

∣∣∣ k ∈ 2Z+ 1.

By inspection, h · 2n mod p attains each value ofZ/pZ (p − 1) times, as (h, n) moves over 0 ≦ h ≦p− 1, 0 ≦ n ≦ p− 2. Moreover, it is also clear that:

#gk + 1 mod p

∣∣∣ k ∈ 2Z+ 1=p− 1

2.

As such, the number of (h, n) ∈ (Z/pZ) × (Z/(p −1)Z) satisfying the condition, h · 2n ∈ gk + 1 modp | k ∈ 2Z+ 1 must be (p− 1) · (p− 1)/2.

2) If p ≡ 3 (mod 4), then [p/(h · 2n− 1)] = −[(h · 2n−1)/p] by the Quadratic Reciprocity Theorem, so

−1 =

(p

h · 2n − 1

)⇔(h · 2n − 1

p

)= 1

⇔ h · 2n ∈gk + 1 mod p

∣∣∣ k ∈ 2Z,

so the number of n is the same as in 1).(QED)

Corollary 5 Let D be good (in the sense of Def. 2),and D = p or D = 2p, for p ∈ P. Then the numberof n ∈ Z/p(p − 1)Z satisfying [D/(n · 2n − 1)] = −1 is(p− 1)2/2.

Proof Apply the CRT to (h, n) in Theorem 4.(QED)

Theorem 6 Let D ∈ P or 2P,• If D = k2 + 1, k ∈ Z>2, then D is good.

• If D = k2 + 4, k ∈ 2Z+ 1, then D is good.

Proof By Chapter 3 in [5] and Section 5.7 in [4], thefundamental unit ϵ of Q(

√D) is given as

ϵ =

k +√D, D = k2 + 1, k ∈ Z>2,

(k +√D)/2, D = k2 + 4, k ∈ 2Z+ 1.

1) If D = k2 + 1 and k ∈ Z>2, ϵ2 = (k +

√D)2. This

means a = k, b = 1, r = |a2 − b2D| = 1 in Theorem1. r is square and (a2−b2D)/r = [k2−(k2+1)]/1 =−1.

2) IfD = k2+4 and k ∈ 2Z+1, ϵ2 = (k+√D)2/4. This

means a = k, b = 1, r = |a2 − b2D| = 4 in Theorem1. r is square and (a2−b2D)/r = [k2−(k2+4)]/4 =−1.

(QED)

Remark 7 The converse of Theorem 6 is not true; seeTable 2 for some counterexamples. Here, (a2−b2D)/r =−1 ⇔ (a2 + r)/b2 = D, so taking b = 1, r = l2 (l ∈ N)for simplicity, in Table 2, we see that ∃a, b ∈ Z such thatD = k2+l2 is good. The reason that we cannot determinewhether D = k2 + l2 is good or not for all k, l ∈ Z isrelated to the problem of solving the Pell equation:

a2 − b2D = −1, D ≡ 2, 3 (mod 4),a2 − b2D = −4, D ≡ 1 (mod 4);

see [4] for details.

Accordingly, to get conditions on n for D, we shallconsider D ∈ P or 2P of the form k2 + 1 or k2 + 4, andcalculate n satisfying [D/(n·2n−1)] = −1. Table 1 showsall values of a, b, r for good D = k2 + 1, k2 + 4 < 10000.Since Corollary 5 tells us that, for each D, the numberof n (mod p(p − 1)) for which primality testing can beapplied is (p−1)2/2, we may speak of the “probability”ofn being applicable to primality testing, by the ratio:

#n mod p(p− 1)

∣∣∣ ( Dn·2n−1

)= −1

#(Z/p(p− 1)Z)

=(p−1)

2 (p− 1)

p(p− 1)

=p− 1

2p.

Thus, for all D listed in Table 1, the probability of n is:

1−∏

D∈P∩Table1

(1− p− 1

2p

)∼ 0.999 999 90.

4. Algorithm

Let us recall two compositeness criteria for Woodallnumbers.

Theorem 8 ([6]) Let hp be the smallest positive in-teger h such that 2h ≡ 1 (mod p), and define: nk :=−(2k + k)(p − 1) − k for k ≥ 0, n′k := nk mod php fork = 0, 1, ..., hp − 1, 0 < n′

k < php, then Wn′k+rphp

≡ 0(mod p) for all r ≥ 0.

Theorem 9 ([6]) Let p ∈ P.

•(2

p

)= +1⇒W(p+1)/2 ≡ 0 (mod p).

•(2

p

)= −1⇒W(3p−1)/2 ≡ 0 (mod p).

In short, we can find out if Wn is composite withoutusing a primality test, for certain n.

– 2 –

Page 10: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.1–4 Kazuki Azami et al.

4.1 Algorithm

Our algorithm for the primality of Woodall numbersfor D = 5 is as follows:

Algorithm Primality testing of the Woodall numbers

Input: Natural number nOutput: “PRIME”or “COMPOSITE”or “FAILURE”Step 1if n ≡ 2, 3, 4, 6, 8, 9, 11, 17 (mod 20) thengo to Step 2

elseOUTPUT “FAILURE”

end ifStep 2v0 ← 2v1 ← 3for i = 2→ n dovi ← 3vi−1 − vi−2 modWn

end foru0 ← vnStep 3for i = 1→ n− 2 doui ← u2i−1 − 2 modWn

end forif un−2 ≡ 0 (mod Wn) thenOUTPUT “PRIME”

elseOUTPUT “COMPOSITE”

end if

As an example, we show the case D = 5. However,before implementing the algorithm, we should make atable of conditions on n with respect to D using theargument seen in the proof of Theorem 4 (Table 3). Ad-ditionally, we should sieve composite Woodall numbersusing Theorems 8, 9.In Step 1, we evaluate conditions for n.In Step 2, for general D, recall that α = (a+b

√D)2/r,

so v1 ← α+ α−1.

Remark 10 If Output is “FAILURE”, the thing to dois to try different values of D in Table 1. We may alsoadd more values to Table 1 easily, so we can increase the“probability”easily.

4.2 Speeding-up

In Step 3, trivial implementation takes a lot of time,so we describe a speeding-up of Step 3. Here, we mustcompute the sequence ui+1 = u2i − 2 (mod Wn) repeat-edly, and the complexity of one round of computation is2(⌊lg n⌋+ n)2. Our speeding-up is as follows:

ui+1 = u2i − 2

= A · 2n +B

= (n ·A0 +A1) · 2n +B

≡ A1 · 2n +A0 +B (mod Wn)

( < 2Wn)

Thus, all that remains is to computeK = A1·2n+A0+B.If K > Wn, then K ← K −Wn.

Table 1. Values of a, b, r for good D = k2 + 1, k2 + 4 < 1000.

D = k2 + 1 or k2 + 4 D a b r

5 = 12 + 4 5 1 1 410 = 32 + 1 2 · 5 3 1 113 = 32 + 4 13 3 1 417 = 42 + 1 17 4 1 1

26 = 52 + 1 2 · 13 5 1 129 = 52 + 4 29 5 1 437 = 62 + 1 37 6 1 1

53 = 72 + 4 53 7 1 482 = 92 + 1 2 · 41 9 1 1

101 = 102 + 1 101 10 1 1122 = 112 + 1 2 · 61 11 1 1

173 = 132 + 4 173 13 1 4197 = 142 + 1 197 14 1 1226 = 152 + 1 2 · 113 15 1 1229 = 152 + 4 229 15 1 4

257 = 162 + 1 257 16 1 1293 = 172 + 4 293 17 1 4362 = 192 + 1 2 · 181 19 1 1401 = 202 + 1 401 20 1 1

577 = 242 + 1 577 24 1 1626 = 252 + 1 2 · 313 25 1 1677 = 262 + 1 677 26 1 1733 = 272 + 4 733 27 1 4

842 = 292 + 1 2 · 421 29 1 1

Note that if D = p,D′ = 2p (p ∈ P), the conditions on n from

D are same as that from D′.

Table 2. Values of a, b, r for good D = k2 + l2, b < 10000, l =3, 4, 5.

D = k2 + 9 D a b r

10 = 12 + 9 2 · 5 3 1 113 = 22 + 9 13 3 1 458 = 72 + 9 2 · 29 99 13 1

73 = 82 + 9 73 2136 250 4109 = 102 + 9 109 261 25 4538 = 232 + 9 2 · 269 69051 2977 1

D = k2 + 16 D a b r

17 = 12 + 16 17 8 2 441 = 52 + 16 41 64 10 497 = 92 + 16 97 11208 1138 4

137 = 112 + 16 137 3488 298 4

D = k2 + 25 D a b r

26 = 12 + 25 2 · 13 5 1 129 = 22 + 25 29 5 1 441 = 42 + 25 41 64 10 461 = 62 + 25 61 39 5 4

74 = 72 + 25 74 43 5 189 = 82 + 25 89 1000 106 4

106 = 92 + 25 2 · 53 4005 389 1

314 = 172 + 25 2 · 157 443 25 1349 = 182 + 25 349 18420 986 4509 = 222 + 25 509 925 41 4554 = 232 + 25 2 · 277 174293 7405 1

701 = 262 + 25 701 23564 890 41181 = 342 + 25 1181 29039 845 41546 = 392 + 25 2 · 773 43605 1109 1

Consequently, the complexity of one round of compu-tation becomes 2⌊lg n⌋2+3n⌊lg n⌋+n2, which results ina theoretical speeding-up of 2 times.

4.3 Implementation

Here, we explain the implementation of the primalitytesting of Woodall numbers. Table 4 shows the averagerunning time of the algorithm for 50 randomly chosenWoodall numbers Wn = n · 2n − 1, for bit lengths of n

– 3 –

Page 11: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.1–4 Kazuki Azami et al.

Table 3. Conditions on n for good D = 5, 13, 17, 29.

D Conditions on n

5 n ≡ 2, 3, 4, 6, 8, 9, 11, 17 (mod 20)13 n ≡ 2, 4, 6, 7, 8, 9, 10, 11, 12, 15, 16, 18, 22, 27, 28,

30, 31, 33, 37, 38, 40, 41, 44, 48, 49, 53, 55, 57, 58, 60,66, 67, 69, 71, 72, 73, 76, 77, 83, 84, 86, 87, 92, 97, 99,

101, 106, 107, 114, 120, 121, 122, 123, 124, 125, 127,128, 129, 131, 133, 134, 137, 139, 140, 141, 142, 146,147, 150, 152, 154, 155 (mod 156)

17 n ≡ 2, 3, 4, 5, 7, 8, 10, 13, 14, 18, 24, 27, 28, 31, 32,

33, 35, 36, 37, 39, 40, 42, 43, 44, 45, 47, 49, 50, 53, 57,58, 59, 60, 62, 63, 64, 65, 69, 72, 74, 78, 80, 86, 89, 94,96, 97, 105, 107, 108, 111, 117, 118, 121, 122, 123, 124,

126, 127, 131, 132, 133, 134, 135, 138, 139, 140, 141,143, 144, 146, 149, 150, 154, 160, 163, 164, 167, 168,169, 171, 172, 173, 175, 176, 178, 179, 180, 181, 183,185, 186, 189, 193, 194, 195, 196, 198, 199, 200, 201,

205, 208, 210, 214, 216, 222, 225, 230, 232, 233, 241,243, 244, 247, 253, 254, 257, 258, 259, 260, 262, 263,267, 268, 269, 270, 271 (mod 272)

29 n ≡ 5, 8, 10, 14, 15, 17, 21, 23, 24, 27, 28, 30, 31, 32,

33, 40, 42, 44, 46, 47, 50, 53, 54, 55, 56, 57, 60, 61, 64,68, 69, 71, 77, 78, 83, 88, 90, 91, 92, 93, 95, 98, 101,105, 110, 111, 117, 118, 120, 124, 126, 129, 131, 132,133, 134, 135, 137, 138, 143, 144, 149, 150, 152, 154,

155, 157, 159, 161, 162, 164, 166, 167, 169, 170, 175,176, 177, 179, 180, 181, 183, 184, 185, 186, 187, 188,190, 192, 196, 198, 205, 206, 210, 211, 212, 214, 215,

216, 219, 222, 225, 227, 229, 233, 234, 236, 237, 239,241, 244, 245, 246, 249, 250, 252, 253, 258, 262, 265,271, 279, 280, 281, 284, 287, 291, 293, 295, 300, 301,304, 305, 306, 308, 309, 310, 312, 313, 314, 316, 317,

318, 321, 324, 326, 328, 331, 332, 333, 334, 338, 339,342, 343, 344, 346, 347, 350, 352, 353, 354, 361, 362,363, 364, 370, 371, 373, 375, 376, 378, 383, 384, 385,386, 387, 389, 392, 393, 395, 399, 402, 403, 404, 405,

407, 412, 413, 414, 415, 418, 423, 424, 425, 426, 427,430, 431, 437, 440, 441, 443, 448, 449, 451, 452, 454,456, 457, 458, 461, 467, 469, 471, 472, 473, 475, 476,478, 479, 481, 484, 485, 490, 491, 492, 494, 496, 504,

509, 510, 511, 513, 514, 515, 516, 518, 525, 526, 528,529, 531, 533, 534, 537, 538, 541, 544, 545, 548, 549,550, 554, 555, 556, 558, 559, 560, 561, 562, 563, 565,566, 567, 571, 573, 576, 578, 579, 584, 586, 587, 589,

590, 592, 597, 601, 603, 604, 605, 606, 611, 613, 615,617, 619, 622, 624, 629, 633, 635, 636, 637, 640, 641,646, 648, 649, 651, 654, 655, 658, 659, 660, 661, 662,

663, 664, 668, 669, 673, 674, 676, 679, 680, 682, 683,684, 685, 687, 688, 689, 690, 691, 694, 695, 697, 698,699, 700, 703, 704, 705, 707, 708, 709, 711, 714, 715,719, 721, 723, 724, 726, 728, 730, 731, 735, 736, 739,

740, 742, 743, 744, 746, 750, 751, 752, 753, 755, 758,759, 760, 762, 768, 770, 772, 773, 774, 776, 778, 779,780, 785, 786, 787, 788, 789, 790, 791, 792, 793, 794,795, 796, 797, 800, 802, 803, 804, 806 (mod 812)

is from 11 to 18.In this implementation, the improved version is faster

than the trivial version, and for larger n, the speeding-up ratio is observed to increase. This is due to the factthat the NTL library, which has rapid multiplication, isemployed as the implementation environment.

5. Conclusion

In this paper, we proposed a primality testing algo-rithm for the Woodall numbers based on Riesel’s primal-ity test for natural numbers of the form N = h · 2n − 1.In Section 3, we characterized the discriminants and

Table 4. Speeding-up of implementation for Wn = n · 2n − 1.

n [bit] 11 12 13 14

Trivial [s] 0.282 0.836 4.894 36.824

Speeding-up [s] 0.246 0.458 1.598 8.698

Actual speeding-up 1.15 1.824 3.062 4.234

n [bit] 15 16 17 18

Trivial [s] 228.47 1791.22 15249.3 52269.4

Speeding-up [s] 42.32 255.25 1647.8 4850.4

Actual speeding-up 5.399 7.018 9.25 10.78

considered a specific approach of getting conditions on(h, n) in Riesel’s primality test, and found that almostall N = h · 2n − 1 are applicable (Theorem 4) - in par-ticular, Wn = n · 2n − 1 are applicable. In Section 4, weconsidered an optimization of our algorithm, and foundthat the theoretical speeding-up is 3 times faster thanthe trivial implementation. Future work on this prob-lem includes studying the distribution of the (h, n) forwhich primality testing can be applied, and further op-timizations of our algorithm. Actually, there is a specialalgorithm to perform modulo N arithmetic for numbersof the form N = k · 2q + c provided c is small [7]. Thiswould improve running time of the implementation. Thisis also our future work.

Acknowledgments

The authors would like to thank the anonymous re-viewer for his/her valuable comments. This work wassupported in part by Grant-in-Aid for Scientific Re-search (C)(20540135).

References

[1] A. J. C. Cunningham and H. J.Woodall, Factorisation of Q =(2q ± q) and q · 2q ± 1, Math. Mag., 47 (1917), 1–38.

[2] C. Caldwell, The Top Twenty: Woodall Primes, The PrimePages, http://primes.utm.edu/top20/page.php?id=7.

[3] H. Riesel, Lucasian criteria for the primality of h · 2n − 1,Math. Comput., 23 (1969), 869–875.

[4] H. Cohen, A Course in Computational Algerbraic NumberTheory, 4th Corrected Printing, Graduate Texts in Math.,

138, Springer-Verlag, Berlin, 2000.[5] W.Narkiewicz, Elementary and Analytic Theory of Algebraic

Numbers, Third Edition, Springer Monographs in Math.,

Springer-Verlag, Berlin, 2004.[6] W. Keller, New Cullen primes, Math. Comput., 64 (1995),

1733–1741.[7] R. Crandall and C. Pomerance, Prime Numbers–a Computa-

tional Approach, Second Edition, Springer, New York, 2005.

– 4 –

Page 12: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol.6 (2014) pp.5–7 c⃝2014 Japan Society for Industrial and Applied Mathematics J S I A MLetters

The elliptic curve Diffie-Hellman problem and an

equivalent hard problem for elliptic divisibility sequences

Junichi Yarimizu1, Yukihiro Uchida1 and Shigenori Uchiyama1

1 Tokyo Metropolitan University, Tokyo 192-0397, Japan

E-mail yarimizu-junichi ed.tmu.ac.jp

Received September 5, 2013, Accepted October 14, 2013

Abstract

In 1948, Ward defined elliptic divisibility sequences satisfying a certain recurrence relation.An elliptic divisibility sequence arises from any choice of elliptic curve and initial point on thatcurve. In this paper, we define a hard problem in the theory of elliptic divisibility sequences(EDS-DH problem), which is computationally equivalent to the elliptic curve Diffie-Hellmanproblem.

Keywords elliptic curve, elliptic divisibility sequence, elliptic curve Diffie-Hellman problem

Research Activity Group Algorithmic Number Theory and Its Applications

1. Introduction

In 1948, Ward defined the concept of an elliptic divis-ibility sequence (EDS for short) [1]. This is a sequenceof integers, satisfying a certain divisibility property anda non-linear recurrence relation, which is related to adivision polynomial. In 2008, Lauter and Stange definedsome hard problems in the theory of EDSs, each of whichis computationally equivalent to the elliptic curve dis-crete logarithm problem (ECDLP) [2]. But, they didnot consider the elliptic curve Diffie-Hellman problem(ECDHP). In this paper, we define a hard problem(EDS-DH problem) for EDSs, which is computationallyequivalent to the ECDHP. In Section 2, we begin withan introduction to EDSs and how to calculate generalterms of EDSs. In Section 3, we introduce the ECDHP,and we define EDS-DH problem. In Section 4, we ex-plain the equivalence of ECDHP and EDS-DH problem.Our conclusion is presented in Section 5.

2. Elliptic divisibility sequences

In this section, we briefly review EDSs according to[2]. See [1–3] for the detail.

2.1 Elliptic divisibility sequences

Let us begin the definition of an EDS.

Definition 1 ([2]) An EDS (W (n)) is a sequence ina field K satisfying: W (m + n)W (m − n) = W (m +1)W (m−1)W (n)2−W (n+1)W (n−1)W (m)2 (∀m,n ∈Z).

EDSs satisfy a relation which division polynomials ofelliptic curves have. We now need two following theo-rems.

Theorem 2 ([4]) If ((W (n)) is a non-trivial EDS,then W (0) = 0K , W (1) = ±1K , and W (−n) = −W (n)(∀n ∈ Z).

This theorem means that we only need to considerpositive subscript terms of EDS withW (1) = 1K , where

1K denotes the unit element for multiplication and 0Kdenotes the unit element for addition in the field K; weassume this throughout this paper.

Theorem 3 ([4]) If the initial five termsW (0),W (1),W (2), W (3), W (4) of an EDS (W (n)) are known, thenthe whole sequence is well defined.

Since we always have W (0) = 0K ,W (1) = 1K , thisis equivalent to knowing the three terms W (2), W (3),W (4).

2.2 Calculating a general termIt is then important to know how to calculate a

general term of an EDS defined by the three termsW (2),W (3),W (4). For this purpose, we use the recur-rence relations below.

Definition 4 By Definition 1, we have the recurrencerelations for all k ∈ Z:

• W (2k+1)W (1) =W (k+2)W (k)3−W (k−1)W (k+1)3.

• W (2k)W (2) =W (k)(W (k + 2)W (k − 1)2 −W (k −2)W (k + 1)2).

These formulae are called the doubling formulae.

Let Ψn denote the n-th division polynomial of an ellip-tic curve E over a field K. The sequence WE,P : Z→ Kof the form WE,P (n) = Ψn(P ) for some fixed pointP ∈ E(K) is an elliptic divisibility sequence. Wardshowed that almost all elliptic divisibility sequences arisein this way for the case K = Q. This relationship is thebasis of our work here.In this paper, we assume naive arithmetic in Fq,

namely, we bound the time to do basic Fq operationsby O((log q)2) for simplicity.

Theorem 5 ([4, Theorem 3.4.1]) Let E be an ellip-tic curve over K = Fq, and P ∈ E(K) a point of ordernot less than 4. Given a value t, the term WE,P (t) inthe elliptic divisibility sequence associated to E,P canbe calculated in O((log t)(log q)2) time.

– 5 –

Page 13: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.5–7 Junichi Yarimizu et al.

3. ECDHP and EDS-DH problem

In this section, we introduce the ECDHP and definethe EDS-DH problem.

Problem 6 Let E be an elliptic curve over a finitefield K. Suppose there are points P, [a]P, [b]P ∈ E(K)(a, b ∈ Z). Determine [ab]P ∈ E(K).

This problem is called the elliptic curve Diffie-Hellmanproblem (ECDHP). In order to define the EDS-DH prob-lem, we need the following theorem.

Theorem 7 ([2]) Let K be a finite field of q elements,and E an elliptic curve defined over K. For all pointsP ∈ E(K) of order relatively prime to q− 1 and greaterthan 3,define:

ϕ(P ) =

(WE,P (q − 1)

WE,P (q − 1 + ord(P ))

) 1ord(P )2

.

For a point P of order relatively prime to q − 1 andgreater than 3, the sequence ϕ([n]P ) is an EDS. Specifi-cally:

ϕ([n]P ) = ϕ(P )n2

WE,P (n) (∀n ∈ Z).

In light of this theorem we will use the following conve-nient notation:

WE,P (n) =ϕ([n]P )

ϕ(P ).

WE,P (n) can be calculated as a function of the point[n]P on the curve without knowledge of n.

Problem 8 Let K be a finite field of q elements, andE an elliptic curve defined over K. Let P ∈ E(K) be apoint of order relatively prime to q−1 and greater than 3.Suppose there are points P, [a]P, [b]P ∈ E(K) (a, b ∈ Z).

Determine WE,P (ab) ∈ K.

We call this problem the EDS-DH problem.

4. Equivalence of two hard problems

In this section, we prove the following theorem.

Theorem 9 Let E be an elliptic curve over a finite fieldK = Fq of characteristic = 2. For all points P ∈ E(K)of order relatively prime to q− 1 and greater than 3, theECDHP is computationally equivalent to the EDS-DHproblem.

Proof ECDHP =⇒ EDS-DH problem:For simplicity and cryptographical view point, we only

consider the case the order of P is prime. Setting n = abin the equation of Theorem 7, we obtain an expression:

WE,P (ab)

=1

ϕ(P )

(WE,[ab]P (q − 1)

WE,[ab]P (q − 1 + ord([ab]P ))

) 1ord([ab]P )2

.

Using Theorem 5 to calculate the ratio of terms in-side the parentheses takes log (q − 1 + ord([ab]P )) +log (q − 1) steps. Since ord([ab]P ) is on the order of q,this is O((log q)3) time at worst. The other necessaryoperation is to find the inverse of ord([ab]P )2 modulo

q− 1, and to raise to that exponent. Both these are alsoO(log q) finite field operations.EDS-DH problem =⇒ ECDHP:See [2, Lemma 1] for the following identity:

WE,P (n− 1)WE,P (n+ 1)

WE,P (n)2 = x(P )− x([n]P ).

Set n = ab in this equation, and apply Theorem 7:

WE,P (ab− 1)WE,P (ab+ 1)

WE,P (ab)2 = ϕ(P )

2(x(P )− x([ab]P )).

The term WE,P (ab) can be calculated from the assump-tion that the EDS-DH problem is solvable. With knowl-edge of the product WE,P (ab− 1)WE,P (ab+ 1), the x-coordinate of [ab]P , x([ab]P ), can be calculated withoutrequiring knowledge of [ab]P .

The sequence WE,P satisfies the recurrence instance:

WE,P (i + j)WE,P (i − j) = WE,P (i + 1)WE,P (i −1)WE,P (j)

2− WE,P (j+1)WE,P (j− 1)WE,P (i)2 (∀i, j ∈

Z).Setting i = ab and j = a in this equation gives:

WE,P (a(b+1))WE,P (a(b−1)) = WE,P (ab+1)WE,P (ab−1)WE,P (a)

2 − WE,P (a+ 1)WE,P (a− 1)WE,P (ab)2.

All of these terms can be calculated by applying theassumption that the EDS-DH problem is solvable exceptfor WE,P (ab+1)WE,P (ab− 1). However, compare theseterms with the recurrence relation to determine this un-known term. Also determine x([ab]P ) in this manner.We can calculate the corresponding possible values for yin probabilistic time O((log q)4) [2, Theorem 9]. To de-termine which of the two points with this x-coordinate isactually [ab]P , first take one of the two candidate points,and proceed on the assumption that it is [ab]P . Using

EDS-DH problem oracle, calculate WE,P (ab) from the

three points P, [a]P, and [b]P . Also calculate WE,P (ab)from P and [ab]P by Theorem 7. Then, if the two valuesare equal, our assumption about the point we chose iscorrect. If the two values are not equal, then the pointwe chose was incorrect, and the other one is the point[ab]P we seek.

(QED)

5. Conclusion

We defined a hard problem in the theory of EDSs(EDS-DH problem), which is computationally equivalentto the ECDHP. A future work is to propose some cryp-tographic schemes based on our proposed hard problem.

Acknowledgments

The authors would like to thank the anonymous re-viewer for his/her valuable comments. This work wassupported in part by Grant-in-Aid for Scientific Re-search (C)(20540135).

References

[1] M. Ward, Memoir on elliptic divisibility sequences, Amer. J.Math., 70 (1948), 31–74.

[2] K. E. Lauter and K. E. Stange, The elliptic curve discrete

– 6 –

Page 14: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.5–7 Junichi Yarimizu et al.

logarithm problem and equivalent hard problems for ellipticdivisibility sequences, in: Proc. of SAC 2008, LNCS-5381, pp.

309–327, Springer-Verlag, Berlin, 2009.[3] N. Sakurada, J. Yarimizu, N. Ogura and S. Uchiyama, An

integer factoring algorithm based on elliptic divisibility se-quences, JSIAM Letters, 4 (2012), 21–23.

[4] R. Shipsey, Elliptic divisibility sequences, Ph.D. thesis, TheUniv. of London, London, 2000.

– 7 –

Page 15: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol.6 (2014) pp.9–11 c⃝2014 Japan Society for Industrial and Applied Mathematics J S I A MLetters

Explicit error bound for the tanh rule and the DE formula

for integrals with logarithmic singularity

Tomoaki Okayama1

1 Graduate School of Economics, Hitotsubashi University, 2-1, Naka, Kunitachi, Tokyo 186-8601, Japan

E-mail tokayama econ.hit-u.ac.jp

Received September 30, 2013, Accepted December 7, 2013

Abstract

The tanh rule and the double-exponential (DE) formula are known empirically and theoreti-cally as quite efficient quadrature formulas, especially for integrals with endpoint singularity,including algebraic singularity and logarithmic singularity. Furthermore, in the case of in-tegrals with algebraic singularity, explicit error bounds have been given for those formulas,which enables us to guarantee their approximation accuracy. In the case of integrals with loga-rithmic singularity, however, such explicit error bounds have not ever given thus far, althoughthose formulas should work accurately in this case as well. This paper presents the desiredtheoretical explicit error bounds, with numerical experiments.

Keywords Sinc quadrature, trapezoidal rule, numerical integration, weakly singular kernel

Research Activity Group Quality of Computations

1. Introduction

This paper is concerned with efficient approximationof the integral with logarithmic singularity of the form

I =

∫ T

0

log(t)f(t) dt, (1)

with its explicit error bound. The function f may haveendpoint singularity. Due to those singularities, we can-not assume that the integrand is continuously differen-tiable over the given interval [0, T ], or is analytic onthe complex domain that includes [0, T ]. This causes aproblem to construct an efficient numerical integrationlibrary with guaranteed accuracy.One idea to avoid the difficulty was shown by Ya-

manaka et al. [1], where f is approximated by a powerseries as

f(t) ≈ a0 + a1t+ a2t2 + · · ·+ ant

n, (2)

with guaranteed accuracy. Then, the integration of eachterm of the approximated integrand (log(t)akt

k) is ana-lytically obtained. This approach should work fine whenT is sufficiently small, but not so fine otherwise, since (2)is the Taylor expansion. In addition, such an approxima-tion performs badly in the case where f also has singu-larity at the endpoint, e.g., f(t) = cos(t1/π).In order to treat those endpoint singularities, this

paper considers the tanh rule [2] and the double-exponential (DE) formula [3], which are well known asefficient quadrature rules for integrals with such singu-larities. The idea of those rules is the combination of thefollowing two techniques: (i) apply a variable transfor-mation using a map ψ : R→ [0, T ] as∫ T

0

F (t) dt =

∫ ∞

−∞F (ψ(τ))ψ′(τ) dτ,

where |ψ′(τ)| decays quickly enough as τ → ±∞ tosuppress the divergence of |F (ψ(τ))|, and (ii) apply the(truncated) trapezoidal rule as∫ ∞

−∞F (ψ(τ))ψ′(τ) dτ ≈ h

M∑k=−N

F (ψ(kh))ψ′(kh).

As the map ψ, the tanh transformation

t = ψSE(τ) =T

2tanh

(τ2

)+T

2

is used in the tanh rule, and the DE transformation

t = ψDE(τ) =T

2tanh

(π2sinh τ

)+T

2

is used in the DE formula. Both quadrature rules workaccurately (more precisely, can converge exponentially)even though T is large, and even in any of the followingcases: the integrand has logarithmic singularity as in (1),and the integrand has algebraic singularity of the form

I =

∫ T

0

tα−1(T − t)β−1f(t) dt, (3)

where α and β are positive constants. Actually, thoseare theoretically supported in the literature [4, 5]. Fur-thermore, in the case of algebraic singularity (3), explicit(computable) error bounds of those rules have been re-cently given [6], and the results were utilized to constructa verified numerical integration library [7] in that case.The objective of this paper is to give such explicit

error bounds for the two rules in the case of logarithmicsingularity (1). Although it is known empirically andtheoretically that those rules can converge exponentiallyin that case, still any explicit error bound has not beengiven thus far. In order to construct a verified numericalintegration library, computable, mathematically rigorous

– 9 –

Page 16: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.9–11 Tomoaki Okayama

error bounds are desired, which are given by this paper.This paper is organized as follows. Main results are

stated in Section 2, and those proofs are given in Sec-tion 4. Numerical results are shown in Section 3.

2. Main results: explicit error bounds

The following function space should be introduced tostate the main results. In this paper, D is supposed to beeither ψSE(Dd) or ψDE(Dd), which is a translated domainfrom the strip domain Dd = ζ ∈ C : | Im ζ| < d.

Definition 1 Let D be a bounded and simply connecteddomain (or Riemann surface), and let K be a positiveconstant. Then H∞

K (D) denotes the family of all func-tions f that are analytic on D , and satisfy |f(z)| ≤ Kfor all z ∈ D .

The main results of this paper are stated as follows.

Theorem 2 Let f ∈ H∞K (ψSE(Dd)) with 0 < d < π.

Let α = (2π − 1)/(2π), let N be a positive integer, andlet h and M be selected by

h =

√2πd

αN, M = ⌈αN⌉. (4)

Then it holds that∣∣∣∣∣I − hM∑

k=−N

log(ψSE(kh))f(ψSE(kh))ψ′SE(kh)

∣∣∣∣∣≤ C0C1

2(1− e−

√2πdα

)cosα+1(d2 )

+ 1

e−√2πdαN ,

where C0 = 2KTα/α and

C1 =

(T

cos(d2 )

) 12π(π2 + log2

(T

cos(d2 )

)) 12

. (5)

Theorem 3 Let f ∈ H∞K (ψDE(Dd)) with 0 < d < π/2.

Let α = (2π − 1)/(2π), let N be a positive integer, andlet h and M be selected by

h =log( 4dnα )

N, M = N −

⌊log( 1

α )

h

⌋. (6)

Then it holds that∣∣∣∣∣I − hM∑

k=−N

log(ψDE(kh))f(ψDE(kh))ψ′DE(kh)

∣∣∣∣∣≤ C0C1

[C2

1− e−πα e /2+ eπ/2

]e−2πdN/ log(4dN/α),

where C0 = 2KTα/α and

C1=

(T

cos(π2 sin d)

) 12π(π2 + log2

(T

cos(π2 sin d)

)) 12

, (7)

C2=2

cos1+α(π2 sin d) cos d. (8)

3. Numerical results

As an example that T is not sufficiently small and f(t)has derivative singularity at the origin, consider f(t) =

1e-30

1e-25

1e-20

1e-15

1e-10

1e-05

1

100000

0 20 40 60 80 100 120 140

err

or

N

error bound of tanh rule

tanh rule

error bound of DE formula

DE formula

Fig. 1. Errors of the tanh rule and the DE formula for the inte-gral (9) and their error bounds.

cos(t1/π)√t2 − 2t+ 2 and the following integral∫ 2

0

log(t)f(t) dt

= −0.870621268307117216836724471909871167 · · · ,(9)

where the value on the right hand side was calculated byMathematica with WorkingPrecision→50. The func-tion f satisfies the assumptions in Theorem 2 withd = π/2 andK = 2, and also satisfies the assumptions inTheorem 3 with d = π/6 and K = 2π/3 (in both cases, dis determined by the branch points of

√t2 − 2t+ 2, and

K can be found by using the maximum-modulus prin-ciple). Therefore, we can compute those error boundsaccording to the theorems, which can be confirmed inFig. 1. All computation programs were written in C withquadruple-precision floating-point arithmetic by using“long double” type on a PowerPC CPU.

4. Proofs

The important function space for the error analysis isdefined as follows.

Definition 4 Let D be a bounded and simply connecteddomain (or Riemann surface), and let K, α, β be posi-tive constants. Then LL,α,β(D) denotes the family of allfunctions f that are analytic on D , and satisfy |f(z)| ≤L|Qα,β(z)| for all z ∈ D , where Qα,β(z) = zα(T − z)β.

For functions that belong to this function space, thefollowing error estimates are known.

Theorem 5 (Okayama et al. [6, Theorem 2.6])Let FQ1,1 ∈ LL,α,1(ψSE(Dd)) for d with 0 < d < π andα ≤ 1. Let N be a positive integer, and let h and M beselected by (4). Then it holds that∣∣∣∣∣∫ T

0

F (t) dt− hM∑

k=−N

F (ψSE(kh))ψ′SE(kh)

∣∣∣∣∣≤ C0

2(1− e−

√2πdα

)cosα+1

(d2

) + 1

e−√2πdαN ,

where C0 = 2LTα/α.

– 10 –

Page 17: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.9–11 Tomoaki Okayama

Theorem 6 (Okayama et al. [6, Theorem 2.14])Let FQ1,1 ∈ LL,α,1(ψDE(Dd)) for d with 0 < d < π/2and 1/(2π) ≤ α ≤ 1. Let N be a positive integer, andlet h and M be selected by (6). Then it holds that∣∣∣∣∣

∫ T

0

F (t) dt− hM∑

k=−N

F (ψDE(kh))ψ′DE(kh)

∣∣∣∣∣≤ C0

(C2

1− e−πα e /2+ eπ/2

)e−2πdN/ log(4dN/α),

where C0 = 2LTα/α and C2 is a constant defined in (8).

In view of these theorems, we find that the main taskhere is to show F ∈ LL,α,1(D) with F (t) = log(t)f(t)under the assumptions in Theorems 2 and 3. The fol-lowing lemmas state the desired results.

Lemma 7 Let the assumptions in Theorem 2 be ful-filled, and let F (t) = log(t)f(t). Then F ∈ LL,α,1

(ψSE(Dd)) with L = KC1, where C1 is a constant de-fined in (5).

Lemma 8 Let the assumptions in Theorem 3 be ful-filled, and let F (t) = log(t)f(t). Then F ∈ LL,α,1

(ψDE(Dd)) with L = KC1, where C1 is a constant de-fined in (7).

In order to prove these lemmas above, what we needto show is the following inequalities.

Lemma 9 Let 0 < d < π. Then for all z ∈ ψSE(Dd)

|log z| ≤ C11

|z| 12π

holds, where C1 is a constant defined in (5).

Lemma 10 Let 0 < d < π/2. Then for all z ∈ ψDE(Dd)

|log z| ≤ C11

|z| 12π

holds, where C1 is a constant defined in (7).

The next lemma essentially shows those inequalities.

Lemma 11 Let R is a positive constant, and z ∈ C bebounded as |z| ≤ R. Then it holds that∣∣∣z1/(2π) log z∣∣∣ ≤ R 1

√log2R+ π2.

Proof Let z = r ei θ, where r and θ be real numberswith 0 ≤ r ≤ R and −π ≤ θ < π. Then we have∣∣∣z1/(2π) log z∣∣∣2 = r

(log2 r + θ2

)≤ r 1

π

(log2 r + π2

)≤ R 1

π

(log2R+ π2

),

which shows the desired result.(QED)

What is left here is to reveal the explicit bound R ofLemma 11, which are done by the following lemmas.

Lemma 12 Let 0 < d < π. Then it holds for all z ∈ψSE(Dd) that

|z| ≤ T

cos(d2 ).

Proof From z ∈ ψSE(Dd), we can put z = ψSE(x+ i y)with x ∈ R and y ∈ [−d, d]. Then we have

|z| = |ψSE(x+ i y)| = T

|1 + e−x−i y |

≤ T

(1 + e−x) cos(y2 )≤ T

cos(d2 ).

The non-trivial inequality here is the first one, which isshown in Okayama et al. [6, Lemma 4.21].

(QED)

Lemma 13 Let 0 < d < π/2. Then it holds for allz ∈ ψDE(Dd) that

|z| ≤ T

cos(π2 sin d).

Proof From z ∈ ψDE(Dd), we can put z = ψDE(x+i y)with x ∈ R and y ∈ [−d, d]. Then we have

|z| = |ψDE(x+ i y)| = T

|1 + e−π sinh(x+i y) |

≤ T

(1 + e−π sinh(x) cos y) cos(π2 sin y)≤ T

cos(π2 sin d).

The non-trivial inequality here is the first one, which isshown in Okayama et al. [6, Lemma 4.22].

(QED)

These lemmas give the desired R, which completes theproofs.

Acknowledgments

This work was supported by JSPS Grant-in-Aid forYoung Scientists (B) No. 24760060.

References

[1] N. Yamanaka, M. Kashiwagi and S. Oishi, Verified numericalintegration algorithm for logarithmic singularity, in: Proc. ofthe 41st Numerical Analysis Symposium, pp. 118–121, 2012.

[2] C. Schwartz, Numerical integration of analytic functions, J.Comput. Phys., 4 (1969), 19–29.

[3] H. Takahasi and M. Mori, Double exponential formulas fornumerical integration, Publ. Res. Inst. Math. Sci., 9 (1974),

721–741.[4] F. Stenger, Numerical Methods Based on Sinc and Analytic

Functions, Springer-Verlag, New York, 1993.

[5] K. Tanaka, M. Sugihara, K. Murota and M. Mori, Functionclasses for double exponential integration formulas, Numer.Math., 111 (2009), 631–655.

[6] T. Okayama, T. Matsuo and M. Sugihara, Error estimates

with explicit constants for Sinc approximation, Sinc quadra-ture and Sinc indefinite integration, Numer. Math., 124(2013), 361–394.

[7] N. Yamanaka, T. Okayama, S. Oishi and T. Ogita, A fast ver-

ified automatic integration algorithm using double exponen-tial formula, Nonlinear Theory and Its Applications, IEICE,1 (2010), 119–132.

– 11 –

Page 18: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol.6 (2014) pp.13–16 c⃝2014 Japan Society for Industrial and Applied Mathematics J S I A MLetters

Efficient system parameters for Identity-Based Encryption

using supersingular elliptic curves

Takumi Tomita1 and Tsuyoshi Takagi2

1 Graduate School of Mathematics, Kyushu University, 744, Motooka, Nishi-ku, Fukuoka 819-0395, Japan

2 Institute of Mathematics for Industry, Kyushu University, 744, Motooka, Nishi-ku, Fukuoka819-0395, Japan

E-mail t-tomita math.kyushu-u.ac.jp

Received October 2, 2013, Accepted November 27, 2013

Abstract

Boneh and Franklin proposed a practical Identity-Based Encryption (IBE) in 2001 [1]. Inorder to embed an identity of users in the IBE, we need a hash function, called HashToPoint.The dominant computation of HashToPoint is the scalar multiplication by a large cofactorc, which is relatively expensive compared with other cryptographic functions in the IBE. Inthis paper, we present a list of cofactor c with Hamming weight two, which can accelerate thecomputation of HashToPoint. Indeed the timing of our implementation of HashToPoint usingthe proposed cofactor is reduced by about 30% on a desktop PC.

Keywords pairing, Identity-Based Cryptography, HashToPoint

Research Activity Group Algorithmic Number Theory and Its Applications

1. Introduction

Pairing based cryptography has received a lot of at-tention since it’s introduction to the construction ofIdentity-Based Encryption (IBE) scheme by Boneh andFranklin 2001 [1]. The IBE scheme is a public-key en-cryption technology that allows a user to calculate apublic-key from an arbitrary string like name, domainname, physical IP address etc. In recent years, variouscryptographic protocols have been proposed related tothe IBE scheme [2]. Practically, Internet EngineeringTask Force (IETF) which develops and promotes Inter-net standards published their IBE scheme as RFC5091[3]. We focus on this scheme in this paper.RFC5091 is restricted to use the pairing constructed

by a family of supersingular elliptic curves over finitefields of large prime characteristic. For a prime p ≥ 5,there are two type of supersingular elliptic curves repre-sented by (a) y2 = x3 + 1 where p ≡ 2 (mod 3) or (b)y2 = x3 + x where p ≡ 3 (mod 4) [4, §3]. In RFC5091is adapted (a) as Type-1 curve. However we substitutethe curve (b) with curve (a) because the library (PBC)which we use to check the IBE performance only sup-ports the curve (b). The differences between (a) and (b)do not affect the efficiency of the IBE discussed in thispaper.For the practical use of the IBE, various methods for

faster computation have been proposed [2, 5–7]. Espe-cially there are lots of contributions to improve the costof computing the pairing related to Miller algorithm andMontgomery multiplication. In fact, Nakajima et al. pro-posed efficient primes p which have low Hamming weight(the definition of Hamming weight, see §2.1) to speed upthe calculation of a Montgomery multiplication inside

the Miller algorithm and achieved speeding up of about22% in computation of the Miller algorithm [8].However we found that there is another parameter

which is called cofactor we will define in §2.3 to savethe calculation cost in the IBE. Boneh and Franklin de-scribe in [1] how to compute the HashToPoint whichmaps an identity to a point of an elliptic curve. Howevertheir HashToPoint typically requires a modular expo-nentiation relatively expensive compared to other cryp-tographic functions in the IBE. Therefore we focus onspeeding up improvement of HashToPoint.In this paper we propose some system parameters that

efficiently compute HashToPoit without losing the speedof other cryptographic functions in the IBE. Most of theprocessing time of HashToPoint depends on scalar multi-plication of a point of an elliptic curve by a large integercalled cofactor. In order to speed up the scalar multipli-cation, we chose a cofactor with low Hamming weight.From the distribution of primes in the arithmetic pro-gression, we estimate the number of such cofactor for afixed size (§3.2). However it is important for industrialpoint of view to list up such cofactor. First we exploresuch cofactor, then we find several cofactor with Ham-ming weight 2. Next we will measure the timing of Hash-ToPoint using one of the cofactor with Hamming weight2 on a desktop PC using the pairing library PBC [9].In order to fairly compare the improved efficiency wealso chose a random cofactor with Hamming weight 182using PBC library, and measure the timing of HashTo-Point using the both cofactor. Finally we find that thetiming of our implementation of HashToPoint using thecofactor with Hamming weight 2 is reduced by about30% on a desktop PC.

– 13 –

Page 19: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.13–16 Takumi Tomita et al.

2. IBE scheme and its computation

In this section, we give an overview of the IBE scheme.First we define signed binary representations and ex-plain the algorithm related to scalar multiplication of apoint of an elliptic curve. Next we review the IBE schemebriefly and make clear the positioning of HashToPoint.Last we explain how to calculate HashToPoint.

2.1 Preliminary

2.1.1 Signed binary representations

In this subsection, we define a signed-binary repre-sentation. We denote a signed-binary representation by∑m−1

i=0 ni2i with nm−1 = 1 and ni ∈ −1, 0, 1 for

i = 0, 1, . . . ,m− 2. The number of ni = 0 in the signed-binary representation is called Hamming weight.

2.1.2 Pairing of a supersingular elliptic curve

Let p be a prime with p ≡ 3 (mod 4), and letGF (p) be the finite field with order prime p. In thispaper, we use supersingular curve y2 = x3 + x overGF(p) which is denoted by E(GF(p)). Let #E(GF(p))be the number of GF(p)-rational points. It is knownthat #E(GF(p)) = p + 1 including the point at infin-ity (P∞) [7, Ch. IX.10]. Let l be the largest prime di-visor of p+ 1. E[l](GF(p)) denotes a cyclic subgroup ofE(GF(p)) with order l. GF(p2)× is the multiplicativegroup of the extension field over GF(p) with degree 2.Let e be the pairing on E(GF(p)), namely e is a mapE[l](GF(p)) × E[l](GF(p)) → GF(p2)× and it is com-puted by Miller’s Algorithm [8, §2] and [7, Ch. IX.8].

Algorithm 1 Left-to-right binary method for pointmultiplication

Require: Signed binary representation k =∑m−1

i=0 ni2i

and P ∈ E(GF(p))Ensure: kP ∈ E(GF(p))1: Q← P2: for i = m− 1 to 0 do:3: Q← 2Q4: if ni = 1 then5: Q← Q+ P6: end if7: if ni = −1 then8: Q← Q− P9: end if10: end for

2.1.3 Scalar multiplication of a point of an ellipticcurve

We consider methods for computing kP , where k isa positive integer and P is a point of an elliptic curve.Algorithm 1 is a well-known method as the left-to-rightbinary method for computing kP . For a positive inte-ger k, we can calculate its unique signed binary repre-sentation called NAF ([10, §3.3 Algorithm 3.30]). Let

k =∑m−1

i=0 ni2i be the NAF of k where m is the length

of its signed binary representation. Algorithm 1 requiresm− 1 point doublings and w − 1 point additions wherew is the Hamming weight of its signed binary represen-tation.

Table 1. System parameters of the IBE.

Notation Comments

n Positive integer, length of plaintext (in bits)s Integer in Zl, master secretG E[l](GF(p)), a cyclic subgroup with order l

GT GF(p2)×, the multiplicative group of theextension field over GF(p) with degree 2

e G×G → GT , PairingsP sP ∈ E(GF(p)), master public-key

h1 h1 : 0, 1n → G1, HashToPoint (see §2.3).h2 h2 : GT → 0, 1n, cryptographic hash function

Fig. 1. Layer structure of the IBE.

2.1.4 Security parameters

In order to achieve the security level of 280 we usuallychoose that the number of bits in a prime p is 512 andmore, and the number of bits in a prime l is 160 and more[11]. In this paper we fix the size of primes as 512-bit pand 160-bit l. Note that RFC 5091 chooses l as Solinasprime of the form 2159 ± 2t ± 1 for t = 1, 2, . . . , 158.

2.2 IBE scheme

We review the IBE in [3]. The IBE consists of threesteps: setup, encryption and decryption. These threesteps are essentially constructed by arithmetic over el-liptic curve E(GF(p)) and finite field GF(p). Fig. 1 de-scribes the layer structure of functions used in the IBE.Next we explain each step in the IBE scheme:

2.2.1 Setup

We can define the public parameters of the IBE tobe (n, s,G,GT , e, sP, h1, h2) as follows (for the sum-mary, see Table 1). Let n be a positive integer ands ∈ Zl := 0, 1, . . . , l − 1. We randomly pick a pointP ∈ E[l](GF(p)). Let G := ⟨P ⟩ and GT := ⟨e(P, P )⟩where P = P∞ and the pairing e in §2.1. The map h1is HashToPoint which will be defined in §2.3. The maph2 : GT → 0, 1n is a cryptographic hash function thatmaps elements of GT into a bit string of length n.

2.2.2 Encryption

Let M be the message in 0, 1n. There are five stepsas follows:

(e1) Generate a random r ∈ Zl and compute rP

(e2) Calculate QID = h1(ID) from the recipient’s iden-tity ID ∈ 0, 1n

(e3) Calculate rQID

(e4) Calculate pairing S := e(rQID, sP )

(e5) Let C be M ⊕ h2(S) and return (rP,C) where thesymbol ⊕ means bitwise XOR.

– 14 –

Page 20: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.13–16 Takumi Tomita et al.

2.2.3 Decryption

Let (rP,C) be the received message, the recipient candecrypt it by the following four steps.

(d1) (Extract private key) Calculate QID = h1(ID) fromthe identity ID ∈ 0, 1n

(d2) Calculate sQID

(d3) Calculate pairing T := e(rP, sQID)

(d4) Extract message by M = C2 ⊕ h2(T )

2.3 Calculation of HashToPoint

Here we explain how to compute HashToPoint explic-itly. First we define g1 : 0, 1n → E(GF(p)) as follows.For any ID ∈ 0, 1n, which is a bit string of n bits,we can embed ID into the x-coordinate of a point Q =(x, y) ∈ E(GF(p)) as an integer modulo p. Then we cal-culate a y-coordinate ofQ by y = (x3+x)1/2. Note that ifx3+x is not quadratic residue in GF(p), we increment IDby ID+1. Next we define g2 : E(GF(p)) → E[l](GF(p))as follows. We know that #E(GF(p)) = p + 1 = lc, sowe get QID:=((p+ 1)/l)Q = cQ ∈ E[l](GF(p)). Hash-ToPoint is defined by the composition g2 g1. Focus ong2, we can use Algorithm 1 for computing the scalarmultiplication cQ. If we choose the cofactor c with lowHamming weight, then the computational time of cQ be-comes faster. We discuss the existence of such cofactorc with low Hamming weight in the following section.

3. Proposed cofactor

In this section, we explain the existence of cofactorc with low Hamming weight in Section 2.3. We give alist of cofactor with Hamming weight less than threefor c = (p + 1)/l of 512-bit p and 160-bits l. We thenestimate the existence probability of such cofactor usingthe prime number theory.

3.1 Search cofactor with low Hamming Weight

Algorithm 2 Searching algorithm for cofactor withHamming weight less than three

Require: Let u be positive integer, l a Solinas primeEnsure: The set of cofactor C = c1, c2, . . . , cN where

the Hamming weight of each ci is less than three1: Let C be the empty set., i.e., C := 2: for i = 1 to u− 1 do:3: for k ∈ −1, 0, 1 do:4: c← 2u + k2i

5: p← cl − 16: if p is a prime and p ≡ 3 (mod 4) then7: c puts C8: end if9: end for10: end for11: Return (C)

In this paper we fix the bit length of p is 512, l is 160and p = lc − 1, then the bit length of c becomes 352.Recall that RFC 5091 chooses prime l as Solinas primesof the form 2159 ± 2t ± 1 for t = 1, 2, . . . , 158. There areten Solinas primes which are listed in the second column

Table 2. Cofactor with Hamming weight less than three.

# Solinas prime l proposed cofactor c

(2159 ± 2t ± 1) (2u ± 2v)

1 2159 + 217 + 1 NA

2 2159 + 219 + 1 2352 − 2150

2352 − 2198

2352 − 2208

3 2159 + 259 + 1 2352 + 2127

2352 − 2134

4 2159 + 263 + 1 2352 − 218

2352 − 224

2352 − 288

2352 − 2108

5 2159 + 288 − 1 2352 − 224

2352 − 2176

6 2159 + 2107 + 1 2352 − 212

2352 − 2156

7 2159 + 2110 − 1 2352 + 233

2352 − 2162

8 2159 + 2116 − 1 2352 + 219

2352 − 2246

2352 + 2335

9 2159 + 2135 + 1 2352 + 231

10 2159 + 2138 − 1 2352 + 213

2352 + 289

2352 + 2269

2352 + 2321

of Table 2. Using Algorithm 2, we have found 23 cofactorwith Hamming weight less than three in Table 2 (see thethird column).

3.2 Prime Number Theorem

Here we estimate the number of cofactor with Ham-ming weight less than three using the prime numbertheorem. Let πa,n(x) be the number of primes in thearithmetic progression a, a + n, a + 2n, . . . less thanx, where a and n are some positive integers. The primenumber theorem for arithmetic progressions states thatϕ(a)−1 Li(x) is an approximation to πa,n(x), where ϕ(x)is the Euler’s totient function and Li(x) is logarithmic in-tegral defined by

∫ x

2(1/ log x)dx [12]. Using this theorem,

the number of primes p where p = 2511 ± 2a1 ± 2a2 · · ·and p ≡ 3 (mod 4) is nearly equal ϕ(4)−1(Li(2512 −1) − Li(2511 − 1)). For each l = 2159 + 2a ± 1 wherea ∈ 17, 19, 59, 63, 88, 107, 110, 116, 135, 138, we try tofind the cofactor c that has the form c = 2352 ± 2x(1 ≤x ≤ 351). Therefore the total number of cofactor is esti-mated by[

1

ϕ(4)

(Li(2512

)− Li

(2511

))]×(351× 2

2511

)×10 = 9.89.

The total number of cofactor found in our experimentin Table 2 is 23, which is the same in the order of 9.89.

4. Timing results

In this section we show the timing result of procedures(e1)-(e4) and (d1)-(d3) of the IBE. First we comparethe timing result obtained from our cofactor with thatof previous method. All tests were running on a desktopPC (Mac mini) with an Intel Core i7 2.6 MHz proces-sor (including four core) and 16 GBytes RAM using OSX 10.8.4 (Mountain Lion). To implement the IBE algo-

– 15 –

Page 21: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.13–16 Takumi Tomita et al.

Table 3. Timing result of procedures in the IBE (§2.2).previous proposed ratio

[ms] [ms] [%]

c c′ c′/c

Encryption (e1) rP 1.89 1.88 99.5(e2) QID 4.35 2.99 68.7

(e3) rQID 1.87 1.87 100(e4) S 1.06 1.05 99.1

Decryption (d1) QID 4.35 2.99 68.7(d2) sQID 1.87 1.87 100(d3) T 1.06 1.05 99.1

rithm, we write programs in ANCI-C using GNU GCCcompiler without specific optimizations. We deploy thePairing Based Crypto (PBC) library developed at Stan-ford University by Benn Lynn [9].

4.1 Test parameters

Let the Solinas prime l be 2159 +2135 +1. Using PBClibrary, we can get a random cofactor c where p(= lc−1)is a 512 bit prime. We compute encryption and decryp-tion which consist of procedures (e1)-(e4) and (d1)-(d3)in Table 3. The timings in Table 3 are the average valuesof 1,000 random functions.

c = 0x 00000001 7be47a80 a1765ba4 0e2adb45\

3c57a4f8 e1681db3 5312ca58 8ebf5713\

a601797b b11cf7d3 cdcbe91f 8eb03e90

The above cofactor c has Hamming weight 182 and cor-responds to p as follow:

p = 0x bdf23dfe 42f86e22 c2433fa9 b399751a\

9c868b4b d97cb956 723b3375 40d721d6\

15a2debb 56c9fc2a 005f7967 e6de9f13\

a601797b b11cf7d3 cdcbe91f 8eb03e8f

The above prime p has Hamming weight 279. In general,the average value of Hamming weight is equal to halfof the bit length. So our random cofactor c and p arethoroughly general for a fair comparison. Next we choosethe following proposed cofactor c′ = 2352 + 231 fromTable 2 line #9.

c′ = 0x 00000001 00000000 00000000 00000000\

00000000 00000000 00000000 00000000\

00000000 00000000 00000000 80000000

Then the corresponding prime p′ with Hamming weight36 is as follows:

p′ = 0x 80000080 00000000 00000000 00000000\

00000001 00000000 00000000 00000000\

00000000 00000000 40000040 00000000\

00000000 00000000 00000000 7fffffff.

4.2 Results and evaluation

In Table 3, the timing of HashToPoint ((e2) in Encryp-tion and (d1) in Decryption) improves more than 30%using proposed cofactor c′ instead of c. Next we eval-uate the reason of this improvement. Let M and I bethe cost of field multiplication and inversion on GF(p),respectively. In [6, Ch. IV.1], the cost of point additionand doubling of an elliptic curve E is 3M+I and 4M+I,

respectively. We can calculate the running time of Hash-ToPoint with c as follows. The dominant part of runningtime of HashToPoint is the scalar multiplication cQ (see§2.3). To calculate cQ using Algorithm 1 shows that therunning time of cQ is 182 point additions and 352 pointdoublings because the Hamming weight of c is 182 andthe bit length of c is 352. Therefore the running timeof HashToPoint with c is 182(3M + I) + 352(4M + I).Similarly the running time of HashToPoint with c′ is2(3M+I)+352(4M+I) because the Hamming weight ofc′ is 2. In general, we know I = 20M , we can estimate theratio c′/c by 2(3M+20I)+352(4M+20I)/182(3M+20I) + 352(4M + 20I) = 0.67. This is the reason whywe improve the running cost of HashToPoint more than30% on a desktop PC.

5. Conclusions

In this paper, we proposed efficient system parametersfor the Identity-Based Encryption (IBE) standardized asRFC5091 by IETF. In particular we presented a list of23 primes of cofactor with Hamming weight 2, which canachieve efficient implementation of HashToPoint in theIBE. Then we implemented the cryptographic functionsused in the IBE using C language and the PBC library.The timing of our implementation of HashToPoint us-ing the proposed system parameters is reduced by about30% on a desktop PC without losing the speed of othercryptographic functions in the IBE.

References

[1] D. Boneh and M. Franklin, Identity-based encryption fromthe Weil pairing, in: Proc. of CRYPTO 2001, J. Kilian ed.,LNCS, Vol. 2139, pp. 213–229, Springer-Verlag, Berlin, 2001.

[2] L.Martin, Introduction to Identity-Based Encryption, ArtechHouse Publishers, Massachusetts, 2008.

[3] X. Boyen and L. Martin, Identity-Based Cryptography Stan-dard (IBCS) #1: Supersingular Curve Implementations of

the BF and BB1 Cryptosystems, Internet Engineering TaskForce, RFC5091 (Informational), 2007. http://www.ietf.

org/rfc/rfc5091.txt.[4] D. Freeman, M. Scott and E. Teske, A taxonomy of pairing-

friendly elliptic curves, J. Cryptol., 23 (2010), 224–280.[5] P.Barreto, H.Kim, B.Lynn and M.Scott, Efficient algorithms

for pairing-based cryptosystems, in: Proc. of CRYPTO 2002,M. Yung ed., LNCS, Vol. 2442, pp. 354–369, Springer-Verlag,

Berlin, 2002.[6] I. Blake, G. Seroussi and N. Smart, Elliptic Curves in Cryp-

tography, London Mathematical Society Lecture Note Series,Vol. 265, Cambridge University Press, Cambridge, 1999.

[7] I. Blake, G. Seroussi and N. Smart, Advances in Elliptic CurveCryptography, London Mathematical Society Lecture NoteSeries, Vol. 317, Cambridge University Press, Cambridge,

2005.[8] T. Nakajima, T. Izu and T. Takagi, An efficient algorithm for

pairing cryptography with supersingular elliptic curves overprime fields, IPSJ Journal, 50 (2009), 1745–1756.

[9] B. Lynn, The Pairing-Based Cryptography(PBC) library,Stanford University. http://crypto.stanford.edu/pbc/.

[10] D. Hankerson, A. Menezes and S. Vanstone, Guide to EllipticCurve Cryptography, Springer-Verlag, New York, 2004.

[11] E.Barker, W.Barker, W.Burr, W.Polk and M. Smid, Recom-mendation for key management - Part 1: General (Revised),NIST Special Publication 800–57, NIST, 2007.

[12] A. Walfisz, Zur additiven zahlentheorie. II., Math. Z., 40

(1936), 592–607.

– 16 –

Page 22: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol.6 (2014) pp.17–20 c⃝2014 Japan Society for Industrial and Applied Mathematics J S I A MLetters

Affine term structure as multi-soliton

Hidemi Aihara1, Jiro Akahori1 and Edouard Grenier2

1 Department of Mathematical Sciences, Ritsumeikan University, 1-1-1, Nojihigashi, Kusatsu,Shiga 525-8577, Japan

2 Ecole Nationale Superieure de Techniques Avancees, 828, Boulevard des Marechaux, 91762Palaiseau Cedex

E-mail hide.v3v.pooh gmail.com

Received December 6, 2013, Accepted December 25, 2013

Abstract

In the real market, the term structure of forward rates exhibits some humps. The quadraticGaussian term structure models or affine term structure models well explain this phenomena.In this research, we give a new insight, where we understand the humps as multi-soliton thatare related to KdV solitons.

Keywords term structure of interest rates, humps, affine class, quadratic Gaussian model,solitons

Research Activity Group Mathematical Finance

1. Introduction

The spot interest rate r(t, T ) is the rate per unit oftime (normally it is one year) at which one can (in prac-tice, the rate can vary depending on who they are andhow it is agreed but we ignore such credit risks/counterparty risks here) borrow (lend) cash at time t and repay(be repaid) at time T . Theoretically it is related to theprice P (t, T ) of the zero-coupon bond maturing at T as

r(t, T ) = − 1

T − tlogP (t, T ).

In practice, the rate so defined is called zero rate. Thefunction

T 7→ r(t, T )

is what we call term structure of spot rates, or in practiceit is rather function in x = T − t;

x 7→ r(t, t+ x),

which is often referred to as yield curve.In theoretical finance, one rather work on the term

structure of (the instantaneous) forward rates, which isgiven by

T 7→ f(t, T ) = − ∂

∂TlogP (t, T ),

or

x 7→ f(t, t+ x) = − ∂

∂TlogP (t, T )

∣∣∣∣T=t+x

.

This is because the forward rate is easier to handlemath-ematically. In particular to impose arbitrage-free prop-erty to the term structure.In real market, however, the term structure of spot

rates behaves nicer. According to the series of studiesby N.L.Liu and her collaborators [1–3], from the termstructure of spot rates only two or three factors up toalmost 99% are detected when applied a principal com-ponent analysis (or its variants), while that of forward

010

2030

4050

6070

0

50

100

150

0

1

2

3

4

5

6

7

8

9

10

Fig. 1. Typical forward rate movement: EU zero rate.

010

2030

4050

6070

0

50

100

150

0

1

2

3

4

5

6

Fig. 2. Spot rate movement of the same data as Fig. 1.

rates exhibits more than 10, sometimes 15, or even morefactors. Much more straightforward peculiarity is thatthe samples of the term structure of forward rates oftenhave more humps than those of spot rates.The main aim of the present paper is to propose a new

point of view where the humps are understood as a kindof solitons.The rest of the paper is organized as follows. In Section

– 17 –

Page 23: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.17–20 Hidemi Aihara et al.

2, we illustrate our idea by a primitive one dimensionalexample. In Section 2.1, we present a brief introductionto solitons. In Section 3, we give a multi-dimensional ver-sion of the observation made in Section 2. We empha-size that a class of affine (quadratic Gaussian) modelsexhibits multi-soliton shape term structures. Finally inSection 4, we remark that the solitons appearing in theterm structure models are related to a non-linear partialdifferential equations called KdV equations.

2. A primitive example

To explain the idea, we start with a primitive example.Let

P (t, T ) = E[e−

12

∫ Tt

c2|Ws|2 ds∣∣∣Wt

], 0 ≤ t ≤ T, (1)

where W is a 1-dimensional Brownian motion. This for-mula defines an arbitrage-free bond market, which is asimplest example of the quadratic Gaussian model, andat the same time, an affine term structure model (seee.g. [4]) where we consider |W |2 to be a state variable.In fact, we have an explicit expression as

P (t, T ) = (cosh(c(T − t))− 12 e−

c2 tanh(c(T−t))|Wt|2 ,

and the (instantaneous) forward rate f(t, T ) = −∂TlogP (t, T ) is then expressed as

f(t, T ) =c

2tanh(c(T − t))

+c2|Wt|2

2sech2 (c(T − t)) , (2)

which is an affine function in the state variable.By (1), we know that

T 7→ − logP (t, T )

is increasing, and therefore the term structure of spotrates under this model behaves nicely, while one noticesthat

T 7→ f(t, T )

is a rational function of ec(T−t) and e−c(T−t), which is,what we will call in local terminology, a soliton.Fig. 3 exemplifies a sample path of the affine forward

rate.

2.1 Solitons

In general, a traveling wave solution to a non-linear(evolution-type) differential equation is not stable; itcollapses from the top. The soliton solutions are excep-tions. They have (sometimes more than two) solitarywaves=humps, and the humps are quite stable even af-ter the “collisions”. Somehow they behave like particles,and that is why they are called “solitons”, which areshown in Fig. 4. Mathematically, solitons can be definedas some rational functions of exponentials (see [5]). Moreprecisely, it is something like

u(t, x) =f

g=

∑iKie

Ait−Bix∑i LieCit−Dix

, (3)

where Ai, Bi, Ci, Di,Ki and Li are constants, and thesummations are finite ones. Here we assume maxi Ci ≥maxiAi and mini Ci ≤ miniAi to ensure the existence of

0 1 2 3 4 5 0

0.5

10.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

t

f(t,T)

Fig. 3. A sample path of the forward rate given by (2) with W0 =8, c = 0.1.

Fig. 4. Solitons.

the limits at x→ ±∞. If we require the inequality to bestrict, then the graph x 7→ u(t, x) is hump-shaped. Notethat solitons of this definition are stable under summa-tion, multiplication, and differentiations. Note that theforward rate (2) in the previous section is a soliton in Tor x = T − t in this sense.

3. Affine term structure as multi-soliton

We generalize the observation made in Section 2.Let W = (W 1, . . . ,Wn) be an n-dimensional Brow-nian motion starting at x = (x1, . . . , xn) ∈ Rn, de-fined on a filtered probability space (Ω,F , P, Ft),Λ = diag(λ1, λ2, . . . , λn) with for each λi ∈ R (i =1, 2, . . . , n), and C ∈ M(n) be a positive definite ma-trix.Let

P (t, T ) :=e⟨CWt,Wt⟩

×E[e−

12

∫ Tt

|ΛWs|2 ds−⟨CWT ,WT ⟩∣∣∣Wt

]. (4)

Then P (·, T ) defines an arbitrage-free bond marketwith

πt = e−12

∫ t0|ΛWs|2 ds−⟨CWt,Wt⟩,

being a state price density.

Proposition 1 Under the model (4), the forward rateis an n-soliton; a rational function in e±(T−t)λi (i =1, 2, . . . , n), of degree at most 2n for any state Wt.

– 18 –

Page 24: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.17–20 Hidemi Aihara et al.

Proof Let

K(t) = − cosh(tΛ)C − 1

2Λ sinh(tΛ),

L(t) = 2 sinh(tΛ)Λ−1C + cosh(tΛ),

(5)

and

H(t) = K(t) · L(t)−1. (6)

Note that

K ′(t) = −1

2Λ2L(t) (7)

and

L′(t) = −2K(t). (8)

We will show that

P (t, T ) = (det(L(T − t)))− 12 e⟨(H(T−t)+C)Wt,Wt⟩. (9)

By the Feynman-Kac formula,

u(t, x) := E[e−

12

∫ t0|ΛWs|2 ds−⟨CWt,Wt⟩

∣∣∣W0 = x],

where x = (x1, . . . , xn), satisfies the following differentialequation:

∂u

∂t=

1

2∆u− 1

2⟨Λ2x, x⟩u,

u(0, x) = e−⟨Cx,x⟩,

(10)

where ∆ is the Laplacian. Note that

P (t, T ) = e⟨CWt,Wt⟩u(T − t,Wt). (11)

It is well-recognized that the solution u to (10) is ex-pressed by

eH0(t)+⟨H(t)x,x⟩, (12)

where H is a symmetric-matrix valued differentiablefunction satisfying

dH

dt(t) = 2H(t)2 − 1

2Λ2, H(0) = −C, (13)

and H0 is given by

dH0

dt(t) = trH(t), H0(0) = 0. (14)

Now we see that H given by (5) and (6) is the uniquesolution to (13). In fact, by (7) and (8), we have

H ′ = (KL−1)′ = −KL−1L′L−1 +K ′L−1

= 2(KL−1)2 − 1

2Λ2 = 2H2 − 1

2Λ2,

and also L(0) = I and K(0) = −C, which imply H(0) =−C.Further, by (14),

eH0(t) = etr(−12

∫ t0L′(s)L(s)−1 ds)

= det(e−

12

∫ t0L′(s)L(s)−1 ds

)=(det(e∫ t0L′(s)L(s)−1 ds

))− 12

= (detL)−12 .

The last line needs some more lines of explanations,which we omit here.Thus we have confirmed (9), at the same time (11)

with (12), by which we have

f(t, T ) = − ∂

∂TH0(T − t) +

∂T⟨H(T − t)Wt,Wt⟩.

Then, by substituting (13) and (14), we get

f(t, T ) =− trH(T − t)

− 1

2⟨(4H(T − t)2 − Λ2)Wt,Wt⟩. (15)

We note that the (i, j)-th entries kij and lij of K(t)and L(t) are given by

kij = − cosh(tλi)cij −1

2δij sinh(tλi),

and

lij = 2 sinh(tλi)λ−1i cij + δij cosh(tλi),

and thus they are polynomials in e±tλi . Since

H(t) = K(t)L(t)−1 = K(t)L(t)(det(L(t)))−1,

where L(t) is the cofactor matrix of L(t), we see thateach entry of H(t) is a rational function in e±tλi (i =1, . . . , n), with degree n. Hence, by the expression (15),we have the assertion.

(QED)

Remark 2 It is known that the forward rates stay pos-itive if π is a strict supermartingale. In fact, for T1 > T2we have

E[πT1 |Ft] < E[πT2 |Ft]

by the supermartingale property of π, and the formulareads

P (t, T1) =E[πT1 |Ft]

πt≤ E[πT2 |Ft]

πt= P (t, T2),

meaning that P (t, ·) and hence logP (t, ·) is decreasing.This in turn implies that f(t, T ) = −∂T logP (t, T ) ispositive.We give a sufficient condition that ensures the posi-

tivity. Since

dπt=πt(−d⟨CWt,Wt⟩ −1

2|ΛWt|2 dt+

1

2d[⟨CWt,Wt⟩]t)

=−2⟨CWt, dWt⟩ − trCdt

− 1

2|ΛWt|2 dt+

22

2|CWt|2 dt,

we see that π is a supermartingale, and hence the forwardrates stay positive, if

Λ2 − 4C2 > 0

since C > 0 is already assumed.

4. Remarks on a relation with KdV equa-

tion

Let f(t, T ) := f( c2

24 t,122T ). Then, we have

f(t, T ) =c

23tanh

(1

2

(c

2T − c3

23t

))– 19 –

Page 25: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.17–20 Hidemi Aihara et al.

+c2|Wt|2

23sech2

(1

2

(c

2T − c3

23t

))=: v(t, T ) + |Wt|2u(t, T ).

By this scale change, the functions u and v satisfy 4 ∂v∂T =

u and

∂u

∂t= −6u ∂u

∂T− ∂3u

∂T 3. (16)

Eq. (16) is known as the Korteweg-de Vries equation(KdV equation for short), which describes waves on shal-low water surfaces. The KdV equation is mathematicallyas well as physically quite important in that there aremany infinite dimensional symmetries which allow it tohave great many explicit solutions including elliptic ones,rational ones, and most importantly in our context, soli-ton ones.The relation has been extensively studied, especially

by N. Ikeda and S. Taniguchi [6–10]. An extended re-lation to KP solitons using stochastic areas is given in[11].

5. Concluding remark

We have pointed out that the forward rates of some(but actually almost all) affine term structures are multi-solitons. This observation may give new insights to fit-ting or calibrating of affine term structures.

Acknowledgments

This work was supported by JSPS KAKENHI GrantNumbers 23330109, 24340022, 23654056 and 25285102.The authors thank Nobutaka Shimizu for providing thefigures.

References

[1] N. L. Liu, A comparative study of principal component anal-

ysis on term structure of interest rates, JSIAM letters, 2(2010), 57–60.

[2] N. L. Liu and M. E. Mancino, Fourier estimation method ap-plied to forward interest rates, JSIAM Letters, 4 (2012), 17–

20.[3] N. L. Liu, I. Shin and Y. Yasuda, Principal component anal-

ysis applied to forward rates I: experiments with empiricaldata, in: Proc. of the 43rd ISCIE International Symposium

on Stochastic Systems Theory and Its Application, pp. 235–241, 2012.

[4] D. Duffie, Dynamic Asset Pricing Theory, 3rd eds., PrincetonUniversity Press, N. J., 2001.

[5] R. Hirota, The Direct Method in Soliton Theory, CambridgeTracts in Mathematics, Vol. 155, Cambridge University Press,Cambridge, 2004.

[6] N. Ikeda and S. Taniguchi, Quadratic Wiener functionals,Kalman-Bucy filters, and the KdV equation, Adv. Stud. PureMath., 41 (2004), 167–187.

[7] S. Taniguchi, On Wiener functionals of order 2 associated

with soliton solutions of the KdV equation, J. Funct. Anal.,216 (2004), 212–229.

[8] S. Taniguchi, Brownian sheet and reflectionless potentials,Stochastic Process. Appl., 116 (2006), 293–309.

[9] S. Taniguchi, Stochastic analysis and the KdV equation, Con-temp. Math., 429 (2007), 245–256.

[10] N. Ikeda and S. Taniguchi, The Ito-Nisio theorem, quadraticWiener functionals, and 1-solitons, Stochastic Process. Appl.,

120 (2010), 605–621.[11] H. Aihara, J. Akahori, H. Fujii and Y. Nitta, Tau functions of

KP solitons realized in Wiener space, Bulletin of the LondonMathematical Society, 2013, DOI: 10.1112/blms/bdt056.

– 20 –

Page 26: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol.6 (2014) pp.21–24 c⃝2014 Japan Society for Industrial and Applied Mathematics J S I A MLetters

Hierarchical graph Laplacian eigen transforms

Jeff Irion1 and Naoki Saito1

1 Department of Mathematics, University of California, Davis, CA 95616, USA

E-mail jlirion math.ucdavis.edu, saito math.ucdavis.edu

Received September 11, 2013, Accepted December 27, 2013

Abstract

We describe a new transform that generates a dictionary of bases for handling data on agraph by combining recursive partitioning of the graph and the Laplacian eigenvectors of eachsubgraph. Similar to the wavelet packet and local cosine dictionaries for regularly sampledsignals, this dictionary of bases on the graph allows one to select an orthonormal basis thatis most suitable to one’s task at hand using a best-basis type algorithm. We also describe afew related transforms including a version of the Haar wavelet transform on a graph, each ofwhich may be useful in its own right.

Keywords graph Laplacian eigenvectors, Fiedler vectors, spectral graph partitioning, a dic-tionary of orthonormal bases, wavelet-like transforms on graphs

Research Activity Group Wavelet Analysis

1. Introduction

For signal processing on regular domains, waveletshave both a well-developed theory and a proven trackrecord of success. Accordingly, efforts have been made toextend classical wavelets and wavelet techniques to theever-expanding realm of data on graphs. Such datasetsinclude structural/morphological data (e.g., tracings ofneuronal dendrites), traffic and transportation data, andsocial networks. The motivation for developing these so-called “second generation wavelets” is simple: to deter-mine whether they afford the same advantages offeredby classical wavelets for approximation/compression, de-noising, and classification in this more general setting.However, a key difficulty in extending wavelets to

graphs is that we lack the notion of “frequency” in gen-eral, i.e., we cannot apply the Littlewood-Paley theorydirectly. Therefore, a common strategy has been to de-velop wavelet-like transforms rather than true general-izations of classical wavelets; see e.g., [1–9]. In this ar-ticle, we propose a new redundant transform for dataon graphs, along with two variations, and then show thebasis vectors computed on a particular graph.

2. Definitions and notation

Let G be an undirected connected graph, let V (G) andE(G) denote its vertices and edges, respectively, and letN := |V (G)|. Let W (G) = (Wij) ∈ RN×N be the sym-metric weight matrix of G, where Wij denotes the edgeweight between vertices i and j. In an unweighted (i.e.,combinatorial) graph, Wij is either 0 or 1, depending onwhether there is an edge between the two vertices. Bycontrast, in a weighted graph, Wij indicates the proxim-ity of vertices i, j or affinity of information measured ati, j. Let f = (f(1), . . . , f(N))T ∈ RN be a data vector,where f(i) is the value measured at the vertex i of thegraph. Let 1 := (1, . . . , 1)T ∈ RN .A standard technique for working with data on a

graph is to utilize the eigenvectors of the Laplacian ma-trix of the graph, which is defined as L(G) :=D(G) −W (G), where D(G) = diag(di) is the (diagonal) degreematrix with di :=

∑j Wij . Alternatively, we may use the

random-walk normalized Laplacian, which is defined asLrw(G) :=D(G)−1L(G) = I−D(G)−1W (G). The eigen-vectors of both L(G) and Lrw(G) form a basis of RN andcan thus be used for representation, approximation, andanalysis of data on G. The simple path graph PN con-sisting of N vertices provides an important insight forthe development of our new transform. As pointed out in[10], the eigenvectors of L(PN ) are nothing but the Dis-crete Cosine Transform (DCT) Type II, which are usedin the JPEG image compression standard. In general, itis difficult to know the essential support of the Lapla-cian eigenvectors a priori, which strongly depends on thestructure of the graph: sometimes they are completelyglobal, like those of PN , while the other times they maybe quite localized, as shown in [10]. Hence, it is worthcontrolling the support of the eigenvectors explicitly.

3. Hierarchical graph Laplacian eigen

transform (HGLET)

We now introduce our Hierarchical Graph LaplacianEigen Transform (HGLET). First, we compute the com-plete set of eigenvectors of L(G): ϕ0

0,0,ϕ00,1, . . . ,ϕ

00,N−1

with corresponding eigenvalues 0 = λ00,0 < λ00,1 ≤ · · · ≤λ00,N−1. As this is a multiscale transform, we adopt the

notation (λjk,l,ϕjk,l) for the eigenpairs, with j denoting

the level (or depth) of the partition, k denoting the re-gion number on level j, and l indexing the eigenvectorsfor region k on level j. Then we partition the graph intotwo disjoint subgraphs (or regions) according to the signof the Fiedler vector, ϕ0

0,1. Partitioning the graph in thismanner is supported by the theory discussed in [11]. Fur-thermore, the Fiedler vector of L(G) (or Lrw(G)) is thesolution of the relaxed RatioCut (or Normalized Cut)

– 21 –

Page 27: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.21–24 Jeff Irion and Naoki Saito

minimization problem; see e.g., [12] and the referencestherein.Let G1

0 and G11 be the two disjoint regions of G

obtained by this partitioning process; note V (G) =V (G1

0) ∪ V (G11) but E(G) ⊋ E(G1

0) ∪E(G11). From here

we repeat the process recursively. The whole process canbe summarized as follows:

Algorithm 1 (HGLET)

Step 0: Set G00 = G and N0

0 = N = |V (G)|; initializeK0 = 1 and K1 = 0; set j = 0 and k = 0.

Step 1: Construct the Laplacian matrix L(Gjk).

Step 2: Compute its eigenvectors, ϕjk,l

Njk−1

l=0 .

Step 3: If N jk > 1, then partition Gj

k by the

sign of the Fiedler vector ϕjk,1 into Gj+1

Kj+1 and

Gj+1Kj+1+1; set N

j+1Kj+1 = |V (Gj+1

Kj+1)|, and N j+1Kj+1+1 =

|V (Gj+1Kj+1+1)|, and Kj+1 = Kj+1 + 2; else set

Gj+1Kj+1 = Gj

k, Nj+1Kj+1 = |V (Gj

k)|, and Kj+1 =Kj+1 + 1.

Step 4: If k+1 < Kj , then set k = k+1 and go backto Step 1; else go to Step 5.

Step 5: If |V (Gj+1k )| = 1 for k = 0, . . . ,Kj+1−1, then

finish; else set j = j + 1, k = 0, Kj+1 = 0, and goback to Step 1.

Several remarks on this algorithm are in order.

• L(Gjk) in Step 1 above can be replaced by Lrw(G

jk),

which may result in better partitions; see [12].

• Similar to dictionaries of orthonormal bases suchas wavelet packet or local cosine dictionaries forregularly-sampled signals, our HGLET yields ahighly overcomplete basis set for data measured onthe vertices V (G) (after extending each eigenvectorϕj

k,l from its original support V (Gjk) to V (G) by

zeros). There are in fact more than 2⌊N/2⌋ possiblebases choosable from this overcomplete set, whichallow us to select a basis most suitable for the taskat hand via the best-basis type algorithms origi-nally developed for regularly-sampled signals; seee.g., [13].

• The HGLET eigenvectors ϕjk,l

Njk−1

l=0 for each fixed

(j, k) form an orthonormal basis for RNjk if one uses

the usual Laplacian matrix L(Gjk) in Step 1. On

the other hand, if one uses the random-walk ver-sion Lrw(G

jk), the resulting eigenvectors are neither

mutually orthogonal nor normalized to have unitℓ2-norm in general. To generate a set of orthonor-mal vectors, one only needs to multiply D(Gj

k)1/2

to each such eigenvector; see also [12].

• The actual HGLET transform of a given data vectorf ∈ RN can be done either on the fly in Algorithm 1(i.e., immediately after the orthonormal eigenvec-tors are computed at each j and k) by taking inner

products ⟨f |V (Gjk),ϕj

k,l⟩, where f |V (Gjk)∈ RNj

k is

the portion of f supported on V (Gjk); or one can

do the same after Algorithm 1 is completed.

−98 −97 −96 −95 −94 −93 −92 −91 −90 −8943

44

45

46

47

48

49

50

−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

x 10−3

(a) ϕ11,4

−98 −97 −96 −95 −94 −93 −92 −91 −90 −8943

44

45

46

47

48

49

50

−3

−2

−1

0

1

2

3

x 10−3

(b) ϕ22,1

−98 −97 −96 −95 −94 −93 −92 −91 −90 −8943

44

45

46

47

48

49

50

−4

−3

−2

−1

0

1

2

3

4

x 10−3

(c) ϕ35,2

−98 −97 −96 −95 −94 −93 −92 −91 −90 −8943

44

45

46

47

48

49

50

−5

−4

−3

−2

−1

0

1

2

3

4

5

x 10−3

(d) ϕ35,9

Fig. 1. The HGLET eigenvectors computed on the Minnesotaroad map (N = 2636). The random-walk Laplacians were used

for this set of experiments with the inverse physical (Euclidean)distances between vertices as edge weights. (a)-(c) show theeigenvectors at different scales covering the densely connectedregion. (d) shows the eigenfunctions with higher oscillations

whose support is the same as that of (c).

• The computational cost of generating the whole setof eigenvectors in the HGLET is clearly O(N3).

• For an unweighted path graph PN , the HGLET ex-actly yields a dictionary of the block DCT-II bases;in other words, the HGLET can be viewed as a truegeneralization of the block DCT-II dictionary.

• Finally, it is easy to see that, over all levels of thepartition tree, this scheme yields a total of N − 1subgraphs (including the initial graphG) containingtwo or more vertices; see also Section 4.

Fig. 1 shows some HGLET eigenvectors on the Min-nesota road network.

4. Variations of HGLET

In this section, we will discuss two variants of theHGLET that are not a member of the HGLET dictio-nary strictly speaking, i.e., not directly choosable usingthe best-basis type algorithms.

4.1 Generalized Haar transform

The first one, which we call the Generalized HaarTransform (GHT), provides a complete orthonormal ba-sis (i.e., no redundancy) of RN comprised of one piece-wise constant vector from each subgraph Gj

k containingtwo or more vertices, along with the constant (i.e., DC)vector on the entire graph G. This variation proceeds asAlgorithm 1 except Step 2, which is modified as follows:

Step 2: Compute only the Fiedler vector, ϕjk,1; define

ψjk(i) :=

1 if ϕjk,1(i) ≥ 0;

−rjk if ϕjk,1(i) < 0,i = 1, . . . , N j

k ,

– 22 –

Page 28: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.21–24 Jeff Irion and Naoki Saito

where rjk :=|m ∈ [1, N j

k ] |ϕjk,1(m) ≥ 0|

|m ∈ [1, N jk ] |ϕ

jk,1(m) < 0|

; then set

ψjk = ψj

k/∥ψjk∥.

This modified Algorithm 1 yields a GHT orthonormal

basis: φ00 ∪ ψ

jk

Kj−1k=0 Jj=0, where φ

00 = 1/N1/2 is the

constant vector on G, the ψjk vectors are extended by

zeros to V (G00)\V (Gj

k), and J is the deepest level of thepartitioning (note that J may be larger than log2N ingeneral, unlike the regularly-sampled signals). We notethat in this GHT, each of the N − 1 subgraphs Gj

k with

|V (Gjk)| ≥ 2 contributes a single ψj

k basis vector to theGHT basis, with the initial graph G also contributingφ0

0. This is quite a contrast to the general HGLET dic-tionary case because not all the subgraphs generated inAlgorithm 1 contribute their eigenvectors to a basis cho-sen by the best-basis type algorithm from this dictionary.The computational cost for generating this GHT is

O(N log2N), which should be contrasted with the fullHGLET case. This speed up is mainly due to the factthat we only need to compute one eigenvector in Step 2of the GHT algorithm.

4.2 Orthonormalized hierarchical Fiedler transform

Each eigenvector ϕjk,l in the HGLET dictionary is a

discretized version of a function that is “continuous”within its support V (Gj

k). On the other hand, the basisvectors in the GHT are all binary-valued (i.e., “discon-tinuous”) except the DC vector φ0

0. Hence it is natural toconsider a smoother version of the GHT. Here, we pro-pose the Orthonormalized Hierarchical Fiedler Trans-form (OHFT), which proceeds similarly to the GHTwith the following modification and addition:

Step 2: Compute only the Fiedler vector ϕjk,1; then set

ψjk :=ϕ

jk,1.

Step 6: Form ψj

k by extending each ψjk (except j = 0)

by zeros to V (G00) \ V (Gj

k); define φ00 :=1/N1/2;

form a matrix Ψ :=[φ0

0

∣∣∣ψ00

∣∣∣ ψ1

0

∣∣∣ · · · ∣∣∣ ψJ

KJ−1

]∈

RN×N where J is the deepest level of the partitions;finally, orthonormalize the columns of Ψ using theQR factorization.

Step 6 is necessary to form an orthonormal basis since

the extended vectors ψj

k are not mutually orthogonal ingeneral.We note that the OHFT provides a single orthonor-

mal basis for RN , and its basis vectors are continuouswithin their original support. However, due to the or-thogonalization procedure in Step 6, the support of eachbasis vector is extended beyond its original support bynonzero values, and moreover, this extension by orthog-onalization may not provide continuous extension to theoutside of the original support. We are currently investi-gating whether we can provide smoother and continuousextensions while keeping the orthogonality.The cost of computing the OHFT is O(N3) due to the

orthogonalization procedure in Step 6.Fig. 2 compares a GHT basis vector and the corre-

sponding OHFT basis vector.

−98 −97 −96 −95 −94 −93 −92 −91 −90 −8943

44

45

46

47

48

49

50

−0.04

−0.03

−0.02

−0.01

0

0.01

0.02

0.03

0.04

(a) ψ22 (GHT)

−98 −97 −96 −95 −94 −93 −92 −91 −90 −8943

44

45

46

47

48

49

50

−0.06

−0.04

−0.02

0

0.02

0.04

0.06

(b) ψ22 (OHFT)

Fig. 2. Comparison between one of the GHT basis vectors andthe corresponding OHFT basis vector on the MN road map. We

again used the random-walk Laplacian, similar to Fig. 1.

5. Discussion

The purpose of this article has been to introduce ournewly developed HGLET method and its variations.In this section, let us first briefly discuss the relation

of our work to the previous works. First of all, the hi-erarchical partitioning of a graph using the Fiedler vec-tors is a natural idea and obviously not new. For exam-ple, Simon [14] discussed such a recursive bi-partitioningof a graph. However, his aim was to create hierarchical(bi)partitions of unstructured grids for the purpose ofparallel processing, and he was concerned about neitherbasis constructions nor data analysis.There are several constructions of wavelet-like trans-

forms on a graph. Diffusion wavelets [1] are based on thebottom-up approach using the diffusion/random walk ona graph, which have been generalized to wavelet packetsin [2]. By contrast, our HGLET and its variants utilizethe top-down approach, and can be viewed as a simplegeneralization of the classical block DCT-II dictionary.The spectral graph wavelet transform (SGWT) [7]

directly applies the Littlewood-Paley theory by view-ing the eigenvalues and eigenvectors of the global graphLaplacian as the “frequencies” and “Fourier modes,” re-spectively. However, for a general graph, the eigenvalueindices cannot be viewed as natural frequencies of thegraph, and moreover some eigenvectors of L(G) maybe localized [10]. In other words, one cannot explicitlycontrol localization properties of such SGWT wavelets,which may lead to unexpected problems.Jansen et al. developed a wavelet-like transform for

signals on graphs by generalizing the classical lift-ing scheme [5]. Since their method proceeds vertex-by-vertex, a discrete (e.g., dyadic) notion of scale no longerexists, which is quite a contrast to our HGLETs.Rustamov recently constructed two different wavelet-

like transforms on graphs. The first one [8] is basedupon the average-interpolation wavelets and also usesa top-down partitioning, like our HGLETs. However, itis fundamentally different from ours due to the average-interpolation procedure. The second one [9] used the lift-ing scheme whose update and prediction operators areadaptively learned from a given set of signals so thatthe resulting wavelet coefficients of a signal belongingto the same signal class become sparse. The aim of thisadapted wavelet construction is the same as that of thebest basis selected from the full HGLET dictionary, but

– 23 –

Page 29: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.21–24 Jeff Irion and Naoki Saito

these two constructions are quite different.We also acknowledge that the GHT is similar to those

found in [3,4,6]. The differences are that [3] and [4] utilizebottom-up clustering methods whereas our transformsuse a top-down partitioning; [6] assumes that the parti-tion tree has already been computed and is provided asan input.Finally, after our work for this article was completed,

we noticed the article by Szlam et al. [15], which pro-vided the closest idea to ours. In that article, the au-thors proposed a top-down approach using the Laplacianeigenfunctions satisfying the Neumann boundary condi-tion, briefly mentioned the construction of the Haar ba-sis, and then proposed a generalization of the local co-sine dictionary to manifolds and graphs. The construc-tion part of their Haar basis seems to be identical toours, although they did not describe their algorithm indetail; in particular, they never explicitly mentioned theconversion of the Fiedler vectors (or the Neumann eigen-functions in their terminology) to their Haar basis vec-tors. As for their construction of the local cosine dic-tionary on manifolds and graphs, they transported thefolding/unfolding operators on the regular lattice to thegeneral manifold and graph setting. Considering our ex-periences of the local cosine dictionary on the regularlattice [16], such generalized local cosines may not workwell in practice. In fact, such folding/unfolding oper-ations may be unnecessary or even harmful for appli-cations. Moreover, they did not notice that the usualgraph Laplacian eigenvectors are the true generalizationof DCT Type II.Much work remains to be done in order to fine tune

these methods and apply them to cutting-edge problemsranging from data approximation and denoising to clas-sification and regression on graphs. Beyond such applica-tions, we would like to mention two possible extensionsof the HGLET. The first one is to develop the Gener-alized Haar-Walsh Transform, which is a generalizationof the Haar-Walsh dictionary to the graph setting, andwhich includes the GHT as just one possible basis. First,we perform a full partitioning of the graph using Fiedlervectors so that all regions at the finest level consist ofa single vertex. The basis vectors at this level are sim-ply Kronecker deltas. From here we perform average anddifference operations on the basis vectors correspondingto each pair of children regions to generate the basis vec-tors of their parent region, and we iterate this processfrom bottom to top until we reach the root level j = 0.This process yields an overcomplete basis set that is ageneralization of the Haar-Walsh dictionary.The second extension is to adopt a more flexible graph

partitioning scheme. In our HGLET in this article, wehave focused only on using Fiedler vectors to split eachsubgraph into two smaller subgraphs. However, notethat only our OHFT has a crucial dependence on theFiedler vector, whereas the other transforms simply usethe Fiedler vector as a means for partitioning the graph.Each of our transforms could just as easily utilize a dif-ferent partitioning scheme. Furthermore, it is entirelypermissible within the general structure of our trans-forms to allow G and subsequent subgraphs to be split

into an arbitrary non-fixed number of subgraphs; ideallythis number of partitions of a given subgraph should bethe same as the number of clusters in that subgraph ifsuch clusters are clearly formed. Since graph partition-ing is a highly evolving field, it is important that ourtransforms be independent of the particular choice ofhierarchical graph partitioning scheme.We are currently investigating the above extensions

and examining the performance of our HGLETs for var-ious applications including simultaneous segmentationand compression of regularly-sampled signals. We hopeto report results of more extensive and challenging ex-periments in our future article.

Acknowledgments

This research was partially supported by the ONRgrant N00014-12-1-0177 and the NDSEG fellowship.

References

[1] R. R. Coifman and M. Maggioni, Diffusion wavelets, Appl.Comput. Harm. Anal., 21 (2006), 53–94.

[2] J. C. Bremer, R. R. Coifman, M. Maggioni and A. Szlam,Diffusion wavelet packets, Appl. Comput. Harm. Anal., 21(2006), 95–112.

[3] F. Murtagh, The Haar wavelet transform of a dendrogram, J.

Classification, 24 (2007), 3–32.[4] A. Lee, B. Nadler and L. Wasserman, Treelets—an adaptive

multi-scale basis for sparse unordered data, Ann. Appl. Stat.,2 (2008), 435–471.

[5] M.Jansen, G.P.Nason and B.W.Silverman, Multiscale meth-ods for data on graphs and irregular multidimensional situa-tions, J. R. Stat. Soc. Ser. B, 71 (2008), 97–125.

[6] M. Gavish, B. Nadler and R. Coifman, Multiscale wavelets on

trees, graphs and high dimensional data: Theory and appli-cations to semi supervised learning, in: Proc. of 27th Intern.Conf.Machine Learning, J. Furnkranz et al. eds., pp. 367–374,

Omnipress, Haifa, 2010.[7] D. K. Hammond, P. Vandergheynst and R. Gribonval,

Wavelets on graphs via spectral graph theory, Appl. Com-put. Harm. Anal., 30 (2011), 129–150.

[8] R. M. Rustamov, Average interpolating wavelets on pointclouds and graphs, arXiv:1110.2227v1 [math.FA], 2011.

[9] R. M. Rustamov and L. Guibas, Wavelets on graphs via deeplearning, in: Advances in Neural Information Processing Sys-

tems, Vol. 26, 2013, to appear.[10] Y. Nakatsukasa, N. Saito and E. Woei, Mysteries around the

graph Laplacian eigenvalue 4, Linear Algebra Appl., 438(2013), 3231–3246.

[11] M. Fiedler, A property of eigenvectors of nonnegativesymmetric matrices and its application to graph theory,Czechoslovak Math. J., 25 (1975), 619–633.

[12] U. von Luxburg, A tutorial on spectral clustering, Stat. Com-put., 17 (2007), 395–416.

[13] N. Saito, Local feature extraction and its applications using alibrary of bases, in: Topics in Analysis and Its Applications:

Selected Theses, R.Coifman ed., pp.269–451, World ScientificPub. Co., Singapore, 2000.

[14] H. D. Simon, Partitioning of unstructured problems for par-allel processing, Comput. Sys. Eng., 2 (1991), 135–148.

[15] A. D. Szlam, M. Maggioni, R. R. Coifman and J. C. Bre-mer, Jr., Diffusion-driven multiscale analysis on manifoldsand graphs: top-down and bottom-up constructions, in: Procof SPIE 5914, Wavelets XI, M. Papadakis et al. eds., Paper

# 59141D, 2005.[16] N. Saito and J.-F. Remy, The polyharmonic local sine trans-

form: A new tool for local image analysis and synthesis with-out edge effect, Appl.Comput.Harm.Anal., 20 (2006), 41–73.

– 24 –

Page 30: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol.6 (2014) pp.25–28 c⃝2014 Japan Society for Industrial and Applied Mathematics J S I A MLetters

Strong Lp convergence associated with Rellich-type

discrete compactness for discontinuous Galerkin FEM

Fumio Kikuchi1 and Daisuke Koyama2

1 Hitotsubashi University, 2-1 Naka, Kunitachi, Tokyo 186-8601, Japan2 The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu, Tokyo 182-8585, Japan

E-mail kikuchi ms.u-tokyo.ac.jp

Received December 1, 2013, Accepted December 30, 2013

Abstract

In a preceding paper, we proved the discrete compactness properties of Rellich type for some2D discontinuous Galerkin finite element methods (DGFEM), that is, the strong L2 conver-gence of some subfamily of finite element functions bounded in an H1-like mesh-dependentnorm. In this note, we will show the strong Lp convergence of the above subfamily for1 ≤ p < ∞. To this end, we will utilize the duality mappings and special auxiliary prob-lems. The results are applicable to numerical analysis of various semi-linear problems.

Keywords discontinuous Galerkin FEM, polygonal FEM, discrete Rellich theorem, strongLp convergence

Research Activity Group Scientific Computation and Numerical Analysis

1. Introduction

Various discontinuous Galerkin finite element meth-ods (DGFEM) have been developed and analyzed in re-cent years [1, 2]. Since they use discontinuous approxi-mation functions, some important results in the conven-tional functional analysis are not directly available, sothat we are obliged to establish their discrete analogs.In [3], we proved discrete compactness properties of

Rellich type for some 2D DGFEM. That is, from a mesh-dependent family of functions bounded in a broken H1-like Sobolev norm, we can choose a subfamily which isstrongly convergent and whose approximate first-orderderivatives are weakly convergent in the L2 sense. Theobtained results can be applied to justification of numer-ical approximations to various linear problems. However,in the 2D cases, the original Rellich theorem also assuresthe strong Lp convergence for 1 ≤ p <∞, and this prop-erty is effective for analysis of some semi-linear problems.So we will derive such property for some DGFEM bymaking use of the duality maps and regularity results ofspecial auxiliary boundary value problems.

2. Preliminaries

2.1 Function spaces

Let Ω ⊂ R2 be a bounded polygonal domain withboundary ∂Ω. We assume that its maximum interiorangle is strictly less than 2π. For Ω, we can definethe Lebesgue and Sobolev spaces Lp(Ω) and W s,p(Ω)(s ≥ 0, 1 ≤ p ≤ ∞, Lp(Ω) = W 0,p(Ω)), where frac-tional cases (s /∈ N ∪ 0) are included [2, 4]. We willalso use Hs(Ω) :=W s,2(Ω). The inner products of bothL2(Ω) and L2(Ω)2 are designated by (·, ·)Ω, with the as-sociated norms by ∥·∥Ω, and the norm and the standardsemi-norm of W s,p(Ω), as well as those of W s,p(Ω)2, aredenoted by ∥·∥s,p,Ω and |·|s,p,Ω, respectively. For domains

other than Ω, notations of the above spaces, norms, etc.will be used with Ω replaced appropriately.Let us consider a subset ∂ΩD of ∂Ω, which either is

empty or consists of finitely many closed segments. Thenwe introduce a closed subspace H1

D(Ω) of H1(Ω) by

H1D(Ω) = v ∈ H1(Ω); v = 0 on ∂ΩD. (1)

2.2 Definitions and notations for triangulations

We first construct a family of triangulations T hh>0

of Ω by polygonal finite elements (or shortly elements):each T h consists of a finite number of elements, andeach element K ∈ T h is a bounded m-polygonal (open)domain, where m is an integer which can differ with Ksuch that 3 ≤ m ≤M for an integer M ≥ 3 common tothe considered family T hh>0. Thus the boundary ∂Kof K ∈ T h is a closed simple polygonal curve composedof m edges. We do not avoid non-convex cases for Kunlike in the classical quadrilateral elements, cf. [5].We use the notation e to denote an edge ofK, which is

assumed here to be an open segment. The sets of edges ofK ∈ T h and T h are respectively denoted by EKand Eh.For each triangulation T h, we define its “skeleton” Γh asΓh = ∪e∈Ehe. We assume that the triangulations are soconstructed that any edge e ∈ Eh such that e∩∂ΩD = ∅is entirely contained in ∂ΩD.The diameter of K is denoted by hK , and the length

of e ∈ EK by |e|. Moreover, h = maxK∈T h hK . We willdesignate the inner products of L2(∂K) and L2(∂K)2 by[·, ·]∂K , and the associated norms by | · |∂K . For e ∈ EK ,[·, ·]e and |·|e are defined similarly, and the norm of Lp(e)(1 ≤ p ≤ ∞) is denoted by | · |p,e (| · |e = | · |2,e).We will also impose the “regularity” conditions onT hh>0 presented in [3], cf. also [5]. In particular, weadopt the chunkiness condition [2], the triangle condi-tion, and the local quasi-uniformity of edge lengths.

– 25 –

Page 31: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.25–28 Fumio Kikuchi et al.

2.3 Function spaces associated to triangulations

Over T h, we consider the broken Sobolev spaces [1,2]:

W s,p(T h) = v ∈ Lp(Ω); v|K ∈W s,p(K)(∀K ∈ T h),Hs(T h) =W s,2(T h) (s ≥ 0, 1 ≤ p ≤ ∞). (2)

Here, W s,p(T h) can be identified with ΠK∈T hW s,p(K).For v ∈ H1/2+σ(T h) (σ > 0) and K ∈ T h, the trace ofv|K to ∂K is well defined as an element of L2(∂K) anddenoted by v|∂K or simply v, which can be double-valuedon edges shared by two elements [1, 2].On Γh, we consider a kind of flux v ∈ L2(Γh), which is

single-valued on each edge shared by two elements [1,2].To deal with the boundary condition in (1), define

L2D(Γh) = v ∈ L2(Γh); v = 0 on ∂ΩD. (3)

In the hybrid(ized) DGFEM, the flux v is independentof v, and they are used as a pair. On the other hand,in some genuine (non-hybridized) DGFEM like IP andLDG methods [1,2], we make v be subject to v by intro-ducing appropriate constraints between them. A typicalapproach is: first define v ∈ L2(Γh) for v ∈ H1(T h)by: for an edge e ∈ Eh, we set v|e = v|e if e ⊂ ∂Ω,while we take as follows (simple averaging) if e is sharedby two elements K1, K2 ∈ T h;

v|e = (v1+v2)2 , (4)

where v1 (v2 resp.) = trace of v|K1 (v|K2 resp.) to e.Then we can use such v|e as v|e when e ⊂ ∂ΩD.For each T h, let us define a mesh-dependent semi-

norm for v, v ∈ H1(T h)× L2(Γh) by

|v, v|2h = ∥∇hv∥2Ω +∑

K∈T h

∑e∈EK |e|−1|v − v|2e, (5)

where v on e ∈ EK implies the trace of v|K to e, and∇h : H1(T h) → L2(Ω)2 is characterized by (∇hv)|K =∇(v|K) for v ∈ H1(T h) and K ∈ T h.

2.4 Finite element spaces

To approximate v, v ∈ H3/2+σ(T h) × L2(Γh) (0 <σ ≤ 1/2) associated to T h, let us prepare two concretefinite dimensional spaces for a specified k ∈ N:

Uh=ΠK∈T hPk(K)⊂W 2,∞(T h)⊂H3/2+σ(T h), (6)

Uh=Πe∈EhPk(e)⊂L∞(Γh) or Πe∈EhPk(e)∩C(Γh), (7)

where Pk(K) and Pk(e) are the spaces of polynomials ofdegree ≤ k on K and e, respectively, and C(Γh) is thespace of continuous functions on Γh.To deal with the Dirichlet condition in (1), define also

UhD = vh ∈ Uh; vh = 0 on ∂ΩD = Uh ∩ L2

D(Γh). (8)

We will employ the finite element spaces given by

V h = Uh × Uh, V hD = Uh × Uh

D. (9)

2.5 Lifting operators

First let us introduce, for the same k ∈ N as in Section2.4,

QK = Pk(K) or Pk−1(K). (10)

Then the local lifting operator RK : g ∈ L2(∂K) 7→ξ ∈ (QK)2 is defined as: given g ∈ L2(∂K), find ξ =

ξ1, ξ2 ∈ (QK)2 such that, ∀η = η1, η2 ∈ (QK)2,

(ξ, η)K = [g, η · n]∂K (η · n = η1n1 + η2n2), (11)

where n = n1, n2 is the outward unit normal on ∂K.Identifying Qh := ΠK∈T hQK with a subspace of

L2(Ω) and further ΠK∈T h(QK)2 with (Qh)2, the globallifting operator Rh is defined by

Rh : g = g∂KK∈T h ∈ ΠK∈T hL2(∂K)

7→ RKg∂KK∈T h ∈ (Qh)2 ⊂ L2(Ω)2. (12)

Since v ∈ L2(Γh) is single-valued on e ∈ Eh, it can benaturally identified with an element of ΠK∈T hL2(∂K).On the other hand, the trace of v ∈ H1(T h) to e ⊂ ∂Ωmay be double-valued. To use Rh for such v, we define

Sh : v ∈ H1(T h) 7→v|∂KK∈T h ∈ ΠK∈T hL2(∂K). (13)

For the present choice of the discrete spaces, we canshow that Rh in (12) satisfies [1, 3]

∥Rhg∥Ω ≤ C(∑

K∈T h

∑e∈EK |e|−1|g

∂K|2e)

12 . (14)

Here C > 0 is independent of h > 0 and g, and, alongwith C∗ and c, will denote generic positive constants [2].

3. Rellich type discrete compactness

In [3], we showed the following results.

Theorem 1 We employ the above finite element spacesand assume the regularity conditions in [3]. Let uh,uh ∈ V h

Dh>0 be a family associated to T hh>0 suchthat |uh, uh|2h + ∥uh∥2Ω ≤ 1. Then there exist a func-tion u0 ∈ H1

D(Ω) and a subfamily, denoted again byuh, uhh>0 for convenience, such that, as h ↓ 0,

uh → u0 in L2(Ω), (15)

uh|∂ΩD→ u0|∂ΩD

= 0 in L2(∂ΩD) if ∂ΩD = ∅, (16)

∇huh +Rh(uh − Shuh) ∇u0 in L2(Ω)2, (17)

where → and respectively denote the strong and weakconvergences.

To prove the above, we used some assumptions on thefamily of finite element spaces, which can be assuredfor the present types of triangulations and piecewisepolynomial spaces [3], cf. also [1]. However, we shouldsupplement the techniques used there. In the formerproof [3], we utilized an h−family (h > 0) of prob-lems −∆uh + uh = uh (uh ∈ Uh) under the mixedDirichlet-Neumann boundary conditions associated toH1

D(Ω). But this choice yields so severe regularity re-sults for Ω of a general shape [4], that some argumentsemployed there may loose the validity unless we put ad-ditional restrictions on the interior angles of the polyg-onal domain Ω. Instead, we can use the pure Neumanncondition without any essential changes of the proof.

Remark 2 Another approach of showing the discretecompactness for some genuine DGFEM is to use the re-construction operators, see e.g. [6]. Probably, we can alsoapply such techniques to various hybrid(ized) DGFEM.

4. Strong Lp convergence for 1 ≤ p < ∞Let us show that the subfamily uhh>0 in Theorem

1 also converges strongly to u0 in Lp(Ω) for p ∈ [1,∞[.

– 26 –

Page 32: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.25–28 Fumio Kikuchi et al.

Since Ω is bounded, the conclusion is obvious for p ∈[1, 2[, so that we will consider only for p ∈ ]2,∞[. We willuse the notation q ∈ ]1, 2[ characterized by 1/p+1/q = 1.Notice here the following lemma [7].

Lemma 3 If f ∈ L2(Ω) ∩ Lp(Ω) (p ∈ ]2,∞[), it alsobelongs to Lp∗

(Ω) where 1/p∗ = (1−α)/2+α/p for α ∈]0, 1[, and the following “interpolation inequality” holds:

∥f∥0,p∗,Ω ≤ ∥f∥1−αΩ ∥f∥α0,p,Ω. (18)

Thus we can conclude the strong convergence ofuhh>0 in Lp∗

(Ω) for all p∗ ∈]2, p[ by deriving the Lp

boundedness for p > 2. Moreover, to our aim, it sufficesto show such boundedness for each sufficiently large p.Let Jp : Lp(Ω) → Lq(Ω) be the duality map charac-

terized for each v ∈ Lp(Ω) by∫Ω(Jpv)v dx = ∥v∥20,p,Ω, ∥Jpv∥0,q,Ω = ∥v∥0,p,Ω, (19)

where x = x1, x2 denotes the variable in R2, and Jpv

is uniquely given by Jpv = v · |v|p−2/∥v∥p−20,p,Ω [7].

For each uh, uh ∈ V h with |uh, uh|2h+ ∥uh∥2Ω ≤ 1,define uh,p ∈ W 1,q(Ω) (p ∈ ]2,∞[, 1/p + 1/q = 1) suchthat ∫

Ω(∑2

i=1∂uh,p

∂xi

∂v∂xi

+ uh,pv)dx

=∫Ω(Jpuh)v dx ∀v ∈W 1,p(Ω). (20)

This is a variational formulation to −∆uh,p + uh,p =Jpuh with the homogeneous Neumann condition. Forp = 2, it reduces to the auxiliary problem in Sec. 3.Here v is taken from W 1,p(Ω), a natural space to applythe Green formula along with functions in W 1,q(Ω) [4],but can be also taken from wider spaces such as H1(Ω).Since we are dealing with a bounded polygonal domain

Ω whose maximum interior angle is strictly smaller than2π, we have the following existence and regularity resultsfor elliptic problems with possible corner singularities(cf. [4, Theorems 1.4.5.3 and 4.4.3.7]).

Lemma 4 For the present domain Ω and any suf-ficiently large p < ∞, there exists a unique solutionuh,p ∈W 1,q(Ω) of (20), which also satisfies

uh,p ∈W s,q(Ω) (s = min2, 12 + 2q + δ),

∥uh,p∥s,q,Ω ≤ Cp,Ω∥Jpuh∥0,q,Ω = Cp,Ω∥uh∥0,p,Ω. (21)

Here, δ > 0 depends only on p and the maximum interiorangle of Ω, and Cp,Ω > 0 does only on p and Ω.

Remark 5 The present results may not hold for somefinite p [4]. Moreover, for 2 < p <∞, the number 1/2+2/q in the definition of s is evaluated as 3/2 < 1/2 +2/q < 5/2.

Let us integrate −∆uh,p + uh,p = Jpuh over Ω aftermultiplying uh to the both sides, and then apply theGreen formula with carefully handling the singularitiesaround the vertices [2]. To justify such calculations, weshould notice that uh,p ∈W s,q(Ω), uh ∈W 1,∞(T h) anduh ∈ L∞(Γh) for any h > 0, and in addition, for anyK ∈ T h and e ∈ EK , (∇uh,p)|e ∈ W s−1−1/q,q(e)2, anduh|e ∈ L∞(e), where uh|e, for example, denotes the traceof uh|K to e and the trace theorem from W s−1,q(K) toW s−1−1/q,q(e) is used by taking account that s − 1 −

1/q = min1− 1/q,−1/2 + 1/q + δ > 0 [4]. Using also(19), we obtain

∥uh∥20,p,Ω =∫Ω(Jpuh)uhdx = I1 + I2;

I1 =∑

K∈T h

∑2i=1

∫K

∂uh,p

∂xi

∂uh

∂xidx+

∫Ωuh,puhdx,

I2 =∑

K∈T h

∫∂K

[(∇uh,p) · n](uh − uh)ds, (22)

where ds denotes the infinitesimal line element.By the Sobolev imbedding theorem ( [4, Theorem

1.4.4.1]), we have the continuous inclusions

W s,q(Ω) ⊆ Hs+1− 2q (Ω) ⊆ H1(Ω), (23)

since s + 1 − 2/q = min3 − 2/q, 3/2 + δ > 1. ThusI1 in (22) can be expressed by I1 = (∇uh,p,∇huh)Ω +(uh,p, uh)Ω, and is estimated as, for a generic constantC > 0,

|I1| ≤ C∥uh,p∥s,q,Ω(∥∇huh∥2Ω + ∥uh∥2Ω)12 . (24)

To estimate I2 in (22), we need some inverse inequal-ities and trace theorems to e and K along with Lemma4. To this end, recall the triangle condition in [3]: LetT0 be a fixed isosceles triangle with unit base length. Foreach h > 0 and each edge e of any K ∈ T h, there existsan isosceles triangle Te ⊂ K that is similar to T0 withthe similarity ratio |e| and whose base coincides with e.Let us first show a trace theorem related to K ∈ T h.

Lemma 6 Let e be an arbitrary edge of K ∈ T h, andv be an arbitrary element of W t,r(K) with

1 < r <∞, 1/r < t ≤ 1. (25)

Then the trace of v to e exists as an element of Lr(e)and satisfies, with C > 0 independent of h > 0 and v,

|v|r,e ≤ C(|e|−1r ∥v∥0,r,K + |e|t−

1r |v|t,r,K). (26)

Proof For a reference triangle T0 in the triangle condi-tion, whose base e0 has unit length. By the trace theoremfor T0 [1, 4], any v ∈ W t,r(T0) has a trace to e0 as anelement of Lr(e0), and satisfies for an appropriate C > 0

|v|r,e0 ≤ C (∥v∥0,r,T0 + |v|t,r,T0) .

Let us introduce a suitable similarity transformationfrom T0 to Te ⊂ K in the triangle condition. By relatingv to an appropriate v and using the scaling arguments,we can derive the desired results.

(QED)

We also need the following inverse inequalities.

Lemma 7 Let e be an arbitrary edge of K ∈ T h, andvh, vh be an arbitrary element of V h. For any p with2 < p < ∞, (vh|K)|e and vh|e can be regarded as ele-ments of Lp(e), and satisfy

|vh − vh|p,e ≤ C|e|1p−

12 |vh − vh|e, (27)

where C > 0 depends on p and the polynomial degree kbut is independent of h > 0, e and vh, vh.Proof Using some notations in the preceding proof, wefind for u ∈ P k(T0) and v ∈ P k(e0) (k ∈ N)

|u− v|p,e0 ≤ C|u− v|e0 (u = u|e0),

since u|e0 − v belongs to the finite-dimensional space

– 27 –

Page 33: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.25–28 Fumio Kikuchi et al.

P k(e0). Here, C > 0 depends on k but does not on uand v. Connecting u and v respectively with vh and vhby an appropriate similarity transformation between T0and Te, we have the desired estimation.

(QED)

In addition, let us define ξh ∈ (Qh)2 such that ξK =ξh|K for each K ∈ T h is given by

ξK ∈ P0(K)2, ξK(x) = 1|K|∫K∇uh,pdx (x ∈ K), (28)

where |K| is the measure of K. Then we find that [2]

∥ξh∥Ω ≤ ∥∇uh,p∥Ω, (29)

∥∇uh,p − ξK∥0,q,K ≤ Chs−1K |uh,p|s,q,K , (30)

where C > 0 is independent of h and K. Using the aboveξh, we split I2 into I3 + I4 with

I3 =∑

K∈T h

∫∂K

(ξh · n)(uh − uh) ds= (Rh(uh − Shuh), ξh)Ω (by (11)) ,

I4 =∑

K∈T h

∫∂K

[(∇uh,p − ξh) · n](uh − uh) ds. (31)

By (5), (14), (23) and (29), I3 is estimated as

|I3| ≤ C∥∇uh,p∥Ω · |uh, uh|h ≤ C∗∥uh,p∥s,q,Ω. (32)

Finally, let us estimate I4.

Lemma 8 For uh, uh ∈ V h and uh,p in (20), wehave

|I4| ≤ C∥uh,p∥s,q,Ω, (33)

where C > 0 is independent of h > 0 and uh, uh.Proof For α, β, γ ≥ 1 and the present uh, uh, define

Jh(α, β, γ) =∑

K∈T h

∑e∈EK |e|−

α2 |uh − uh|γβ,e.

By the Holder inequality, we have, with ηh = ∇uh,p−ξh,

|I4| ≤∑

K∈T h

∑e∈EK

∫e|ηh · n| · |uh − uh| ds

≤ (∑

K∈T h

∑e∈EK |e|

qp |ηh|qq,e)

1q Jh(2, p, p)

1p , (34)

where |ηh|q,e = (∑2

i=1 |ηhi |qq,e)1/q (ηh = ηh1 , ηh2 ). ByLemma 7, |uh − uh|pp,e ≤ Cp|e|1−p/2|uh − uh|pe, so that

Jh(2, p, p) ≤ CpJh(p, 2, p). (35)

It follows from |uh, uh|2h + ∥uh∥2Ω ≤ 1 and (5) that

Jh(2, 2, 2) =∑

K∈T h

∑e∈EK |e|−1|uh − uh|2e ≤ 1.

Then we have |e|−1/2|uh−uh|e ≤ 1, and hence, for p > 2,|e|−p/2|uh − uh|pe ≤ |e|−1|uh − uh|2e, which means that

Jh(p, 2, p) ≤ Jh(2, 2, 2) ≤ 1. (36)

On the other hand, by noting 1/q < s − 1 ≤ 1 andapplying Lemma 6 to ∇uh,p ∈W s−1,q(Ω)2, we find that

|ηh|q,e = |∇uh,p − ξK |q,e

≤ C(|e|−1q ∥∇uh,p − ξK∥0,q,K

+ |e|s−1− 1q |uh,p|s,q,K).

Then, by (30) and |e|/hK ≥ c for some c > 0, we obtain∑K∈T h

∑e∈EK |e|

qp |ηh|qq,e

≤ C∗∑K∈T h h

qp−1+q(s−1)

K |uh,p|qs,q,K

≤ C∗hqs−2∥uh,p∥qs,q,Ω (37)

since hK ≤ h and qs > 2, where C∗ > 0 is independentof h. By (34), (35), (36) and (37), we have (33).

(QED)

By (21), (24), (32), (33) and |uh, uh|2h + ∥uh∥2Ω ≤ 1,∥uh∥20,p,Ω = I1+I3+I4 in (22) is bounded from above by

C∥uh,p∥s,q,Ω ≤ C∗∥uh∥0,p,Ω (C,C∗ > 0). Thus ∥uh∥0,p,Ωfor each sufficiently large p < ∞ is uniformly boundedfor h > 0, so that we obtain the theorem below.

Theorem 9 The subfamily uhh>0 in Theorem 1 alsoconverges strongly to u0 in Lp(Ω) ( ∀p ∈ [1,∞[) as h ↓ 0.

5. Concluding remarks

We have proved the strong Lp convergence associatedwith the Rellich type discrete compactness for some dis-continuous Galerkin FEM. The results can be applied tojustification of numerical computations of various semi-linear problems by DGFEM. To give a firm foundation toDGFEM, we are also planning to show the discrete Korninequalities, which play essential roles in applicationsto solid mechanics and fluid dynamics [2, 8]. Moreover,our results on Lp boundedness are only “qualitative”since we have not shown, for example, the dependenceof ∥uh∥0,p,Ω on p. Such refined results may be requiredin certain cases, and we will continue our studies.

References

[1] D. N. Arnold, F. Brezzi, B. Cockburn and L. D. Marini, Uni-

fied analysis of discontinuous Galerkin methods for ellipticproblems, SIAM J. Numer. Anal., 39 (2002), 1749–1779.

[2] S. C. Brenner and L. R. Scott, The Mathematical Theory ofFinite Element Methods, 3rd ed., Springer, New York, 2008.

[3] F. Kikuchi, Rellich-type discrete compactness for some dis-continuous Galerkin FEM, Jpn J. Indust. Appl. Math., 29(2012), 269–288.

[4] P.Grisvard, Elliptic Problems in Nonsmooth Domains, SIAM,Philadelphia, 2011.

[5] P. G. Ciarlet, The Finite Element Method for Elliptic Prob-lems, 2nd ed., SIAM, Philadelphia, 2002.

[6] A. Buffa and C. Ortner, Compact embeddings of brokenSobolev spaces and applications, IMA J. Numer. Anal., 29(2009), 827–855.

[7] H. Brezis, Functional Analysis, Sobolev Spaces and Partial

Differential Equations, Springer, New York, 2011.[8] S. C. Brenner, Korn’s inequalities for piecewise H1 vector

fields, Math. Comp., 73 (2003), 1067–1087.

– 28 –

Page 34: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol.6 (2014) pp.29–32 c⃝2014 Japan Society for Industrial and Applied Mathematics J S I A MLetters

Shape derivative of cost function for singular point:

Evaluation by the generalized J integral

Hideyuki Azegami1, Kohji Ohtsuka2 and Masato Kimura3

1 Nagoya University, A4-2 (780) Furo-cho, Chikusa-ku, Nagoya 464-8601, Japan2 Hiroshima Kokusai Gakuin University, 6-20-1 Nakano, Aki-ku, Hiroshima 739-0321, Japan3 Kanazawa University, Kakuma, Kanazawa 920-1192, Japan

E-mail azegami is.nagoya-u.ac.jp

Received September 30, 2013, Accepted January 9, 2014

Abstract

This paper presents analytic solutions of the shape derivatives (Frechet derivatives with re-spect to domain variation) for singular points of cost functions in shape-optimization prob-lems for the domain in which the boundary value problem of a partial differential equation isdefined. A design variable is given by a domain mapping. Cost functions are defined as func-tionals of the design variable and the solution to the boundary value problem. The analyticsolutions for singular points such as crack tips and boundary points of the mixed boundaryconditions on a smooth boundary are obtained by using the generalized J integral.

Keywords calculus of variations, boundary value problem, shape optimization, generalizedJ integral, H1 gradient method

Research Activity Group Mathematical Design

1. Introduction

Determining the optimum shape of the domain inwhich a boundary value problem of a partial differentialequation is defined is called a shape-optimization prob-lem. One way to formulate this problem is to choosethe domain mapping as the design variable. Cost func-tions are defined as functionals of the design variable andthe solution to the boundary value problem. The shapederivatives, which are defined as the Frechet deriva-tives with respect to domain variation, of the cost func-tions can be evaluated assuming appropriate regularityin the boundary value problem. Solution using the shapederivative is presented in [1].On the other hand, in research of evaluating the sin-

gularity of a crack, the generalized J integral was pro-posed [2], and its relation to the shape derivative of acost function has been presented [3–5]. However, the an-alytic solution at the singular point has not been shown.The present paper is dedicated to obtaining the ana-

lytic solutions of the shape derivatives for singular pointssuch as crack tips and boundary points of the mixedboundary conditions on a smooth boundary by the useof the generalized J integral.

2. Set of design variable

Let Ω0 depicted in Fig. 1 be a two-dimensionalbounded domain, where the boundary ∂Ω0 consists ofDirichlet boundary ΓD0 ⊂ ∂Ω0 and Neumann boundaryΓN0 = ∂Ω0 \ ΓD0.For j ∈ ΘN = 1, . . . , |ΘN|, let xj0 (note, x10 is

hidden in Fig. 1) be corner points on ΓN0 having con-cave angles of α0j ∈ (π, 2π]. In the same manner, forj ∈ ΘD = |ΘN|+ 1, . . . , |ΘN|+ |ΘD|, let xj0 be corner

®1(Á)

x1(Á)

B(x1(Á),")

µ

(Á+')

(Á)

x'(x)

(i+')(x)

0

x20

®3(Á)

¡D0

¡D(Á)

x2(Á)

x40x4(Á)

x3(Á)

x30

Fig. 1. Varying 2-dimensional domain with corner points.

points inside of ΓD0 with α0j ∈ (π, 2π]. Moreover, forj ∈ ΘM = |ΘN|+ |ΘD|+ 1, . . . , |ΘN|+ |ΘD|+ |ΘM|,let xj0 be corner points on the boundary of the mixedboundary conditions having an opening angle of α0j ∈(π/2, 2π]. In the present paper, we call these points thesingular points, and define the set of their indexes asΘ = ΘN ∪ΘD ∪ΘM. The remaining part of the bound-ary is assumed to be sufficiently smooth.We define design variable in a shape optimization

problem by domain variation ϕ, with which a varieddomain is created by continuous one-to-one mappingi + ϕ : Ω0 → R2 as Ω (ϕ) = (i+ ϕ) (x) | x ∈ Ω0.The symbol i is used as the identity mapping inthe present paper. The notation ( · ) (ϕ) is used as (i+ ϕ) (x) | x ∈ ( · )0 for domains and boundaries. Tokeep continuous one-to-one mapping property, we definethe admissible set of ϕ as

D = ϕ ∈ Y | ∥ϕ∥Y < σ , (1)

where Y is defined by W 1,∞ (R2;R2), and σ > 0 is cho-

sen such that (i+ ϕ) is a bijection [4, Proposition 1.39].

– 29 –

Page 35: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.29–32 Hideyuki Azegami et al.

The domain of ϕ is extended to R2 by Calderon’s ex-tension theorem [6]. In the present paper, Y is used asthe Banach space for the perturbation φ of ϕ in orderto define the Frechet derivatives as shown later.

3. Main problem

For simplicity, we use the Poisson problem as the mainproblem. The solution to the main problem is called thestate variable in the shape optimization problem. Wedenote the outer unit normal by ν, and ∂ν = ν ·∇.

Problem 1 (Main problem) Let b : R2 → R be afunction not depending on ϕ that is sufficiently smooth.For a given ϕ ∈ D, find u (ϕ) : Ω (ϕ)→ R such that

−∆u (ϕ) = b in Ω(ϕ) ,

∂νu (ϕ) = 0 on ΓN (ϕ) ,

u (ϕ) = 0 on ΓD (ϕ) .

If b is given appropriately, the weak solution u (ϕ) toProblem 1 lies within U = H1 (Ω (ϕ) ;R). The domainof u (ϕ) can be extended to R2 by Calderon’s extensiontheorem. Moreover, in the present paper, we define theadmissible set of the state variable u (ϕ) by

S =W 1,2q(R2;R

)(2)

for some q > 2. In [1], u (ϕ) ∈ S is used as a necessarycondition in order to obtain the domain variation in Ywithout singular points by the H1 gradient method. Inthe present paper, we clarify the conditions for singularpoints in order that u (ϕ) is included in S.

4. Shape optimization problem

Using the design variable ϕ ∈ D and the state variableu = u (ϕ) ∈ S, we define cost functions as

fi (ϕ, u) =

∫Ω(ϕ)

ζi (ϕ, u,∇u) dx+ ci (3)

for i ∈ 0, 1, . . . ,m, where ζi and their derivatives aresufficiently smooth, and c0, . . . , cm are given constants.Among the m+1 cost functions, f0 is called an objectivefunction, and f1, . . . , fm are called constraint functions.Using the cost functions f0, . . . , fm, we define the

shape optimization problem as follows.

Problem 2 (Shape optimization) Let D and S begiven by (1) and (2), respectively, and f0, . . . , fm be asdefined in (3). Find Ω(ϕ) with ϕ such that

ϕ = arg minϕ∈D

f0 (ϕ, u) | fi (ϕ, u) ≤ 0

for i ∈ 1, . . . ,m , u ∈ S, Problem 1 .

5. Shape derivative of cost functions

For i ∈ 1, . . . ,m, the Frechet derivatives (we fol-low [4, Definition 1.8]) with respect to arbitrary domainvariation φ ∈ Y of fi are obtained in [1] as

⟨gi,φ⟩ =∫Ω(ϕ)

[∇u ·

(∇φT∇vi

)+∇vi ·

(∇φT∇u

)+ (ζiϕ (ϕ, u,∇u) + vi∇b) ·φ+ (ζi −∇u ·∇vi + bvi)∇ ·φ

]dx. (4)

Here, vi ∈ U is called the adjoint variable for fi, and isgiven as the weak solution of the following problem.

Problem 3 (Adjoint problem for fi) For a givenϕ ∈ D, let u be the solution to Problem 1 and ζi bethe function in (3). Find vi : Ω (ϕ)→ R such that

−∆vi (ϕ) = ζiu (ϕ, u,∇u) +∇ · ζi∇u (ϕ, u,∇u)

in Ω(ϕ) ,

∂νvi (ϕ) = 0 on ΓN (ϕ) ,

vi (ϕ) = 0 on ΓD (ϕ) .

In [1], it is shown that if u and vi are in S, gi belongsto Lq

(Ω(ϕ) ;R2

), and the domain variation obtained by

the H1 gradient method belongs to Y without singularpoints.In the present paper, we pay attention to the range of

the opening angles for singular points in order that u andvi are included in S, and obtain the analytic solutionsof the shape derivative at the crack tip and the bound-ary point of the mixed boundary conditions on smoothboundary.

6. Regularity of u and vi at corner

For the regularities of u and vi, the following resultshave been known [7–9].We suppose that ∂Ω(ϕ) \ xj (ϕ)j∈Θ is sufficiently

smooth in the following argument. If we let B (xj (ϕ) , ϵ)be the disc of radius ϵ centered at xj (ϕ), then u has theexpression for a point x−xj (ϕ) = reiθ ∈ B (xj (ϕ) , ϵ)∩Ω(ϕ) of

u(reiθ

)= kj (ϕ) r

παj(ϕ) cos

π

αj (ϕ)θ + uR for j ∈ ΘN,

(5)

u(reiθ

)= kj (ϕ) r

παj(ϕ) sin

π

αj (ϕ)θ + uR for j ∈ ΘD,

(6)

u(reiθ

)= kj (ϕ) r

π2αj(ϕ) sin

π

2αj (ϕ)θ + uR for j ∈ ΘM,

(7)

where kj (ϕ) are constants, and uR stands for the termin H2 (B (xj (ϕ) , ϵ) ∩ Ω (ϕ) ;R).A derivative of u = rωψ (θ) behaves as a finite

sum of functions rω−1ψ (θ), where ψ (θ) and ψ ∈C∞ ([0, αj ] ;R). The p-th power of rω−1ψ(θ) is integrablein B (xj (ϕ) , ϵ)∩Ω(ϕ) iff p (ω − 1)+1 > −1. This means

u ∈W 1,p(B (xj (ϕ) , ϵ) ∩ Ω(ϕ) ;R) for ω > 1−2

p. (8)

We now obtain the following.

Theorem 4 (Regularity of u and vi at corner)For j ∈ ΘN ∪ΘD, the weak solutions u and vi to Prob-lem 1 and Problem 3, respectively, come into lie withinS if αj (ϕ) ∈ (0, 2π). For j ∈ ΘM, the weak solutions uand vi come into lie within S if αj (ϕ) ∈ (0, π).

The case αj (ϕ) = 2π in j ∈ ΘN ∪ΘD corresponds tothe crack. The case αj (ϕ) = π in j ∈ ΘM correspondsto the boundary point of the mixed boundary conditionson smooth boundary, which we call the smooth mixed

– 30 –

Page 36: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.29–32 Hideyuki Azegami et al.

boundary. In the next section, we shall show how toevaluate the shape derivative gi in these cases.

7. Evaluation of gi by generalized J-

integral

To evaluate the shape derivative gi in the cases thatαj (ϕ) = 2π in j ∈ ΘN ∪ΘD and αj (ϕ) = π in j ∈ ΘM,we use the generalized J integral. The generalized J in-tegral is defined in terms of the solution to an ellipticboundary value problem and domain variation. Here,using the solution u = u (ϕ) ∈ U to Problem 1 anddomain variation φ ∈ Y , and following [3–5], we definethe generalized J integral as

J (Ω (ϕ) ,φ, u)

= P (∂Ω(ϕ) ,φ, u) + R (Ω (ϕ) ,φ, u) , (9)

where

P (∂Ω(ϕ) ,φ, u)

=

∫∂Ω(ϕ)

[1

2(∇u ·∇u)ν ·φ− ∂νu∇u ·φ

]dγ (10)

R (Ω (ϕ) ,φ, u)

= −∫Ω(ϕ)

[b∇u ·φ−∇u ·

(∇φT∇u

)+

1

2(∇u ·∇u)∇ ·φ

]dx. (11)

For J , the following properties have been obtained [3].

Theorem 5 (Properties of gen. J-integral) Forϕ ∈ D, let J (Ω (ϕ) ,φ, u) be defined in (9) with theweak solution u ∈ U to Problem 1 and domain variationφ ∈ Y . For all φ ∈ Y , the following hold.

(1) R (Ω (ϕ) ,φ, u) has finite value for u ∈ U .

(2) For a Lipschitz domain Σ ⊂ R2, if u|Σ∩Ω(ϕ) is of

class H2, then

J (Σ ∩ Ω(ϕ) ,φ, u) = 0 (12)

holds.

(3) Let Σ ⊂ R2 be separated into Σ1 and Σ2 such thatΣ1 ∩ Σ2 = ∅ and Σ = Σ1 ∪ Σ2. If u is of class H2

on neighborhood of ∂Σ1 and ∂Σ2, then

J (Σ ∩ Ω(ϕ) ,φ, u) = J (Σ1 ∩ Ω(ϕ) ,φ, u)

+ J (Σ2 ∩ Ω(ϕ) ,φ, u)(13)

holds.

Let us rewrite gi using the properties in Theorem 5.The partial Frechet derivatives of P and R with respectto arbitrary variation vi ∈ U of u can be written as

−Pu (∂Ω (ϕ) ,φ, u) [vi]

=

∫∂Ω(ϕ)

[(∇u ·∇vi)ν ·φ− ∂νu∇vi ·φ

− ∂νvi∇u ·φ]dγ (14)

Ru (Ω (ϕ) ,φ, u) [vi]

= −∫Ω(ϕ)

[b∇vi ·φ−∇u ·

(∇φT∇vi

)

(Á)

¡N(Á) ¡D(Á)

®j(Á)º

B(xj(Á),")

Fig. 2. Path for the boundary integral of Pu.

−∇vi ·(∇φT∇u

)+ (∇u ·∇vi)∇ ·φ

]dx. (15)

Here, by comparing (4) and (15), we have

⟨gi,φ⟩ = Ru (Ω (ϕ) ,φ, u) [vi] + ⟨giR,φ⟩ , (16)

where

⟨giR,φ⟩ =∫∂Ω(ϕ)

bviν ·φ dγ

+

∫Ω(ϕ)

(ζiϕ (ϕ, u,∇u) ·φ+ ζi∇ ·φ) dx.

(17)

Moreover, denoting the ϵ-neighborhood of the singularpoints by BΘ =

∪j∈ΘB (xi (ϕ) , ϵ), separating Ω (ϕ)

into Ω (ϕ) \BΘ and Ω (ϕ)∩BΘ, and applying the prop-erties of (ii) and (iii) in Theorem 5, we have

Ru (Ω (ϕ) ,φ, u) [vi]

= −Pu (∂ (Ω (ϕ) \BΘ) ,φ, u) [vi]

+∑j∈Θ

Ru (B (xi (ϕ) , ϵ) ∩ Ω(ϕ) ,φ, u) [vi] . (18)

The dotted line in Fig. 2 shows the path for the boundaryintegral of Pu around the boundary point of the mixedboundary conditions on the smooth boundary (αj (ϕ) =π). Here, when ϵ → 0, the second term on the right-hand side of (18) converges to 0. The first term on theright-hand side of (18) can be written as

−Pu (∂Ω(ϕ) ,φ, u) [vi] +∑j∈Θ

⟨gij ,φ

⟩, (19)

where⟨gij ,φ

⟩= lim

ϵ→0−∫ α

0

(∇u ·∇vi)ν ·φ

− ∂νu∇vi ·φ− ∂νvi∇u ·φϵdθ.

(20)

Hence, if the right-hand side of (20) converges, we have

⟨gi,φ⟩ =−Pu (∂Ω(ϕ) ,φ, u) [vi] +∑j∈Θ

⟨gij ,φ

⟩+ ⟨giR,φ⟩ . (21)

8. gi at crack tip and smooth mixed

boundary

Based on the result in (21), we show the analytic so-lutions of gij in two cases as follows.One case is that of a crack tip on ΓN (ϕ)∪ΓD (ϕ), i.e.,

– 31 –

Page 37: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.29–32 Hideyuki Azegami et al.

xj of αj (ϕ) = 2π for j ∈ ΘN ∪ ΘD. In a neighborhoodof the point, we have the solution u to Problem 1 by (5)and the solution vi to Problem 3 by

vi(reiθ

)= lij (ϕ) r

παj(ϕ) cos

π

αj (ϕ)θ + viR, (22)

were lij (ϕ) is a constant, and viR is the term inH2 (B (xj (ϕ) , ϵ) ∩ Ω(ϕ)). In the following, we neglectthe regular terms of uR and viR by taking a sufficientlysmall ϵ. Here, putting r = ϵ, αj (ϕ) = 2π, and calculat-ing the derivatives of (5) and (22), we have

∇u =

(cos θ ∂

∂r −sin θr

∂∂θ

sin θ ∂∂r + cos θ

r∂∂θ

)u =

kj

2ϵ12

(cos(θ2

)sin(θ2

)) , (23)

∇vi =lij

2ϵ12

(cos(θ2

)sin(θ2

)) . (24)

From these results, we have

∇u ·∇vi =kj lij4ϵ

. (25)

Then, for all φ = (φ1, φ2)T ∈ R2,

−∫ 2π

0

(∇u ·∇vi)ν ·φ ϵdθ

=

∫ 2π

0

kj lij4

(φ1 cos θ + φ2 sin θ) dθ = 0 (26)

holds. Moreover, we have

∂νu = ν ·∇u =kj

2ϵ12

(− cos θ− sin θ

)·(cos(θ2

)sin(θ2

))= − kj

2ϵ12

cos

2

), (27)

∂νu∇vi = −kj lij4ϵ

cos

2

)(cos(θ2

)sin(θ2

)) . (28)

Then, for all φ = (φ1, φ2)T ∈ R2,∫ 2π

0

∂νu∇vi ·φ ϵdθ

=

∫ 2π

0

∂νvi∇u ·φ ϵ dθ = −kj lij4

(π0

)·(φ1

φ2

)(29)

holds. From these results, the analytic solution at thecrack tip can be obtained by⟨

gij ,φ⟩= −kj lij

2

(π0

)·(φ1

φ2

). (30)

We can confirm that gij points to the crack plane.The other one is the case of the boundary point of the

mixed boundary conditions on smooth boundary, i.e.,xj of αj (ϕ) = π for j ∈ ΘM. In a neighborhood ofthe point, using (7) for u

(reiθ

), and (22) for vi

(reiθ

)in

which cos and αj (ϕ) are replaced by sin and 2αj (ϕ),we use ∫ π

0

kj lij4

(φ1 cos θ + φ2 sin θ) dθ =kj lij2

φ2

instead of (26). Moreover, we have∫ π

0

∂νu∇vi ·φ ϵdθ =∫ π

0

∂νvi∇u ·φ ϵdθ

=kj lij8

(π−2

)·(φ1

φ2

)instead of (29). Then, the analytic solution at the bound-ary point can be obtained as⟨

gij ,φ⟩=kj lij4

(π0

)·(φ1

φ2

). (31)

From the equations above, we have the following.

Theorem 6 (gi at crack tip and s. mixed bound.)If αj (ϕ) = 2π at xj (ϕ) for j ∈ ΘN ∪ ΘD, gi defined

by (4) is given by (21), where gij is given by (30). Ifαj (ϕ) = π at xj (ϕ) for j ∈ ΘM, gi defined by (4) isgiven by (21), where gij is given by (31).

From the calculation above, it becomes clear that theshape derivative is not evaluated in the case of an open-ing angle greater than π for the boundary point of themixed boundary conditions, because gij →∞ as ϵ→ 0.

9. Conclusions

In the present paper, we showed the following.

(1) If the assumption in Theorem 4 is satisfied, thenthe solutions to the main problem and the adjointproblem are included in the admissible set of statevariable S in (2).

(2) The shape derivatives at the crack tip and theboundary point of the mixed boundary conditionson a smooth boundary are obtained as stated inTheorem 6.

Acknowledgments

The present study was supported by JSPS KAKENHI(23540258-1).

References

[1] H. Azegami, Regularized solution to shape optimization prob-lem (in Japanese), Trans. JSIAM, 24 (2014), 83–138.

[2] K. Ohtsuka, Generalized J-integral and its applications I –Basic theory–, Japan J. Appl. Math., 2 (1985), 329–350.

[3] K. Ohtsuka and A. Khludnev, Generalized J-integral methodfor sensitivity analysis of static shape design, Control and Cy-

bernetics, 29 (2000), 513–533.[4] M. Kimura, Shape derivative of minimum potential energy:

abstract theory and applications, in: Proc of Jindrich NecasCenter for Mathematical Modeling Lecture notes Volume IV,

Topics in Mathematical Modeling, M. Benes and E. Feireisleds., pp. 1–38, Matfyzpress, Prague, 2008.

[5] M.Kimura and I.Wakano, Shape derivative of potential energyand energy release rate in fracture mechanics, J. Math-for-

Industry, 3 (2011), 21–31.[6] R. A. Adams and J. J. F. Fournier, Sobolev Spaces, 2nd ed.,

Academic Press, Amsterdam, 2003.[7] R. S. Lehman, Developments at an analytic corner of solu-

tions of elliptic partial differential equations, J. Math. Mech.,8 (1959), 727–760.

[8] G. Strang and G. J. Fix, An Analysis of the Finite Element

Method, Prentice-Hall, Englewood Cliffs, N.J., 1973.[9] P.Grisvard, Elliptic Problems in Nonsmooth Domains, Pitman

Advanced Pub. Program, Boston, 1985.

– 32 –

Page 38: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol.6 (2014) pp.33–36 c⃝2014 Japan Society for Industrial and Applied Mathematics J S I A MLetters

On ramifications of Artin-Schreier extensions of surfaces

over algebraically closed fields of positive characteristic I

Masao Oi1

1 Department of Mathematics, Kyoto University, Oiwake-cho, Sakyo-ku, Kyoto 606-8502, Japan

E-mail ooimasao math.kyoto-u.ac.jp

Received September 25, 2013, Accepted December 26, 2013

Abstract

We give an algorithm which computes r, defined by K. Kato in the paper [1], which is animportant invariant for Artin-Schreier extensions of surfaces X over fields of positive char-acteristic. The Swan conductor gives the invariant of ramifications concerning codimension 1subvarieties of X. This r gives the invariant of ramifications concerning codimension 2 sub-varieties of X. The invariant r is important to calculate the Euler Poincare characteristic ofsome smooth l-adic sheaf of rank 1 on an open dense subscheme U of X.

Keywords Swan conductor, ramification, surface, Artin-Schreier extension

Research Activity Group Algorithmic Number Theory and Its Applications

1. Introduction

For a perfect field F of characteristic p > 0, a smoothsurface X over Spec(F ), and an open subscheme U ofX such that D := (X\U)red is a simple normal crossingdivisor, we consider an Artin-Schreier extension of X,namely a degree p etale covering of U . Such an exten-sion is characterized by the character χ : π1(U) → Ql

of exact order p, where l is a prime not equal to p. Theramification of χ is understood by the swan conductorSw(χ), but it is not enough to understand the ramifi-cation of the Artin-Schreier extension. Kato defined animportant invariant r depending on a closed point x ofX, and χ. We review Kato’s definition of rx in [1, Re-mark 5.7], and we define r′x which satisfies rx ≥ r′x ≥ 0and rx = r′x for “almost all” extensions. The purposeof this paper is to give an algorithm to compute r′x inthe case X = Spec(F [x, y]). Here F is an algebraicallyclosed field of characteristic p > 0.The Euler Poincare characteristic of a smooth l-adic

sheaf Fχ is calculated by rx, where Fχ is the rank 1 sheafon Uet corresponding to χ. This is written in Section 4.I would like to thank the referee for useful comments.

2. Review of the Swan conductor of a dis-

crete valuation ring

2.1 Basic notation

Let F be an algebraically closed field of characteristicp > 0. LetX be a smooth surface over Spec(F ). LetD bea simple normal crossing divisor onX. Let covp : Y → Xbe a morphism of schemes of degree p, which is etale overU := X −D. Let K be the function field of X, and K ′

be the function field of Y . By the Artin-Schreier theory,there exists f ∈ K such that K ′ = K(α), where α is asolution of the equation.

αp − α = f. (1)

2.2 Review of the Swan conductor of an imperfectresidue field

Let L be a discrete valuation field, whose residue fieldF ′ is a field of characteristic p > 0. Let OL be the integerring of L. Let vL be the normalized additive valuationof L. We define the Swan conductor of an Artin-Schreierextension L′/L as follows. We may assume L′ = L(α),where αp − α = f ∈ L. Then we define

Sw(L′/L)

:= minmax−vL(g) | g ≡ f mod B(L), 0. (2)

Here we put B(L) := xp − x | x ∈ L. We also use thenotation Sw(f), SwL(f) or Sw(χ) instead of Sw(L′/L),where χ is a non-trivial character of Gal(L′/L).Moreover, let X, Y , K, and K ′ be as in Section 2.1,

and D be an irreducible codimension 1 subvariety of X.Then, the normalized additive valuation of K is denotedby vD. For a non-trivial character χ of Gal(K ′/K), wedefine SwD(χ) as above, and define Sw(χ) by

Sw(χ) =∑D⊂X

SwD(χ) ·D, (3)

where D runs through all irreducible codimension 1 sub-varieties of X.We give a concrete description of Sw(f) next.

2.3 A concrete description of the Swan conductor of adiscrete valuation field

Replacing f by g such that g ≡ f mod B(L), we mayassume that one of the following (i), (ii) or (iii) holds:

(i) f ∈ OL.

(ii) vL(f) = −n such that n > 0 and (n, p) = 1.

(iii) f = at−n where a ∈ O×L , n > 0, p | n, t is a prime

element of L, and the residue class of a in F ′ doesnot belong to F ′p := xp | x ∈ F ′.

In the case (i), we see Sw(f) = 0. In the case (ii) or (iii),we see Sw(f) = n.

– 33 –

Page 39: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.33–36 Masao Oi

2.4 The definition of cleannessLet X, F , D, and Y be as in Section 2.1. The defi-

nition of cleanness is written in (3.6) of [1]. We definethe cleanness at the closed point x ∈ D following Kato’spaper [1]. We say (X,D) is “Case (I)” at x if D hasone irreducible component D′

1 at x, and we say (X,D)is “Case (II)” at x if D has two irreducible componentsD′

1 and D′2 at x. From now on, we always assume there

exists t1 ∈ OX,x such that t1 is a generator of the defin-ing ideal of D in OX,x in Case (I), and there exist t1,t2 ∈ OX,x such that t1t2 is a generator of the definingideal of D in OX,x in Case (II).Case (I): We say (1) is clean at x ∈ D if and only if

there exists g ∈ K such that g − f ∈ B(K) and eitherof the following holds:

(I-1) g ∈ OX,x.(I-2) g = u/tn1 (u ∈ O×

X,x, n ≥ 1, gcd(n, p) = 1).(I-3) g = t/tn1 (n ≥ 1, t ∈ OX,x and (t1, t) is the maxi-

mal ideal).

Case (II): We say (1) is clean at x ∈ D if and only ifeither of the following holds:

(II-1) g ∈ OX,x.(II-2) g=u/ta1t

b2,(u ∈ O×

X,x, a, b≥ 1and gcd(a, b, p)=1).

Here t1t2 ∈ OX,x is a generator of the defining idealof D in OX,x.

3. Review of the definition of r

Let X, F , D, x ∈ D, and f be as in Section 2.4. Thenr(X,D)(x, f) is inductively defined as follows. If (1) isclean at x, r(X,D)(x, f) := 0. Otherwise,

r(X,D)(x, f) := µ+∑

x′∈pr−1(x)

r(X′,pr−1(D))(x′, f), (4)

where X ′ is the blow up of X at x, and pr : X ′ → Xis the natural map. Here we put µ := e(e − 1) (resp.µ := e2) in Case (I) (resp. in Case (II)) and

e := SwD′1(f)− Swpr−1(x)(f) in Case (I),

e := SwD′1(f) + SwD′

2(f)− Swpr−1(x)(f) in Case (II).

It is known r(X,D)(x, f) is well defined (by finite steps)and finite.

3.1 The definition of r′ (Special case)Throughout this section, we always assume X =

Spec(F [x, y]), D := D1 or D2, and pt = (0, 0), where

D1 :=(x, y) ∈ F 2 | x = 0

,

D2 :=(x, y) ∈ F 2 | x = 0 or y = 0

.

Let X ′ be the blow-up of X at pt = (0, 0). Then X ′ havetwo open coverings U1 and U2, where U1 = Spec(F [y, w])and U2 = Spec(F [x, s]). In U1 ∩ U2, x = yw and y = xsare satisfied.Now r′(X,D)(pt, f) is inductively defined as follows. It

is known that r′ is well defined (by finite steps) andfinite.If (1) is clean at x, r′(X,D)(pt, f) := 0. Otherwise,

r′(X,D)(pt, f) := µ+∑i=1,2

r′(Ui,pr−1(D)∩Ui)(pti, f).

Here pt1 = (0, 0) ∈ U1 in the coordinate (x,w), andpt2 = (0, 0) ∈ U2 in the coordinate (y, s). Note thatUi∼= X as schemes for i = 1, 2, and

pr−1(Di) ∩ Uj∼= D1, i = 1, j = 2,

pr−1(Di) ∩ Uj∼= D2, otherwise.

Note that we only treat the case X = Spec(F [x, y]),but we can easily generalize the definition of r′ for anysmooth surface X.

3.2 An algorithm to compute r′ (I )

For simplicity, we may assume thatX = Spec(F [x, y]),D := D1 or D2, and pt = (0, 0), and f = c0y

m/xn +c1y

m+b1/xn+a1 + · · ·+ ckym+bk/xn+ak for some natural

number k and ci ∈ F× (i = 0, 1, 2, . . . , k). Throughoutthis paper, we assume a0 = b0 = 0. We may assume(ai, bi) = (aj , bj) for 0 ≤ i = j ≤ k. From now on, wealways assume p ∤ gcd(n+ ai,m+ bi) for any i.

Theorem 1 Let X, D, pt, and f be as above. Thenr′(X,Di)

(pt, f) depends only on n,m, a1, b1, . . . , ak, bk.

Proof The function f is written as

f =k∑

i=0

ciym−n+bi−ai

wn+aiin Case (I),

f =

k∑i=0

cism+bi

xn−m+ai−biin Case (II).

In this blow-up, e is written as

e = max0≤i≤k

n+ ai, 0+ max0≤i≤k

−m− bi, 0

− max0≤i≤k

n−m+ ai − bi, 0.

From the definition of r′, we see

r′D(X, pt, f) = r′D2(X ′, pt1, f) + r′D(X ′, pt2, f) + µ. (5)

It is easy to see that µ depends only on n,m, a1,b1, . . . , ak, bk. By the induction (explained in detail inthe next section), we see r′ depends only on n,m,a1, b1, . . . , ak, bk. This completes the proof of the the-orem. It allows us to write r′Di

(n,m; a1, b1, . . . , ak, bk) orr′Di

(n,m; (ai, bi)1≤i≤k) instead of r′(X,Di)(pt, f).

(QED)

4. Some mathematical properties of r′

and an algorithm to compute r′

In this section, we give a theorem and an algorithm tocompute r′. The difference between r and r′ is describedin Section 4.3.

4.1 Some concrete example for r′

Theorem 2 In Case (I), r′D1(n,m) = max(m −

1), 0n. In Case (II), if nm > 0, then r′D2(n,m) = nm,

and if nm ≤ 0, then r′D2(n,m) = 0.

Proof We prove this theorem by induction on |n|+|m|.At first we treat case (I). As in the proof of Theorem 1,we obtain r′D1

(n,m) = r′D2(n,m−n)+ r′D1

(n−m,m)+e(e−1). If n > m ≥ 1, the right hand side of this equalityis (n−m)(m−1)+m(m−1) = n(m−1) by the induction

– 34 –

Page 40: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.33–36 Masao Oi

hypothesis, and if m ≥ n ≥ 1 and the right hand sideof this equality is n(m − n) + n(n − 1) = n(m − 1) bythe induction hypothesis. Next, we treat case (II). By theabove subsection, we obtain r′D2

(n,m) = r′D2(n,m−n)+

r′D2(n −m,m) + e2. We see that the right hand side of

this equality is (n−m)m+m2 = nm if n > m ≥ 1, andif m ≥ n ≥ 1, that the right hand side of this equality isn(m− n) + n2 = nm. The proof for mn ≤ 0 is similar.

(QED)

4.2 An algorithm to compute r′ (II )

The following theorem is essential to compute r′.

Theorem 3 If aj ≥ ai, and bj ≤ bi (i = j), then wehave

r′D(n,m; (ai′ , bi′)1≤i′≤k) = r′D(n,m; (ai′ , bi′)1≤i′( =i)≤k).

Proof We assume that f ∈ ym′/xn

′O×

X,pt + ym′+b′1

/xn′+a′

1O×X,pt + · · · + ym

′+a′l/xn

′+a′lO×

X,pt for some nat-ural number l. We see that r′(X,D)(pt, f) = r′(n′,m′;

a′1, b′1, . . . , a

′l, b

′l) by the equality (5) and that O×

X,pt ⊂O×

Ui,ptifor i = 1, 2. Note that if aj ≥ ai, and bj ≤ bi,

then we have

cjym′+b′j/xn

′+a′jO×

X,pt + ciym′+b′i/xn

′+a′iO×

X,pt

∈ ym′+b′j/xn

′+a′jO×

X,pt.

This completes the proof of the theorem.(QED)

We denote

Ja :=

j

∣∣∣∣ bj < maxj+1≤i≤k

bi,

Jb :=

j

∣∣∣∣ aj > max0≤i≤j−1

ai.

Corollary 4 If 0 < a1 < · · · < ak, then

r′Di(n,m; (aj , bj)1≤j≤k)

= r′Di(n,m, (aj , bj)j∈Ja), and if bj1 ≤ 0,

r′Di(n,m; (aj , bj)1≤j≤k)

= r′Di(n− aj1 ,m− bj1 ; (aj − aj1 , bj − bj1)j(=j1)∈Ja

).

Here we put j1 = min Ja. If 0 < b1 < · · · < bk, then

r′Di(n,m; (aj , bj)1≤j≤k) = r′Di

(n,m, (aj , bj)j∈Jb).

Similar statements hold for r. From Theorem 3, we haveto compute r′ when 0 < a1 < · · · < ak and 0 < b1 <· · · < bk.We call k the length of r′(n,m; a1, b1, . . . , ak, bk), and

we call #Ja (or #Ja−1 in the case bj1 ≤ 0) the essentiallength of r′(n,m; a1, b1, . . . , ak, bk). We call a1+b1+· · ·+ak + bk the depth of r′, and we call

depthess(r′) :=

∑j∈Ja

(aj + bj)

the essential depth of r′ if bj1 > 0. We call

depthess(r′) :=

∑bj1 =j∈Ja

(aj − aj1 + bj − bj1)

the essential depth of r′ if bj1 ≤ 0. We describe an al-gorithm to compute r′(X,D)(n,m, a1, b1, . . . , ak, bk). It iscalculated by the induction on the pair of length and es-sential depth, where the order of the pair is determinedby the dictionary order.

(Step 1) By Theorem 3, we may assume that 0 < a1 <· · · < ak and 0 < b1 < · · · < bk.

(Step 2) Using the definition of r′, to compute r′(n,m;a1, b1, . . . , ak, bk), we need to calculate the values of

(1) r′(n−m,m; a1 − b1, b1, . . . , ak − bk, bk),(2) r′(n,m− n; a1, b1 − a1, . . . , ak, bk − ak).

(Step 3) Note that the essential lengths of (1) and (2)are less than or equal to k.

(Case 3-1) If the essential lengths of (1) and (2) are lessthan k, then use the induction on k.

(Case 3-2) If the essential length of (1) or (2) is k, thenall ai− bi (i = 1, 2, . . . , k) have the same sign. Thenwe use the induction on the essential depth of r′.

4.3 Difference between r and r′

Let X be a smooth surface, and X ′ be the blow-up ofX at the closed point x ∈ X. Note that pr−1(x) ≃ P1

F ,where pr : X ′ → X is the projection. We describe whenr and r′ differ. For simplicity let X = Spec(F [x, y]),D := D1 or D2, pt = (0, 0) and

f ∈ 1

xn1yn2

(a∑

w=0

dwxwya−w +ma+1

)(6)

for some non-negative integers n1, n2. Here m = (x, y)is the maximal ideal of OX,pt.

Theorem 5 Let assumptions be as above. Then (X ′,pr−1(D), f) is not clean at the point of pr−1(pt) otherthan (0, 1) in Case (I), (other than (0, 1) and (1, 0) inCase (II)) if and only if h(x, y) :=

∑aw=0 dwx

wya−w hasa double root at that point. More precisely, the roots ofh(x, y) correspond one-to-one with the points of P1

F . Thedouble roots of h correspond one-to-one with the non-clean points of P1

F .

Proof This is merely the definition of cleanness.(QED)

But this difficulty is easily solved by the coordinatetransformation (the translation).

Theorem 6 For simplicity, let X and f be as in Sec-tion 3.1. We assume D = D2. Let Xs → Xs−1 →· · · → X0 = X be a successive blow up of X at theclosed point of Case (II). If Xs is a clean model, thenr(X,D)(pt, f) = r′X,D(pt, f).

Proof This theorem is a consequence of Theorem 5.(QED)

4.4 Application

LetX,F,D, Y , and χ be as in Section 2. Let Fχ be thesmooth sheaf on Uet corresponding to χ. Let χc(U,F) bethe (compact support etale cohomological) Euler num-ber of the sheaf F on Uet. By the calculation of r, weunderstood explicitly the formula (5.7.1) of [1]. Moreprecisely, we obtain the following equality (c.f. [2]).

– 35 –

Page 41: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.33–36 Masao Oi

Theorem 7 (Saito [2]) Let X be a smooth propersurface over Spec(F ). Then

χc(U,Fχ)− χc(U) = −degcFχ . (7)

Here cFχ ∈ CH0(X) is defined by

cFχ :=∑x∈X

rD(x, χ) · x−(Sw(χ), Sw(χ) +Klog

X

). (8)

Note that KlogX is the log canonical divisor of (X,D),

and that (, ) is the intersection pairing. As is well known,χc(U,F) is defined by

χc(U,F) =∞∑i=0

(−1)iHic(Uet,F). (9)

We simply write χc(U) for χc(U,Ql).

4.5 Further topics

Further discussion is given in [3]. In that paper, wetreat the general smooth surface X, and give a mathe-matical description of r.

References

[1] K. Kato, Class field theory, D-modules, and ramification onhigher-dimensional schemes, Part I, Amer. J. Math., 116(1994), 757–784.

[2] T. Saito, The Euler numbers of ℓ-adic sheaves of rank 1 in

positive characteristic, in: Proc of ICM-90 Satellite Confer-ence, A. Fujiki et al. eds., Algebraic Geometry and AnalyticGeometry, pp. 165–181, Springer-Verlag, Tokyo, 1991.

[3] M. Oi, On ramifications of Artin-Schreier extensions of sur-

faces over algebraically closed fields of positive characteristicII, in preparation.

– 36 –

Page 42: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol.6 (2014) pp.37–39 c⃝2014 Japan Society for Industrial and Applied Mathematics J S I A MLetters

Scalar multiplication for twisted Edwards curves

using the extended double-base number system

Yasunori Mineo1 and Shigenori Uchiyama1

1 Tokyo Metropolitan University, 1-1 Minami-Osawa, Hachioji, Tokyo 192-0397, Japan

E-mail mineo-yasunori ed.tmu.ac.jp

Received December 13, 2013, Accepted January 17, 2014

Abstract

This paper analyzes the problem of speeding up single-scalar multiplication of a recentlyintroduced type of elliptic curve, so-called “twisted Edwards curve”, and also presents a newconstruction of addition chains using the extended double-base number system. Our methoduses the Fibonacci sequence. It was found through numerical investigation that our double-base chains can save time, compared with other methods in previous work.

Keywords twisted Edwards curves, extended double-base number system, Fibonacci se-quence, single-scalar multiplication

Research Activity Group Algorithmic Number Theory and Its Applications

1. Introduction

In [1], Edwards introduced a new normal form for el-liptic curves, now known as Edwards curves, for whichthe addition law is efficient. In [2], Doche and Imbertintroduced a new number system called the extendeddouble-base number system. The idea is to expand apositive integer n as a sum

∑i di2

ai3bi of as few termsas possible, with di or −di which is chosen from a coef-ficient set S larger than 1, and with the restrictionsa1 ≥ a2 ≥ · · · and b1 ≥ b2 ≥ · · · . Then, one can ex-press a scalar multiple [n]P as a sum

∑i[di2

ai3bi ]P ofvery few points. In [3], Bernstein et al. analyzed the bestspeeds that can be obtained for single-scalar multiplica-tion with various elliptic curves by using the extendeddouble-base number system.In this paper, we analyze the best speeds with twisted

Edwards curves, introduced in [4], using the conventionalcoefficient set S, as well as another one previously unseenin the literature. Our coefficient set includes a subset ofthe Fibonacci sequence. By using our new double-basechains, we can speed up for single-scalar multiplication.The plan of the paper is as follows. In Section 2, we

recall the definition of the twisted Edwards curves, andof three coordinate systems on these curves [4,5]: projec-tive twisted Edwards; inverted twisted Edwards; and ex-tended twisted Edwards. We also show new tripling for-mulas that are needed in making up double-base chains.In Section 3, we review the extended double-base num-ber system, and present a new choice of S. Our experi-ments and results are described in Section 4 before con-cluding with Section 5.

2. Twisted Edwards curves

Definition 1 ([4]) Let k be a field of odd character-istic, and a, d ∈ k with ad(a − d) = 0. The twistedEdwards curve with coefficients a and d is the curveEE,a,d : ax2 + y2 = 1 + dx2y2.

Let (0, 1) be the neutral element of the group. Then theinversion of P = (x1, y1) is written by (−x1, y1).

2.1 Addition lawLet P = (x1, y1), Q = (x2, y2) be points on the twisted

Edwards curve EE,a,d. The sum of these points on EE,a,d

is

P +Q =

(x1y2 + y1x2

1 + dx1x2y1y2,y1y2 − ax1x21− dx1x2y1y2

).

This formula also works for doubling, i.e., P = Q. Thenwe can obtain tripling formulas. One can triple a pointby first doubling it and then adding the result to itselfby applying the curve equation as in doubling. (x3, y3) =[3](x1, y1), with

x3 =(ax21 + y21)

2 − (2y1)2

4a(ax21 − 1)x21 − (ax21 − y21)2x1,

y3 =(ax21 + y21)

2 − a(2x1)2

−4(y21 − 1)y21 + (ax21 − y21)2y1.

2.2 CoordinatesBernstein and Lange gave efficient formulas for the

group operations. They introduced projective coordi-nates and inverted coordinates in [4]. In [5], Hisil etal. proposed a new system called extended twisted Ed-wards coordinates. We review these coordinates andshow tripling algorithms.

2.3 Projective twisted Edwards coordinatesTo avoid inversions, Bernstein and Lange work on the

projective twisted Edwards curve

(aX2 + Y 2)Z2 = Z4 + dX2Y 2. (1)

For Z1 = 0, the homogeneous point (X1 : Y1 : Z1) rep-resents the affine point (X1/Z1, Y1/Z1) on EE,a,d.

2.4 Inverted twisted Edwards coordinatesAnother way to avoid inversions is using a point (X1 :

Y1 : Z1), with X1Y1Z1 = 0 to represent the affine point

– 37 –

Page 43: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.37–39 Yasunori Mineo et al.

(Z1/X1, Z1/Y1) on EE,a,d.

2.5 Extended twisted Edwards coordinates

Hisil et al. proposed using a point (X1 : Y1 : T1 :Z1), with Z1 = 0 which satisfies (1) and corresponds tothe extended affine point (X1/Z1, Y1/Z1, T1/Z1). Here,T1 = X1Y1/Z1. Next, we show new tripling algorithmsfor these coordinates. Here,M is a field multiplication, Sis a field squaring, and D is a multiplication by a or d.

2.6 Tripling in inverted twisted Edwards coordinates

The following sets of formulas compute (X3 : Y3 :Z3) = [3](X1 : Y1 : Z1). The first one costs 9M + 4S +2D, while the second needs 7M + 7S + 2D. Here are9M+ 4S+ 2D formulas for tripling:

A← X21 , B ← aY 2

1 , C ← Z21 , D ← A+B,

E ← 4(D − d · C), H ← 2D · (B −A),P ← D2 −A · E, Q← D2 −B · E,X3 ← (H +Q) ·Q ·X1, Y3 ← (H − P ) · P · Y1,Z3 ← P ·Q · Z1.

Here are 7M+ 7S+ 2D formulas for tripling:

A← X21 , B ← aY 2

1 , C ← Z21 , D ← A+B,

E ← 4(D − d · C), H ← 2D · (B −A),P ← D2 −A · E, Q← D2 −B · E,X3 ← (H +Q) · [(Q+X1)

2 −Q2 −A],Y3 ← 2(H − P ) · P · Y1,Z3 ← P · [(Q+ Z1)

2 −Q2 − C].

2.7 Tripling in extended twisted Edwards coordinates

The following sets of formulas compute (X3 : Y3 :T3 : Z3) = [3](X1 : Y1 : T1 : Z1). The first one costs11M+4S+1D, while the second needs 9M+7S+1D.Here are 11M+ 4S+ 1D formulas for tripling:

A← aX21 , B ← Y 2

1 , C ← (2Z1)2, D ← A+B,

E ← D2, F ← 2D · (A−B), G← E −B · C,H ← E −A · C, I ← F +H, J ← F −G,X3 ← G · J ·X1, Y3 ← H · I · Y1,T3 ← G ·H · T1, Z3 ← I · J · Z1.

Here are 9M+ 7S+ 1D formulas for tripling:

A← aX21 , B ← Y 2

1 , C ← Z21 , D ← A+B,

E ← D2, F ← 2D · (A−B), K ← 4C,L← E −B ·K, M ← E −A ·K, N ← F +M,O ← N2, P ← F − L, X3 ← 2L · P ·X1,Y3 ←M · [(N + Y1)

2 −O −B], T3 ← 2L ·M · T1,Z3 ← P · [(N + Z1)

2 −O − C].

When one computes tripling in projective twisted Ed-wards coordinates, one can compute in extended twistedEdwards coordinates by simply ignoring T . The costs inprojective coordinates can be reduced 2M rather thanthe costs of using extended coordinates. By using triplingformulas, we can reduce multiplication costs less thanthe costs of using mixing doubling and addition formu-las. Extended coordinates are faster than projective co-ordinates and inverted coordinates, while in doublingand tripling, projective coordinates and inverted coor-dinates are faster than extended coordinates.

3. Extended double-base number system

This section reviews the extended double-base numbersystem (extended DBNS, for short) for computing [n]Pgiven P . Let S be a set containing 1. Every positiveinteger n can be represented as n = Σm

i=1di2ai3bi , with

|di| ∈ S, a0 ≥ a1 ≥ a2 ≥ · · · ≥ am ≥ 0, and b0 ≥ b1 ≥b2 ≥ · · · ≥ bm ≥ 0. This approach is called extendedDBNS.This representation is not unique. Bernstein et al.

optimized single-scalar multiplication using extendedDBNS. They analyzed various elliptic curves containingEdwards curves. However, twisted Edwards curves arenot contained. We analyze projective twisted Edwardscoordinates, inverted twisted Edwards coordinates, andextended twisted Edwards coordinates. We also proposea new precomputation set S for single-scalar multiplica-tion, and optimize the best speeds that can be obtained.

3.1 A new choice of a coefficient set SWe reviewed extended DBNS above. It is significant

to choose a proper coefficient set S. If one chooses abetter set, single-scalar multiplication can be computedfaster. We propose a new coefficient set containing F2 =1, F3 = 2, . . . . Here, Fi is i-th Fibonacci number. Forexample, if #S = 6, we make up S = 1, 2, 3, 5, 8, 13.If the number of elements in S is larger, one can chooselarger coefficients, and it is possible to make up moreefficient extended DBNS expansions. Moreover, by cal-culating the precomputation points [2]P, [3]P, [5]P =[2]P + [3]P, [8]P = [3]P + [5]P, . . . , in order, the initialcomputation of [c]P for each c ∈ S can be calculatedefficiently.

3.2 Example.

Take the integer n = 264290. We consider two coeffi-cient sets S1 = 1, 2, 3, 5, 7, 9 and S2 = 1, 2, 3, 5, 8, 13.The extended DBNS expansion with S1 can be writtenas 5 · 21133 − 7 · 2633 − 5 · 2132 − 2 · 2130. Assuming that[2]P, [5]P, and [7]P are precomputed, it is possible to ob-tain [264290]P as [2]([32]([253]([25][5]P − [7]P )− [5]P )−[2]P ) with 11 doublings, 3 triplings, and 3 additions. Onthe other hand, the extended DBNS expansion with S2can be written as 13·2834−8·2334−8·2231+2·2030. As-suming that [2]P, [8]P, and [13]P are precomputed, onecan obtain [264290]P as [2231]([2133]([25][13]P − [8]P )−[8]P ) + [2]P with 8 doublings, 4 triplings, and 3 addi-tions. The latter expansion can be computed faster.To compute extended double-base chains, we used the

greedy type algorithm in [2] (Algorithm 1). In this algo-rithm, one chooses the best approximation d12

a13b1 ofgiven integer n first. Total costs for single-scalar mul-tiplication using extended DBNS depend on a1, b1, andthe length of the chain. If these values can be reduced,it is possible to compute with less costs. Let d1 be thelargest number in S. Other di’s are chosen in S properly.Then, we can reduce a1 and b1 as in the above example.We carried out the experiments and showed the resultsin the next section.

– 38 –

Page 44: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.37–39 Yasunori Mineo et al.

Table 1. Total multiplication counts for each curve shape.

Shape l (bit) M (New results) M (Bernstein) a0Proj 160 1142.08706 1149.18034 156Proj 200 1392.19286 1402.05392 196Proj 256 1739.61144 1749.63448 252Proj 300 2012.27620 2022.30108 296

Proj 400 2653.80934 2682.76172 396Proj 500 3273.85562 3302.77824 496

Inv 160 1126.75536 1134.59694 156Inv 200 1376.36036 1386.94202 196

Inv 256 1723.79004 1734.50618 252Inv 300 1996.43160 2007.16798 296Inv 400 2633.84604 2663.45662 396Inv 500 3253.88962 3283.49154 496

Ext 160 1272.09880 1283.00072 156Ext 200 1560.78106 1574.78950 196Ext 256 1964.27612 1978.37394 252Ext 300 2280.83532 2295.00180 296

Ext 400 3011.23480 3047.20314 396Ext 500 3731.27792 3767.28818 496

Table 2. Choices of the sets.

l (bit) #S S (New results) S (Bernstein)

160 8 1, 2, 3, 5, 8, . . . , 34 1, 2, 3, 5, 7, . . . , 13200 9 1, 2, 3, 5, 8, . . . , 55 1, 2, 3, 5, 7, . . . , 15256 9 1, 2, 3, 5, 8, . . . , 55 1, 2, 3, 5, 7, . . . , 15300 9 1, 2, 3, 5, 8, . . . , 55 1, 2, 3, 5, 7, . . . , 15400 14 1, 2, 3, 5, 8, . . . , 610 1, 2, 3, 5, 7, . . . , 25500 14 1, 2, 3, 5, 8, . . . , 610 1, 2, 3, 5, 7, . . . , 25

4. Experiments and results

In this section, we explain our experiments and sum-marize our results. Our experiments are based on whatBernstain et al. carried out in [3]. We generated 10000uniform random integers n ∈ 0, 1, . . . , 2l − 1. Here, l’sare several bit sizes, namely, 160, 200, 256, 300, 400, and500. Next, we converted each integer into a double-basechain as specified by a0 and S. Note that we can obtainb0 by calculating ⌈(log2 n − a0) log2 3⌉. In our experi-ments, we chose S optimized by them. In addition, weincluded the sets F2, F3, . . . . Finally, we checked thatthe constructed chain indeed computed n starting thechain from 1, and counted the number of triplings, dou-blings, and additions for those 10000 choices of n. Ourexperiments included the three curve shapes: Projectivetwisted Edwards(Proj); Inverted twisted Edwards(Inv);and Extended twisted Edwards(Ext). We follow thestandard practice of counting S = 0.8M, and disregard-ing other field operations. We included the multiplica-tion counts for the initial computation of [c]P for eachc ∈ S. The results of the experiments are presented astables. Table 1 shows total multiplication counts for eachcurve shape and each l. We describe our choices of thecoefficient sets for each l in Table 2. The multiplicationcounts can be reduced approximately 1% for each curveshape and each l rather than that of Bernstein et al. Forlarger l, our sets can be more efficient. Extended twistedEdwards coordinates are slower than the other coordi-nates in our experiment. That is because the numberof doublings occupies most of the multiplication counts.The costs of the doubling for extended coordinates arelarger than those for projective coordinates and invertedcoordinates.

5. Conclusion

In this paper, we showed explicit tripling formulas fortwisted Edwards curves. We also proposed a new co-efficient set for extended DBNS chains, and optimizedsingle-scalar multiplication for twisted Edwards curves.Future works are as follows:

• analyzing mixed coordinates introduced in [5] fortwisted Edwards curves,

• optimizing single-scalar multiplication for other el-liptic curve shapes by using our sets,

• speeding up the algorithm which makes up extendedDBNS chains.

Acknowledgments

We would like to thank the anonymous reviewer forhis/her valuable comments. This work was supportedin part by Grant-in-Aid for Scientific Research (C)(20540135).

References

[1] H. M. Edwards, A normal form for elliptic curves, B. Am.Math. Soc., 44 (2007), 393–422.

[2] C. Doche and L. Imbert, Extended double-base number sys-tem with applications to elliptic curve cryptography, in: Proc.

of INDOCRYPT 2006, Rana Barua et al. eds., LNCS, Vol.4329, pp. 335–348, Springer-Verlag, Berlin, 2006.

[3] D. J. Bernstein, P. Birkner, T. Lange and C. Peters, Optimiz-ing double-base elliptic-curve single-scalar multiplication, in:

Proc. of INDOCRYPT 2007, K. Srinathan et al. eds., LNCS,Vol. 4859, pp. 167–182, Springer-Verlag, Berlin, 2007.

[4] D. J. Bernstein, P. Birkner, T. Lange and C. Peters, TwistedEdwards curves, in: Proc. of AFRICACRYPT 2008, Serge

Vaudenay ed., LNCS, Vol. 5023, pp. 389–405, Springer-Verlag,Berlin, 2008.

[5] H. Hisil, K. K. Wong, G. Carter and E. Dawson, Twisted Ed-

wards curves revisited, in: Proc. of ASIACRYPT 2008, JosefPieprzyk ed., LNCS, Vol. 5350, pp. 326–343, Springer, Berlin,2008.

– 39 –

Page 45: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol.6 (2014) pp.41–44 c⃝2014 Japan Society for Industrial and Applied Mathematics J S I A MLetters

Some examples of multidimensional

Shintani zeta distributions

Takahiro Aoyama1 and Kazuhiro Yoshikawa2

1 Faculty of Culture and Education, Saga University, 1 Honjo-machi, Saga 840-8502, Japan2 Graduate School of Science and Engineering, Ritsumeikan University, 1-1-1 Noji-higashi,Kusatsu, Shiga 525-8577, Japan

E-mail 1aoyama cc.saga-u.ac.jp 2ra009059 ed.ritsumei.ac.jp

Received January 22, 2014, Accepted January 29, 2014

Abstract

In the studies of mathematical statistics, we often consider discrete distributions and theircorresponding stochastic processes. Especially, probabilistic limit theorems of them may giveus some progress in mathematical finance. There exist not so many properties of discretedistributions on Rd. In this paper, we treat multiple zeta functions as to define several formsof discrete distributions on Rd including those with infinitely many mass points. Our purposeis to obtain new methods in the relations between multiple infinite series and high dimen-sional integral calculus, which can provide us more opportunities to handle high dimensionalphenomenon.

Keywords Levy measure, multinomial distribution, zeta function

Research Activity Group Mathematical Finance

1. Infinitely divisible distributions

Infinitely divisible distributions are known as one ofthe most important class of distributions in probabilitytheory. They are the marginal distributions of stochas-tic processes having independent and stationary incre-ments such as Brownian motion and Poisson processes.In 1930’s, such stochastic processes were well-studied byP. Levy and now we usually call them Levy processes.They often appear in mathematical finance as standardstochastic processes. We can find the detail of Levy pro-cesses in Sato [1].In this section, we mention some known properties of

infinitely divisible distributions.

Definition 1 (Infinitely divisible distribution) Aprobability measure µ on Rd is infinitely divisible if, forany positive integer n, there is a probability measure µn

on Rd such that

µ = µn∗n ,

where µn∗n is the n-fold convolution of µn.

Example 2 Normal, degenerate and Poisson distribu-tions are infinitely divisible.

Denote by I(Rd) the class of all infinitely divisible

distributions on Rd. Let µ(t) :=∫Rd e

i⟨t,x⟩µ(dx), t ∈ Rd,be the characteristic function of a distribution µ, where⟨·, ·⟩ is the standard inner product in Rd. We also writea ∧ b = mina, b.The following is well-known.

Proposition 3 (Levy–Khintchine representation(see, e.g. Sato [1])) (i) If µ ∈ I(Rd), then

µ(t) = exp

(− 1

2⟨t, At⟩+ i⟨γ, t⟩

+

∫Rd

(ei⟨t,x⟩ − 1− i⟨t, x⟩

1 + |x|2

)ν(dx)

), t ∈ Rd,

(1.1)

where A is a symmetric nonnegative-definite d × d ma-trix, ν is a measure on Rd satisfying

ν(0) = 0 and

∫Rd

(|x|2 ∧ 1

)ν(dx) <∞, (1.2)

and γ ∈ Rd.(ii) The representation of µ in (i) by A, ν, and γ is

unique.(iii) Conversely, if A is a symmetric nonnegative-

definite d×d matrix, ν is a measure satisfying (1.2), andγ ∈ Rd, then there exists an infinitely divisible distribu-tion µ whose characteristic function is given by (1.1).

The measure ν is called the Levy measure and it gen-erates a jump type Levy process. The following is alsoknown as one of the most important classes of infinitelydivisible distributions.

Definition 4 (Compound Poisson distribution)A distribution µ on Rd is called compound Poisson if,for some c > 0 and some probability measure ρ on Rd

with ρ(0) = 0,

µ(t) = exp(c[ρ( t )− 1]

), t ∈ Rd.

Here the measure ρ is the Levy measure of the com-pound Poisson distribution µ and is finite. The Poissondistribution is a special case where d = 1 and ρ = δ1,where δx is a delta measure at x.

Remark 5 We have to note that any infinitely divis-ible distribution can be expressed as the weak limit of acertain sequence of compound Poisson distributions.

– 41 –

Page 46: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.41–44 Takahiro Aoyama et al.

2. Zeta distributions

In one dimensional case, there exists a class of dis-crete distribution generated by the Riemann zeta func-tion. Our research is focused on this class and expandedto obtain several exact expressions of discrete multidi-mensional distributions with Levy measures if they have.We rarely see that both of the discrete distributions andLevy measures on Rd are computable in mathematicalstatistics as well as such relations between multiple se-ries and high dimensional measure theories even in puremathematics.First, we introduce the Riemann zeta function and

distribution. We can find the basic properties of zetafunctions in Apostol [2].

Definition 6 (Riemann zeta function) The Rie-mann zeta function is a function of a complex variables = σ + it ∈ C, for σ > 1, t ∈ R given by

ζ(s) :=∞∑

n=1

1

ns.

The Riemann zeta function converges absolutely inthe region σ > 1. In this region of absolutely conver-gence,wehave the followingwell-knowndistribution onR.Definition 7 (Riemann zeta distribution) Foreach σ > 1, a probability measure µσ on R is called aRiemann zeta distribution, if

µσ (− log n) = n−σ

ζ(σ), n ∈ N.

Then its characteristic function fσ can be written asfollows:

fσ(t) =

∫Reitxµσ(dx) =

ζ(σ + it)

ζ(σ), t ∈ R.

This class of distribution is first introduced in Jessenand Wintner [3] without normalization for an examplein the studies of infinitely many times convolutions. Asa probability distribution, it is first appeared in Khint-chine [4].

Proposition 8 (See, e.g. Gnedenko and Kol-mogorov [5]) The characteristic function fσ(t) is acompound Poisson with a finite Levy measure Nσ on R:

log fσ(t) =

∫ ∞

0

(exp(−itx)− 1)Nσ(dx),

where

Nσ(dx) =∑p∈P

∞∑r=1

p−rσ

rδr log p(dx).

This proposition shows that the Riemann zeta func-tion is treatable in the theory of Levy processes as someother well-known functions. Further properties of thisclass is also studied in Hu and Lin [6].The Riemann zeta function is variously extended such

as Hurwitz or Barnes types. (See, e.g. Apostol [2] indetail.) Also, several generalized zeta distributions areintroduced but most of them are not infinitely divisible.The cases having the infinite divisibility are the following.A special case of Hurwitz zeta function generates a

compound Poisson distribution on R which is given in

Hu and Lin [7].Other cases are given by using multivariable zeta func-

tions and their corresponding distributions are on Rd.Aoyama and Nakamura [8] introduced multidimensionalShintani zeta functions which are generalized to be mul-tivariable and multiple infinite series as in the following.

Definition 9 (Multidimensional Shintani zetafunction (Aoyama and Nakamura [8])) Letd,m, r ∈ N, s ∈ Cd and (n1, . . . , nr) ∈ Zr

≥0. For λlj,

uj > 0, cl ∈ Rd, where 1 ≤ j ≤ r and 1 ≤ l ≤ m, anda function θ(n1, . . . , nr) ∈ C satisfying |θ(n1, . . . , nr)| =O((n1 + · · ·+ nr)

ε), for any ε > 0, we define a multidi-mensional Shintani zeta function ZS(s) given by

∞∑n1,...,nr=0

θ(n1, . . . , nr)∏ml=1(

∑rj=1 λlj(nj + uj))⟨cl,s⟩

.

We call the function θ(n1, . . . , nr) a generalizedDirichlet character of the multidimensional Shintani zetafunction and write ⟨c, s⟩ := ⟨c, σ⟩ + i⟨c, t⟩ for c ∈ Rd

and s ∈ Cd, where σ, t ∈ Rd and s = σ + it. The seriesZS(s) converges absolutely in the region min1≤l≤m⟨cl, σ⟩> r/m (see, Aoyama and Nakamura [8]), which we de-note by DS. Suppose that θ(n1, . . . , nr) is non-negativeor non-positive definite, then we can define the followingclass of distribution on Rd.

Definition 10 (Multidimensional Shintani zetadistribution (Aoyama and Nakamura [8])) Foreach σ ∈ DS, a probability measure µσ on Rd is calleda multidimensional Shintani zeta distribution, if for all(n1, . . . , nr) ∈ Zr

≥0,

µσ

(−

m∑l=1

cl1 log

(r∑

k=1

λlk(nk + uk)

),

. . . ,−m∑l=1

cld log

(r∑

k=1

λlk(nk + uk)

))

=θ(n1, . . . , nr)

ZS(σ)

m∏l=1

(r∑

k=1

λlk(nk + uk)

)−⟨cl,σ⟩

.

Then its characteristic function fσ is given by

fσ (t) :=

∫Rd

ei ⟨t,x⟩µσ(dx) =ZS(σ + it)

ZS(σ), t ∈ Rd,

which can be regarded as a generalization of the Rie-mann zeta distribution.

Remark 11 This class contains both infinitely divisi-ble and non infinitely divisible distributions on Rd. Byapplying Euler products, some simple examples of com-pound Poisson case on R2 and generalized cases on Rd

are given in [9] and [10], respectively.

3. Relation between distributions and

characters

Many kinds of discrete distributions can be repre-sented in the sense of multidimensional Shintani zetafunctions by choosing suitable characters, and so theircharacteristic functions can be written by multiple infi-nite series. In this section, we pick up the multinomial

– 42 –

Page 47: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.41–44 Takahiro Aoyama et al.

distribution which is well-known as a multidimensionaldiscrete one, and show the relation with the character.

3.1 Main result 1 (A character corresponding to amultinomial distribution)

Fix N ∈ N. For each 1 ≤ l, k ≤ m, let ul = 1, λll = 1,λlk = 0, (l = k). We also take σ ∈ DS, cl = (clj)

dj=1 ∈

Rd, ϕ(l) ∈ R and j(1), . . . , j(m) ∈ N\1 with relativelyprime each other. Define a character by

θN (n1, . . . , nm)

:=

N !m∏l=1

(ϕ(l))kl

kl !(nl + 1 = (j(l))kl ,

m∑l=1

kl = N

),

0 (otherwise).

Then, for each s = σ + it ∈ Cd, t ∈ Rd,

ZS(s) =∑

k1+···+km=N

N !

m∏l=1

(ϕ(l))kl(j(l)−⟨cl,s⟩)kl

kl !

=

(m∑l=1

ϕ(l)(j(l))−⟨cl,s⟩

)N

,

fσ (t) :=ZS(σ + it)

ZS(σ)=

(m∑l=1

q(l)ei⟨xl ,t⟩

)N

,

where

q(l) :=ϕ(l)(j(l))−⟨cl,σ⟩∑ml=1 ϕ(l)(j(l))

−⟨cl,σ⟩,

xl := (xlk)dk=1, xlk := −clk log j(l).

If ϕ(1), . . . , ϕ(m) have the same sign, then the char-acter θN is non-negative or non-positive definite, and sothat fσ is a characteristic function of a Shintani zetadistribution. That is, it is the characteristic function ofa random variable Xσ defined by

Pr

(Xσ =

(m∑l=1

xl1nl, . . . ,m∑l=1

xldnl

))

= N !m∏l=1

(q(l))nl

nl!, when

m∑l=1

nl = N.

Especially, if m = d and x1, . . . , xd are the standardbasis of Rd, then Xσ belongs to a multinomial distribu-tion.Since multinomial distributions are the distributions

which have densities at most finitely many points, theircharacteristic functions are also multiple finite sum.However, multidimensional Shintani zeta distributionswhose characteristic functions defined by multiple infi-nite series may have densities at countably many points.In the following, we give an example of them and men-tion whether it is infinitely divisible or not.

3.2 Main result 2 (A character corresponding to a com-pound Poisson distribution on Rd)

We use the settings in Main result 1. For any non-negative integer valued random variable T , define a char-

acter

θT (n1, . . . , nm)

:=

∞∑N=0

Pr(T = N)θN (n1, . . . , nm)

(∑m

l=1 ϕ(l)(j(l))−⟨cl,σ⟩)N

.

Then the characteristic function Fσ,T of a multidimen-sional Shintani zeta distribution with a character θT hasthe form of

Fσ,T (t)

=∞∑

N=0

Pr(T = N)

(∑m

l=1 ϕ(l)(j(l))−⟨cl,σ⟩)N

×∞∑

n1,...,nm=0

θN (n1, . . . , nm)∏ml=1(

∑mk=1(λlk(nk + uk))⟨cl,σ+it⟩

=∞∑

N=0

Pr(T = N)

(∑m

l=1 ϕ(l)(j(l))−⟨cl,σ⟩)N

×

(m∑l=1

ϕ(l)(j(l))−⟨cl,σ+it⟩

)N

=∞∑

N=0

Pr(T = N)

(m∑l=1

q(l) ei⟨xl ,t⟩

)N

, t ∈ Rd.

Especially, if T belongs to a Poisson distribution withmean λ, then

Fσ (t) =∞∑

N=0

λN

N !e−λ

(m∑l=1

q(l) ei⟨xl ,t⟩

)N

= exp

(m∑l=1

q(l) ei⟨xl ,t⟩ − 1

)), t ∈ Rd.

This is the characteristic function of a compound Pois-son distribution with a finite Levy measure Nσ on Rd

given by

Nσ(dx) = λ

m∑l=1

q(l)δxl(dx).

Remark 12 In mathematical finance, models causedby some one-dimensional Levy process are studied. Nowwe have discrete infinitely divisible distributions on Rd

with finite Levy measures, which make us possible to sim-ulate some models associating with Rd-valued Levy pro-cesses.

4. Conditions to be characteristic func-

tions

Non-negative or non-positive definiteness of charac-ters are not necessary conditions for distributions tobe defined by multidimensional Shintani zeta functions.Therefore, now we consider the case when they are notnon-negative nor non-positive definite. We have the fol-lowing lemma which holds under the settings in Mainresult 1 and 2.

Lemma 13 Suppose that Rd-valued vectors c1, . . . , cmare linearly independent over R or c1 = · · · = cm ( = 0).If ϕ(1), . . . , ϕ(m) do not have the same sign, then there

– 43 –

Page 48: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.41–44 Takahiro Aoyama et al.

exist Rd-valued vectors t1, t2 such that

|fσ (t1)| > 1, |Fσ (t2)| > 1.

It is known that characteristic functions µ of any prob-ability measures µ on Rd satisfies |µ(t)| ≤ 1, t ∈ Rd.Hence, if normalized functions fσ (t) and Fσ (t) have vec-tors t1, t2 such that |fσ (t1)| > 1, |Fσ (t2)| > 1 holds, thenthey can not be characteristic functions. Now we havethe following result.

Theorem 14 (A necessary and sufficient condi-tion to be characteristic functions) Suppose thatRd-valued vectors c1, . . . , cm are linearly independentover R or c1 = · · · = cm (= 0). Then, fσ, Fσ are charac-teristic functions if and only if ϕ(1), . . . , ϕ(m) have thesame sign.

We give the proof of Lemma 13 of the case of fσ whenc1 = · · · = cm ( = 0) as in the same way as in Aoyamaand Nakamura [9,10]. The following proposition plays akey role in its proof.

Proposition 15 (Kronecker’s approximation the-orem (see, e.g. [11])) If r1, . . . , rn are arbitrary realnumbers, if real numbers θ1, . . . , θn are linearly indepen-dent over the rationals, and if ϵ > 0 is arbitrary, thenthere exist a real number t and integers h1, . . . , hn suchthat

|tθk − hk − rk| < ϵ, 1 ≤ k ≤ n.

Proof of Lemma 13 Since ϕ(1), . . . , ϕ(m) do nothave the same sign, there exists l0 such that q(l0) < 0.Put

L :=∑l =l0

q(l)− q(l0) >m∑l=1

q(l) = 1,

and take n0 ∈ N and ϵ > 0 such that L− ϵ > 1.Define

θl =log j(l)

2π(1 ≤ l ≤ m).

Then, θ1, . . . , θm are linearly independent over the ra-tionals. Therefore, the Kronecker’s approximation theo-rem shows that there exists T0 ∈ R such that

|ei2πT0θl0 + 1| < ϵ

(∑m

l=1 |q(l)|),

|ei2πT0θl − 1| < ϵ

(∑m

l=1 |q(l)|)(l = l0).

Thus, we have∣∣∣∣∣m∑l=1

ei2πT0θl − L

∣∣∣∣∣≤∑l =l0

|q(l)||ei2πT0θl − 1|+ |q(l0)||ei2πT0θl0 + 1| < ϵ

and that is ∣∣∣∣∣Re(

m∑l=1

q(l)ei2πT0θl − L

)∣∣∣∣∣≤

∣∣∣∣∣(

m∑l=1

q(l)ei2πT0θl − L

)∣∣∣∣∣ < ϵ.

Take t1 ∈ Rd such that T0 = ⟨c1, t1⟩. Then

Re

(m∑l=1

q(l)ei⟨xl ,t1⟩

)

= Re

(m∑l=1

q(l)ei2πT0θl

)> L− ϵ > 1.

Hence

|fσ (t1)| =

∣∣∣∣∣m∑l=1

q(l)ei⟨xl ,t1⟩

∣∣∣∣∣N

> 1.

(QED)

The rest of the proofs and further results are given inAoyama and Yoshikawa [12].By following our story, we can see that the characters

seem to be an important key of multidimensional Shin-tani zeta functions in view of defining distributions. Westill need new facts and methods of them as to makethings in stochastic models clearer and more useful.

Acknowledgments

The authors would like to express sincere appre-ciations to Professor Jiro Akahori for his valuablecomments. This work was partially supported byJSPS KAKENHI Grant Numbers 23330190, 24340022,23654056 and 25285102.

References

[1] K. Sato, Levy Processes and Infinitely Divisible Distributions,Cambridge Univ. Press, Cambridge, 1999.

[2] T. M. Apostol, Introduction to Analytic Number Theory,

Undergraduate Texts in Mathematics, Springer-Verlag, NewYork, 1976.

[3] B. Jessen and A. Wintner, Distribution functions and theRiemann zeta function, Trans. Amer. Math. Soc., 38 (1935),

48–88.[4] A. Ya. Khinchine, Limit Theorems for Sums of Independent

Random Variables (in Russian), GONTI, Moscow, 1938.[5] B. V. Gnedenko and A. N. Kolmogorov, Limit Distributions

for Sums of Independent Random Variables (Translated fromthe Russian by Kai Lai Chung), Addison-Wesley, London,1968.

[6] G. D. Lin and C.-Y. Hu, The Riemann zeta distribution,

Bernoulli, 7 (2001), 817–828.[7] C.-Y. Hu, A. M. Iksanov, G. D. Lin and O. K. Zakusylo, The

Hurwitz zeta distribution, Aust. N. Z. J. Stat., 48 (2006),1–6.

[8] T. Aoyama and T. Nakamura, Multidimensional Shintanizeta functions and zeta distributions on Rd, Tokyo J. Math.,36 (2013), 521–538.

[9] T. Aoyama and T. Nakamura, Behaviors of multivariable fi-nite Euler products in probabilistic view, Math. Nachr., 286(2013), 1691–1700.

[10] T. Aoyama and T. Nakamura, Multidimensional polynomial

Euler products and infinitely divisible distributions on Rd,submitted, http://arxiv.org/abs/1204.4041.

[11] T. M. Apostol, Modular Functions and Dirichlet Seriesin Number Theory, Graduate Texts in Mathematics 41,

Springer-Verlag, New York, 1990.[12] T. Aoyama and K. Yoshikawa, Multinomial distributions in

Shintani zeta class, preprint.

– 44 –

Page 49: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol.6 (2014) pp.45–48 c⃝2014 Japan Society for Industrial and Applied Mathematics J S I A MLetters

Some results of multidimensional discrete probability

measures represented by Euler products

Takahiro Aoyama1 and Nobutaka Shimizu2

1 Faculty of Culture and Education, Saga University, 1 Honjo-machi, Saga 840-8502, Japan2 Graduate School of Science and Engineering, Ritsumeikan University, 1-1-1 Noji-higashi,Kusatsu, Shiga 525-8577, Japan

E-mail 1aoyama cc.saga-u.ac.jp 2ra005065 ed.ritsumei.ac.jp

Received February 21, 2014, Accepted March 28, 2014

Abstract

There exist not many treatable multivariable functions with respect to multidimensional dis-crete distributions. In this paper, we pick up multidimensional finite Euler products and showwhen they can generate characteristic functions. Moreover, the infinite divisibility of them arestudied as well as non-infinite divisibility which are rarely seen in multidimensional discretecase with infinitely many mass points. The relation between series representations of zetafunctions is also studied by adjusting to the Shintani zeta type.

Keywords characteristic function, Euler product, Levy measure

Research Activity Group Mathematical Finance

1. Introduction

1.1 Infinite divisibilities

In mathematical statistics, there exists an importantclasss of probability distributions. It is defined as follows.

Definition 1 (Infinitely divisible distributions)A probability measure on Rd µ is infinitely divisible if,for any n ∈ N, there exists a probability measure on Rd

µn such that

µ = µn∗n .

where µn∗n is the n-fold convolution of µn.

Remark 2 Normal and Poisson distributions are in-finitely divisible. This class of distributions is also knownto be the marginal distributions of stochastic processeswith independent and stationary increments such asBrownian motion and Poisson processes. In mathemat-ical finance, such stochastic processes often appear andwe usually call them Levy processes. We can find thedetail of Levy processes in [1].

Let µ(t) :=∫Rd e

i⟨t,x⟩µ(dx), t ∈ Rd, be the charac-teristic function of a distribution µ, where ⟨·, ·⟩ is thestandard inner product in Rd. We also write a ∧ b =mina, b. Then, we have the following.

Proposition 3 (Levy-Khintchine representation (See e.g. [1]) (i) If a probability measure on Rd µ isinfinitely divisible, then

µ(t)=e− 1

2 ⟨t,At⟩+i⟨γ,t⟩+∫Rd

(ei⟨t,x⟩−1− i⟨t,x⟩

1+|x|2

)ν(dx)

, t ∈ Rd,(1)

where A is a symmetric nonnegative-definite d × d ma-trix, γ ∈ Rd, ν is a measure on Rd satisfying

ν(0) = 0 and

∫Rd

(|x|2 ∧ 1)ν(dx) <∞. (2)

(ii) The representation given by (1) is unique. (iii) Con-versely, if A is a symmetric nonnegative-definite d × dmatrix, γ ∈ Rd, ν is a measure on Rd satisfying (2), thenthere exists an infinitely divisible distribution µ whosecharacteristic function is given by (1).

Remark 4 The measure ν is called Levy measure ofan infinitely divisible distribution µ. It represents theconstructions of jump type Levy processes and is oftentreated in the studies of stochastic models in mathemat-ical finance.

As a generalization of infinitely divisible distributions,we have the following class.

Definition 5 (Quasi-infinitely divisible probabil-ity measures (See e.g. [2]) A probability measure onRd µ is called quasi-infinitely divisible if, µ has a formof (1), and corresponding measure ν is a signed measureon Rd with total variation |ν| satisfying ν(0) = 0 and∫Rd(|x|2 ∧ 1)|ν|(dx) <∞.

Remark 6 When the measure ν is not unsigned, thecorresponding measure µ is not infinitely divisible. Thisfact is applied to show certain distributions not to beinfinitely divisible. This class also appears in the studiesof convolution powers of distributions. The properties ofthis class, however, is still not well known. Some resultsof the relations between multidimensional quasi-infinitledivisibility and multivariable Euler product are given in[2].

1.2 Zeta functions and distributions

Zeta functions are one of the main subjects in num-ber theory. The Riemann zeta function is now regardedas the prototype and have a representation by infiniteproducts called the Euler product. They are given asfollows.

– 45 –

Page 50: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.45–48 Takahiro Aoyama et al.

Definition 7 (Riemann zeta function, Euler prod-uct) For s = σ + it, σ > 1, t ∈ R,

ζ(s) =∞∑

n=1

1

ns(3)

=∏p∈P

(1− p−s)−1, (4)

where P is the collection of all prime numbers. The series(3) is called the Riemann zeta function and the products(4) the Euler product.

Remark 8 The Riemann zeta function ζ converges ab-solutely and has no zeros in the region σ > 1. They arenow generalized in many ways and often appear in someother fields of mathematics. We can find the basic prop-erties of zeta functions in [3].

There exists an infinitely divisible discrete probabilitydistribution on R generated by the Riemann zeta func-tion.

Definition 9 (Riemann zeta distribution) For σ> 1, a probability distribution on R is called the Rie-mann zeta distribution with a parameter σ if it is givenby

Pr(Xσ = − log n) =n−σ

ζ(σ), n ∈ N.

Then, we have the following.

Proposition 10 (See, e.g. [4]) The characteristicfunction fσ of the Riemann zeta distribution is given by

fσ(t) = E[eitXσ

]=ζ(σ + it)

ζ(σ), t ∈ R.

Moreover, it is compound Poisson (hence infinitely di-visible) and its Levy measure Nσ is finite and given by

Nσ(dx) =∑p∈P

∞∑r=1

p−rσ

rδlog p(dx).

Remark 11 This proposition is first appeared in [5].Before that, unnormalized case is introduced as an exam-ple in the studies of infinitely many times convolutionsof measures in [6].

The Levy mueasure of the Riemann zeta distributioncomes from the Euler product. It means that Euler prod-ucts may provide us certain new keys and ways of how totreat the jump type Levy processes. As to obtain multi-variable products, the multidimensional polynomial Eu-ler product is introduced in [7] to extend the Riemannzeta distribution to multidimensional cases.

Definition 12 (Multidimensional polynomial Eu-ler produsts [7]) Let d, k ∈ N and s ∈ Cd. For primep, −1 ≤ αh(p) ≤ 1, ah ∈ Rd and 1 ≤ h ≤ k, we definemultidimensional polynomial Euler products given by

ZE(s) =∏p∈P

k∏h=1

(1− αh(p)p

−⟨ah,s⟩)−1

. (5)

Our aim is to obtain several definable multidimen-sional discrete distributions by such products and seewhether they are infinitely divisible or not.

2. Main results

Denote by

ID : the class of R2-valued infinitely divisible charac-teristic functions.

ID0: the class of R2-valued quasi-infinitely divisiblecharacteristic functions but non-infinitely divisiblecharacteristic functions.

ND: the class of R2-valued functions not even to becharacteristic functions.

Definition 13 Put s1 := σ1 + it1, s2 := σ2 + it2, σ :=(σ1, σ2) and t := (t1, t2), where σ1, σ2 > 0 and t1, t2 ∈ R.Then, for s = (s1, s2), we have s = σ + it. For p ∈ P,define two functions

g#p (σ, t) :=1(

1− p−(σ1+it1)) (

1− p−(σ2+it2)) ,

g∗p(σ, t) :=1(

1 + p−(σ1+it1)−(σ2+it2))

and corresponding normalized functions

G#p (σ, t) :=

g#p (σ, t)

g#p (σ, 0), G∗

p(σ, t) :=g∗p(σ, t)

g∗p(σ, 0).

By following the definition of the Riemann zeta distri-bution, it seems to be natural to treat G#

p and G∗p to find

out whether they can generate characteristic functionsof distributions on R2 or not. There are some results ofthem as follows.

Proposition 14 ([2]) We have the following.

(1) G#p ∈ ID.

(2) G∗p ∈ ND.

(3) G#p G

∗p ∈ ID0.

In this paper, we consider the following case.

Definition 15 Let p1, p2 ∈ P, σ1, σ2 > 0, t1, t2 ∈ R.For s1 = σ1+it1, s2 = σ2+it2, σ = (σ1, σ2), t = (t1, t2),we define multidimensional polynomial Euler productsg♯p1p2

, g∗p1p2by

g♯p1p2(σ, t) :=

1

(1− p−s11 )(1− p−s2

1 )

× 1

(1− p−s12 )(1− p−s2

2 ),

g∗p1p2(σ, t) :=

1

(1 + p−s1−s21 )(1 + p−s1−s2

2 )

and their normalized functions

G♯p1p2

(σ, t) :=g♯p1p2

(σ, t)

g♯p1p2(σ, 0), G∗

p1p2(σ, t) :=

g∗p1p2(σ, t)

g∗p1p2(σ, 0)

.

Then, we can also see their behaviors as in the sameway as above.

Theorem 16 We have the following.

(1) G♯p1p2∈ ID.

(2) G∗p1p2∈ ND.

(3) G♯p1p2

G∗p1p2∈ ID0.

Now we add some products.

– 46 –

Page 51: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.45–48 Takahiro Aoyama et al.

Definition 17 Under the same condition as in Defini-tion 15, we define

fp1p2(σ, t) :=1

(1− p−s1−s21 )(1− p−s1−s2

2 ),

gp1p2(σ, t) :=2∏

j=1

1

(1− p−s1j )(1− p−s2

j )(1 + p−s1−s2j )

,

hp1p2(σ, t) :=2∏

j=1

1

(1 + p−s1j )(1 + p−s2

j )(1− p−s1−s2j )

and their corresponding normalized functions

Fp1p2(σ, t) :=fp1p2(σ, t)

fp1p2(σ, 0),

Gp1p2(σ, t) :=gp1p2(σ, t)

gp1p2(σ, 0),

Hp1p2(σ, t) :=hp1p2(σ, t)

hp1p2(σ, 0).

Then, we obtain the following theorems.

Theorem 18

(1) Fp1p2 ∈ ID.

(2) Gp1p2 ∈ ID0.

(3) Hp1p2 ∈ ND.

Theorem 19

(1) Fp1p2Gp1p2 ∈ ID.

(2) Gp1p2Hp1p2 ∈ ID.

(3) Hp1p2Fp1p2 ∈ ND.

Remark 20 As further results, behaviors of productsof these functions are studied in Aoyama and Shimizu[9].

We need the following facts to show our results.

Proposition 21 Let µ be a probability measure on Rd.Then, |µ(t)| ≤ 1 for any t ∈ Rd, where µ is the charac-teristic function of µ.

For n ∈ N, let ak = ψka, (1 ≤ k ≤ n) be vectors on Rd

where ψ1, . . . , ψn are algebraic real numbers. Then wecall a1, . . . , an are linearly dependent but linearly inde-pendent over Q, if c1ψ1 + · · ·+ cnψn = 0 is equivalentlyc1 = · · · = cn = 0, (c1, . . . , cn ∈ Q). Denote by

LI: Linearly independent,

LR: Linearly dependent but linearly independent overQ.

Put fσ (t) := ZE(σ + it)/ZE(σ). The following lemmais one of the keys for our proof coming later.

Lemma 22 ([7]) Suppose that a1, . . . , ak satisfy LI orLR. If there exists a set of pairs of h and q, (1 ≤ h ≤k, q ∈ P) such that αh(q) < 0, then there exists t0 ∈ Rd

such that |fσ (t0)| > 1.

They also gave the following.

Proposition 23 ([7]) Suppose that a1, . . . , ak satisfyLI or LR. Then fσ is a characteristic function if andonly if αh(p) ≥ 0 for all 1 ≤ h ≤ k and p ∈ P. Moreover,

fσ is a compound Poisson characteristic function with afinite Levy measure Nσ on Rd given by

Nσ(dx) =∑p∈P

∞∑r=0

k∑h=1

1

rαh(p)

rp−r⟨ah,σ⟩δr log pah(dx).

Now we give the proof of Theorem 16.Proof of Theorem 16 (1) It can be proved by fol-lowing Proposition 23.

(QED)

Proof of Theorem 16 (2) We have that there existst0 such that |G∗

p1p2(t0)| > 1 by Lemma 22. This contra-

dicts the fact as in Proposition 21.(QED)

We need the followings before we prove Theorem 16(3).

Definition 24 (Multidimensional Shintani zetafunction [8]) Let d,m, r ∈ N, s ∈ Cd and(n1, . . . , nr) ∈ Zr

≥0. For λlj , uj > 0, cl ∈ Rd, where 1 ≤j ≤ r and 1 ≤ l ≤ m, and a function θ(n1, . . . , nr) ∈ Csatisfying |θ(n1, . . . , nr)| = O((n1 + · · ·+ nr)

ε), for anyε > 0, we define a multidimensional Shintani zeta func-tion ZS(s) given by

∞∑n1,...,nr=0

θ(n1, . . . , nr)∏ml=1

[∑rj=1 λlj(nj + uj)

]⟨cl,s⟩ .The series ZS(s) converges absolutely in the region

min1≤l≤m⟨cl, σ⟩ > r/m (see, Aoyama and Nakamura[8]), which we denote by DS. Suppose that θ(n1, . . . , nr)is non-negative or non-positive definite, then we have thefollowing class of distribution on Rd.

Definition 25 (Multidimensional Shintani zetadistribution [8]) For each σ ∈ DS, a probability mea-sure µσ on Rd is called a multidimensional Shintani zetadistribution, if for all (n1, . . . , nr) ∈ Zr

≥0,

µσ

(−

m∑l=1

cl1 log

(r∑

k=1

λlk(nk + uk)

),

. . . ,−m∑l=1

cld log

(r∑

k=1

λlk(nk + uk)

))

=θ(n1, . . . , nr)

ZS(σ)

m∏l=1

[r∑

k=1

λlk(nk + uk)

]−⟨cl,σ⟩

.

Then its characteristic function fσ is given by

fσ (t) :=

∫Rd

ei ⟨t,x⟩µσ(dx) =ZS(σ + it)

ZS(σ), t ∈ Rd.

Now we prove Theorem 16 (3).Proof of Theorem 16 (3) First we show thatG♯

p1p2G∗

p1p2is a characteristic function. We have

g♯p1p2(σ, t)g∗p1p2

(σ, t)

=

2∏j=1

1(1− p−2s1−2s2

j

) × (1− p−s1−s2

j

)(1− p−s1

j

) (1− p−s2

j

) .For anyX,Y with |X|, |Y | < 1, we also have 1/(1−X) =∑∞

n=0Xn and (1−XY )/[(1−X)(1−Y )] = 1/(1−X)+

1/(1 − Y ) − 1 = 1 +∑∞

n=1(Xn + Y n). Therefore, we

– 47 –

Page 52: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.45–48 Takahiro Aoyama et al.

obtain

g♯p1p2(σ, t)g∗p1p2

(σ, t)

=∞∑

l1=0

1

p2l1(s1+s2)1

(1 +

∞∑m1=1

1

pm1s11

+∞∑

n1=1

1

pn1s21

)

×∞∑

l2=0

1

p2l2(s1+s2)2

(1 +

∞∑m2=1

1

pm2s12

+

∞∑n2=1

1

pn2s22

)

=

∞∑l1,m1,n1=1

A1(l1)A1(m1, n1)

ls1+s21 ms1

1 ns21

×

∞∑l2,m2,n2=1

A2(l2)A2(m2, n2)

ls1+s22 ms1

2 ns22

,

where, for a, b, c ∈ N and j = 1, 2,

Aj(lj) :=

1 lj = 1, p2aj ,

0 otherwise,

Aj(mj , nj) :=

1 (mj , nj) = (1, 1), (1, pbj), (p

cj , 1),

0 otherwise.

We can see that both of∑∞

l1,m1,n1=1A1(l1)A1(m1,

n1)/(ls1+s21 ms1

1 ns21 ) and

∑∞l2,m2,n2=1A2(l2)A2(m2, n2)/(

ls1+s22 ms1

2 ns22 ) belong to multidimensional Shintani zeta

function with above nonnegative sequences as characterθ. Thus, G♯

p1p2G∗

p1p2is a characteristic function of mul-

tidimensional Shintani zeta distribution.Next, we show that G♯

p1p2G∗

p1p2is quasi but not in-

finitely divisible. We have that

logG♯p1p2

(σ, t)G∗p1p2

(σ, t)

=2∑

j=1

(log(1− p−σ1

j

)− log

(1− p−σ1−it1

j

)+ log

(1− p−σ2

j

)− log

(1− p−σ2−it2

j

)+ log

(1 + p−σ1−σ2

j

)− log

(1 + p−σ1−it1−σ2−it2

j

))=

2∑j=1

( ∞∑r=1

1

rp−rσ1j

(p−rit1j − 1

)+

∞∑r=1

1

rp−rσ2j

(p−rit2j − 1

)+

∞∑r=1

(−1)r

rp−r(σ1+σ2)j

(p−ri(t1+t2)j − 1

))

=

∫R2

(ei⟨t,x⟩ − 1

)N

G♯p1p2

G∗p1p2

σ (dx),

where

NG♯

p1p2G∗

p1p2

σ (dx)

:=2∑

j=1

( ∞∑r=1

1

rp−rσ1j δr log pj(1,0)(dx)

+∞∑r=1

1

rp−rσ2j δr log pj(0,1)(dx)

+∞∑r=1

(−1)r

rp−r(σ1+σ2)j δr log pj(1,1)(dx)

).

The measure NG♯

p1p2G∗

p1p2

σ is a quasi-Levy measure on R2

with negative components since the 3rd and 6th term arenegative signed measures which cannot be canceled bythe other terms and we easily obtain∫

R2

∣∣∣∣NG♯p1p2

G∗p1p2

σ

∣∣∣∣ (dx) <∞.Hence N

G♯p1p2

G∗p1p2

σ ∈ ID0.(QED)

The proofs of Theorem 18, 19 and further results aregiven in [9]. Such properties may provide us new meth-ods to construct stochastic models in mathematical fi-nance.

Acknowledgments

The authors would like to express sincere appre-ciations to Professor Jiro Akahori for his variablecomments. This work was partially supported byJSPS KAKENHI Grant Numbers 23330190, 24340022,23654056 and 25285102.

References

[1] K. Sato, Levy Processes and Infinitely Divisible Distributions,Cambridge Univ. Press, Cambridge, 1999.

[2] T. Aoyama and T. Nakamura, Behaviors of multivariable fi-nite Euler products in probabilistic view, Math. Nachr., 286(2013), 1691–1700.

[3] T. M. Apostol, Introduction to Analytic Number Theory,Undergraduate Texts in Mathematics, Springer-Verlag, NewYork, 1976.

[4] B. V. Gnedenko and A. N. Kolmogorov, Limit Distributions

for Sums of Independent Random Variables (Translated fromthe Russian by Kai Lai Chung), Addison-Wesley, London,1968.

[5] A. Ya. Khinchine, Limit Theorems for Sums of Independent

Random Variables (in Russian), GONTI, Moscow, 1938.[6] B. Jessen and A. Wintner, Distribution functions and the

Riemann zeta function, Trans. Amer. Math. Soc., 38 (1935),48–88.

[7] T. Aoyama and T. Nakamura, Multidimensional polynomialEuler products and infinitely divisible distributions on Rd,submitted, http://arxiv.org/abs/1204.4041.

[8] T. Aoyama and T. Nakamura, Multidimensional Shintanizeta functions and zeta distributions on Rd, Tokyo J. Math.,36 (2013), 521–538.

[9] T. Aoyama and N. Shimizu, Properties of multidimensional

discrete distributions having Euler products, preprint.

– 48 –

Page 53: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol.6 (2014) pp.49–52 c⃝2014 Japan Society for Industrial and Applied Mathematics J S I A MLetters

Credit risk valuation model for

real estate non-recourse loan

Suguru Yamanaka1 and Masaaki Otaka1

1 Mitsubishi UFJ Trust Investment Technology Institute Co., 2-6, Akasaka 4-Chome, Minato-ku, Tokyo 107-0052, Japan

E-mail yamanaka mtec-institute.co.jp, otaka mtec-institute.co.jp

Received March 6, 2014, Accepted April 15, 2014

Abstract

In this paper we propose a practical cost-effective model to estimate the credit risk of a largeportfolio of real estate non-recourse loans. It uses information that is as easy to get and updateas possible, such as real estate investment indices and macroeconomic indices. Empiricalcharacteristics of real estates can be taken into account, such as serial correlations, cross-sectional correlations within individual properties, lagged effects of macroeconomic factors.

Keywords credit risk, non-recourse loan, real estate, LTV, macroeconomic factor

Research Activity Group Mathematical Finance

1. Introduction

A real estate non-recourse loan (NRL) is a loan thatis secured by a pledge of real properties as collateral.For a financial institute that manages a lot of real es-tate NRLs, it is necessary to estimate probable lossesfrom the defaults or the changes of mark-to-market loanvalues.As for a NRL, if the borrower defaults, the lender

can seize the collaterals but the recovery is limited tothe value of them. Thus, the value of a NRL dependsrather on the value of mortgage properties than finan-cial or credit conditions of the lender. To estimate therisk of real estate NRLs, it is natural to employ struc-tural approaches, in which a NRL’s default is triggeredby loan-to-value ratio (LTV) exceeding some threshold.This kind of approaches was first introduced by Mer-ton [1] to model the defaults of corporate bonds, andhas been employed for modeling defaults of commercialmortgage backed securities (CMBS), which are securitiesbacked by NRLs. Liu et. al. [2] and Kau et. al. [3] andShiu et. al. [4] employed the first passage time model toestimate default probabilities of CMBS on the basis ofLTV. They described the dynamics of a property valueas a simple lognormal process. However, considering theapplication to stress testing, it is more desirable to usea model that can reflect some macroeconomic scenarios.In the appraisal practice, the value of a real property

is computed by adding up the discount values of thenet cash flows from it. Kanzaki and Sasaki [5] proposeda dynamic model combined with widely used appraisalmethod: capitalization rate (cap-rate) model. “Cap-rateof a standard property” was statistically estimated bythe use of a large database of individual trade infor-mation. They also developed a time series model of thecap-rate process lagged with macroeconomic factors. Al-though this type of model is a useful tool to valuateappraised value of each collateral property on credit ad-

ministrations, it will cost a lot to develop or purchasea large database and continuously update and maintainit.In the risk management practice in financial institu-

tions, which hold a large number of real estate NRLs,it is often needed to fast estimate rough distributionsof losses or the effects of macroeconomic scenarios onthe basis of a relatively simple model. We developed anew practical cost-effective model which uses informa-tion that is as easy to get and update as possible, suchas real estate investment indices, published by the As-sociation for Real Estate Securitization Japan (ARES),and macroeconomic indices by government organiza-tions, and so on. It can take empirical characteristicsof real estates into account, such as serial correlations,cross-sectional correlations within individual properties,lagged effects of macroeconomic factors. In our model,we have set the following assumptions: (1) All of thecharacteristics of each individual property are involvedin the appraised value at the loan origination. (2) A realestate NRL defaults as soon as the appraised value of thecollaterals falls short of the remaining principal, i.e. firstpassage time model. (3) After the default, there existsthe liquidity risk that the sales value of the collateralsdiffers from the appraised value.An outline for this paper is as follows. Section 2

presents the model formulations, and Section 3 providesa description of the data and estimates the model for realestates in Japan. In Section 4, we show numerical exam-ples of the risk measurement of virtual NRLs with ourmodel. A conclusion and future extension are providedin Section 5.

2. Model

In this section, we introduce a credit risk valuationmodel for NRL, which enables us to calculate the lossesat default time. First, we model the time series of real

– 49 –

Page 54: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.49–52 Suguru Yamanaka et al.

estate investment indices, which determine the approx-imate level of property values. Then, we model ap-praised value of collateral properties of target NRL. Inour real estate investment indices modeling, we considerserial correlations of the indices. Also, we incorporatethe macroeconomic factors, which precede the propertyvalues, in the model to capture the turning points ofthe indices. Real estate investment indices are calculatedby property types, such as office-type, residential-type,retail-type, etc. We describe the time series of the indiceswith vector auto regressive (VAR) model:

∆log(Z1t )

∆ log(Z2t )

...∆ log(ZC

t )

=

α1 + β1∆log(Z1

t−1) + γ1∆Mt−1−ℓ + ϵ1α2 + β2∆log(Z2

t−1) + γ2∆Mt−1−ℓ + ϵ2...

αC + βC∆log(ZCt−1) + γC∆Mt−1−ℓ + ϵC

,

where Zct denotes the index value of property type

c ∈ 1, 2, . . . , C at time t and ∆ log(Zct ) := log(Zc

t ) −log(Zc

t−1). Coefficients α, β and γ are the parameters tobe estimated. Random variable Mt stands for a macroe-conomic variable and ℓ ∈ 0, 1, 2, . . . is the time lag.Random variables (ϵ1, ϵ2, . . . , ϵC) denote random noiseswhich have normal distribution with mean 0, namely(ϵ1, ϵ2, . . . , ϵC) ∼ N(0,Σ).With real estate investment indices Z, the appraised

value of collateral property V it is modeled as follows:

log(V it )

= log(V it−1) +

√RC(i)∆ log(Z

C(i)t ) +

√1−RC(i)ϵi,t,

where, i ∈ 1, 2, . . . , I denotes the label of each proper-ties. C(i) denotes the property type to which the collat-eral property i belongs.Rc is the square of correlation co-efficient. Random variable ϵi,t represents error term andϵi,t ∼ N(0, δC(i)). Let us denote the principal of NRL attime t as PNRL,t. Let τ denote the time of default whichis defined as the first time that the underlying process,the sum of all collateral values VNRL,t =

∑i V

it , crosses

the barrier PNRL,t(1 +A) :

τ = inft|VNRL,t < PNRL,t(1 +A).

Here, the additional constant term A ≥ 0 is introducedfor sound risk management. In usual case, we set A = 0.We obtain sales values of collaterals by adjusting liq-

uidity risk on appraised values:

V SELLNRL,t =

∑i

(1− ℓi)× V it .

Here, the random variable ℓi ∼ N(0, σℓ) represents theliquidity risk, which means the risk of difference betweensales value and appraised value. If (1 − ℓi) > 1 (i.e.ℓi < 0 ), the sales value of property i is higher thanthe appraised value. On the other hand, if (1 − ℓi) < 1(i.e. ℓi > 0 ), the sales value of property i is lowerthan the appraised value. The constant σℓ denotes the

standard deviation of the spread rates, (sales value −appraised value)/appraised value. Then, we obtain theloss given default (LGD) as

LGD = max(PNRL,τ − V SELL

NRL,τ , 0).

3. Model estimation

In this section, we show the model estimation proce-dure by the use of sample data in Japan.

3.1 Data

We use the sample data of Tokyo stock exchange J-REIT index and ARES Japan Property Index (AJPI).AJPI is a real estate investment performance index, andgenerally shows the investment return in a certain in-vestment period. In particular, we use the capital returnof AJPI, with property types of office-type, residential-type and retail-type, for real estate investment indicesZ. Also, we use appraised value samples of propertieswhich constitute J-REITs. The sample period is fromMarch 2002 to March 2010. We set the time unit of thetime series models as 6 month in order to keep the sam-ple series independent. For leading indicator, we linedup some candidate macroeconomic variables as follows:Tokyo stock exchange REIT index, business sentimentdiffusion index of real estate sector, diffusion index onlending attitude of financial institute against real estatesector, the number of new housing starts, money stock,spread between long and short interest rates. The can-didates of time lag are 0, 6, 9, 12, 15, 18, 24 months. Weselected the appropriate variable and time lag by follow-ing two steps. First, we plotted the time series of AJPIand the variables and choose the time lag under whichthe turning points of AJPI and the variables are close.Then, we calculated the correlation coefficient of AJPIand variables and choose the set of variable and time lagfrom both point of view. As a result, we selected businesssentiment diffusion index of real estate with 6 month lag,diffusion index on lending attitude of financial instituteagainst real estate with 12 month lag and Tokyo stockexchange REIT index with 9 month lag. Figs. 1, 2 and 3are the time-series plots of AJPI and these figures showthe turning points of AJPI and the variables are close.

3.2 Estimation result

We estimate the real estate index time series modelwith multiple linear regression analysis. Table 1 showsthe estimation result of AJPI time series model withbusiness sentiment diffusion index with 6 month lag.Table 1 shows the coefficient of determination R2 ex-ceed 50% for each property type. In addition, as theestimated coefficient of leading indicator is positive withthe business sentiment diffusion index, the turning pointof business sentiment diffusion index captures the turn-ing point of estimated AJPI values. The estimated co-efficients with leading indicators of lending attitude offinancial institute against real estate sector and Tokyostock exchange REIT index were not significant, thus wedo not focus on them hereafter.The estimates of covariance matrix Σ for business sen-

– 50 –

Page 55: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.49–52 Suguru Yamanaka et al.

- 20

- 10

0

10

20

30

40

- 0.05

0

0.05

0.1

0.15

0.2

0.25

0.331-

Jan-03

30- N

ov-03

30-S

ep-04

31-J u

l-05

31-M

ay-06

31-M

ar-

07

31-J

an-08

30-N

ov-08

30-S

ep-09

31-Ju

l-10

31-M

ay-11

Log( ) AJPI Log Business Sentiment DI(Real Estate, l=6)Z :

Fig. 1. Time series plots of AJPI (office-type) and business sen-timent diffusion index of real estate sector with 6 month lag.

-20

-15

-10

-5

0

5

10

15

20

25

30

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

31-J

ul-

03

31-M

ay-0 4

31 -

Mar

-05

31-

Jan-0

6

30-N

ov-

06

30-S

ep-

07

31-J

ul-

08

31-M

ay-0 9

31-

Mar

-10

31-

Jan-1

1

Log( ) AJPI Log Lending Attitude of Financial Institutions DI (l=12)Z )(:

Fig. 2. Time series plots of AJPI (office-type) and diffusion in-dex on lending attitude of financial institute against real estate

sector with 12 month lag.

- 0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

30-Nov-05

30-Sep-06

31-Jul-07

31 -May-08

3 1-Ma r-09

31-Jan-10

30-Nov-10

Log(Z) AJPI Log) TSEREIT(l=9)(:

Fig. 3. Time series plots of AJPI (office-type) and Tokyo stockexchange REIT index with 9 month lag.

timent diffusion index is

Σ =

0.0002640.000319 0.0004360.000267 0.000299 0.000225

.

To estimate appraised value model, we use maxi-mum likelihood estimation method with sample dataVi,T (s)(s = 0, 1, · · · , n(i)) of property i. Likelihood func-tion LL is obtained as follows:

LLc(Rc, δc) =∑

i:C(i)=c

Li,

Table 1. Estimation Result of AJPI time series model with busi-ness sentiment diffusion index with 6 month lag.

Office

coefficient t-value p-value

α 0.001 0.178 0.862 −β 0.620 3.994 0.002 ∗∗γ 0.002 2.885 0.014 ∗

Adjusted R2 0.818

Retail

coefficient t-value p-value

α −0.001 −0.149 0.885 −β 0.429 1.467 0.176 −γ 0.002 1.513 0.165 −

Adjusted R2 0.576

Residential

coefficient t-value p-value

α −0.006 −1.044 0.331 −β 0.324 0.877 0.410 −γ 0.002 1.485 0.181 −

Adjusted R2 0.626

Table 2. MLE of appraised value model.

Office Retail Residential

Rc 0.857 0.965 0.987δc 0.146 0.279 0.385

LLc 6157.7 1766.9 9456.2

where

Li =

n(i)∑s=2

−1

2ln(2πS2

c

)−(∆lnVi,T (s) − µc,s

)22S2

c

,

µc,s =√Rc

(αc + βc∆lnZc

T (s−1) + γc∆Ms−1−ℓ

),

∆Ms−1−ℓ =MT (s)−ℓ −MT (s−1)−1−ℓ,

S2c = Rcσ

2c + (1−Rc)δ

2c .

Here, σc is the root of the diagonal element of Σ. Ta-ble 2 shows the estimated parameters of appraised valuemodel. Finally, we estimate liquidity risk parameter σℓby the use of traded records of properties which consti-tute J-REITs after 2008 and obtained σℓ = 13.68%.

4. Numerical examples of credit risk val-

uation of NRLs

In this section, we show numerical examples of cal-culating losses at the default time of sample NRLs. Weobtain loss distribution of a NRL with Monte Carlo sim-ulation with the estimated parameters in Section 3. Inparticular, we utilize business sentiment diffusion indexfor the leading indicator. In order to generate scenar-ios of business sentiment diffusion index, we model timeseries of business sentiment diffusion index with vectorregression model:

∆Mt = α0 + α1∆Mt−1 + ϵ, ϵ ∼ N(0, σ).

We obtain estimated parameters α0 = −0.791, α1 =0.427, σ = 7.22 with the least squares method. In orderto consider the high volatility in the real estate bubble

– 51 –

Page 56: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.49–52 Suguru Yamanaka et al.

Table 3. Risk valuation result for sample NRLs.

NRL 1 NRL 2

Number of collateral 2 7

Principal 2700 30000Total Appraised Value (t = 0) 3800 39000LTV 71.1% 76.9%

PD 1.52% 1.29%

EL 0.11% 0.07%99.9% VaR 24.50% 15.99%

period in 1980s, we adjust the volatility as follows:

Σ =

(σLT

σST

)2

× Σ, σ =

(σLT

σST

)2

× σC(i).

Here, Σ and σ are the estimated volatilities in Section3. σLT = 12.85% is the volatility of MU-CBex capitalreturn from 1970 to 2009. σST = 6.58% is the volatilityof MU-CBex capital return from 2002 to 2009.We set two sample NRLs of single Tranche with no

amortization. The statement of NRLs are describedin Table 3. LTV of both NRLs, which is defined byVNRL,t/PNRL,t, are almost in the same level. We set theproperty types of all collaterals are office-type. The eval-uation date is March 31, 2010 and the time horizon is 1year. The number of simulation scenarios is 50,000. Forsimplicity, we assume that the default does not occursbefore the time horizon.Table 3 shows the risk measures of sample NRL. Here,

PD is the default probability of the NRL. EL is the ex-pected value of standardized LGD and the standard-ized LGD is obtained as LGD/PNRL,τ . 99.9% VaR isthe 99.9%-percentile of the distribution of standardizedLGD. While LTV of both NRLs are in almost same level,the loss of NRL 2 is much smaller than that of NRL 1,by the dispersion effect of collateral.

5. Concluding remarks

We proposed a credit risk valuation model for non-recourse loans in this paper. Introducing a waterfallstructure model, our model can be applied to the riskmanagements of a tranched CMBS. Moreover, it willbe possible to estimate credit risks of loans for REITs,with some categorizations by maturities and simplifica-tion of loan portfolio within one REIT. Although wehave adopted LTV as a default trigger this time, it willbe extended to “a double trigger model” by adding netcash flow dynamics and determining the other defaultcondition with “debt service coverage ratio”.

Acknowledgments

We thank Takeshi Hirose, Toshiyuki Kobayashi, To-mohiko Yamashita, Kouhei Wada, and Masayuki Nishioat Mitshbishi UFJ Trust and Banking Co. Ltd. for help-ful conversations.

Note

The views and opinions expressed here are those ofthe authors and do not reflect the views of the authorsemployer.

References

[1] R.C.Merton, On the pricing of corporate debt: the risk struc-

ture of interest rates, J. Financ., 29 (1974), 449–470.[2] Y. Liu, G. Jabbour and R. Green, The performance of option-

based default risk models on commercial mortgages: an em-pirical investigation, J. Fixed Income, 17 (2007), 63–76.

[3] J. Kau, D. Keenan and Y. Yildrim, Estimating default proba-bilities implicit in commercial mortgage backed securities, J.Real Estate Finance Econ., 39 (2009), 107–117.

[4] P. Shiu, U. Luong and Y. Rozov, CMBS tranche valuation

framework: corrrelated geometric Brownian motions simula-tion, J. Fixed Income, 21 (2011), 55–66.

[5] T. Sasaki and K. Kanzaki, On risk management in real estatefinance (in Japanese), ARES Certified Master Journal, 20

(2009), 80–87.

– 52 –

Page 57: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol.6 (2014) pp.53–56 c⃝2014 Japan Society for Industrial and Applied Mathematics J S I A MLetters

An experiment of number field sieve

for discrete logarithm problem over GF(pn)

Kenichiro Hayasaka1, Kazumaro Aoki2, Tetsutaro Kobayashi2 and Tsuyoshi Takagi3

1 Graduate School of Mathematics, Kyushu University, 744, Motooka, Nishiku, Fukuoka-shi,Fukuoka 819-0395, Japan

2 NTT Secure Platform Laboratories, 3-9-11 Midori-cho, Musashino-shi, Tokyo 180-8585, Japan3 Institute of Mathematics for Industry, Kyushu University, 744, Motooka, Nishiku, Fukuoka-shi, Fukuoka 819-0395, Japan

E-mail k-hayasaka math.kyushu-u.ac.jp

Received October 7, 2013, Accepted January 19, 2014

Abstract

The security of the optimal Ate pairing using the BN curves is based on the hardness ofthe DLP over GF(p12). At CRYPTO 2006, Joux et al. proposed the number field sieve overGF(pn), but the number field sieve needs multi-dimensional sieving. In this paper, we dealwith the multi-dimensional sieving, and discuss its parameter sizes such as the dimension ofsieving and the size of the sieving region from some experiments of the multi-dimensionalsieving. Using efficient parameters, we have solved the DLP over GF(p12) of 203 bits in about43 hours using a PC of 16 CPU cores.

Keywords pairing, discrete logarithm problem, number field sieve, extension field, latticesieve

Research Activity Group Algorithmic Number Theory and Its Applications

1. Introduction

Pairing-based cryptography has attracted us due tonovel cryptographic protocols such as ID-based cryptog-raphy, functional encryption, etc. Many efficient imple-mentations of pairing have been reported, and one of themost efficient algorithms for computing pairing is the op-timal Ate pairing [1] using the BN curves [2]. The secu-rity of pairing-based cryptography using the BN curvesis based on the hardness of the discrete logarithm prob-lem (DLP) over finite fields GF(p12).The asymptotically fastest algorithm for solving the

DLP over prime fields GF(p) is the number field sieve[3]. At CRYPTO 2006, Joux et al. extended the numberfield sieve to the case of extension fields GF(pn) of de-gree n and characteristic p [4]. The complexity of solvingthe DLP over finite fields GF(p12) of 3072 bits by thenumber field sieve is estimated to be 2128 [2]. There aretwo experimental reports on the implementation of thenumber field sieve over extension fields GF(pn) of de-grees n = 3 [4] and n = 6 [5, 6]. However, to the best ofour knowledge, there is no experimental report on thehardness of the DLP over finite fields GF(p12) by thenumber field sieve. In order to correctly estimate the se-curity of the pairing-based cryptography, we need someexperimental evaluations of number field sieve over finitefield GF(p12).The number field sieve over extension field GF(pn)

has a substantially different sieving step from that overprime field GF(p). There are two sieving algorithms,called the line sieve and the lattice sieve [7]. The large-scale implementation of the number field sieve overprime fields GF(p) deploys the lattice sieve of dimen-

sion two, but we have to construct the lattice sieve ofdimension higher than two for the number field sieveover extension fields GF(p12). The currently known re-ports on the multi-dimensional sieving have discussedonly the case of dimension three [5, 6].In this paper, we propose the lattice sieving of dimen-

sion higher than two for the number field sieve over ex-tension fields GF(p12) by naturally extending the latticesieve of dimension two. We implemented the proposedmulti-dimensional lattice sieve over an extension fieldGF(p12) of 203 bits, and we show some experimentaldata for accelerating the number field sieve by choosingsuitable dimensions and sizes of the sieving region. Con-sequently, we have solved the DLP over the extensionfield GF(p12) of 203 bits by the number field sieve usinga PC of 16 CPU cores in about 43 hours.

2. Number field sieve over GF(pn) [4]

2.1 DLP over GF(pn)We denote by GF(pn)∗ the multiplicative group of a

finite field of cardinality pn, where p is a prime numberand n is an extension degree. The DLP over a finite fieldGF(pn) tries to find the non-negative smallest integerx that satisfies γx = δ for given δ, γ in GF(pn)∗. Thisdiscrete logarithm x is written as logγ δ in this paper.

2.2 Polynomial selectionWe generate two irreducible polynomials f1, f2 ∈

Z[X]\0 that satisfy the following conditions: f1 = f2,deg f1 = n, f1 is irreducible in GF(p)[X], f1 | f2 mod p.From the conditions, there exists v ∈ GF(pn) such thatf1(v) = f2(v) = 0 in GF(pn). Let α1 and α2 ∈ C beroots of f1(X) = 0 and f2(X) = 0, respectively. There

– 53 –

Page 58: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.53–56 Kenichiro Hayasaka et al.

are homomorphism maps ϕ1 : Z[α1] → GF(pn), α1 7→v, ϕ2 : Z[α2]→ GF(pn), α2 7→ v.

2.3 Searching relationsIn the step of searching relations, we try to find many

relations of certain polynomials of degree t ≥ 1. LetB1, B2 ∈ R>0 be smoothness bounds associated withpolynomials f1, f2 in Section 2.2. We define the factorbases B1,B2 by

Bi = (q, g) | q : prime, q ≤ Bi, g : irreducible monic

polynomial in GF(q)[X], g|fi mod q,

deg g ≤ t.

In this paper, we represent a polynomial ha(X) =∑tj=0 ajX

j ∈ Z[X] as a vector a = (a0, a1, . . . , at)T ∈

Zt+1. For a given H = (H0, H1, . . . , Ht) ∈ Rt+1>0 , we de-

fine a (t+ 1)-dimensional region Ha(H) as

Ha(H) =(a0, a1, . . . , at)T ∈ Zt+1 ||ai| ≤ Hi (0 ≤ i ≤ t), at ≥ 0.

Here H and Ha are called a sieving interval and a siev-ing region, respectively. Next, the norm of ha(αi) is de-fined by N(ha(αi)) = |Res(ha, fi)|, where Res(ha, fi) isthe resultant of ha(X) and fi(X) for i = 1, 2. In thestep of searching relations, for the given sieving inter-val H and the smoothness bound B1, B2, we try to finda ∈ Zt+1 (called a hit tuple) that satisfies the follow-ing conditions: N(ha(α1)) is B1-smooth, N(ha(α2)) isB2-smooth, gcd(a0, a1, . . . , at) = 1, where an integer isB-smooth if and only if its prime factors are at mostB. We denote by S the set of all hit tuples gathered insearching relations. In order to solve the correct discretelogarithm, the size of S is chosen as

♯S ≥ ♯B1 + ♯B2 + 2n. (1)

From ϕ1(ha(α1)) = ϕ2(ha(α2)), using a ∈ S and thehomomorphism maps in Section 2.2, we obtain relationsof discrete logarithms. Consequently, we can computethe discrete logarithms of q ∈ Bi by solving the linearequations obtained from the relations.

3. Searching relations by multi-

dimensional sieving

In the following, we describe the line sieve presentedby Zajac [6]. If q | N(ha(αi)) holds for a prime q < Bi

and i = 1, 2, then q | N(h(αi)) holds for any polynomialh(X) = ha(X) + kq where k is any integer. From thisfact, we can search a hit tuple a divisible by q in thesieving region without performing the division of inte-gers. Similarly, for q = (q, g) ∈ Bi(i = 1, 2), we have theproperty

g | ha mod q ⇒ qdeg g | N(ha(αi)). (2)

For ∀q = (q, g) ∈ Bi where i ∈ 1, 2 is fixed, we ac-cumulate deg g log q in a variable L[a] if the sufficientcondition in (2) for (q, g) and a is satisfied. Then, wecan find a candidate of a ∈ Ha whose norm N(ha(αi))is Bi-smooth by checking logN(ha(αi)) − L[a] is suffi-ciently small.Let Id be the identity matrix of size d × d. For ∀q =

(q, g) ∈ Bi(i = 1, 2) where g =∑deg g

j=0 gjXj , the set of all

polynomials in Z[X] of degree less than t+1 that satisfythe sufficient condition in (2) is generated by the integerlinear combinations of the columns of the following (t+1)× (t+ 1) matrix:

g0 0

qIdeg g g1. . .

.... . . g0

gdeg g g1

0. . .

...0 gdeg g

. (3)

Since g is a monic polynomial (see Section 2.3), we canconvert the i-th column (deg g+1 ≤ i ≤ t+1) columns ofthis matrix (3) by integer linear combinations of columnsas follows:

Mq =

(qIdeg g Tq

0 It−deg g+1

), (4)

where Tq is a deg g× (t−deg g+1) integer matrix. Con-versely, for any c = (c0, c1, . . . , ct)

T ∈ Zt+1 the polyno-mial vector a =Mq c satisfies the sufficient condition in(2). Therefore, for the matrix Tq and c ∈ Zt+1, we canrepresent Mq as follows:

(a0, a1, . . . , adeg g−1)T

=q(c0, c1, . . . , cdeg g−1)T+Tq(adeg g, adeg g+1, . . . , at)

T.(5)

For Tq and (adeg g, adegg +1, . . . , at)T ∈ Zt−degg +1, we

set (u0, u1, . . . , udeg g−1) = Tq (adeg g, adegg +1, . . . , at)T.

Then, we can search a that satisfies the sufficient con-dition in (2) by repeatedly adding u0, u1, . . . , udeg g−1

to q in the sieving region Ha for (u0, u1, . . . , udeg g−1,adeg g, adeg g+1, . . . , at).

4. Proposed multi-dimensional lattice

sieve

The lattice sieve tries to find candidates of hit tuplesin the lattice whose elements are divisible by q ∈ Bi(called special-q). For a special-q = (q, g) ∈ Bi, letMq bethe matrix of equation (4), and let MLLL

q be the matrixgenerated by the LLL algorithm [8] from Mq.In this paper, we call the search space of dimension

t+1 for hit tuple a ∈ Ha the a-space. On the other hand,the (t+1)-dimensional latticeMLLL

q , which is generated

by MLLLq c for c ∈ Zt+1, is called the c-space. Moreover,

for a sieving interval Hc ∈ R>0, we define the sievingregion over the c-space by

Hc(Hc) = (c0, c1, . . . , ct)T ∈ Zt+1 ||ci| ≤ Hc (0 ≤ i ≤ t), ct ≥ 0.

The lattice sieve for the special-q searches candidates ofhit tuples in the sieving region Hc in the c-space.Next, we construct the matrix Mr from an element

r = (r, h) ∈ Bi that is different from q in the factorbase. By the same method for generating Mq from q,we can obtain equation (5) corresponding toMr, and byreducing vector r(c0, c1, . . . , cdeg h−1)

T modulo r, we can

– 54 –

Page 59: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.53–56 Kenichiro Hayasaka et al.

0

5e+07

1e+08

1.5e+08

2e+08

0 1e+25 2e+25 3e+25 4e+25

VB

VH

pn: 203-bit

t = 6t = 7t = 8t = 9

Fig. 1. Here VH is the size of the sieving region and VB is thatof the factor bases of the multi-dimensional lattice sieve for thenumber field sieve over the extension field GF(p12) of 203 bits.

yield the following equation

(a0, a1, . . . , adeg h−1)T

≡ Tr(adeg h, adeg h+1, . . . ,at)T (mod r). (6)

Here, we decompose the (t+1)× (t+1) matrix MLLLq

into the deg h×(t+1) matrixMLLLq ,1 and the (t−deg h+

1)×(t+1) matrixMLLLq ,2 asMLLL

q =(

MLLLq ,1

MLLLq ,2

). The set of

all elements a divisible by q is represented by a =MLLLq c

for c ∈ Zt+1, namely

(a0, a1, . . . , adeg h−1)T =MLLL

q ,1 c, (7)

(adeg h, adeg h+1, . . . , at)T =MLLL

q ,2 c. (8)

Therefore, from equations (6) and (8), we obtain

(MLLLq ,1 − Tr MLLL

q ,2 ) c ≡ 0 (mod r). (9)

Next, let Mq ,r be the lattice generated by c from equa-tion (9), namely Mq ,r is the kernel of the linear map(MLLL

q ,1 −Tr MLLLq ,2 ). Note that a =MLLL

q Mq ,r e for anye = (e0, e1, . . . , et) ∈ Zt+1 satisfies the sufficient condi-tion in (2) for both q and r. We can compute Mq ,r fromMLLL

q ,1 − Tr MLLLq ,2 corresponding to equation (9).

5. How to select parameters t,H,B1, B2

In this section, we explain how to select the param-eters of the lattice sieve in Section 2.3 for given twopolynomials f1, f2 in the polynomial selection in Section2.2. In particular, we discuss suitable size of the dimen-sion t + 1, the sieving interval H, and the smoothnessbounds B1, B2 that satisfy inequality (1) for the num-ber field sieve over extension fields GF(p12). If we selectthe parameters that accelerate both the searching re-lation step and the linear algebra step simultaneously,then the total running time of the number field sievebecomes faster.

5.1 Selection of t

We denote by VH the size of the sieving region Ha(H),namely VH = 2t

∏tj=0Hj . We extend the estimation of

the average norm in the two-dimensional lattice sieve[9] to our multi-dimensional case. The average norm

Nave(ha(αi)) of the polynomial fi (i = 1, 2) in the latticesieve of dimension t+ 1 is evaluated by the formula

Nave(ha(αi)) =√∫Ht

0

∫Ht−1

−Ht−1. . .∫H0

−H0(Res(ha, fi))2 da0 . . . dat

VH.

Moreover, we approximate the probability ρ(x, y) thatthe integers smaller than x are y-smooth to be(logy x)

− logy x, and we assume that the total size ofthe factor bases B1,B2 is VB = π(B1) + π(B2) whereπ(Bi) is the number of primes smaller than or equal toBi (i = 1, 2). Let R be the number of hit tuples in thesieving region Ha(H). Then R is calculated by

R = ρ(Nave(ha(α1)), B1)ρ(Nave(ha(α2)), B2)VH . (10)

Here, we have to find parameters that satisfy (1), namelyR > VB. Fig. 1 shows the minimal VB that satisfiesR > VB for VH in the lattice sieve of dimension t + 1in the extension field GF(p12) of 203 bits. In order toreduce the time of searching such a bound VB, we setH0 = H1 = · · · = Ht and B1 = B2. From Fig. 1, we canselect smaller sizes VH of the sieving region and VB of thefactor bases that satisfy inequality (1) using dimension8 for the extension field GF(p12) of 203 bits.

5.2 Selection of H and B1, B2

For fixed sizes VH of the sieving region and VBof the factor bases, we first select a sieving in-terval H and then smoothness bounds B1, B2. Thesieving interval H is chosen so that the probabilityρ(Nave(ha(α1)), B1)ρ(Nave(ha(α2)), B2) of a hit tuple inequation (10) is maximum for fixed B1, B2 with B1 =B2. For the above H, we then select B1, B2 so that thenumber of hit tuples in equation (10) become maximum.

6. Our experiment on number field sieve

over GF(p12)

In this section, we report our experiment on solvingthe DLP over the extension field GF(p12) of 203 bitsusing the number field sieve in Section 2. We chose thecharacteristic p = 122663 of 17 bits, namely the cardi-nality of the extension field GF(p12) is

p12 =1160280479014934899128936416124 \

5260072909585140266491307794081.

The computational environment in our experiment isas follows. We used one PC equipped with four CPUs(Intel Xeon X7350 2.93 GHz; Core2 microarchitecture;16 cores in total) and 64 GBytes of RAM. We utilizegmp-5.0.5 for the arithmetic of multi-precision integers,openmpi-1.6 for parallel implementation between pro-cesses, pari-2.5.1 for the decomposition of ideals in thenumber fields, and ntl-5.5.2 for the computation of lat-tice reduction using the LLL algorithm. We use C++with compiler gcc-4.7.1 on Linux OS (64 bits).Table 1 presents the experimental data in our imple-

mentation and the previous ones of the number fieldsieve over extension fields GF(pn).

– 55 –

Page 60: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.53–56 Kenichiro Hayasaka et al.

Table 1. Comparison of known experiments of the number field sieve over extension field GF(pn).

Finite Field GF(p3) GF(p6) GF(p12)

Authors Joux et al. [4] Zajac [5] This paper

Year 2006 2008 2012

CPU Alpha (1.15GHz) × 8 Sempron (2.01GHz) × 8 Xeon (2.93GHz) × 4

Days 19 days 5 days 2 days

Bit Length 394 242 203

Sieving 2-dim. lattice sieve 3-dim. line sieve 7-dim. lattice sieve

6.1 Polynomial selection

In order to select two polynomials f1, f2 in Section 2.2,we use the polynomial selection similar to the previousexperiments [4] and [5]. At first, an irreducible polyno-mial f1 ∈ Z[X] of degree 12 with small coefficients ischosen, and then we set f2 = f1 + p or f2 = f1 − p.In this paper, Murphy’s α function [9] is used for

selecting a more suitable pair of polynomials f1, f2.If Murphy’s α function fi (i = 1, 2) is smaller, thenthe norm N(ha(αi)) (i = 1, 2) is expected to becomesmoother, namely it is divisible by small prime divisorswith higher probability. The coefficients of the polyno-mial f1 are searched in the range of ±10, and thenthe sum of Murphy’s α of the following polynomialsf1, f2 is the smallest among the range of our search:f1(X) = X12 − 3X4 + 9X3 − 9X2 − 9X + 2, f2(X) =X12 − 3X4 + 9X3 − 9X2 − 9X − 122661.

6.2 Searching relations

In the estimation of Section 5.1, the suitable dimen-sion of the lattice sieve for the extension field GF(p12)of 203 bits was estimated to be eight. We perform someexperiments of the lattice sieve of dimensions 6, 7 and 8for a random special-q with fixed VH and VB . From theseexperiments, the lattice sieve of dimension 7 yields thelargest number of hit tuples for one special-q, and thenwe select H = (443, 427, 304, 140, 70, 24, 9) and smooth-ness bounds B1 = 114547 and, B2 = 148859.We run the lattice sieve using the above polynomials

f1, f2 and the above parameters t,H,B1, B2. Our exper-iment has generated 32,241 hit tuples in about 42 hoursusing only 6 cores in our computational environment.This is about 1.3 times larger than the sufficient num-ber ♯B1 + ♯B2 + 2n of hit tuples.

6.3 Linear algebra

From the hit tuples in the searching relations, weconstruct a matrix of linear equations modulo ℓ =6118607636866573789 (63 bits) that is the maximumprime divisor of p12 − 1. The size of the matrix is32241× 24463, and it is shrunk to 16579× 15073 by thefilter process such as eliminating duplicated hit tuples.Then, we solve it by the Lanczos method.We found the solutions of the linear equations in about

25 minutes using the 16 cores in our computational en-vironment, and the logarithms of q ∈ Bi was obtained.Finally, we present an example of the discrete loga-

rithm. Let γ = x2 + x− 7 be a generator of GF(p12)∗ =(GF(p)[X]/f1(X))∗. Let δ = x2− 5x+7 be a target ele-ment of solving the discrete logarithm logγ δ in GF(p12).Note that both γ and δ are B1-smooth. The above linearequations modulo ℓ yields log δ = 3540036734608022534

and log γ = 3897708711757659596, and thus the discretelogarithm logg δ in GF(p12) is computed by log δ/ log γ =3161374319443177763 mod ℓ.

7. Conclusion

In this paper, we presented an implementation of thenumber field sieve for solving the DLP over extensionfields GF(pn) that underpinned the security of pairing-based cryptography. Especially, we proposed an imple-mentation of the lattice sieve of dimension higher thantwo. In our experiment, we discussed the dimension andthe size of the sieving region suitable for the numberfield sieve over extension fields GF(p12). Finally, we havesolved the DLP over an extension field GF(p12) of 203bits using a PC of 16 CPU core in about 43 hours.In the future, we discuss how to select the sieving re-

gion for the DLP over extension fields GF(p12) of largerbits. We also extend the efficient lattice sieve of dimen-sion two to the lattice sieve of dimension higher thantwo.

References

[1] F. Vercauteren, Optimal pairings, IEEE Trans. Inform. The-ory, 56 (2010), 455–461.

[2] P. S. L. M. Barreto and M. Naehrig, Pairing-friendly ellipticcurves of prime order, in: Proc. of SAC 2005, LNCS, Vol.

3897, pp. 319–331, Springer-Verlag, Berlin, 2006.[3] A. Joux and R. Lercier, Improvements to the general number

field sieve for discrete logarithms in prime fields. A compar-ison with the Gaussian integer method, Math. Comput., 72

(2003), 953–967.[4] A. Joux, R. Lercier, N. P. Smart and F. Vercauteren, The

number field sieve in the medium prime case, in: Proc. ofCRYPTO 2006, LNCS, Vol. 4117, pp. 326–344, Springer-

Verlag, Berlin, 2006.[5] P.Zajac, Discrete logarithm problem in degree six finite fields,

Ph.D. thesis, Slovak Univ. of Technology, 2008.[6] P. Zajac, On the use of the lattice sieve in the 3D NFS, Tatra

Mt. Math. Publ., 45 (2010), 161–172.[7] J. M. Pollard, The lattice sieve, in: Lecture Notes in Math.,

A. K. Lenstra and H. W. Lenstra eds., Vol. 1554, pp. 43–49,

Springer-Verlag, Berlin, 1993.[8] A. K. Lenstra, H. W. Lenstra and L. Lovasz, Factoring poly-

nomials with rational coefficients, Math. Ann., 261 (1982),515–534.

[9] B. Murphy, Polynomial selection for the number field sieveinteger factorisation algorithm, Ph.D. thesis, The AustralianNational Univ., 1999.

– 56 –

Page 61: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol.6 (2014) pp.57–60 c⃝2014 Japan Society for Industrial and Applied Mathematics J S I A MLetters

Convergence analysis of the parallel classical block Jacobi

method for the symmetric eigenvalue problem

Yusaku Yamamoto1,2, Zhang Lan3 and Shuhei Kudo3

1 The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu, Tokyo 182-8585, Japan2 JST CREST, 4-1-8 Kawaguchi-Honcho, Kawaguchi, Saitama 332-0012, Japan3 Kobe University, 1-1 Rokko-dai, Nada, Kobe 657-8501, Japan

E-mail yusaku.yamamoto uec.ac.jp

Received July 25, 2014, Accepted September 22, 2014

Abstract

We analyze convergence properties of the parallel classical block Jacobi method for the sym-metric eigenvalue problem using dynamic ordering strategy of Becka et al. It is shown thatthe method is globally convergent. It is also shown that the order of convergence is ultimatelyquadratic if there are no multiple eigenvalues.

Keywords symmetric eigenvalue problem, Jacobi method, parallel computing

Research Activity Group Algorithms for Matrix / Eigenvalue Problems and their Applications

1. Introduction

In this paper, we consider the problem of computingall the eigenvalues and eigenvectors of a real symmetricn × n matrix A. For this problem, algorithms based ontri-diagonalization of the coefficient matrix have beenused as a standard procedure. However, the algorithmof tri-diagonalization has small parallel granularity andrequires O(n) inter-processor synchronization. Due tothis, when the matrix is not very large, the overhead ofsynchronization becomes dominant and the performancetends to be saturated with a small number of processors.Recently, block Jacobi methods have attracted atten-

tion as an alternative to the tri-diagonalization basedapproach. They are block versions of the well known Ja-cobi methods for the symmetric eigenvalue problem andare based on the idea of making the matrix close to di-agonal by eliminating off-diagonal blocks by orthogonaltransformations. Although they require more computa-tional work than the tri-diagonalization based methods,they have desirable properties from the viewpoint of highperformance computing, such as large parallel granular-ity and efficient use of level-3 BLAS routines [1].There are several versions of the block Jacobi meth-

ods, which differ mainly in the order of selecting theoff-diagonal blocks to be eliminated. Among them, theclassical block Jacobi method, which eliminates the off-diagonal block with the largest Frobenius norm at eachstage, is known for its fast convergence. Becka et al. pro-posed to parallelize this method using a strategy calleddynamic ordering [2]. In this strategy, one selects a set ofoff-diagonal blocks at each stage in such a way that thesum of squares of their Frobenius norms is maximal un-der the constraint that they can be eliminated simulta-neously, and eliminates these blocks in parallel. Accord-ing to numerical experiments, this strategy has provedefficient in terms of both convergence speed and parallelefficiency. However, convergence properties of this strat-

egy have yet to be elucidated. This is in contrast to thecase of the block cyclic Jacobi method, whose conver-gence has been analyzed recently by Drmac [3].In this paper, we analyze global and local convergence

properties of the parallel classical block Jacobi methodusing dynamic ordering. In particular, we show that thismethod is globally convergent. We also show that the or-der of convergence is ultimately quadratic if there are nomultiple eigenvalues. Hence, we can ensure theoreticallythat this method has excellent convergence properties.This paper is structured as follows. In Section 2, we

explain the parallel classical block Jacobi method withdynamic ordering. Its global and local convergence prop-erties are discussed in Sections 3 and 4, respectively. Nu-merical results that support our analysis are shown inSection 5. Section 6 gives some concluding remarks.

2. The parallel classical block Jacobi

method using dynamic ordering

Let P be the number of processors and assume thatn is divisible by 2P , though extension to a more generalcase is straightforward. Let L = n/2P and suppose thatthe matrix A(0) = A is partitioned into blocks of sizeL×L. We denote the (I, J) block of A by AIJ . In the kthstep of the (sequential) classical block Jacobi method,

one eliminates the block A(k)XY with the largest Frobenius

norm (F-norm) by an orthogonal transformation:

A(k+1) = (P (k))⊤A(k)P (k). (1)

The n× n orthogonal matrix P (k) is defined as follows.Let us define a 2L× 2L matrix A(k) by

A(k) =

(A

(k)XX A

(k)XY

A(k)Y X A

(k)Y Y

)(2)

and denote the eigenvector matrix of A(k) by P (k). Now,

partition P (k) into four L × L blocks P(k)XX , P

(k)XY , P

(k)Y X

– 57 –

Page 62: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.57–60 Yusaku Yamamoto et al.

and P(k)Y Y and construct the matrix P (k) by embedding

these blocks into the (X,X), (X,Y ), (Y,X) and (Y, Y )blocks of the n × n identity matrix In. It is easy to seethat by using the P (k) thus constructed, the (X,Y ) and(Y,X) blocks of A(k+1) become zero.It is to be noted that only the Xth and Y th block

rows and Xth and Y th block columns of A(k) are up-dated with the transformation (1). Using this fact, one

can eliminate another off-diagonal block, say A(k)X′Y ′ ,

simultaneously with A(k)XY if X ′ = X,Y and Y ′ =

X,Y . More generally, P off-diagonal blocks, A(k)Xℓ,Yℓ

Pℓ=1

can be eliminated in parallel if X1, Y1, . . . , XP , YP areall different, that is, if they are some permutation of1, 2, . . . , 2P . In the dynamic ordering strategy of Beckaet al., X1, Y1, . . . , XP , YP are determined under this con-

straint to maximize∑P

ℓ=1 ∥A(k)Xℓ,Yℓ

∥2F , where ∥ · ∥F de-notes the F-norm. Thus this method can be viewed as ageneralization of the classical block Jacobi method.The problem of finding such X1, Y1, . . . , XP , YP can

be formulated as a maximum weight matching problemof the perfect graph of degree 2P , where the edges andweights correspond to the off-diagonal blocks and theirF-norms, respectively. In the implementation of Becka etal., this problem is solved approximately using a greedyalgorithm, that is, by selecting the off-diagonal blockwith the largest F-norm first and then selecting the off-diagonal block with the largest F-norm from the not yetselected block rows and columns, and so on. In the fol-lowing, we analyze the convergence of the parallel clas-sical block Jacobi method under this greedy strategy.

3. Global convergence

We first consider the global convergence of the se-quential classical block Jacobi method. Let w = 2P andW = w(w − 1)/2, the number of off-diagonal blocks inthe upper triangular part. The following theorem holds.

Theorem 1 In the classical block Jacobi method, sumof squares of the F-norms of off-diagonal blocks satisfiesthe following inequality and therefore converges to zeroas k →∞.∑

I =J

∥A(k+1)IJ ∥2F ≤

(1− 1

W

)∑I =J

∥A(k)IJ ∥

2F . (3)

Proof This is a direct extension of the global conver-gence theorem for the classical Jacobi method [4] to theblock case. Let us rewrite A(k) and A(k+1) as A and B,respectively, for simplicity. Since A and B are unitarilyequivalent and ∥BXY ∥2F = ∥BY X∥2F = 0, it follows that

∥BXX∥2F + ∥BXY ∥2F= ∥AXX∥2F + ∥AY Y ∥2F + 2∥AXY ∥2F . (4)

Noting that other diagonal blocks than AXX and AY Y

are unchanged and ∥A∥2F = ∥B∥2F , we have∑I =J

∥BIJ∥2F =∑I =J

∥AIJ∥2F − 2∥AXY ∥2F

≤∑I =J

∥AIJ∥2F −2

w(w − 1)

∑I =J

∥AIJ∥2F

≤(1− 1

W

)∑I =J

∥AIJ∥2F , (5)

where we used the fact that AXY is the block with thelargest F-norm in the first inequality.

(QED)

From this result, we immediately have the followingtheorem.

Theorem 2 In the parallel classical block Jacobimethod with the greedy strategy, the off-diagonal ele-ments of A(k) converges to zero as k →∞.

Proof The matrices generated by the parallel greedymethod also satisfy (3) because the off-diagonal blockwith the largest F-norm is included in the set of P blockseliminated at the kth step. Hence, the off-diagonal blocksconverge to zero as k → ∞. On the other hand, off-diagonal elements of all the diagonal blocks become zeroafter the first step and remain zero thereafter. This is

because A(k+1)Xℓ,Xℓ

and A(k+1)Yℓ,Yℓ

become diagonal matricesfor ℓ = 1, 2, . . . , P from the construction of the 2L× 2Lorthogonal transformations.

(QED)

This establishes global convergence of the parallelclassical block Jacobi method with the greedy strategy.The eigenvectors can be computed from the product ofthe orthogonal matrices, as usual.

4. Local quadratic convergence

Let the eigenvalues of A be λ1, λ2, . . . , λn. In the fol-lowing, we assume that there are no multiple eigenvaluesand define d = mini =j |λi − λj |. Before going into theanalysis, we first quote the famous sinΘ theorem [4],which will be a useful tool in our analysis.

Theorem 3 (sinΘ theorem) Let A ∈ Rn×n be asymmetric matrix and y ∈ Rn be a vector with ∥y∥ = 1.Define α = y⊤Ay and r(y) = Ay − αy. Now, let λibe the eigenvalue of A that is closest to α, xi be thecorresponding eigenvector, θ = ∠(y,xi) and gap(α) =minλj =λi |λj − α|. Then the following inequality holds.

| sin θ| ≤ ∥r(y)∥gap(α)

. (6)

We first give a theorem on quadratic convergence ofthe sequential classical block Jacobi method. The sce-nario of the proof follows closely that of the quadraticconvergence proof of the classical Jacobi method [5]. Themain difference is that we need more intricate discussionusing the sinΘ theorem to bound the F-norms of the off-diagonal blocks of P .

Theorem 4 Let δ =∑

I =J ∥A(m)IJ ∥F . If w

√LW δ ≤

d/4, then the matrix obtained by applying W steps ofthe classical block Jacobi method to A(m) satisfies

∥A(m+W )IJ ∥F ≤

2W 2√L

dδ2 (I = J), (7)

that is, the F-norms of the off-diagonal blocks convergeto zero quadratically after every W steps.

Proof The proof will be given in several steps.

– 58 –

Page 63: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.57–60 Yusaku Yamamoto et al.

(i) Upper bound on ∥A(m+k)∥F The square sum ofF-norms of the off-diagonal blocks at step m satisfies∑

J =I

∥AIJ(m)∥2F ≤ 2Wδ2. (8)

From (3), it is clear that the same bound holds for∑J =I ∥AIJ

(m+k)∥2F (k ≥ 0). By taking the symmetry

into account, the upper bound of ∥A(m+k)IJ ∥F is given as

∥A(m+k)IJ ∥F ≤

√W δ (I = J, k ≥ 0). (9)

(ii) Lower bound on the difference between twodiagonal elements of A(m+k) Let the (i, j) element

of A(k) be denoted by a(k)ij . The ith Gershgorin circle Ci

of A(m+k) is defined by

Ci : |z − aii(m+k)| ≤∑j =i

|aij(m+k)|. (10)

Let the number of the block to which i belongs be I1.Noting that the diagonal blocks of A(m+k) are diagonal,the right hand side can be evaluated as∑

j =i

|aij(m+k)|

=∑J =I1

JL∑j=(J−1)L+1

|a(m+k)ij |

≤∑J =I1

√√√√ JL∑j=(J−1)L+1

1

√√√√ JL∑j=(J−1)L+1

|a(m+k)ij |2

≤∑J =I1

√L√W δ =

w − 1

w· d4, (11)

where we used the Cauchy-Schwarz inequality and (9).Since the radius of each circle is smaller than d/4 andthe minimum distance between the eigenvalues is d,each circle contains at most one eigenvalue. This meansthat each circle contains exactly one eigenvalue. Let theeigenvalue contained in Ci be λi. Then,

|aii(m+k) − λi| ≤w − 1

w· d4

(1 ≤ i ≤ n). (12)

Using the triangular inequality, we have for i = j,

|aii(m+k) − ajj(m+k)|≥ |λi − λj | − |λi − aii(m+k)| − |ajj(m+k) − λj |

≥ d− w − 1

w· d4− w − 1

w· d4=w + 1

w· d2. (13)

(iii) Change of the F-norm of an off-diagonalblock by one step Suppose that an off-diagonal

block A(m+k)IJ is changed by the elimination of A

(m+k)XY .

As a representative case, we consider the case of I = Xand J = X,Y . Let us rewrite A(k+m) and A(k+m+1) asA and B, respectively, for simplicity. By denoting theeigenvector matrix of A by P , we have from (1),

BXJ = P⊤XXAXJ + P⊤

Y XAY J (J = X,Y ). (14)

Hence,

∥BXJ∥F ≤ ∥P⊤XX∥2∥AXJ∥F + ∥P⊤

Y X∥2∥AY J∥F

≤ ∥AXJ∥F + ∥PY X∥2∥AY J∥F , (15)

where we used the fact that PXX is a submatrix of anorthogonal matrix P and therefore ∥PXX∥2 ≤ 1.Now, we evaluate ∥PY X∥2. Denote the diagonal ele-

ments of A by aii (i = 1, 2, . . . , 2L). Then, from (9) andthe fact that both AXX and AY Y are diagonal, we havethe following Gerschgorin circles for A:

Ci : |z − aii| ≤√LW δ, (16)

where we again used the Cauchy-Schwarz inequality toderive the right-hand side. Since

√LW δ ≤ d/(4w) from

the assumption, the radius of each circle is smaller thanhalf of the minimum distance between diagonal elementsof A given in (13). Thus all the circles are disjoint andeach circle contains exactly one eigenvalue of A. Let theeigenvalue of A contained in C be µi and the eigenvectorcorresponding to µi be pi.Now, denote the ith column of the identity matrix of

order 2L by ei. We apply the sinΘ theorem by substitut-ing A and ei into A and y, respectively, of the theorem.Then α and r(ei) in the theorem can be calculated as

αi = e⊤i Aei = aii, (17)

r(ei) = Aei − αiei = ai − aiiei, (18)

where ai is the ith column vector of A. Thus, if 1 ≤ i ≤L, we have

∥r(ei)∥ =√∑

j =i

a2ji ≤ ∥AY X∥F ≤√W δ. (19)

A similar result follows in the case of L + 1 ≤ i ≤ 2L.For gap(αi), we can derive the following lower bound:

gap(αi) = minµj =µi

|µj − aii|

≥ minµj =µi

(|aii − ajj | − |µj − ajj |)

≥ w + 1

w· d2− d

4w≥ d

2. (20)

By inserting (18) and (19) into (6), we have

| sin θi| ≤∥r(ei)∥gap

≤ 2√W δ

d. (21)

where θi = ∠(ei, pi). On the other hand,√∑j =i

p2ji =√1− p2ii =

√1− (pi · ei)2 = | sin θi|. (22)

Combining (21) and (22), we have

∥PY X∥2 ≤

√√√√ L∑i=1

2L∑j=L+1

p2ji ≤

√√√√ L∑i=1

∑j =i

p2ji

√√√√ L∑i=1

sin2 θi ≤2√LW

dδ. (23)

Finally, insertion of (23) and (9) into (15) leads to

∥A(m+k+1)XJ ∥F ≤ ∥A(m+k)

XJ ∥F +2W√L

dδ2, (24)

– 59 –

Page 64: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.57–60 Yusaku Yamamoto et al.

which means that the F-norm of an off-diagonal blockincreases by at most 2W

√Lδ2/d by each elimination.

(iv) Change of the F-norm of an off-diagonalblock after W steps We can show by induction thatamong the W off-diagonal blocks in the upper triangu-lar part of of A(k+m), there are at least k blocks whoseF-norm is smaller than or equal to 2W (k − 1)

√Lδ2/d.

In fact, this proposition clearly is true for k = 1. Assumethat it is true for some k ≥ 1. Then we can consider twocases.(a) If the block to be eliminated next is chosen fromthese k blocks, it means that the F-norm of the block issmaller than or equal to 2W (k−1)

√Lδ2/d. But because

this is the off-diagonal block with the largest F-norm,all the off-diagonal blocks have F-norm smaller than orequal to 2W (k−1)

√Lδ2/d. The F-norms of these blocks

will increase by at most 2W√Lδ2/d by the elimination.

Thus, the proposition holds true also for k + 1.(b) Otherwise, the F-norms of these k blocks will in-crease by at most 2W

√Lδ2/d by the elimination. On

the other hand, the F-norm of the block to be elimi-nated will become zero. Thus the proposition holds truefor k + 1 also in this case.Now that we have proved the proposition, we can put

k =W and readily obtain (7).(QED)

Finally, we give a theorem on quadratic convergenceof the parallel classical block Jacobi method.

Theorem 5 Under the same condition as in Theo-rem 4, the matrix obtained by applying W steps of theparallel classical block Jacobi method to A(m) satisfies

∥A(m+W )IJ ∥F ≤

4W 2√L

dδ2 (I = J), (25)

that is, the F-norms of the off-diagonal blocks convergeto zero quadratically after every W steps.

Proof Steps (i) through (iii) of the proof are the sameas those in the proof of Theorem 4, except that the sec-ond term in the right-hand side of (24) is replaced by4W√Lδ2/d. This is because in the parallel algorithm,

w/2 off-diagonal blocks are eliminated at once and there-fore each off-diagonal block is updated both from theleft and right. Also, it is to be noted that the w/2 off-diagonal blocks include the block with the largest F-norm.As for step (iv), we consider the same proposition as

in the proof of Theorem 4, where 2W (k − 1)√Lδ2/d is

replaced by 4W (k − 1)√Lδ2/d, and prove it by induc-

tion. We assume that the proposition is true for somek ≥ 1 and consider two cases: (a) the block with thelargest F-norm among the w/2 off-diagonal blocks to beeliminated next is chosen from the k blocks specified inthe proposition, or (b) otherwise. In either case, it can beshown with the same logic as in the proof of Theorem 4that there are at least k+1 off-diagonal blocks in the up-per triangular part of A(k+m+1) whose F-norm is smallerthan or equal to 4Wk

√Lδ2/d. Hence, the proposition

is true also for k + 1. The theorem follows by puttingk =W .

(QED)

Fig. 1. Convergence of the three block Jacobi methods.

5. Numerical results

In this section, we show an example of convergencebehavior of the block Jacobi methods. As the test ma-trix, we use a real symmetric random matrix of ordern = 1, 200, whose elements follow the uniform distribu-tion in [0, 10]. Fig. 1 shows the convergence of three typesof block Jacobi methods, namely, the sequential classi-cal block Jacobi method, parallel classical block Jacobimethod, and the sequential cyclic block Jacobi method.It is clear that the convergence of both the sequentialand parallel classical Jacobi methods are quadratic, aspredicted by Theorems 4 and 5. The convergence of thecyclic method is also quadratic, as analyzed in [3], butthe speed is slower.

6. Conclusion

In this paper, we presented theoretical convergence re-sults for the parallel classical block Jacobi method usingthe dynamic ordering strategy. Our future work includesextension of the results to the case of multiple eigenval-ues and evaluation of the algorithm on large-scale par-allel machines.

Acknowledgment

We express our sincere gratitude to the anonymousreferee, whose comments helped us to improve the paper.We also thank Prof. Takaharu Yaguchi, Prof. MitsuoYokokawa and Prof. Yoshio Oyanagi of Kobe Universityfor fruitful discussions.

References

[1] Y. Takahashi, Y. Hirota and Y. Yamamoto, Performance ofthe block Jacobi method for the symmetric eigenvalue prob-

lem on a modern massively parallel computer, in: Proc. ofALGORITMY 2012, A.Handlovicova, Z.Minarechova and D.Sevcovic eds., pp. 151–160, House of STU, Bratislava, 2012.

[2] M. Becka, G. Oksa and M. Vajtersic, Dynamic ordering for

a parallel block-Jacobi SVD algorithm, Parallel Comput., 28(2002), 243–262.

[3] Z. Drmac, A global convergence proof for cyclic Jacobi

method with block rotations, SIAM J. Matrix Anal. Appl.,31 (2009), 1329–1350.

[4] B. N. Parlett, The Symmetric Eigenvalue Problem, 3rd ed.,SIAM, Philadelphia, 1987.

[5] P. Henrici, On the speed of convergence of cyclic and quasi-cyclic Jacobi methods for computing eigenvalues of Hermitianmatrices, J. SIAM, 6 (1958), 144–162.

– 60 –

Page 65: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol.6 (2014) pp.61–64 c⃝2014 Japan Society for Industrial and Applied Mathematics J S I A MLetters

Improvement of the accuracy of the approximate solution

of the Block BiCR method

Hiroto Tadano1, Youichi Ishikawa2 and Akira Imakura1

1 Faculty of Engineering, Information and Systems, University of Tsukuba, 1-1-1 Tennodai,Tsukuba, Ibaraki 305-8573, Japan

2 FBS Corporation, 7 Kanda Mitoshirocho, Chiyoda-ku, Tokyo 101-0053, Japan

E-mail tadano cs.tsukuba.ac.jp

Received April 1, 2014, Accepted July 10, 2014

Abstract

Block Krylov subspace methods are efficient solvers for linear systems with multiple right-handsides in terms of the number of iterations and computational time. As one of Block Krylovsubspace methods, the Block BiCR method has been proposed by Zhang et al. in 2013. Thismethod often shows a smooth convergence behavior compared with the Block BiCG method.However, the accuracy of the approximate solution generated by the Block BiCR method oftendeteriorates. In this paper we propose a modified Block BiCR method in order to improve theaccuracy of the approximate solutions.

Keywords Block Krylov subspace methods, linear systems, multiple right-hand sides

Research Activity Group Algorithms for Matrix / Eigenvalue Problems and their Applications

1. Introduction

In this paper we consider solving linear systems withmultiple right-hand sides:

AX = B, (1)

where A ∈ Cn×n is a non-Hermitian matrix, and X,B ∈Cn×L. Linear systems (1) appear in many applicationssuch as lattice quantum chromodynamics (QCD) calcu-lation [1], and the eigensolver based on the contour inte-gral [2]. In these applications, it is necessary to computea high accuracy approximate solution of (1).Numerical methods for (1) are roughly divided into

the direct methods and the iterative methods. As it-erative methods for (1), Block Krylov subspace meth-ods such as the Block Bi-Conjugate Gradient (BiCG)method [3], the Block BiCGSTAB method [4], and theBlock GMRES method [5] have been proposed. BlockKrylov subspace methods are efficient solvers for (1) interms of the number of iterations and computationaltime.As one of Block Krylov subspace methods, the Block

Bi-Conjugate Residual (BiCR) method [6] has been pro-posed by Zhang et al. This method is a natural extensionof the BiCR method [7] for linear systems with singleright-hand side proposed by Sogabe et al. The BlockBiCR method often shows smooth convergence behav-ior compared with the Block BiCG method. However,the accuracy of the approximate solution generated bythe Block BiCR method may deteriorate due to an errormatrix that arises from the matrix multiplication withrespect to the coefficient matrix A. In this paper, a mod-ified Block BiCR method is proposed in order to improvethe accuracy of the approximate solutions. Moreover,this method is stabilized by using the residual orthonor-

malization technique [8].This paper is organized as follows. In Section 2, the

Block BiCG method and the Block BiCR method arebriefly described. In Section 3, the influence of the er-ror matrix on the accuracy of the approximate solution isanalyzed. The modified Block BiCR method is proposedin Section 4. In Section 5, the modified Block BiCRmethod is stabilized by the residual orthonormalizationtechnique. Section 6 provides the results of numerical ex-periments to show the efficiency of the proposed method.This paper is concluded in Section 7.

2. The Block BiCG method and the

Block BiCR method

Let Xk+1 ∈ Cn×L be a (k+1)th approximate solutionof the linear systems (1). Xk+1 is computed so that thefollowing condition is satisfied.

Xk+1 = X0 + Zk+1, Zk+1 ∈ Kk+1(A;R0).

Here, R0 = B − AX0 is an initial residual, andK

k+1(A;R0) is a block Krylov subspace defined as fol-lows:

Kk+1(A;R0)

k∑

j=0

AjR0γj

∣∣∣∣∣∣γj ∈ CL×L (j = 0, 1, . . . , k)

.

The (k+ 1)th residual Rk+1 = B −AXk+1 of the BlockBiCG method [3] and the Block BiCR method [6] is com-puted by the following recurrence relations.

R0 = P0 = B −AX0 ∈ K1 (A;R0),

Rk+1 = Rk −APkαk ∈ Kk+2(A;R0), (2)

– 61 –

Page 66: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.61–64 Hiroto Tadano et al.

Table 1. Conditions for determining the L× L matrices.

Matrix Block BiCG Block BiCR

αk, βkRk ⊥ K

k (AH; R0) Rk ⊥ AHKk (AH; R0)

APk ⊥ Kk (AH; R0) APk ⊥ AHK

k (AH; R0)

αk, βkRk ⊥ K

k (A;R0) Rk ⊥ AKk (A;R0)

AHPk ⊥ Kk (A;R0) AHPk ⊥ AK

k (A;R0)

Pk+1 = Rk+1 + Pkβk ∈ Kk+2(A;R0).

Here, Pk+1 ∈ Cn×L, αk, βk ∈ CL×L. The (k + 1)th ap-proximate solution Xk+1 is updated by the following re-currence relation.

Xk+1 = Xk + Pkαk. (3)

In the Block BiCG method and the Block BiCRmethod, the residual Rk+1 and the auxiliary matrixPk+1 of the linear system AHX = B are computed simul-taneously in order to compute αk and βk. The matricesRk+1 and Pk+1 are computed by the following recurrencerelations.

R0 = P0 = B −AHX0 ∈ K1 (A

H; R0),

Rk+1 = Rk −AHPkαk ∈ Kk+2(A

H; R0), (4)

Pk+1 = Rk+1 + Pkβk ∈ Kk+2(A

H; R0).

The matrices αk, βk, αk, and βk are determined byimposing a bi-orthogonality condition shown in Table 1.Figs. 1 and 2 show the algorithm of the Block BiCGmethod and of the Block BiCR method, respectively. Inthese algorithms, ∥ · ∥F denotes the Frobenius norm ofa matrix, and ε is a sufficiently small value defined byusers.The Block BiCR method requires three matrix mul-

tiplications APk, ARk, and AHPk in each iteration.The matrices ARk and AHPk are computed by the ex-plicit matrix multiplication. To reduce the computa-tional complexity, the matrix APk is computed by therecurrence relation.

3. The influence of the error matrix on

the approximate solution

In this section, the influence of the error matrix onthe approximate solution is analyzed. Expanding (2) and(3), the (k + 1)th approximate solution Xk+1 and thecorresponding residual Rk+1 are rewritten as follows:

Xk+1 = X0 +

k∑j=0

Pjαj , (5)

Rk+1 = R0 −k∑

j=0

(APj)αj . (6)

In (6), the matrix enclosed by the brackets denotes thematrix computed in advance.From (5) and (6), the relationship between the true

residual B − AXk+1 and the residual Rk+1 can be ob-tained as follows:

B −AXk+1 = Rk+1 + Ek+1, (7)

X0 ∈ Cn×L is an initial guess,

Compute R0 = B −AX0,

Choose R0 ∈ Cn×L,

Set P0 = R0, P0 = R0, U0 = AP0, U0 = AHP0,

For k = 0, 1, . . . , until ∥Rk∥F/∥B∥F ≤ ε do:

Solve (PHk Uk)αk = RH

k Rk for αk

Solve (PHk Uk)αk = RH

k Rk for αk

Xk+1 = Xk + Pkαk,

Rk+1 = Rk − Ukαk, Rk+1 = Rk − Ukαk,

Solve (RHk Rk)βk = RH

k+1Rk+1 for βk,

Solve (RHk Rk)βk = RH

k+1Rk+1 for βk,

Pk+1 = Rk+1 + Pkβk, Pk+1 = Rk+1 + Pkβk,

Uk+1 = APk+1, Uk+1 = AHPk+1,

End For

Fig. 1. Algorithm of the Block BiCG method.

X0 ∈ Cn×L is an initial guess,

Compute R0 = B −AX0,

Choose R0 ∈ Cn×L,

Set P0 = R0, P0 = R0, U0 = V0 = AR0, U0 = AHR0,

For k = 0, 1, . . . , until ∥Rk∥F/∥B∥F ≤ ε do:

Solve (UHk Uk)αk = RH

k Vk for αk,

Solve (UHk Uk)αk = V H

k Rk for αk,

Xk+1 = Xk + Pkαk,

Rk+1 = Rk − Ukαk, Rk+1 = Rk − Ukαk,

Vk+1 = ARk+1,

Solve (RHk Vk)βk = RH

k+1Vk+1 for βk,

Solve (V Hk Rk)βk = V H

k+1Rk+1 for βk,

Pk+1 = Rk+1 + Pkβk, Pk+1 = Rk+1 + Pkβk,

Uk+1 = Vk+1 + Ukβk,

Uk+1 = AHPk+1,

End For

Fig. 2. Algorithm of the Block BiCR method.

Ek+1 =k∑

j=0

[(APj)αj −A(Pjαj)] .

Theoretically, the error matrix Ek+1 does not exist be-cause the residual Rk+1 computed by the recurrence re-lation is equal to the true residual B−AXk+1. However,in numerical computation, the matrix Ek+1 appears be-cause (APj)αj = A(Pjαj).

4. Modification of the Block BiCR

method for improving the accuracy of

the approximate solution

As mentioned in the previous section, the error ma-trix Ek+1 of (7) has an influence on the accuracy of theapproximate solution. In order to improve the accuracyof the approximate solution, we need to reduce the in-fluence of the error matrix Ek+1.The error matrix Ek+1 includes the matrix APj . In

the Block BiCR method, the matrix APk is computed bythe recurrence relation. The accuracy of the approximatesolution may deteriorate if the matrix APk computedby the recurrence relation differs appreciably from onecomputed by the explicit matrix multiplication. In thissection, the Block BiCR method is modified to computethe matrix APk by the explicit matrix multiplicationwithout the increase of computational complexity.In the modified Block BiCR method, the matrix AHPk

– 62 –

Page 67: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.61–64 Hiroto Tadano et al.

X0 ∈ Cn×L is an initial guess,

Compute R0 = B −AX0,

Choose R0 ∈ Cn×L,

Set P0 = R0, P0 = R0, U0 = AR0, U0 = V0 = AHR0,

For k = 0, 1, . . . , until ∥Rk∥F/∥B∥F ≤ ε do:

Solve (UHk Uk)αk = V H

k Rk for αk,

Solve (UHk Uk)αk = RH

k Vk for αk,

Xk+1 = Xk + Pkαk,

Rk+1 = Rk − Ukαk, Rk+1 = Rk − Ukαk,

Vk+1 = AHRk+1,

Solve (V Hk Rk)βk = V H

k+1Rk+1 for βk,

Solve (RHk Vk)βk = RH

k+1Vk+1 for βk,

Pk+1 = Rk+1 + Pkβk, Pk+1 = Rk+1 + Pkβk,

Uk+1 = Vk+1 + Ukβk,

Uk+1 = APk+1,

End For

Fig. 3. Algorithm of the modified Block BiCR method.

X0 ∈ Cn×L is an initial guess,

Compute Q0ξ0 = B −AX0,

Choose R0 ∈ Cn×L and compute Q0ξ0 = R0,

Set S0 = Q0, S0 = Q0, U0 = AQ0, U0 = V0 = AHQ0,

For k = 0, 1, . . . , until ∥ξk∥F/∥B∥F ≤ ε do:

Solve (UHk Uk)α

′k = V H

k Qk for α′k,

Solve (UHk Uk)α

′k = QH

k Vk for α′k,

Xk+1 = Xk + Skα′kξk,

Qk+1τk+1 = Qk − Ukα′k, Qk+1τk+1 = Qk − Ukα

′k,

ξk+1 = τk+1ξk,

Vk+1 = AHQk+1,

Solve (V Hk Qk)β

′k = τHk+1V

Hk+1Qk+1 for β′

k,

Solve (QHk Vk)β

′k = τHk+1Q

Hk+1Vk+1 for β′

k,

Sk+1 = Qk+1 + Skβ′k, Sk+1 = Qk+1 + Skβ

′k,

Uk+1 = Vk+1 + Ukβ′k,

Uk+1 = ASk+1,

End For

Fig. 4. Algorithm of the modified Block BiCR method with

residual orthonormalization.

is computed by the recurrence relation instead of com-puting APk by the explicit matrix multiplication. Sincethe matrix AHRk is required to compute AHPk by therecurrence relation, AHRk is computed by the explicitmatrix multiplication. The matrix ARk is not requiredby using the following transformation.

RHkARk = (AHRk)

HRk, (ARk)HRk = RH

kAHRk.

Fig. 3 shows the algorithm of the modified Block BiCRmethod.

5. Improvement of numerical stability

The residual norms of Block Krylov subspace methodsmay not converge due to numerical instability when thenumber L of right-hand sides is large. This numericalinstability comes from the loss of linear independenceamong column vectors of n × L matrices which appearin the methods. In this section, numerical stability ofthe modified Block BiCR method is improved by theresidual orthonormalization. This approach is also usedin [8].The residual Rk and the matrix Rk are factored as

Rk = Qkξk and Rk = Qk ξk by the QR factorization,respectively. The matrices Qk and Qk satisfy QH

kQk =

IL and QHk Qk = IL, respectively. Here, IL denotes the

Table 2. Test matrices.

Name Size n NNZ Structure

Si5H12 19, 896 738, 598 Real symmetricpde2961 2, 961 14, 585 Real non-symmetricpoisson3Da 13, 514 352, 762 Real non-symmetricwaveguide3D 21, 036 303, 468 Complex non-Hermitian

identity matrix of order L, and ξk, ξk ∈ CL×L. From (2)and (4), the following equations can be obtained.

Qk+1τk+1 = Qk −ASkα′k,

Qk+1τk+1 = Qk −AHSkα′k.

Here, τk+1 ≡ ξk+1ξ−1k , τk+1 ≡ ξk+1ξ

−1k , α′

k ≡ ξkαkξ−1k ,

α′k ≡ ξkαk ξ

−1k , Sk ≡ Pkξ

−1k , and Sk ≡ Pk ξ

−1k .

The algorithm of the modified Block BiCR methodwith residual orthonormalization is shown in Fig. 4. Thematrices β′

k and β′k in this algorithm are defined as

β′k ≡ ξkβkξ

−1k+1 and β′

k ≡ ξkβk ξ−1k+1, respectively. Since

the Frobenius norm of Rk satisfies ∥Rk∥F = ∥ξk∥F, theresidual norm is monitored by ∥ξk∥F instead of ∥Rk∥F.

6. Numerical experiments

In this section we evaluate the performance of theBlock BiCG method, the Block BiCR method and themodified Block BiCR method through some numeri-cal experiments. In numerical experiments, the methodswith residual orthonormalization are used to improvenumerical stability. In the rest of this paper, we callthese methods “Block BiCGrQ”, “Block BiCRrQ”, and“modified Block BiCRrQ”, respectively.Test matrices used in numerical experiments are

Si5H12, pde2961, poisson3Da, and waveguide3D [9]. Thesize n, the number of nonzero elements (NNZ), andstructure of test matrices are shown in Table 2. All ex-periments are carried out in double precision arithmeticon CPU: Intel Xeon X5650 2.67GHz, Memory: 24GiBDDR3 1333MHz, Software: MATLAB R2014a.The initial solution X0 and the matrix R0 are set as

X0 = O and R0 = R0, respectively. The right-hand sideB is given by the MATLAB function rand. The itera-tion is stopped if the condition ∥Rk∥F/∥B∥F ≤ 10−14 issatisfied.Table 3 shows the results of the Block BiCGrQ

method, the Block BiCRrQ method, and the modi-fied Block BiCRrQ method. “#Iter.”, “Res.” and “TrueRes.” denote the number of iterations, the relative resid-ual norm ∥Rk∥F/∥B∥F and the true relative residualnorm ∥B −AXk∥F/∥B∥F, respectively.From Table 3, we can see that the number of itera-

tions and the computational time of three methods arealmost the same. However, the true residual norms of themodified Block BiCRrQ method are smaller than that ofthe Block BiCRrQ method when the number L of right-hand sides is large. Hence, the modified Block BiCRrQmethod can generate the higher accuracy approximatesolutions than the Block BiCRrQ method.Fig. 5 shows the true relative residual histories for

Si5H12 in the case of L = 32. The true relative residualnorm of the Block BiCR method stagnates around 3.0×

– 63 –

Page 68: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.61–64 Hiroto Tadano et al.

Fig. 5. True relative residual histories for Si5H12 (L = 32). :

Block BiCGrQ, : Block BiCRrQ, : modified Block BiCRrQ.

10−11. On the other hand, that of the modified BlockBiCR method decreases to 4.3× 10−13.

7. Conclusion

In this paper, we have proposed the modified BlockBiCR method in order to improve the accuracy of theapproximate solutions of the Block BiCR method [6].Through some numerical experiments, we verified thatthe our proposed can generate the higher accuracy ap-proximate solutions than the Block BiCR method.

Acknowledgments

This work was supported by Strategic Programsfor Innovative Research (SPIRE) Field 5 “The originof matter and the universe”, MEXT KAKENHI (No.22104003), JSPS KAKENHI (Nos. 25870099, 25286097).

References

[1] PACS-CS Collaboration, S. Aoki et al., 2+1 flavor lattice QCDtoward the physical point, arXiv:0807.1661.v1 [heplat], 2008.

[2] T. Ikegami, T. Sakurai and U. Nagashima, A filter diago-nalization for generalized eigenvalue problems based on theSakurai-Sugiura projection method, J. Comput. Appl. Math.,233 (2010), 1927–1936.

[3] D. P. O’Leary, The block conjugate gradient algorithm andrelated methods, Linear Algebra Appl., 29 (1980), 293–322.

[4] A. El Guennouni, K. Jbilou and H. Sadok, A block version ofBiCGSTAB for linear systems with multiple right-hand sides,

Electron. Trans. Numer. Anal., 16 (2010), 129–142.[5] B. Vital, Etude de quelques methodes de resolution de

problemes lineaires de grande taille sur multiprocesseur, Ph.D.

Thesis, Universite de Rennes I, Rennes, 1990.[6] J. Zhang and J. Zhao, A novel class of block methods based on

the block AAT-Lanczos bi-orthogonalization process for ma-trix equations, Int. J. Comput. Math., 90 (2013), 341–359.

[7] T. Sogabe, M. Sugihara and S.-L. Zhang, An extension of theconjugate residual method to nonsymmetric linear systems, J.Comput. Appl. Math., 226 (2009), 103–113.

[8] A. A. Dubrulle, Retooling the method of block conjugate gra-

dients, Electron. Trans. Numer. Anal., 12 (2001), 216–233.[9] The University of Florida Sparse Matrix Collection, http://

www.cise.ufl.edu/research/sparse/matrices/.

Table 3. Results of the Block BiCGrQ method, the Block Bi-CRrQ method, and the modified Block BiCRrQ method.

Si5H12

Block BiCGrQ

L #Iter. Time/L [s] Res. True Res.

1 653 2.06 9.2× 10−15 3.0× 10−12

8 244 0.77 3.0× 10−15 9.5× 10−13

32 144 0.52 7.6× 10−15 2.9× 10−12

Block BiCRrQ

L #Iter. Time/L [s] Res. True Res.

1 663 1.98 9.4× 10−15 2.6× 10−12

8 232 0.76 9.0× 10−15 4.9× 10−12

32 134 0.49 6.1× 10−15 3.0× 10−11

modified Block BiCRrQ

L #Iter. Time/L [s] Res. True Res.

1 663 1.96 9.6× 10−15 9.5× 10−13

8 222 0.69 9.2× 10−15 5.5× 10−13

32 140 0.49 5.3× 10−15 4.3× 10−13

pde2961

Block BiCGrQ

L #Iter. Time/L [s] Res. True Res.

1 363 0.20 8.2× 10−15 7.6× 10−13

8 250 0.09 9.3× 10−15 3.0× 10−11

32 149 0.06 7.7× 10−15 1.7× 10−10

Block BiCRrQ

L #Iter. Time/L [s] Res. True Res.

1 373 0.19 9.2× 10−15 4.6× 10−13

8 279 0.10 6.3× 10−15 2.6× 10−09

32 150 0.06 3.9× 10−15 4.2× 10−09

modified Block BiCRrQ

L #Iter. Time/L [s] Res. True Res.

1 367 0.18 5.0× 10−15 4.3× 10−12

8 299 0.09 7.8× 10−15 5.9× 10−11

32 145 0.06 4.0× 10−15 9.0× 10−11

poisson3Da

Block BiCGrQ

L #Iter. Time/L [s] Res. True Res.

1 256 0.49 3.8× 10−15 2.4× 10−13

8 159 0.35 9.1× 10−15 2.6× 10−12

32 97 0.23 6.8× 10−15 1.5× 10−13

Block BiCRrQ

L #Iter. Time/L [s] Res. True Res.

1 251 0.48 6.7× 10−15 2.5× 10−13

8 169 0.38 8.0× 10−15 3.4× 10−11

32 94 0.24 8.4× 10−15 2.7× 10−13

modified Block BiCRrQ

L #Iter. Time/L [s] Res. True Res.

1 251 0.50 6.7× 10−15 2.5× 10−13

8 169 0.37 9.8× 10−15 2.1× 10−13

32 93 0.23 7.4× 10−15 1.4× 10−13

waveguide3D

Block BiCGrQ

L #Iter. Time/L [s] Res. True Res.

1 6, 419 27.30 7.5× 10−15 1.2× 10−12

8 3, 066 14.78 6.6× 10−15 1.5× 10−12

32 1, 660 13.36 9.7× 10−15 1.0× 10−11

Block BiCRrQ

L #Iter. Time/L [s] Res. True Res.

1 6, 240 27.61 9.6× 10−15 1.3× 10−12

8 3, 127 15.82 9.1× 10−15 1.2× 10−09

32 1, 725 14.43 8.8× 10−15 3.4× 10−08

modified Block BiCRrQ

L #Iter. Time/L [s] Res. True Res.

1 6, 263 27.58 1.0× 10−14 1.2× 10−12

8 3, 036 15.89 9.2× 10−15 4.6× 10−12

32 1, 769 14.75 8.3× 10−15 4.4× 10−11

– 64 –

Page 69: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol.6 (2014) pp.65–68 c⃝2014 Japan Society for Industrial and Applied Mathematics J S I A MLetters

Development of the Block BiCGSTAB(ℓ) method for

solving linear systems with multiple right hand sides

Shusaku Saito1, Hiroto Tadano1 and Akira Imakura1

1 University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8573, Japan

E-mail shusaku hpcs.cs.tsukuba.ac.jp

Received May 30, 2014, Accepted August 3, 2014

Abstract

In this paper, we derive the Block BiCGSTAB(ℓ) method which is developed by extending theBiCGSTAB(ℓ) method. We also propose some techniques to improve convergence propertiesby applying orthogonalization and to improve the accuracy of the approximate solutions byadditional matrix multiplications. Some numerical experiments indicate that the performanceof the Block BiCGSTAB(ℓ) method with those stabilization techniques can be higher thanthat of the Block BiCGSTAB method.

Keywords Block Krylov subspace methods, BiCGSTAB(ℓ), linear systems with multipleright hand sides

Research Activity Group Algorithms for Matrix / Eigenvalue Problems and their Applications

1. Introduction

In this paper, we consider linear systems with multipleright hand sides

AX = B, A ∈ Rn×n, X,B ∈ Rn×s, (1)

where A is typically sparse and s is the number ofright hand sides. To solve (1), the Block Krylov sub-space methods often show better convergence behaviorthan the Krylov subspace methods. For example, it iswell known that the Block BiCG method [1], the BlockBiCGSTAB method [2] and so on. One of the Krylovsubspace methods is the BiCGSTAB(ℓ) method [3], andthis often shows better convergence behavior than theBiCGSTAB method and so on. In this paper, we de-velop the block version method of the BiCGSTAB(ℓ)method named “Block BiCGSTAB(ℓ)”. We also proposesome techniques which improve the convergence proper-ties and accuracy of the approximate solution.This paper is organized as follows. In Section 2, we

develop the Block BiCGSTAB(ℓ) method. In Section 3,we propose some techniques which improve convergenceproperties by applying orthogonalization. In Section 4,we propose a method which raises accuracy of the ap-proximate solution by multiplying the coefficient matrixexplicitly. In Section 5, we show some numerical resultsand verify the effectiveness of these stabilization meth-ods. Finally, in Section 6, we conclude this paper.

2. The Block BiCGSTAB(ℓ) Method

The Block BiCGSTAB(ℓ) method can be developedby naturally extending the BiCGSTAB(ℓ) method. Theresidual matrix is defined as

Rmℓ := B −AXmℓ = Smℓ(A)R(BiCG)mℓ ,

where R(BiCG)mℓ is the residual matrix of the Block BiCG

method. Smℓ(λ) is a stabilization polynomial and is de-

fined as S(m+1)ℓ(λ) :=(1−

∑ℓi=1 ωm,iλ

i)Smℓ(λ), just

as the BiCGSTAB(ℓ) method. Parameters ωm,i are de-termined to minimize ∥R(m+1)ℓ∥F. Here, ∥ · ∥F denotesthe Frobenius norm of a matrix.Let R

(m)mℓ+j,i := AiSmℓ(A)R

(BiCG)mℓ+j , and P

(m)mℓ+j,i :=

AiSmℓ(A)P(BiCG)mℓ+j , where P

(BiCG)mℓ+j is the direction matrix

of the Block BiCG method. Then, the residual matrix

can be written as R(m)mℓ,0 = Smℓ(A)R

(BiCG)mℓ .

For minimizing ∥R(m+1)ℓ∥F, R(m)(m+1)ℓ,i (i = 0, 1, . . . , ℓ)

are required, and can be computed by

R(m)mℓ+j+1,i = R

(m)mℓ+j,i − P

(m)mℓ+j,i+1αmℓ+j , (2)

P(m)mℓ+j,i = R

(m)mℓ+j,i + P

(m)mℓ+j−1,iβmℓ+j−1, (3)

R(m)mℓ+j+1,j+1 = AR

(m)mℓ+j+1,j ,

P(m)mℓ+j,j+1 = AP

(m)mℓ+j,j ,

where i < j ≤ ℓ− 1. Here, the approximate solution is

Xmℓ+j+1 = Xmℓ+j + P(m)mℓ+j,0αmℓ+j . (4)

The matrices αmℓ+j , βmℓ+j ∈ Rs×s are determined tosatisfy the bi-orthogonal conditions, such that

Os = R∗T0 AiR

(BiCG)k , Os = R∗T

0 Ai+1P(BiCG)k , (5)

where i = 0, 1, . . . , k− 1 and Os is the s× s zero matrix,and R∗

0 is the n×s initial shadow residual matrix. Here,

let γ(m)j := R∗T

0 R(m)mℓ+j,j . From the conditions (5), αmℓ+j

and βmℓ+j−1 can be obtained as

αmℓ+j =(R∗T

0 P(m)mℓ+j,j+1

)−1

γ(m)j , (6)βmℓ−1=

αmℓ−1

ωm−1,ℓ

(γ(m−1)ℓ−1

)−1

γ(m)0 , (j = 0),

βmℓ+j−1=−αmℓ+j−1

(γ(m)j−1

)−1

γ(m)j , (j = 0).

(7)

– 65 –

Page 70: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.65–68 Shusaku Saito et al.

Algorithm 1 Block BiCGSTAB(ℓ)

X0 is an initial guess and R∗0 is an arbitrary matrix,

R0 = B −AX0, R(0)0,0 = R0, P

(0)−1,0 = O, α−1 = Os, ω−1,ℓ = 1,

for m = 0, 1, . . . until ∥R(m)mℓ,0∥F/∥B∥F is small enough do

for j = 0, 1, . . . , ℓ− 1 do

γ(m)j = R∗T

0 R(m)mℓ+j,j , βmℓ−1=

αmℓ−1

ωm−1,ℓ

(γ(m−1)ℓ−1

)−1γ(m)0 , (j = 0),

βmℓ+j−1=−αmℓ+j−1

(γ(m)j−1

)−1γ(m)j , (j = 0),

for i = 0, 1, . . . , j do

P(m)mℓ+j,i = R

(m)mℓ+j,i + P

(m)mℓ+j−1,iβmℓ+j−1,

end forP

(m)mℓ+j,j+1 = AP

(m)mℓ+j,j ,

αmℓ+j =(R∗T

0 P(m)mℓ+j,j+1

)−1γ(m)j ,

for i = 0, 1, . . . , j do

R(m)mℓ+j+1,i = R

(m)mℓ+j,i − P

(m)mℓ+j,i+1αmℓ+j ,

end forR

(m)mℓ+j+1,j+1 = AR

(m)mℓ+j+1,j ,

Xmℓ+j+1 = Xmℓ+j + P(m)mℓ+j,0αmℓ+j ,

end forcalculate ωm,1, ωm,2, . . . , ωm,ℓ,

R(m+1)(m+1)ℓ,0

= R(m)(m+1)ℓ,0

−∑ℓ

i=1 ωm,iR(m)(m+1)ℓ,i

,

P(m+1)(m+1)ℓ−1,0

= P(m)(m+1)ℓ−1,0

−∑ℓ

i=1 ωm,iP(m)(m+1)ℓ−1,i

,

X(m+1)ℓ = X(m+1)ℓ +∑ℓ

i=1 ωm,iR(m)(m+1)ℓ,i−1

,

end for

This part is called “BiCG Part”. After the calculationof BiCG Part, ωm,i, (i = 1, 2, . . . , ℓ) are calculated to

minimize ∥R(m+1)(m+1)ℓ,0∥F. This part is called “MR Part”.

Then, the residual matrix, the direction matrix and theapproximate solution are updated as follows:

R(m+1)(m+1)ℓ,0 = R

(m)(m+1)ℓ,0 −

ℓ∑i=1

ωm,iR(m)(m+1)ℓ,i, (8)

P(m+1)(m+1)ℓ−1,0 = P

(m)(m+1)ℓ−1,0 −

ℓ∑i=1

ωm,iP(m)(m+1)ℓ−1,i, (9)

X(m+1)ℓ = X(m+1)ℓ +

ℓ∑i=1

ωm,iR(m)(m+1)ℓ,i−1. (10)

One cycle consist of BiCG part, MR part and updatesis called “outer iteration”. In each outer iteration, theapproximate solution and the corresponding residualare updated. The number of matrix multiplications perouter iteration is 2ℓ. According to the above, the methodnamed “Block BiCGSTAB(ℓ)” is obtained. The BlockBiCGSTAB(ℓ) algorithm is displayed in Algorithm 1.

3. Improvement of the convergence

properties

The residual norms of the Block Krylov subspacemethods often diverge or stagnate due to the influence ofthe round-off error. In this section, we propose some sta-ble methods which prevent to accumulate the round-offerror.Small matrices αmℓ+j and βmℓ+j−1 are often com-

puted by solving small-scale linear systems. These linearsystems consist of the residual matrix or the direction

matrix. When the linear independence of these matricesis lost numerically in iteration, linear systems becomeill-condition. Some strategies which improve the conver-gence properties and the accuracy for the BiCGSTAB(ℓ)method have been proposed in [4] and so on. Also ithas been reported that orthogonalization of the residualmatrix or the direction matrix can improve the conver-gence properties of the Block CG method [5]. In thispaper, we try to apply orthogonalization to the BlockBiCGSTAB(ℓ) method by using the ideas in [5].

3.1 Orthogonalization of P(m)mℓ+j

Let V(m)mℓ+j,0 be an n×s column orthogonalized matrix

generated by applying the QR decomposition to P(m)mℓ+j,0

as follows:

V(m)mℓ+j,0ξmℓ+j = P

(m)mℓ+j,0,

where ξmℓ+j is an s× s upper triangular matrix. Then,

from P(m)mℓ+j,i = AiP

(m)mℓ+j,0, we have the following formula

AiV(m)mℓ+j,0 = P

(m)mℓ+j,iξ

−1mℓ+j .

Here, V(m)mℓ+j,i and βmℓ+j are defined by V

(m)mℓ+j,i :=

AiV(m)mℓ+j,0 (i = 1, 2, . . . , j), and βmℓ+j := ξmℓ+jβmℓ+j ,

respectively. Then, (3) can be rewritten as

V(m)mℓ+j,i =

(R

(m)mℓ+j,i + V

(m)mℓ+j−1,iβmℓ+j−1

)ξ−1mℓ+j . (11)

Here, αmℓ+j is defined by αmℓ+j := ξmℓ+jαmℓ+j . Then,

R(m)mℓ+j+1,i can be rewritten as

R(m)mℓ+j+1,i = R

(m)mℓ+j,i − V

(m)mℓ+j,i+1αmℓ+j . (12)

By using (6) and (7), matrices αmℓ+j and βmℓ+j−1 arewritten as

αmℓ+j =(R∗T

0 V(m)mℓ+j,j+1

)−1

γ(m)j , βmℓ−1 = αmℓ−1

ωm−1,ℓ

(γ(m−1)ℓ−1

)−1

γ(m)0 , (j = 0),

βmℓ+j−1 = −αmℓ+j−1

(γ(m)j−1

)−1

γ(m)j , (j = 0).

From the (4), Xmℓ+j+1 is written as

Xmℓ+j+1 = Xmℓ+j + V(m)mℓ+j,0αmℓ+j .

In addition, (9) is rewritten as

P(m+1)(m+1)ℓ−1,0ξ

−1(m+1)ℓ−1

= V(m)(m+1)ℓ−1,0 −

ℓ∑i=1

ωm,iV(m)(m+1)ℓ−1,i. (13)

From (13), P(m+1)(m+1)ℓ−1,0 multiplied ξ−1

(m+1)ℓ−1 can be ob-

tained when mth iteration is finished. The first compu-

tation of P(m+1)(m+1)ℓ,0 at (m+1)th iteration can be written

as

V(m+1)(m+1)ℓ,0ξ(m+1)ℓ

= R(m+1)(m+1)ℓ,0 + P

(m+1)(m+1)ℓ−1,0ξ

−1(m+1)ℓ−1β(m+1)ℓ−1.

– 66 –

Page 71: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.65–68 Shusaku Saito et al.

Thus, (13) is utilized as it is. This improved algorithmis named “Block BiCGSTAB(ℓ)-P”.

3.2 Orthogonalization of R(m)mℓ+j

Let V(m)mℓ+j,0 be an orthogonal matrix generated by ap-

plying the QR decomposition to R(m)mℓ+j,0 as follows:

V(m)mℓ+j+1,0ξmℓ+j+1 = R

(m)mℓ+j+1,0.

Let V(m)mℓ+j,i := AiV

(m)mℓ+j,0 (i = 1, 2, . . . , j), W

(m)mℓ+j,i :=

P(m)mℓ+j,iξ

−1mℓ+j and βmℓ+j−1 := ξmℓ+j−1βmℓ+j−1ξ

−1mℓ+j ,

and (3) can be rewritten as

W(m)mℓ+j,i = V

(m)mℓ+j,i + W

(m)mℓ+j−1,iβmℓ+j−1.

Also, let αmℓ+j := ξmℓ+jαmℓ+j ξ−1mℓ+j and τmℓ+j+1 :=

ξmℓ+j+1ξ−1mℓ+j , and (2) can be rewritten as

V(m)mℓ+j+1,iτmℓ+j+1 = V

(m)mℓ+j,i − W

(m)mℓ+j,i+1αmℓ+j . (14)

Let γ(m)j := R∗T

0 V(m)mℓ+j,j , then αmℓ+j and βmℓ+j−1 can

be computed as follows:

αmℓ+j =(R∗T

0 W(m)mℓ+j,j+1

)−1

γ(m)j , βmℓ−1 = αmℓ−1

ωm−1,ℓ

(γ(m−1)ℓ−1

)−1

γ(m)0 , (j = 0),

βmℓ+j−1 = −αmℓ+j−1

(γ(m)j−1

)−1

γ(m)j , (j = 0).

Eq. (4) can be written as

Xmℓ+j+1 = Xmℓ+j + W(m)mℓ+j,0αmℓ+j ξmℓ+j .

Now, when mth iteration is finished, (8) and (9) can berewritten as

R(m+1)(m+1)ℓ,0ξ

−1(m+1)ℓ= V

(m)(m+1)ℓ,0−

ℓ∑i=1

ωm,iV(m)(m+1)ℓ,i, (15)

W(m+1)(m+1)ℓ−1,0=W

(m)(m+1)ℓ−1,0−

ℓ∑i=1

ωm,iW(m)(m+1)ℓ−1,i. (16)

The first computation of R(m+1)(m+1)ℓ+1,0 and P

(m+1)(m+1)ℓ,0 at

(m+ 1)th iteration can be written as

V(m+1)(m+1)ℓ+1,0τ(m+1)ℓ+1

= R(m+1)(m+1)ℓ,0ξ

−1(m+1)ℓ − W

(m+1)(m+1)ℓ,1α(m+1)ℓ, (17)

W(m+1)(m+1)ℓ,0 = R

(m+1)(m+1)ℓ,0ξ

−1(m+1)ℓ + W

(m+1)(m+1)ℓ−1,0β(m+1)ℓ−1.

Thus, (15) and (16) are utilized as they are. This im-proved algorithm is named “Block BiCGSTAB(ℓ)-R”.

4. Improvement of the accuracy

The true residual matrix B−AXmℓ is equal to R(m)mℓ,0

mathematically. However, in actual computation, thisrelation is liable to be lost under the influence of round-off error. In [6], it has been proposed that to modify therelation between variation of the approximate solutionand that of the residual matrix improves the accuracy ofthe approximate solution. We try to apply this techniqueto the Block BiCGSTAB(ℓ)-P method and the Block

BiCGSTAB(ℓ)-R method.

Let ∆X(BiCG)mℓ and ∆R

(BiCG)mℓ be the variations of the

approximate solution and the residual matrix at BiCG

Part and ∆X(MR)mℓ and ∆R

(MR)mℓ be those at MR Part,

X(m+1)ℓ and R(m+1)(m+1)ℓ,0 can be written as follows:

X(m+1)ℓ = Xmℓ +∆X(BiCG)mℓ +∆X

(MR)mℓ , (18)

R(m+1)(m+1)ℓ,0 = R

(m)mℓ,0 +∆R

(BiCG)mℓ +∆R

(MR)mℓ . (19)

Then, the following equation is mathematically true.

B −AX(m+1)ℓ = R(m+1)(m+1)ℓ,0.

From (18) and (19) andB−AXmℓ = R(m)mℓ,0, the following

equations can be obtained as

∆R(BiCG)mℓ = −A∆X(BiCG)

mℓ , ∆R(MR)mℓ = −A∆X(MR)

mℓ .

These relations will break in iteration because of errors.Then, to multiply the coefficient matrix explicitly canimprove the accuracy of the approximate solution.

4.1 In case of orthogonalization of P(m)mℓ+j

Eq. (12) can be rewritten asR

(m)mℓ+j+1,0 = R

(m)mℓ+j,0 −A

(V

(m)mℓ+j,0αmℓ+j

), (i = 0),

R(m)mℓ+j+1,i = R

(m)mℓ+j,i − V

(m)mℓ+j,i+1αmℓ+j , (i = 0).

Similarly, (8) can be rewritten as

R(m+1)(m+1)ℓ,0 = R

(m)(m+1)ℓ,0 −A

(ℓ∑

i=1

ωm,iR(m)(m+1)ℓ,i−1

).

This improved algorithm is named “BlockBiCGSTAB(ℓ)-PH”.

4.2 In case of orthogonalization of R(m)mℓ+j

Eqs. (14) and (17) can be rewritten asV

(m)mℓ+1,0τmℓ+1 = R

(m)mℓ,0ξ

−1mℓ −Aζ0,0, (i = j = 0),

V(m)mℓ+j+1,0τmℓ+j+1 = V

(m)mℓ+j,0 −Aζj,0, (i = 0),

V(m)mℓ+j+1,i =

(V

(m)mℓ+j,i − ζj,i+1

)τ−1mℓ+j+1, (otherwise),

where ζj,i = W(m)mℓ+j,iαmℓ+j . Similarly, (15) can be

rewritten as

R(m+1)(m+1)ℓ,0ξ

−1(m+1)ℓ= V

(m)(m+1)ℓ,0−A

(ℓ∑

i=1

ωm,iV(m)(m+1)ℓ,i−1

).

This improved algorithm is named “BlockBiCGSTAB(ℓ)-RH”. Note that the number of ma-trix multiplications per outer iteration of thesemethods becomes 3ℓ + 1. The algorithm of the BlockBiCGSTAB(ℓ)-RH method is shown in Algorithm 2.

5. Numerical experiments

In this section, we present some numerical experi-ments. Test matrices are af23560 and dw2048 [7]. Inthis numerical experiments, the convergence conditionis that the relative residual norm is under 1.00E-14,and the right hand side B of (1) and the shadow resid-ual R∗

0 are given by randomly and these number s ofvectors are 16. The initial solution is set to the zero

– 67 –

Page 72: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.65–68 Shusaku Saito et al.

Algorithm 2 Block BiCGSTAB(ℓ)-RH

X0 is an initial guess and R∗0 is an arbitrary matrix,

R0 = B −AX0, V(0)0,0 ξ0 = R

(0)0,0 = R0, W

(0)−1,0 = O, α−1 = Os,

ω−1,ℓ = 1,

for m = 0, 1, . . . until ∥R(m)mℓ,0∥F/∥B∥F is small enough do

for j = 0, 1, . . . , ℓ− 1 do

γ(m)j = R∗T

0 V(m)mℓ+j,j , βmℓ−1 =αmℓ−1

ωm−1,ℓ

(γ(m−1)ℓ−1

)−1γ(m)0 , (j = 0),

βmℓ+j−1 = −αmℓ+j−1

(γ(m)j−1

)−1γ(m)j , (j = 0),

for i = 0, 1, . . . , j doW

(m)mℓ,0=R

(m)mℓ,0ξ

−1mℓ+W

(m)mℓ−1,0βmℓ−1, (i = j = 0),

W(m)mℓ+j,i= V

(m)mℓ+j,i+W

(m)mℓ+j−1,iβmℓ+j−1, (otherwise),

end for

W(m)mℓ+j,j+1 = AW

(m)mℓ+j,j ,

αmℓ+j =(R∗T

0 W(m)mℓ+j,j+1

)−1γ(m)j ,

for i = 0, 1, . . . , j do

ζj,i = W(m)mℓ+j,iαmℓ+j ,

V(m)mℓ+1,0τmℓ+1 = R

(m)mℓ,0ξ

−1mℓ −Aζ0,0, (i = j = 0),

V(m)mℓ+j+1,0τmℓ+j+1 = V

(m)mℓ+j,0 −Aζj,0, (i = 0),

V(m)mℓ+j+1,i=

(V

(m)mℓ+j,i− ζj,i+1

)τ−1mℓ+j+1, (otherwise),

end forV

(m)mℓ+j+1,j+1 = AV

(m)mℓ+j+1,j ,

Xmℓ+j+1 = Xmℓ+j + W(m)mℓ+j,0αmℓ+j ξmℓ+j ,

ξmℓ+j+1 = τmℓ+j+1ξmℓ+j ,end for

calculate ωm,1, ωm,2, . . . , ωm,ℓ,

η =∑ℓ

i=1 ωm,iV(m)(m+1)ℓ,i−1

,

R(m+1)(m+1)ℓ,0

ξ−1(m+1)ℓ

= V(m)(m+1)ℓ,0

−Aη,

W(m+1)(m+1)ℓ−1,0

= W(m)(m+1)ℓ−1,0

−∑ℓ

i=1 ωm,iW(m)(m+1)ℓ−1,i

,

X(m+1)ℓ = X(m+1)ℓ + ηξ(m+1)ℓ,end for

matrix. The limit number of outer iterations is 10,000.We experiment in following environment: CPU is In-tel Core i5 2.5GHz, Memory is 8GBytes, and we useMATLAB 2014a. The results of the Block BiCGSTABmethod and five proposed methods are shown in Ta-ble 1. The residual matrix orthogonalization and multi-plying the coefficient matrix explicitly are also appliedto the Block BiCGSTAB method. From Table 1, bothcases indicate that to apply orthogonalization improves

the convergence properties, and R(m)mℓ+j+1,0 orthogonal-

ization shows better convergence behavior than P(m)mℓ+j,0

orthogonalization. To multiply a coefficient matrix ex-plicitly makes high accuracy, but MVs of those methodsare nearly twice. In comparison to the Block BiCGSTABmethod, MVs and QRs of the Block BiCGSTAB(4)-RHmethod decrease.

6. Conclusion

In this paper, we developed the Block BiCGSTAB(ℓ)method to solve the linear systems with multiple righthand sides. Moreover, we proposed some techniqueswhich improve convergence behavior and accuracy ofthe approximate solution. From some numerical exper-iments, we verified that the Block BiCGSTAB(ℓ)-RHmethod with an appropriate parameter ℓ can performhigher than other variants of the Block BiCGSTAB(ℓ)

Table 1. Results of the numerical experiments. “nnz” is the num-ber of nonzero matrix elements. “MVs” and “QRs” are thenumber of matrix-vector multiplications and QR factorizations.“Res” is the relative residual norm. “—” means that the method

did not converged. “+++” means divergence. MVs counts aproduct of n× n matrix and n× s one as s MVs.

af23560 (n = 23, 560, nnz=460,598)

Method ℓ MVs QRs Res. True Res.

Bl STAB - 87,808 1,372 8.87E-15 3.32E-11

Bl 1 +++ +++ +++ +++STAB(ℓ) 2 +++ +++ +++ +++

4 +++ +++ +++ +++

Bl 1 — — (1.30E+0) (1.30E+0)

STAB(ℓ) 2 +++ +++ +++ +++-P 4 +++ +++ +++ +++

Bl 1 — — (2.19E+0) (2.19E+0)STAB(ℓ) 2 +++ +++ +++ +++-PH 4 +++ +++ +++ +++

Bl 1 +++ +++ +++ +++STAB(ℓ) 2 26,880 840 4.90E-15 1.20E-5-R 4 25,728 804 5.62E-15 1.12E-5

Bl 1 91,648 1,432 9.85E-15 9.94E-11

STAB(ℓ) 2 50,288 898 9.43E-15 1.16E-10-RH 4 45,552 876 6.51E-15 5.58E-11

dw2048 (n = 2, 048, nnz=10,114)

Method ℓ MVs QRs Res. True Res.

Bl STAB - 20,672 323 6.91E-15 1.02E-12

Bl 1 +++ +++ +++ +++STAB(ℓ) 2 +++ +++ +++ +++

4 +++ +++ +++ +++

Bl 1 11,220 350 9.49E-15 2.87E-13STAB(ℓ) 2 16,128 252 6.19E-15 1.60E-8-P 4 7,680 240 3.28E-16 5.56E-8

Bl 1 +++ +++ +++ +++STAB(ℓ) 2 17,808 318 7.39E-15 1.84E-12

-PH 4 12,480 240 7.64E-15 7.26E-14

Bl 1 10,528 329 2.36E-15 2.23E-10STAB(ℓ) 2 7,424 232 7.07E-15 1.13E-8-R 4 7,040 220 9.09E-15 6.28E-9

Bl 1 20,160 315 4.89E-15 3.05E-12STAB(ℓ) 2 13,664 244 9.84E-15 1.61E-13-RH 4 11,856 228 3.40E-15 1.27E-13

method and the Block BiCGSTAB method.

References

[1] D. P. O’ Leary, The block conjugate gradient algorithm and

related methods, Linear Algebra Appl., 29 (1980), 293–322.[2] A. El Guennouni, K. Jbilou and H. Sadok, A block version of

BiCGSTAB for linear systems with multiple right-hand sides,Electron. Trans. Numer. Anal., 16 (2003), 129–142.

[3] G. L. G. Sleijpen and D. R. Fokkema, BiCGSTAB(ℓ) for lin-ear equations involving unsymmetric matrices with complexspectrum, Electron. Trans. Numer. Anal., 1 (1993), 11–32.

[4] G. L. G. Sleijpen and H. A. van der Vorst, Maintaining con-

vergence properties of BiCGstab methods in finite precisionarithmetic, Numer. Algorithms, 10 (1995), 203–223.

[5] A. A. Dubrulle, Retooling the method of block conjugate gra-dients, Electron. Trans. Numer. Anal., 12 (2001), 216–233.

[6] K.Aihara, K.Abe and E. Ishiwata, A variant of IDRstab withreliable update strategies for solving sparse linear systems, J.Comput. Appl. Math., 259 (2014), 244–258.

[7] The University of Florida Sparse Matrix Collection, http:

//www.cise.ufl.edu/research/sparse/matrices/.

– 68 –

Page 73: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol.6 (2014) pp.69–72 c⃝2014 Japan Society for Industrial and Applied Mathematics J S I A MLetters

Computing fixed argument pairings with the elliptic net

algorithm

Yang Liu1, Naoki Kanayama1, Kazutaka Saito2, Tadanori Teruya3, Shigenori Uchiyama4

and Eiji Okamoto1

1 University of Tsukuba, 1-1-1 Ten-nohdai, Tsukuba, Ibaraki 305-8573, Japan2 Internet Initiative Japan Inc., Jinbocho Mitsui Bldg., 1-105 Kanda Jinbo-cho, Chiyoda-ku,Tokyo 101-0051, Japan

3 National Institute of Advanced Industrial Science and Technology, 1-1-1 Umezono, Tsukuba,Ibaraki 305-8568, Japan

4 Tokyo Metropolitan University, 1-1 Minami-Ohsawa, Hachioji, Tokyo 192-0397, Japan

E-mail kanayama risk.tsukuba.ac.jp

Received March 25, 2014, Accepted June 17, 2014

Abstract

The pairing-based cryptosystem was proposed in 2001, and it provides efficient implemen-tations of identity-based encryption (IBE) and attribute-based encryption (ABE). In 2010,Costello and Stebila introduced the concept of fixed argument pairing, which can be appliedto many applications of pairings, and, to compute these pairings, they proposed an efficientalgorithm based on the Miller algorithm. In this paper, we propose a method for computingfixed argument pairings, based on the elliptic net method proposed by Stange.

Keywords pairing-based cryptography, fixed argument pairing, elliptic net algorithm

Research Activity Group Algorithmic Number Theory and Its Applications

1. Introduction

Boneh and Franklin proposed ID-based encryptionbased on cryptographic pairings in 2001 [1], which madethe practical value of cryptographic pairings clear. Eversince pairing-based cryptosystems have been a heatedtopic in cryptographic research. Though acceleration ofpairing computation has been widely researched so far,it still remains an important field of interest. Currently,the pairing that results in the most efficient implementa-tion of pairing-based cryptographic schemes is the Tatepairing and some of its variants, for example, the ηT [2],Ate [3], Atei [4], R-Ate [5], and optimal [6] pairings.A standard algorithm for computing pairings is

Miller’s algorithm [7, 8]. A generic implementation ofMiller’s algorithm uses the classic double-and-add line-and-tangent method. Most improvements of the com-putation of pairings attempt to shorten the number ofiterations of the so-called Miller loop in the Miller algo-rithm.Fixed argument pairing is a pairing system in which

one argument is fixed while the other is allowed tochange. Note that though there are chances that thesecond argument is fixed, in this paper we discuss thesituation where the first one is fixed.Such pairings can be applied to many scenarios. For

example, when we use Boneh-Franklin ID-based cryp-tosystem, in the decryption phase, the decryption keydID can be fixed as the sender has only one ID. Thedecryption key just serves as the first argument in pair-ing computation, so the receiver will have to compute afixed argument pairing.

In 2010, Costello and Stebila [9] proposed a schemebased on the Miller algorithm that can be used tocompute such pairings. To compute the Miller func-tion fm,P (), where m is an integer, they used two setsGDBL and GADD to store the precomputed values. Theyalso adopted a strategy similar to parallel computation(performing n iterations simultaneously). With P fixed,it is less expensive to compute fm,P (Q). According toCostello and Stebila’s result, the Miller loop can be com-puted with between 25% to 37% fewer field operations.In 2007, Stange [10] defined elliptic nets and proposed

an alternative method based on elliptic nets for comput-ing Tate pairings. Ogura et al. [11] presented formulasthat use elliptic nets to compute cryptographic pairings.However, using elliptic nets for pairing computations

would not be very popular because for the time beingthe Double and DoubleAdd algorithms [10] with ellipticnets are more expensive than the classic double-and-addline-and-tangent method in Miller’s algorithm.In this paper, we propose an algorithm that uses el-

liptic nets to compute fixed argument pairings and showthat this is an example in which a pairing computa-tion using an elliptic net becomes more efficient. In [10],Stange considered a block V (i) for an integer i, calleda block centered on i, that consisted of 11 elliptic netvalues with respect to A and B to be used for com-puting the Tate pairing er(A,B). Given the block V (i),we can compute the block V (2i) or V (2i + 1) by usingthe Double/DoubleAdd algorithm in [10]. Hence, we cancompute the block V (r) in polynomial time. Each blockcontains eight values that depend on A only and three

– 69 –

Page 74: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.69–72 Yang Liu et al.

that depend on A and B. Therefore, if we precomputeand store a subblock that depends only on A, then wecan compute the (A,B)-dependent part by referring thetable. In this way, we can efficiently compute fixed ar-gument pairings e(A, ∗).This strategy is efficient when we use pairings defined

on G2 × G1, or roughly when G2 (resp. G1) is a cyclicgroup of Fqk (resp. Fq)-rational points on an ellipticcurve E/Fq. In this case, in Fqk there are eight ellip-tic net values that depend only on A, and we can reducethe total cost of computing the fixed argument pairings.Many cryptographic pairings are defined on G2 × G1

and are used widely in the implementation of pairing-based cryptosystems. In this paper, we mainly considerthe case of Ate pairing.The rest of this paper is organized as follows. In Sec-

tion 2, we briefly introduce the basic principles of theMiller algorithm and the elliptic net algorithm, on whichour scheme is based. In Section 3, we present the princi-ples and details of our scheme. In Section 4, we evaluatethe efficiency of the speed and storage cost of our scheme.

2. Preliminaries

2.1 Pairings

Let E be an elliptic curve defined over a finite field Fq

with q elements. The set of Fq-rational points of E is de-noted by E(Fq), and the point at infinity on E is denotedby O. Consider a large prime r such that r | #E(Fq), andlet E(Fq)[r] denote the subgroup of r-torsion points inE(Fq). The embedding degree k is the smallest positiveinteger such that r divides qk − 1.Let πq be the Frobenius endomorphism πq : E → E,

(x, y) 7→ (xq, yq). We denote the trace of Frobenius by t,i.e., #E(Fq) = q + 1− t.

2.1.1 Tate Pairing:

Let P ∈ E(Fqk)[r] and Q ∈ E(Fqk). Choose a pointR ∈ E(Fqk) such that the support of div(fr,P ) = r(P )−r(O) and DQ := (Q + R) − (R) are disjoint. Then, theTate pairing is defined by

⟨·, ·⟩r : E(Fqk)[r]× E(Fqk)/rE(Fqk)→ F×qk/(F×

qk)r,

(P,Q) 7→ ⟨P,Q⟩r := fr,P (DQ) mod (F×qk)r.

It has been shown that ⟨P,Q⟩r is bilinear and nondegen-erate.

2.1.2 Ate Pairing:

The Ate pairing, proposed by Hess et al. [3], is a gen-eralization of the ηT pairing [2]. The Ate pairing can beapplied to not only supersingular elliptic curves but alsoto ordinary ones.For cryptographic applications, it is usually assumed

that the points P and Q are, respectively, elements inthe following groups:

G1 = E(Fq)[r] = E(Fqk)[r] ∩Ker(πq − 1),

G2 = E(Fqk)[r] ∩Ker(πq − q).

Let T = t− 1. We choose integers N and L such thatN = gcd(T k − 1, qk − 1) and T k − 1 = LN . We assumethat r2 does not divide qk − 1. Then the Ate pairing is

defined by fnormT,Q (P ) forQ ∈ G2 and P ∈ G1. Here, fnormT,Q

is the normalization of fT,Q. We denote by α(Q,P ) the

reduced Ate pairing: α(Q,P ) := fT,Q(P )(qk−1)/r. The

length of the Miller loop for computing the Ate pairingfT,Q(P ) is log2 |T |.Remark 1 Note that the Ate paring is a “point-evaluation” pairing, although the Tate pairing fr,P (DQ)is a “divisor-evaluation” pairing. The rational functionwith div(fT,Q) = T (Q) − ([T ]Q) − (T − 1)(O) is deter-mined uniquely up to a constant. When the point Q isin E(Fqk), the constant is in Fqk , and it will not van-ish during the final exponentiation. We therefore need tonormalize the function. We can obtain the normaliza-tion function by fnormT,Q = fT,Q/(z

T−1fT,Q)(O), where zis called the uniformizer of E on O.

2.2 Elliptic net algorithm

At this point, strategies for accelerating the Miller al-gorithm seem to have been exhausted, so it is necessaryto seek other methods for computing pairings. At theconference Pairing 2007, Stange defined an elliptic netand proposed a method for using it to compute a Tatepairing.An elliptic net W is a map from a finitely generated

free Abelian group A to an integral domain R, that sat-isfies a certain recursive equation (see [10]). Stange pre-sented the elliptic net WE,P1,P2,...,Pn

associated with anelliptic curve E/K, whereK is a field, and P1, P2, . . . , Pn

are points. The elliptic net WE,P1,P2,...,Pn gives a func-tion from Zn to K. In this paper, we consider only thecase where n = 2.Suppose there is an elliptic curve defined over a finite

field Fq with the Weierstrass function:

E/Fq : y2 = x3 +Ax+B,

and P = (x1, y1), Q = (x2, y2) ∈ E. Then we canmake an elliptic net system consisting of WP,Q(i, 0) andWP,Q(j, 1). For the initial values and recursive equationsto compute the elliptic net system, see [10]. For ease ofreading, here we simplify by writing W (i, j) instead ofWP,Q(i, j).Stange also demonstrated an efficient method for com-

puting elliptic nets. Consider a block consisting of thetwo vectors; WP,Q(i1, 0) with k − 3 ≤ i1 ≤ k + 4 andWP,Q(i2, 1) with k − 1 ≤ i2 ≤ k + 1. This is called an“elliptic net block centered on k”. We will use “V” todenote such a block.Stange gave two algorithms for calculating elliptic net

blocks, as follows.

• Double(V): outputs an elliptic net block centeredon 2k given the one centered on k.

• DoubleAdd(V): outputs an elliptic net block cen-tered on 2k + 1 given the one centered on k.

With the Double and DoubleAdd algorithms, it is possi-ble to compute the elliptic net block centered on k giventhe one centered on 1 in a polynomial time of O(log k).For details, see [10].

2.3 Cryptographic pairings using elliptic nets

Stange provided the formula for using an elliptic netto find the Tate pairing.

– 70 –

Page 75: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.69–72 Yang Liu et al.

2.3.1 Using an elliptic net to find a Tate pairing:

Theorem 2 ([10, Corollary 1]) Let E be an ellipticcurve over a finite field K. For P ∈ E(K)[r], Q ∈ E(K),

fr,P (DQ) =WP,Q(r + 1, 1)WP,Q(1, 0)

WP,Q(r + 1, 0)WP,Q(1, 1). (1)

Stange also gave an algorithm for using an elliptic netto compute the Tate pairing.

2.3.2 Using an elliptic net to find an Ate pairing:

Ogura et al. [11] provided formulas for using ellipticnets to find cryptographic pairings. In this paper, wereview only the formula for the Ate pairing.Before presenting this, we first need to formulate the

normalization for elliptic nets.

Proposition 3 ([11]) WP,Q(s, 1) denotes the normal-ization for the elliptic net WP,Q(s, 1). For s ∈ Z, assume[s]P = O. Then

WP,Q(s, 1) =WP,Q(s, 1)

2s−1WP,Q(s, 0).

For practical uses of pairings, we can assume k > 1. In

this case, 2(qk−1)/r = 1, and so we have

WP,Q(s, 1)qk−1

r =

(WP,Q(s, 1)

WP,Q(s, 0)

) qk−1r

.

Theorem 4 ([11, Theorem 4]) Let E be an ellipticcurve over a finite field Fq, and let πq : (x, y) 7→ (xq, yq)be the q-Frobenius endomorphism on E. We assume thatthe embedding degree k > 1. Let r be a large prime num-ber with r | #E(Fq) and (r, q) = 1, and let T ≡ q(mod r). For P ∈ G1 = E(Fqk)[r] ∩ Ker(πq − 1) andQ ∈ G2 = E(Fqk)[r] ∩Ker(πq − q),

α(Q,P ) = fnormT,Q (P )qk−1

r = WQ,P (T, 1)qk−1

r .

3. Our scheme

3.1 P -dependent and (P,Q)-dependent vectors

The computations in Stange’s Double/DoubleAdd al-gorithm are based on the recursive formulas for ellipticnets. An elliptic block can be divided into two partsWP,Q(i, 0) and WP,Q(j, 1). Although these are com-puted simultaneously in the elliptic net method, wefound that WP,Q(i, 0) can be computed independentlyof WP,Q(j, 1). Consider the initial values and recur-sive functions above, and note that the computation ofWP,Q(i, 0) has nothing to do with Q(x2, y2); we say thatWP,Q(i, 0) is a P-dependent vector. However, the com-putation of WP,Q(j, 1) requires the values of not onlyQ(x2, y2), but also WP,Q(i, 0) and P (x1, y1), so we callWP,Q(j, 1) a (P,Q)-dependent vector. Note that here P -dependent and (P,Q)-dependent correspond to the stan-dard expression of the Miller function: fm,P ().With this advantage of elliptic net blocks, we can eas-

ily formulate an efficient algorithm for the computationof a fixed argument pairing, in which P is fixed. WhenP is fixed, we can first carry out the P -dependent oper-ations, which we will call precomputing. After obtainingthese values, we can perform the (P,Q)-dependent com-putations, which we will call computing, at a very low

Table 1. Operation Counts in Each Loop.

Types Precomputing Computing

G1 × G2 6SFq + 26MFq 1SFqk

+ 1MFqk

+ 8MFqk

∗FqG2 × G1 6SF

qk+ 26MF

qk1SF

qk+ 9MF

qk

cost. Our scheme includes two parts:

• Precomputing: output the values associated withthe fixed argument.

• Computing: output the values associated with thechanging argument. This requires the values com-puted in precomputing.

Our algorithms are presented in Algorithm 1 and Al-gorithm 2. Note that in the expression of the Millerfunction fm,P (), it is important to fix m to ensure thatthe same computation (Double or DoubleAdd) is per-formed in each loop. Since m is fixed, we can use a ma-trix STR[t][6], in which t = bitlength(m) − 1, to storethe precomputed values of each loop; this separates thecomputation of WP,Q(i, 0) from the elliptic net method.

Algorithm 1 Precomputing

Input: a = WP,Q(2, 0), b = WP,Q(3, 0), c = WP,Q(4, 0),A =WP,Q(2, 0)

−1, m = (dtdt−1dt−2dt−3 . . . d0)2Output: Matrix STR[t][6] for storing precomputed

data, WP,Q(m, 0)1: V0 = [−a,−1, 0, 1, a, b, c, a3c− b3]2: for p = t− 1 down to 0 do3: S = [0, 0, 0, 0, 0, 0]4: P = [0, 0, 0, 0, 0, 0]5: for i = 0 to 5 do6: S[i] = V0[i+ 1]2

7: P [i] = V0[i]V0[i+ 2]8: end for9: if dp == 0 then10: for i = 1 to 4 do11: V0[2i− 2] = S[i− 1] ∗ P [i]− S[i] ∗ P [i− 1]12: V0[2i− 1] = (S[i− 1] ∗P [i+1]−S[i+1] ∗P [i− 1])A13: end for14: for j = 0 to 2 do15: STR[p][2j] = S[j + 1]16: STR[p][2j + 1] = P [j + 1]17: end for18: else19: for i = 1 to 4 do20: V0[2i− 2] = (S[i− 1] ∗P [i+1]−S[i+1] ∗P [i− 1])A21: V0[2i− 1] = S[i] ∗ P [i+ 1]− S[i+ 1] ∗ P [i]22: end for23: for j = 0 to 2 do24: STR[p][2j] = S[j + 2]25: STR[p][2j + 1] = P [j + 2]26: end for27: end if28: end for29: return STR, V0[3](WP,Q(m, 0))

– 71 –

Page 76: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.69–72 Yang Liu et al.

Algorithm 2 Computing

Input: d =WP,Q(2, 1), g =WP,Q(1, 1),E =WP,Q(−1, 1)−1, F =WP,Q(2,−1)−1,G =WP,Q(1, 1)

−1, m = (dtdt−1dt−2dt−3 . . . d0)2,precomputed data STR[t][6]

Output: WP,Q(m, 1)1: V1 = [1, g, d]2: for p = t− 1 down to 0 do3: S0 = V1[1]

2

4: P0 = V1[0]V1[2]5: if dp == 0 then6: V1[0] = (P0 ∗ STR[p][0]− STR[p][1] ∗ S0)G7: V1[1] = STR[p][2] ∗ P0 − S0 ∗ STR[p][3]8: V1[2] = (STR[p][4] ∗ P0 − S0 ∗ STR[p][5])E9: else10: V1[0] = STR[p][0] ∗ P0 − S0 ∗ STR[p][1]11: V1[1] = (STR[p][2] ∗ P0 − S0 ∗ STR[p][3])E12: V1[2] = (S0 ∗ STR[p][5]− STR[p][4] ∗ P0)F13: end if14: end for15: return V1[1] (WP,Q(m, 1))

4. Evaluation of the proposed method

4.1 Computational cost

There are different types of computation in Precom-puting and Computing phase. The types of computationin different kinds of pairings are not the same either. Webegin by listing the operation counts of precomputingand computing, in Table 1. The meanings of the nota-tions used in the table are as follows:

MFq : The multiplication of two elements in Fq;

MFqk

: The multiplication of two elements in Fqk ;

MFqk

∗Fq : The multiplication of an element in Fqk andone in Fq;

SFq: The square of an element in Fq;

SFqk

: The square of an element in Fqk ;

Our scheme can reduce the number of computationsof the loop of the elliptic net method down to 1SF

qk+

9MFqk. The reduction is equal to the operations in pre-

computing. The types of operations saved depend on thetype of pairings. We see that the time needed for thecomputing phase for a G2×G1 pairing, denoted as tpre,is exactly (1S + 9M)/(7S + 35M)tno pre, where tno pre

denotes the time needed for computing a G2×G1 pairingusing the elliptic net method. When the characteristic pis neither 2 nor 3, we can take S = 0.8M . In this situ-ation, tpre = (9.8/40.6)tno pre = 0.241tno pre. However,in G1 × G2 pairings, because the types of operationsare different in precomputing and computing (the oper-ations in computing require more time), the relationshipbetween tno pre and tpre is tpre > 0.241tno pre.

4.2 Storage cost

For each WP,Q(i, 0) vector, there are eight elements.However, since it is necessary to fix certain elements, itis only necessary to store six elements for each loop. Thestorage costs for G1×G2 and G2×G1 pairings are listedin Table 2.

Table 2. Storage Cost.

Types Field Number

G1 × G2 Fq 6 ∗ log2 mG2 × G1 Fqk 6 ∗ log2 m

5. Conclusion

In this paper, we proposed a method based on theelliptic net method proposed by Stange in [10], for com-puting fixed argument pairings. According to our analy-sis, our method saves about 70% of the time cost, relativeto the Elliptic Net Method without precomputation, forcomputing G2 ×G1 pairings.

Acknowledgments

This work was supported by JSPS KAKENHI GrantNumbers 22300002, 22500005, 24540135.

References

[1] D. Boneh and M. Franklin, Identity-based encryption from

the Weil pairing, in: Proc. of CRYPTO 2001, J. Kilian ed.,LNCS, Vol. 2139, pp. 213–369, Springer-Verlag, Berlin, 2001.

[2] P. S. L. M. Barreto, S. D. Galbraith, C. OhEigeartaigh and M.Scott, Efficient pairing computation on supersingular abelian

varieties, Des. Codes Cryptogr., 42 (2007), 239–271.[3] F. Hess, N. P. Smart and F. Vercauteren, The Eta pairing

revisited, IEEE Trans. Inform. Theory, 52 (2006), 4595–4602.[4] C.-A.Zhao, F.Zhang and J.Huang, A note on the Ate pairing,

Int. J. Inf. Secur., 6 (2008), 379–382.[5] E. Lee, H. S. Lee and C. M. Park, Efficient and generalized

pairing computation on abelian varieties, IEEE Trans. Inform.Theory, 55 (2009), 1793–1803.

[6] F. Vercauteren, Optimal pairings, IEEE Trans. Inform. The-ory, 56 (2010), 455–461.

[7] V. S. Miller, Short programs for functions on curves, 1986,

http://crypto.stanford.edu/miller/miller.pdf.[8] V. S. Miller, The Weil pairing and its efficient calculation, J.

Cryptology, 17 (2004), 235–261.[9] C. Costello and D. Stebila, Fixed argument pairings, in: Proc.

of LATINCRYPT 2010, M. Abdalla et al. eds., LNCS, Vol.6212, pp. 92–108, Springer-Verlag, Berlin, 2010.

[10] K. E. Stange, The Tate pairing via elliptic nets, in: Proc. ofPairing 2007, T. Takagi et al. eds., LNCS, Vol. 5475, pp. 329–

348, Springer-Verlag, Berlin, 2007.[11] N. Ogura, N. Kanayama, S. Uchiyama and E. Okamoto,

Cryptographic pairings based on elliptic nets, in: Proc. ofIWSEC 2011, T. Iwata et al. eds., LNCS, Vol. 7038, pp. 65–78,

Springer-Verlag, Berlin, 2011.

– 72 –

Page 77: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol.6 (2014) pp.73–76 c⃝2014 Japan Society for Industrial and Applied Mathematics J S I A MLetters

Heuristic counting of Kachisa-Schaefer-Scott curves

Yutaro Kiyomura1, Noriyasu Iwamoto2, Shun’ichi Yokoyama1, Kenichiro Hayasaka1,

Yuntao Wang1, Takanori Yasuda3, Katsuyuki Takashima4 and Tsuyoshi Takagi5

1 Graduate School of Mathematics, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka 819-0395, Japan

2 Graduate School of Engineering, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka 819-0395, Japan

3 Institute of Systems, Information Technologies and Nanotechnologies, 2-1-22 Momochihama,Fukuoka 814-0001, Japan

4 Information Technology R&D Center, Mitsubishi Electric, 5-1-1 Ofuna, Kamakura, Kanagawa247-8501, Japan

5 Institute of Mathematics for Industry, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka819-0395, Japan

E-mail ma212014 math.kyushu-u.ac.jp

Received March 25, 2014, Accepted June 25, 2014

Abstract

Estimating the number of pairing-friendly elliptic curves is important for obtaining such acurve with a suitable security level and high efficiency. For 128-bit security level, M. Naehrigand J. Boxall estimated the number of Barreto-Naehrig (BN) curves. For future use, we extendtheir results to higher security levels, that is, to count Kachisa-Schaefer-Scott (KSS) curveswith 192- and 224-bit security levels. Our efficient counting is based on a number-theoreticconjecture, called the Bateman-Horn conjecture. We verify the validity of using the conjectureand confirm that an enough amount of KSS curves can be obtained for practical use.

Keywords pairing-based cryptography, pairing-friendly elliptic curve, KSS curves,Bateman-Horn conjecture

Research Activity Group Algorithmic Number Theory and Its Applications

1. Introduction

Pairing-based cryptography is one of the popular top-ics in public key cryptography. A pairing is given as amap e from G1 ×G2 to GT , where G1 and G2 are sub-groups with order r on an elliptic curve over a finitefield Fq and GT is a multiplicative group with primeorder r of Fqk (k: embedding degree). This map e hasthe bilinearity property i.e. e(aP, bQ) = e(P,Q)ab forall P ∈ G1, Q ∈ G2 and a, b ∈ Z. The schemes usingsuch a pairing have various applications including IDbased encryption [1], keyword searchable encryption [2],and functional encryption [3]. For a secure pairing-basedcryptosystem, both the discrete logarithm problems inthe group of Fq-rational points on an elliptic curve(ECDLP) and in the multiplicative group Fqk (DLP)must be computationally infeasible. To achieve the samesecurity level in both groups, the prime order r and thesize qk of the extension field should be balanced appro-priately. For the purpose, the ratio ρ = log q/ log r givesan important parameter. Usually ρ satisfies 1 ≤ ρ ≤ 2and ρ ≈ 1 is the most desirable for high efficiency. Forthe embedding degree k, Freeman et al. recommendedthat 1 ≤ k ≤ 50 [4]. For practical cryptosystems, weneed suitable elliptic curves with small embedding de-grees k and large r. Curves with such properties arecalled “pairing-friendly elliptic curves”.Due to the performance improvement of the hard-

ware in the future, it is necessary to construct elliptic

curves with varying embedding degrees for various se-curity levels. In the literatures, Barreto-Naehrig (BN)curves [5] (k = 12) are recommended for 128-bit secu-rity level, and Kachisa-Schaefer-Scott (KSS) curves [6]are recommended for 192- and 224-bit security levels (asindicated in [7]). It is important to estimate the numberof prime pairs (q, r) of pairing-friendly elliptic curves forefficient curve parameters with suitable security levels.M. Naehrig and J. Boxall estimated the number of primepairs (q, r) for BN curves [8, 9].In this paper, we extend their results to KSS curves

and estimate the number of prime pairs (q, r) for KSScurves. Our efficient counting is based on a number-theoretic conjecture, called the Bateman-Horn conjec-ture [10]. For verifying the conjecture, we estimate thenumber of prime pairs (q, r) in certain short intervals.Moreover, by calculating the Hardy-Littlewood constantgiven in the conjecture, we confirmed an enough amountof KSS curves with appropriate parameter sizes can beobtained for practical use.

2. Pairing-Friendly Elliptic Curves

2.1 Definition

For a random elliptic curve E over a random field Fq

and a prime r ≈ q, the probability that E has embed-ding degree less than (log q)2 with respect to r is vanish-ingly small and in general the embedding degree can beexpected to be around r [11]. The computation of pair-

– 73 –

Page 78: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.73–76 Yutaro Kiyomura et al.

ings is infeasible since pairings on a random elliptic curvetake values in a field of size 22

160

if the size of r and q arearound 2160 (80-bit security level). In order to constructpractical pairing-based cryptography, we need suitableelliptic curves with such small embedding degree k andlarge order r of group. We define such an elliptic curvewhich is called pairing-friendly elliptic curve [4].

Definition 1 Suppose E is an elliptic curve definedover a finite field Fq. We say that E is a pairing-friendlyelliptic curve if the following two conditions hold.

(1) There is a prime r ≥ √q dividing ♯E(Fq).

(2) The embedding degree of E with respect to r is lessthan log2(r)/8.

Freeman, Scott and Teske recommended that 1 ≤ k ≤ 50[4]. It is known that BN curves [5], KSS curves [6], etc.are famous pairing-friendly elliptic curves.

2.2 KSS Curves

Kachisa, Schaefer and Scott presented some new fam-ily of pairing-friendly elliptic curves [6]. These curvesare called KSS curves. KSS curves are suitable for cal-culating pairings for high security levels. KSS-18 curves(k = 18) use the following parameters.

q(x) =1

21

(x8 + 5x7 + 7x6 + 37x5 + 188x4 + 259x3

+343x2 + 1763x+ 2401),

r(x) =1

343

(x6 + 37x3 + 343

),

t(x) =1

7

(x4 + 16x+ 7

).

We get the following polynomials q+(x) and r+(x)with integer coefficients by substituting 42x + 14 for xin q(x) and r(x).

q+(x) = 461078666496x8 + 1284433428096x7

+ 1564374047040x6 + 1088278335648x5

+ 473078255328x4 + 131624074008x3

+ 22896702948x2 + 2277529014x

+ 99213811,

r+(x) = 16003008x6 + 32006016x5 + 26671680x4

+ 11862072x3 + 2971512x2 + 397800x

+ 22249.

KSS-16 curves (k = 16) use the following parameters.

q(x) =1

980

(x10 + 2x9 + 5x8 + 48x6 + 152x5 + 240x4

+625x2 + 2398x+ 3125),

r(x) =1

61250

(x8 + 48x4 + 625

),

t(x) =1

35

(2x5 + 41x+ 35

).

We get the following polynomials q+(x) and r+(x) ofinteger coefficient by substituting 70x+25 for x in q(x)

and r(x).

q+(x) = 2882400500000000x10 + 10376641800000000x9

+ 16812042100000000x8

+ 16143123500000000x7

+ 10173492949900000x6

+ 4396843858680000x5

+ 1319757590130000x4

+ 271665747150000x3 + 36701968956250x2

+ 2938629980082x+ 105890880565,

r+(x) = 9411920000x8 + 26891200000x7

+ 33614000000x6 + 24010000000x5

+ 10718768816x4 + 3062526880x3

+ 546889400x2 + 55807000x+ 2491537.

KSS-16 and KSS-18 curves are recommended for 192-and 224-bit security levels, respectively [7].

3. Heuristic Counting

3.1 Counting (q, r) Based on the Bateman-Horn Con-jecture

A conjecture by Bateman and Horn [10] allows us toestimate the number of prime pairs (q, r) of KSS curves.

Conjecture 2 For large y ∈ N, we heuristically ex-pect the number of positive x with 1 ≤ x ≤ y for which(q, r) = (q+(x), r+(x)) provides a prime pair of KSScurves to be

Q(y) =C

deg q+ · deg r+

∫ y

2

1

(log x)2dx. (1)

The constant C is given as

C =∏p

[(1− 1

p

)−2(1− Np

p

)], (2)

where the product is taken over all primes p and Np

denotes the number of solutions of q+(x)r+(x) ≡ 0(mod p). C is called the Hardy-Littlewood constant.

C is conditionally convergent and therefore unsuitablefor numerical computation. Instead, we use the formulagiven by the theorem of Davenport and Schinzel [12]shown below.

Theorem 3 Let Dq·r be the discriminant of q+(x) ·r+(x). Let Kq (resp. Kr) be the number field gener-ated by polynomial q+(x) (resp. r+(x)), let ρ(Kq) (resp.ρ(Kr)) be the residue of the Dedekind zeta function ofnumber field Kq (resp. Kr). Then the Hardy-Littlewoodconstant (2) is given by

C=γ(Dq·r)

ρ(Kq) · ρ(Kr)

×∏

p|Dq·r

(1− Np

p

)(1− 1

p

)−Np∏j≥2

(1− 1

pj

)−N(j)p

where the product is taken over all primes p and N(j)p

denotes the number of irreducible factors of q+(x)·r+(x)

– 74 –

Page 79: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.73–76 Yutaro Kiyomura et al.

Table 1. The value of each parameter for the Hardy-Littlewoodconstant in the case of KSS-18 curves.

p s.t. p | Dq·r Np j Nq,(j)p N

r,(j)p

2 02 1 04 1 06 0 1

3 0 1 3 1

7 11 2 02 1 03 1 2

1879 3

1 3 0

2 2 03 0 2

Table 2. The value of each parameter for the Hardy-Littlewoodconstant in the case of KSS-16 curves.

p s.t. p | Dq·r Np j Nq,(j)p N

r,(j)p

2 0 1 2 1

5 11 2 04 2 2

7 0 2 5 4

29 2

1 2 02 2 0

3 1 04 0 2

37 21 2 03 1 0

4 1 2

41 31 4 02 3 4

(mod p) that are of degree j, and

γ(Dq·r)=∏

p|Dq·r

(1− Np

p

)∏j≥1

(1− 1

pj

)−Nq,(j)p −Nr,(j)

p

where N

q,(j)p (resp. N

r,(j)p ) denotes the number of distinct

prime ideal factors of p in Kq (resp. Kr) that are ofdegree j.

3.2 Calculating the Hardy-Littlewood constants

For estimating the number of prime pairs (q, r) of KSScurves, we calculated the Hardy-Littlewood constants byusing a PC with following specifications: OS: Mac OS XLion 10.7.5, CPU: Intel Core i7 2.7GHz, RAM: 4GB,Software with library: Magma V2.19-8 [13].First, Table 1 shows the value of each parameter for

the Hardy-Littlewood constant in the case of KSS-18curves. We took the product over all primes p with 2 ≤p ≤ 104. We calculated the Hardy-Littlewood constantC by using the following intermediate values:

Dq·r = −10913687659404569926389844331456521164779960295692592512261703195412279432

9891358472986345289788089026869871171

1616465957225142569839124675574807303

0450869955536097935844131873491149953

3599336893569238811479553570367044914

7418630141106978816,

γ(Dq·r) = 8.78774551785595663724757783419,

ρ(Kq) = 1.33604815126442435955504593364,

ℓ-bit ( length of x ) 2ℓ-1 2ℓ - 1

x x x x x x x x

Fig. 1. Sampling of x in Method 1.

ρ(Kr) = 1.33604815126442435955504593364,

C = 19.8653404773766746324265660471.

Next, Table 2 shows the value of each parameter forthe Hardy-Littlewood constant in the case of KSS-16curves. We took the product over all p with 2 ≤ p ≤104. We calculated the Hardy-Littlewood constant C byusing the following intermediate values:

Dq·r = −994448982037183529599478476734451066522547740134431401467723847358399525

397583402219171754265964718092075464

356265057345133343530464697577619970

219454622541437510700205658099921263

603800551509975789545500316881833862

3248610440248058699898159104× 10164,

γ(Dq·r) = 12.4523611636175763409190620554,

ρ(Kq) = 1.66518610121698748654705521849,

ρ(Kr) = 0.464556703287590109955192064205,

C = 15.7765767223748378545315731421.

4. Verification of the Heuristic Estimates

It is important to estimate the number of prime pairs(q, r) of pairing-friendly elliptic curves for generating thecurve. We verify the heuristic estimates by using twomethods using a PC with the same specifications givenin Section 3.2. In the case of KSS-18 curves, we suppose224-bit security level, so we take x of 76-bit length [7,TABLE 20]. In the case of KSS-16 curves, we suppose192-bit security level, so we take x of 50-bit length [7,TABLE 19].

4.1 Method 1

In Method 1, we estimate the number of prime pairs(q, r) = (q+(x), r+(x)) by taking random x with certainbit lengths. Steps (1) and (2) are repeated 3×106 times.

(1) Choose a random integer x ∈ [2ℓ−1, 2ℓ − 1].

(2) Check if q+(x) and r+(x) are prime for x.

Fig. 1 shows a rough sketch of the sampling of x inMethod 1.Table 3 shows the heuristic estimate by using above

steps. Let N be the number of prime pairs. Define Pℓ =N/(3×106), Rℓ = Pℓ× (♯[2ℓ−1, 2ℓ−1]) and Qℓ = Q(2ℓ−1)−Q(2ℓ−1 − 1) where Q(x) is given by (1). We denotethe error of sample survey Rℓ by Eℓ. Define Eℓ = |Rℓ −Qℓ|/Rℓ × 100.

4.2 Method 2

In Method 2, we estimate the number of prime pairs(q, r) = (q+(x), r+(x)) by taking all x in specific in-tervals I1, I2, I3. Fig. 2 shows the intervals for x in

– 75 –

Page 80: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.73–76 Yutaro Kiyomura et al.

Table 3. Experimental results obtained by Method 1.

Curves ℓ Pℓ Rℓ Qℓ Eℓ

KSS-18 76 1.38× 10−4 5200899619800435929 5711425858113572801 9.8%

KSS-16 50 1.46× 10−4 82003043215 94172570023 14.8%

Table 4. Experimental results obtained by Method 2.

Curves ℓ Interval xi,1♯Ii = 5× 105 ♯Ii = 3× 106

RIi PIi QIi EIi RIi PIi QIi EIi

KSS-18 76I1 275 62 1.24× 10−4 76.7 23.7% 413 1.38× 10−4 460.3 11.4%I2 (275 + 276)/2− ♯Ii/2 83 1.66× 10−4 75.5 9.0% 434 1.45× 10−4 453.2 4.4%I3 276 − ♯Ii 51 1.02× 10−4 74.7 46.5% 388 1.29× 10−4 448.3 15.5%

KSS-16 50I1 249 65 1.30× 10−4 85.5 31.5% 420 1.40× 10−4 513.2 22.1%I2 (249 + 250)/2− ♯Ii/2 84 1.68× 10−4 83.5 0.5% 437 1.46× 10−4 501.2 14.7%I3 250 − ♯Ii 60 1.20× 10−4 82.2 37.0% 405 1.35× 10−4 493.0 21.7%

2ℓ-1 2ℓ - 1

x1,1 x1,2 x2,1 x2,2 x3,1 x3,2

ℓ-bit ( length of x )

I3 I2 I1

Ii

Fig. 2. Intervals I1, I2, I3 for x in Method 2.

Method 2. Table 4 shows the heuristic estimate by us-ing the step that for all integers x ∈ [xi,1, xi,2] = Ii (i =1, 2, 3) where ♯Ii = 5×105 and 3×106, we check if q+(x)and r+(x) are prime. Define QIi = Q(xi,2)−Q(xi,1− 1)where Q(x) is given by (1). We denote the number ofactually existing prime pairs in Ii (i = 1, 2, 3) by RIi ,the probability of the actually existing prime pairs byPIi = RIi/♯Ii and the error between RIi and QIi byEIi . Define EIi = |RIi −QIi |/RIi × 100.

4.3 Discussion

The errors occurring in Method 1 (resp. Method 2)seem to be caused by an insufficient number of primesp, i.e., 2 ≤ p ≤ 104, in calculating the product in theHardy-Littlewood constants (2). If we take the productover more primes p with a larger upper-bound, the errorswould be a little bit smaller while the calculating timesget longer.Moreover, we consider statistical effects on the errors

obtained in the experiments as follows.The errors Eℓ occurring in Method 1, which are given

in Table 3, seem to be caused by accidental biases due tolack of samples. The elapsed times for Method 1 are 2870(s) for KSS-18 and 2074 (s) for KSS-16, respectively. Weneed more time for experimenting with more samples.The errors EIi occurring in Method 2, which are given

in Table 4, seem to be caused by narrow intervals. Forthe verification, we experiment with using two widths ofIi, i.e., 5×105 and 3×106, in Method 2. The dispersionof errors EIi with the latter width is reduced from thatwith the former. The errors with the latter large widthget closer to 10–15% (resp. 15–20%) in the case of KSS-18 (resp. KSS-16). With the width ♯Ii = 3 × 106, theelapsed times are 2609–2952 (s) for KSS-18 and 2107–2123 (s) for KSS-16, respectively. We need more time forexperimenting with wider intervals.

5. Conclusion

In this paper, we counted the number of prime pairs(q, r) of KSS curves by using two methods, and esti-mated the probability of the prime pairs of KSS curves.The estimation shows that KSS curves with appropriatesizes of primes (q, r) exist enough for practical use.

References

[1] D. Boneh and M. Franklin, Identity-based encryption fromthe Weil pairing, in: Proc. of CRYPTO 2001, J. Kilian ed.,

LNCS, Vol. 2139, pp.213–229, Springer-Verlag, Berlin, 2001.[2] D. Boneh, G. Di Crescenzo, R. Ostrovsky and G. Persiano,

Public key encryption with keyword search, in: Proc. of EU-ROCRYPT 2004, C. Cachin and J. Camenisch eds., LNCS,

Vol. 3027, pp.506–522, Springer-Verlag, Berlin, 2004.[3] T. Okamoto and K. Takashima, Fully secure functional en-

cryption with general relations from the decisional linear as-sumption, in: Proc. of CRYPTO 2010, T. Rabin ed., LNCS,

Vol. 6223, pp.191–208, Springer-Verlag, Berlin, 2010.[4] D. Freeman, M. Scott and E. Teske, A taxonomy of pairing-

friendly elliptic curves, J. Cryptology, 23 (2010), 224–280.[5] P. Barreto and M. Naehring, Pairing friendly elliptic curve of

prime order, in: Proc. of SAC 2005, B. Preneel and S. Tavareseds., LNCS, Vol. 3897, pp.319–331, Springer-Verlag, Berlin,2006.

[6] E. J. Kachisa, E. F. Schaefer and M. Scott, ConstructingBrezing-Weng pairing friendly elliptic curves using elementsin the cyclotomic field, in: Proc. of Pairing 2008, S. D. Gal-braith and K. G. Paterson eds., LNCS, Vol. 5209, pp.126–135,

Springer-Verlag, Berlin, 2008.[7] C. Costello, Particularly friendly members of family trees,

Cryptology ePrint Archive, Report 2012/072, 2012.[8] J. Boxall, Heuristics on pairing-friendly elliptic curves, J.

Math. Cryptol., 6 (2012), 81–104.[9] M. Naehrig, Constructive and computational aspects of

cryptographic pairings, Ph.D. thesis, Technische UniversiteitEindhoven, 2009.

[10] T. Bateman and R. Horn, A heuristic asymptotic formulaconcerning the distribution of prime numbers, Math. Comp.,16 (1962), 363–367.

[11] R. Balasubramanian and N. Koblitz, The improbability that

an elliptic curve has subexponential discrete log problem un-der the Menezes-Okamoto-Vanstone algorithm, J.Cryptology,11 (1998), 141–145.

[12] H. Davenport and A. Schinzel, A note on certain arithmeticalconstants, Illinois J. Math., 16 (1966), 181–185.

[13] Computational Algebra Group, University of Sydney, TheMAGMA Computational Algebra System for Algebra, Num-

ber Theory, and Geometry.

– 76 –

Page 81: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol.6 (2014) pp.77–80 c⃝2014 Japan Society for Industrial and Applied Mathematics J S I A MLetters

Some results on Parisian walks

Jiro Akahori1 and Yuuki Ida1

1 Department of Mathematical Sciences, Ritsumeikan University, 1-1-1 Nojihigashi, Kusatsu,Shiga 525-8577, Japan

E-mail akahori se.ritsumei.ac.jp

Received September 23, 2014, Accepted September 24, 2014

Abstract

In the present paper, we introduce a framework of a discrete stochastic calculus based onParisian walk, a special kind of symmetric random walk in the complex plane, listing someresults analogue to those for complex Brownian motion. We also discuss, as an application tomathematical finance, a Parisian-walk analogue of Heston’s stochastic volatility model.

Keywords Parisian walk, discrete stochastic calculus, conformal martingale, Heston model

Research Activity Group Mathematical Finance

1. Introduction

Let τ1, . . . , τn, . . . be an i.i.d. sequence with P(τ =1) = P(τ = ζ) = P(τ = ζ2) = 1/3, where we denoteζ = (−1+

√−3)/2. The filtration generated by τ will

be denoted by F ≡ Ft.Definition 1 An F-adapted complex valued processZt is called Parisian (walk) if (i) it is a martingalestarting from a point in Z[ζ] ≡ a + bζ : a, b ∈ Z, and(ii) Zt+1 − Zt =: ∆Zt ∈ 1, ζ, ζ2 for all t.

Thus, a Parisian walk (PW for short) is a random walkon Z[ζ]. Note that there are a lot of Parisian walks asfunctions of τ, but the law is unique up to the initialpoint. The main aim of the present paper is to claimthat PW is a discrete analogue of the complex Brown-ian motion (cBM for short), just as the simple symmetricrandom walk on Z is to the real one dimensional Brow-nian motion.The latter analogy, the real case, is already established

since many similar properties are known (see e.g. [1,2]).For our complex case, we have the following “evidences”so far;

(i) The scaling limit of PW converges to cBM.

(ii) The Ito’s formula for PW (Proposition 4 below)looks very much like (symbolically the same as)the one for cBM.

(iii) We can define a discrete analogue of the conformalmartingale, which is shown to be a time changedParisian walk (Theorem 10). The result is analogueto the well-known fact with the conformal martin-gales.

The first fact (i) is due to the orthogonality betweenRe(∆Z) and Im(∆Z). The second one is coming fromthe fact that the martingale dimension of F is two. Wegive a proof for (ii) in Section 2, and to discuss (iii) wefirst argue the conformalilty in Z[ζ] in Section 3.2, thengive a proof of (ii) in Section 4.@In Section 5, we present a potential application of our

Parisian discrete stochastic calculus to mathematical fi-nance. We first remark that a parametric restriction of

Fig. 1. Parisian walkways Z[ζ].

the Heston model, one of the most popular stochas-tic volatility model, has a representation in terms ofthe squared norm and the area of a two-dimensionalOrnstein-Uhlenbeck process (Proposition 11). Then re-lying on the fact that the pair of the squared norm andthe area of a two-dimensional process has a represen-tation in terms of a complex martingale (Lemma 12),we construct a Parisian discrete analogue of the Hestonmodel.

Remark 2 A fully general discrete Ito formula is givenin [3], where the convergence rate of a scaling limit is alsodiscussed, but analogies with complex stochastic calculuswas out of the scope. Discrete analogues of Malliavincalculus are also studied in [4–6], etc, but, again, noneof them is interested in analogies with complex stochasticcalculus.

2. An Ito formula for Parisian walks

We begin with a lemma.

Lemma 3 Let Z be a Parisian walk. Then the two di-mensional process (Z, Z) enjoys martingale representa-tion property; every complex valued F-martingale is rep-resented as a stochastic integral with respect to (Z, Z).

– 77 –

Page 82: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.77–80 Jiro Akahori et al.

Proof Denote ∆Zt := Zt−Zt−1 for t ∈ Z>0. Fix t andset

∆ZS :=∏si=ζ

∆Zi

∏si=ζ2

∆Zi

for S = (s1, . . . , st) ∈ 1, ζ, ζ2t. Then we haveE[∆ZS∆ZS′ ] = 1 if S = S′ and = 0 otherwise becauseof the martingale property and of the fact that (∆Zt)

2 =∆Zt. Therefore ∆ZS |S ∈ 1, ζ, ζ2t forms an or-thonormal basis (ONB) of L2(Ft) since ♯1, ζ, ζ2t =dimL2(Ft) = 3t.For an adapted Xt, expanding Xt −Xt−1 with re-

spect to this ONB and denoting

E[(Xt −Xt−1)∆ZS ] = xS ,

we have

Xt −Xt−1 =∑st=ζ

xS∆ZS +∑st=ζ2

xS∆ZS +∑st=1

xS∆ZS

=

∑st=ζ

xS∆Z(s1,...,st−1)

∆Zt

+

∑st=ζ2

xS∆Z(s1,...,st−1)

∆Zt

+∑st=1

xS∆Z(s1,...,st−1). (1)

By summing up the above equations, we obtain the Doobdecomposition of X, and this completes the proof.

(QED)

The above lemma can be easily extended to generalunit root cases. The point here is that Parisian walk isthe right discrete analogue of planar Brownian motionas can be seen by the following proposition.

Proposition 4 Let Zt be a Parisian walk, and let fbe a complex valued function on Z[ζ]. Then we have thefollowing formula, which would correspond to an Ito’sformula in F. For t = 0, 1, 2, . . . , we have

f(Zt+1)− f(Zt)

=1

3(Zt+1 − Zt)(

f(Zt + 1) + ζ2f(Zt + ζ) + ζf(Zt + ζ2))

+1

3(Zt+1 − Zt)(

f(Zt + 1) + ζf(Zt + ζ) + ζ2f(Zt + ζ2))

+1

3(f(Zt + 1) + f(Zt + ζ)

+f(Zt + ζ2)− 3f(Zt)). (2)

Proof As in the expression (1),

f(Zt+1)− f(Zt) = α∆Zt+1 + β∆Zt+1 + γ

for some Ft-measurable α, β and γ. On the set of∆Zt+1 = 1, ∆Zt+1 = ζ, and ∆Zt+1 = ζ2 respectively,we have

f(Zt + 1)− f(Zt) = α+ β + γ,

f(Zt + ζ)− f(Zt) = αζ + βζ2 + γ, (3)

and f(Zt + ζ2)− f(Zt) = αζ2 + βζ + γ.

Solving (3) in terms of (α, β, γ), we obtain (2).(QED)

3. A discrete analogue of conformality

3.1 Analogy in Ito formulas

If we put

Df(z) =1

3

∑j=0,1,2

ζ−jf(z + ζj),

Df(z) =1

3

∑j=0,1,2

ζjf(z + ζj),

and

Lf(z) =1

3

∑j=0,1,2

[f(z + ζj)− f(z)

],

the formula (2) becomes

∆f(Zt) := f(Zt)− f(Zt−1)

= Df(Zt−1)∆Zt + Df(Zt−1)∆Zt + Lf(Zt−1).(4)

The discrete Ito’s formula symbolically coincides withthe one of cBM Zt; for f(x + iy) = f1(x, y) + if2(x, y)with f1, f2 ∈ C2(R), we have that

df(Zt) = ∂zf(Zt)dZ+ ∂zf(Zt)dZ+ Lf(Zt)dt,

where

∂z =1

2(∂x + i∂y), (5)

and

∂z =1

2(∂x − i∂y), (6)

the right-hand-side of (5) and (6) acting on f1 and f2,

L =1

2∂z∂z,

is the Laplacian (see e.g. [7]).

3.2 Conformality in Z[ζ]If f is analytic, or equivalently ∂z = 0, we have that

df(Zt) = f ′(Zt)dZ.

With this in mind, we define the conformality in Z[ζ] asfollows:

Definition 5 We say a map f : Z[ζ] → Z[ζ] isParisian conformal or p-conformal if Df(z) = 0and Lf(z) = 0 for all z ∈ K.

We give the following basic result, which insists thatour definition of conformality is proper in a geometricsense.

Proposition 6 A map f : Z[ζ] → Z[ζ] is p-conformalif and only if it has the following property; any triangleof the form z+1, z+ ζ, z+ ζ2 is mapped to a trianglef(z) + c, f(z) + cζ, f(z) + cζ2 for some c ∈ Z[ζ] withf(z + ζj) = f(z) + cζj, j = 0, 1, 2.

Proof The “if” part is straightforward since we cancalculate directly Df and Lf by the assumed property.

– 78 –

Page 83: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.77–80 Jiro Akahori et al.

The converse is also easy to see; since Df(z) = Lf(z) =0, we obtain by the Ito’s formula (4),

f(z + ζj) = f(z) +Df(z)ζj , j = 0, 1, 2.

Here we notice in particular that Df(z) ∈ Z[ζ], whichwe state separately as a corollary.

(QED)

Corollary 7 If f : Z[ζ] → Z[ζ] is p-conformal, thenthe image of the map Df is in Z[ζ].Example 8 Except the trivial ones like constant or lin-ear functions in z, the simplest example of p-conformalmap would be z2 − |z|. Note that z2 is not p-conformal.In general, among monic polynomials of a fixed degree,there is only one p-conformal map. A proof will be givenin [8].

4. Discrete analogue of conformal mar-

tingales

In this section, we give a probabilistic “credit” thatours is a discrete analogue of the conformality. First wegive a definition of a “Parisian” conformal martingale.

Definition 9 We say an F martingale M is p-conformal if it is Z[ζ]-valued and is represented by amartingale transform with respect to Z.

From the definitions and the Ito’s formula (4), it isstraightforward that f(Z) is a p-conformal martingaleif and only if f : Z[ζ]→ Z[ζ] is p-conformal.We yet have the following theorem, which also implies

that the definition is proper since it is also a discreteanalogue of the fact that “a conformal martingale is atime changed cBM” (see e.g. [7, Proposition 3.6.2]).

Theorem 10 For a p-conformal martingale M withrespect to a Parisian walk Z, there exists anotherParisian Walk Z and a sequence of stopping times T1 <T2 < · · · < ∞ such that M is identically distributedas ZTt

as a stochastic process.

Proof We first note that a Parisian walk is recurrent,which can be proven in a similar way as the case withthe simple random walk on Z2. We choose a sequence ofstopping times T1 < T2 < · · · <∞ recursively as T0 = 0,

Tk :=

inft > Tk−1 : Zt ∈ ZTk−1+Mk−1ζ

j , k = 0, 1, 2,

k = 1, 2, . . .

Then it is easy to see that the sequence satisfies thedesired property.

(QED)

5. Parisian analogue of Heston’s stochas-

tic volatility model

In this section, we discuss a potential application ofour Parisian stochastic calculus to mathematical finance.

5.1 Heston’s stochastic volatility model

In Heston’s model [9], the stock price at time t is givenby

St = S0 exp

(∫ t

0

√νs dBs + rt− 1

2

∫ t

0

νsds

),

where r > 0 stands for the risk-free rate, B is a 1-dimensional standard Brownian motion (under an equiv-alent martingale measure), and (the square root) of thevolatility νt is a Cox-Ingersol-Ross process;

dνt = ξ√νt dB

′t + κ(θ − νt)dt, (7)

where ξ is the volatility of volatility assumed to be con-stant, κ is the rate at which νt reverts to θ, the longvariance. Here B′ is another standard Brownian motionsuch that

d⟨B,B′⟩ = ρdt

for some ρ ∈ [−1, 1].It is known that when ξ2 ≤ 2κθ, the process stays

strictly positive (see e.g. [10, Chapter 6, Section 3.1]).

5.2 A decomposition of a Heston process

We work on the special case ξ2 = 2κθ. It is also wellknown that in this case the unique strong solution to (7)is given by

νt = |Ot|2,

where Ot = (O1t , O

2t ) is a two-dimensional Ornstein-

Uhlenbeck process; Oj , j = 1, 2 solve

dOjt =

ξ

2dW j

t −κ

2Oj

tdt. (8)

Here (W 1,W 2) is a two-dimensional Brownian motion.In fact, since

d(|Ot|2) = 2O1dO1t + 2O2dO2

t +ξ2

2dt

= ξ(O1dW 1 +O2dW 2)

− κ(O1)2 + (O2)2dt+ ξ2

2dt

= ξ|Ot|O1dW 1 +O2dW 2

|Ot|+ κ(θ − |Ot|2)dt,

and since ⟨O1dW 1 +O2dW 2

|Ot|

⟩= dt,

it is a Brownian motion, we see that |Ot|2 solves (7).Further, we have the following fact, which may not be

new.

Proposition 11 When ξ2 = 2κθ, X := logSt has thefollowing identity in law as a stochastic process:

Xt −X0

=√1− ρ2

(∫ t

0

O1sdO

2s −

∫ t

0

O2sdO

1s

)+ρ

ξ

(|Ot|2 − |O0|2

)+

(ρξ

θ− 1

2

)∫ t

0

|Ot|2ds

– 79 –

Page 84: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.77–80 Jiro Akahori et al.

+

(r − ρξ

2

)t. (9)

Proof Observe that

O1t dO

2t −O2

t dO1t = O1

t dW2t −O2

t dW1t

= |Ot|O1

t dW2t −O2

t dW1t

|Ot|

=: |Ot|dB′′t .

Here B′′ is a Brownian motion independent of B′ sinceit is a martingale, d⟨B′′⟩ = dt, and d⟨B′, B′′⟩ = 0.On the other hand, since d⟨B,B′⟩ = ρdt, we have

Xt −X0

d=

∫ t

0

√νs

(ρdB′

s +√1− ρ2 dB′′

s

)+ rt− 1

2

∫ t

0

νsds

ξ(νt − ν0) +

√1− ρ2

∫ t

0

√νs dB

′′s

+

(r − ρξ

2

)t+

(ρξ

θ− 1

2

)∫ t

0

νsds,

which leads to (9).(QED)

5.3 Discrete analogue of the stochastic area and thesquared Bessel processes

For a discrete deterministic process X : Z0 → C, de-fine H(X) : Z0 → C by

H0(X) = |X0|2, ∆Ht(X) = Xt−1∆Xt, t = 1, 2, . . .

Then, we have the following.

Lemma 12 (i) The squared distance from 0 of Xt isrepresented by the real part of H(X)t for each t > 0;

|Xt|2 = 2ReHt(X) +t∑

j=1

|∆Xj |2,

and (ii) the aggregation of the (oriented) areas of thetriangle drawn by Xs−1, Xs and 0, s = 1, . . . , t is repre-sented by the imaginary part of H(X)t for each t > 0;

A(X)t :=1

2

t∑s=1

(ReXs−1)(ImXs)− (ReXs)(ImXs−1)

=1

2ImHt(X).

Proof For (i),

(X0 +∆X1 + · · ·+∆Xt)(X0 +∆X1 + · · ·+∆Xt)

= |X0|2 +t∑

j=1

|∆Xj |2 + 2Re(X0

t∑j=1

∆Xj)

+ 2Ret∑

j=1

j−1∑i=1

∆Xi∆Xj

= 2ReHt(X) +t∑

j=1

|∆Xj |2.

The relation (ii) is obvious.(QED)

Since the Ornstein-Uhlenbeck process (8) can be ap-proximate by our Parisian walk by taking a scaling limitwith a Girsanov-Maruyama type measure-change, wemay claim that

SPt :=

√1− ρ2 ImHt(Z) +

ξ

(2ReHt(Z)− |Z0|2

)+

(2ρξ

θ− 1

)∑ReHt(Z)∆t+

(r − ρξ

2

)t,

(10)

where Z is a Parisian walk, is a discrete analogue ofHeston’s model, with a proper change of measures.

Acknowledgments

This work was partially supported by JSPS KAK-ENHI Grant Numbers 23330109, 24340022, 23654056and 25285102.

References

[1] T. Fujita, A random walk analogue of Levy’s theorem, StudiaSci. Math. Hungar., 45 (2008), 223–233.

[2] T.Fujita and M.Yor, On the remarkable distributions of max-ima of some fragments of the standard reflecting random walkand Brownian Motion, Probab.Math. Statist., 27 (2007), 89–104.

[3] J. Akahori, A discrete Ito calculus approach to He’s frame-work for multi-factor discrete market, Asia-Pacific Finan.Markets, 12 (2005), 273–287.

[4] J. Akahori, T. Amaba and K. Okuma, A discrete-time Clark-

Ocone formula and its application to an error analysis,arXiv:1307.0673v2 [math.PR], 2013.

[5] T. Amaba, A discrete-time Clark-Ocone formula for Poisson

functionals, Asia-Pacific Finan. Markets, 21 (2013), 97–120.[6] N. Privault, Stochastic Analysis in Discrete and Continuous

Settings, Springer-Verlag, Berlin, 2009.[7] N. Ikeda and S. Watanabe, Stochastic Differential Equations

and Diffusion Processes, North-Holland, Amsterdam, 1989.[8] J. Akahori, Y. Ida and G. Markowsky, work in progress.[9] S. L. Heston, A closed-form solution for options with stochas-

tic volatility with applications to bond and currency options,

Rev. Finan. Stud., 6 (1993), 327–343.[10] M.Jeanblanc, M.Yor and M.Chesney, Mathematical Methods

for Financial Markets, Springer-Verlag, London, 2009.

– 80 –

Page 85: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol.6 (2014) pp.81–84 c⃝2014 Japan Society for Industrial and Applied Mathematics J S I A MLetters

Accelerated multiple precision matrix multiplication using

Strassen’s algorithm and Winograd’s variant

Tomonori Kouya1

1 Shizuoka Institute of Science and Technology, 2200-2 Toyosawa, Fukuroi, Shizuoka 437-8555,Japan

E-mail tkouya cs.sist.ac.jp

Received August 28, 2014, Accepted October 7, 2014

Abstract

The Strassen algorithm and Winograd’s variant accelerate matrix multiplication by usingfewer arithmetic operations than standard matrix multiplication. Although many papers havebeen published to accelerate single- as well as double-precision matrix multiplication by usingthese algorithms, no research to date has been undertaken to accelerate multiple precisionmatrix multiplication. In this paper, we propose a multiple precision matrix multiplicationprogram for matrices of any size and test its performance. We also reveal special propertiesof our program through its application to LU decomposition.

Keywords matrix multiplication, multiple precision arithmetic, Strassen’s algorithm

Research Activity Group Algorithms for Matrix / Eigenvalue Problems and their Applications

1. Introduction

Current large-scale scientific computations use mul-tiple precision (MP) floating-point arithmetic beyondthe IEEE 754 single-precision (SP) and double-precision(DP) computation standard to obtain precise numeri-cal solutions. Although MP arithmetic libraries, such asMultiple Precision Floating-Point Reliability (MPFR)and the GNU Multiple Precision Arithmetic Library(GMP), are software-based implementations, their MPnumerical computations are typically much slower thanhardware-based SP and DP computations. To preventthe consequent increase in computational cost, efficientMP numerical computation requires acceleration tech-niques, such as effective use of cache memory and algo-rithms to reduce the complexity of the computations.Matrix multiplication is one of the most important

parts of numerical computation. It is well known throughresearch in DP matrix multiplication [1,2], that its com-putational cost can be reduced by using Strassen’s algo-rithm [3] and Winograd’s variant [4]. By referring to pastresults, we can expect that MP matrix multiplication us-ing these algorithms is more effective than in case of DParithmetic. On the other hand, less precise numerical re-sults may be obtained by applying Strassen’s algorithmand its variant [5].In this paper, we propose the acceleration of MP ma-

trix multiplication using Strassen’s algorithm by com-paring block matrix multiplication to increase the hit ra-tio of the cache memory in the CPU. We apply this accel-erated MP matrix multiplication to LU decomposition,and examine both well-conditioned and ill-conditionedexamples in order to study its numerical properties.

2. Algorithms of Matrix Product

We consider the real matrix multiplication C :=AB = [cij ] ∈ Rm×n, where A = [aij ] ∈ Rm×l andB = [bij ] ∈ Rl×n in this paper. We use the followingalgorithm to calculate cij :

cij :=l∑

k=1

aikbkj . (1)

Eq. (1) is called “simple matrix multiplication” (“Sim-ple,” for short).To increase the hit ratio of the cache memory in the

processor, “block matrix multiplication” (Block) withdivided A and B are always used in well-tuned BasicLinear Algebra Subprogram (BLAS) libraries, such asthe Automatically Tuned Linear Algebra Software (AT-LAS) and the Intel Math Kernel. In this paper, we divideA and B into small ML pieces of Aik and LN pieces ofBkj , respectively. We can hence obtain blocked Cij bythe following matrix multiplication:

Cij :=L∑

k=1

AikBkj .

These simple and blocked matrix multiplication pro-cedures have identical computational cost.On the other hand, Strassen’s algorithm to reduce the

computational cost of matrix multiplication is recursive[3]. For even-dimensional matrices A and B (m, n, andl are even), we divide A and B as follows:

A =

[A11 A12

A21 A22

], B =

[B11 B12

B21 B22

]. (2)

We calculate intermediate block matrices Pi (i =1, 2, . . . , 7) by using four divided Aijs and Bijs (i, j =

– 81 –

Page 86: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.81–84 Tomonori Kouya

1, 2) as follows:

P1 := (A11 +A22)(B11 +B22),

P2 := (A21 +A22)B11,

P3 := A11(B12 −B22),

P4 := A22(B21 −B11),

P5 := (A11 +A12)B22,

P6 := (A21 −A11)(B11 +B12),

P7 := (A12 −A22)(B21 +B22).

By using these matrices P1, P2, . . . , P7, we can calcu-late C as blocked Cij (i, j = 1, 2) as follows:

C :=

[P1 + P4 − P5 + P7 P3 + P5

P2 + P4 P1 + P3 − P2 + P6

].

By applying Strassen’s algorithm to matrix multiplica-tion, the number of real multiplication Mul(m, l, n) andreal addition-subtraction operations Addsub(m, l, n) tocalculate matrix C using A and B is reduced as follows:

Mul(m, l, n) = 7Mul

(m

2,l

2,n

2

),

Addsub(m, l, n) = 5Addsub

(m

2,l

2

)+ 5Addsub

(l

2,n

2

)+ 8Addsub

(m2,n

2

).

Winograd proposed the self-titled “Winograd’s vari-ant” (Winograd) algorithm that requires fewer matrixaddition and subtraction operations than Strassen’s al-gorithm [4]. Winograd’s variant is constructed with di-vided even-dimensional matrices in the same manner asin Strassen’s algorithm (2). It computes matrix multi-plication in the following three steps:

S1 := A21 +A22, S2 := S1 −A11,

S3 := A11 −A21, S4 := A12 − S2,

S5 := B12 −B11, S6 := B22 − S5,

S7 := B22 −B12, S8 := S6 −B21,

(3)

M1 := S2S6, M2 := A11B11, M3 := A12B21,

M4 := S3S7, M5 := S1S5, M6 := S4B22,

M7 := A22S8,

(4)

T1 :=M1 +M2, T2 := T1 +M4. (5)

Through (3)→(4)→(5), we can obtain C as follows:

C :=

[M2 +M3 T1 +M5 +M6

T2 −M7 T2 +M5

].

Winograd’s variant involves the following arithmeticaloperations:

Mul(m, l, n) = 7Mul

(m

2,l

2,n

2

),

Table 1. Relative complexity of Strassen’s and Winograd’s algo-rithms (vs. Simple and Block algorithms).

Strassen Winograd

nmin = 32 Add & Sub Mul Add & Sub Mul

255 × 255 0.678 0.781 0.678 0.764

256 × 256 0.670 0.772 0.670 0.755257 × 257 0.674 0.775 0.674 0.758511 × 511 0.590 0.688 0.590 0.672512 × 512 0.586 0.684 0.586 0.668

513 × 513 0.589 0.686 0.589 0.6701023 × 1023 0.514 0.605 0.514 0.5901024 × 1024 0.513 0.603 0.513 0.5881025 × 1025 0.514 0.604 0.514 0.589

2047 × 2047 0.449 0.531 0.449 0.5172048 × 2048 0.449 0.530 0.449 0.5162049 × 2049 0.450 0.531 0.450 0.517

Addsub(m, l, n) = 4Addsub

(m

2,l

2

)+ 4Addsub

(l

2,n

2

)+ 7Addsub

(m2,n

2

).

As we can observe, it can reduce a Addsub(m/2, l/2),a Addsub(l/2, n/2), and a Addsub(m/2, n/2) operation.In addition to Strassen’s algorithm and Winograd’s

variant, we implement two matrix multiplication algo-rithms: a simple three-loop algorithm (1) and a blockalgorithm (2). The four algorithms can obtain ma-trix products of any precision for matrices of any size.Strassen and Winograd recursively divided matrices Aand B until the row and column dimensions were smallerthan nmin as the minimal dimension. In case of an oddnumber of row or column dimensions of A or B, we fitthem to become even by using a mixture of dynamicpadding and peeling [1].Table 1 shows the reduction rates of Strassen’s algo-

rithm and Winograd’s variant in comparison with Sim-ple and Block in case of nmin = 32. Both recursive al-gorithms can reduce multiplication operations by 45%and addition-subtraction operations by 52% in case ofm × n = 2048 × 2048. There is a difference of a fewpercentage points in the efficiency of the addition andsubtraction operations between the Strassen algorithmand Winograd’s variant, and manifests itself as a moresignificant difference in computational time, as shown inthe next section.

3. Benchmark tests of square and rect-

angle matrix multiplications

In this section, we use aij and bij , the elements of Aand B, respectively, as follows:

aij =√5 (i+ j − 1), bij =

√3 (n− i+ 1).

We then show the results of C := AB. Our numericalcomputational environment was as follows:

H/W Intel Core i7 3850 (3.6 GHz), 64 GB RAM.

S/W Scientific Linux 6.3 x86 64, Intel C Compiler Ver.13.0.1, BNCpack ver. 0.8, MPFR 3.1.2, GMP 5.1.3.

– 82 –

Page 87: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.81–84 Tomonori Kouya

Table 2. Computation time: Block algorithm (128 bits).

m× n Simple Block(16) Block(32) Block(64)

255 × 255 1.06 1.20 1.22 1.24256 × 256 1.25 1.22 1.22 1.25

257 × 257 1.04 1.25 1.28 1.37511 × 511 9.60 9.71 9.61 10.02512 × 512 10.83 9.70 9.68 9.96

513 × 513 10.02 9.89 9.97 10.441023 × 1023 107.78 77.63 77.80 79.361024 × 1024 213.09 77.77 77.72 79.511025 × 1025 94.62 78.92 78.41 81.48

2047 × 2047 756.81 627.75 619.21 648.312048 × 2048 1679.04 624.86 618.87 639.712049 × 2049 632.74 623.24 625.69 640.84

Table 3. Computation time: Strassen’s and Winograd’s algo-rithms (128 bits).

m× n min(Simple, Block) Strassen Winograd

255 × 255 1.06 0.72 0.63256 × 256 1.22 0.70 0.57257 × 257 1.04 0.74 0.60

511 × 511 9.60 4.84 4.06512 × 512 9.68 4.77 3.73513 × 513 9.89 4.92 3.88

1023 × 1023 77.63 32.02 25.57

1024 × 1024 77.72 31.53 24.101025 × 1025 78.41 32.21 24.772047 × 2047 619.21 211.80 163.87

2048 × 2048 618.87 211.19 155.672049 × 2049 623.24 212.79 157.52

Table 4. Computation time: Strassen’s and Winograd’s algo-

rithms (1024 bits).

n× n min(Simple, Block) Strassen Winograd

255 × 255 5.46 2.33 1.95256 × 256 5.53 2.31 1.72257 × 257 5.61 2.41 1.81511 × 511 43.81 13.40 10.57

512 × 512 44.20 13.02 9.44513 × 513 44.37 13.38 9.81

1023 × 1023 352.79 76.93 57.981024 × 1024 355.99 74.58 52.47

1025 × 1025 356.58 76.36 54.222047 × 2047 2820.16 454.02 329.412048 × 2048 2824.34 446.87 302.562049 × 2049 2829.95 456.08 307.05

All computations were serially executed without anyparallelization. Since MPFR is a binary multiple pre-cision floating-point library, we used a binary length ofthe mantissa as precision within a range of 128 to 8192bits.We first discuss the results of square matrix multi-

plication (m = n = l). Table 2 is obtained by usingSimple and Block. Block(nmin) represents the minimaldimension of the divided block matrices Aik and Bkj asthree values of nmin = 16, 32, 64. Block is the most effec-tive algorithm in the case of 128-bit precision arithmetic,but we cannot recognize the difference between Simpleand Block in case of 1024-bit precision. In case of 128bits, the largest relative error in the elements of C was1.34× 10−37 and the smallest was 5.23× 10−39. As a re-sults, we obtained many times of smallest computationaltimes in the case of nmin = 32.

Table 5. Computation time: Rectangle matrix multiplication(nmin = 32, Unit: seconds).

128 bits computationm(= n), l Simple Block(32) Strassen Winograd

1024, 63 5.09 5.86 5.31 4.29

1024, 64 5.18 5.91 5.03 3.991024, 65 5.26 6.30 5.24 4.21

1024, 127 10.47 11.8 9.43 6.451024, 128 10.54 11.86 8.71 5.42

1024, 129 10.63 12.24 8.90 5.651024, 255 52.51 23.69 14.67 9.391024, 256 52.27 23.77 13.13 7.51

1024, 257 52.56 24.07 13.39 7.701024, 511 110.75 47.42 26.40 16.551024, 512 106.19 47.44 21.85 11.831024, 513 110.82 47.96 22.08 12.09

1024 bits computationm(= n), l Simple Block(32) Strassen Winograd

1024, 63 24.71 26.74 19.92 14.881024, 64 25.77 27.13 19.49 14.31

1024, 65 26.24 27.82 20.04 14.911024, 127 52.37 53.75 32.26 19.021024, 128 53.17 54.23 30.69 16.741024, 129 53.59 54.97 30.79 17.37

1024, 255 105.04 108.90 47.43 24.151024, 256 106.34 108.64 43.12 19.831024, 257 106.92 109.83 43.63 20.481024, 511 244.07 216.82 71.30 34.18

1024, 512 245.69 216.65 61.01 25.221024, 513 247.96 217.69 62.14 25.91

We show Table 3 (128 bits precision) and Table 4(1024 bits) for the sake of comparison. These resultsof Strassen’s algorithm and Winograd’s variant are ob-tained with nmin = 32 due to these of Blocks.The maximum relative errors in cijs are as follows:

128bits Strassen: 3.20×10−36, Winograd: 2.25×10−35.

1024bits Strassen: 6.30 × 10−306, Winograd: 3.92 ×10−305.

On occasion, the results of Winograd’s variant are worsethat those of Strassen’s algorithm by one decimal digit.We list the computation times of rectangle matrix

multiplication for the four algorithms in Table 5. All ma-trix multiplication operations obtain 1024-dimensionalsquare matrices as their final result. In these cases, Blockis faster than Simple beyond l = 255 or 511, and Wino-grad is always faster than Strassen within 37 seconds.

4. Application to LU decomposition

It is well known that matrix multiplication can beapplied to LU decomposition [6]. In this section, no LUdecomposition involves any pivoting.We consider the linear equation (6) with A ∈ Rn×n,

b ∈ Rn:

Ax = b. (6)

We use direct methods for the LU decomposition of thecoefficient matrix by setting the block size to K. LU de-composition with matrix multiplication (the underlinedpart) is then as follows:

– 83 –

Page 88: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.81–84 Tomonori Kouya

(1) Divide A into A11 ∈ RK×K , A12 ∈ RK×(n−K),A21 ∈ R(n−K)×K , and A22 ∈ R(n−K)×(n−K).

(2) Decompose A11 into L11U11(= A11), and thentransform A12 to U12 and A21 to L21.

(3) A(1)22 := A22 − L21U12.

After substituting A := A(1)22 , repeat the above algorithm

until n−K ≥ 0.We employ a random matrix as an instance of a well-

conditioned matrix and a Lotkin matrix as that of anill-conditioned one.

Random Matrix aij is a random number in [−1, 1].Lotkin Matrix

aij =

1 (i = 1)

1i+j−1 (i ≥ 2)

.

The true solution is x = [0, 1, . . . , n − 1]T , and we setb := Ax. The condition numbers ∥A∥1∥A−1∥1 of therandom matrix and the Lotkin matrix in n = 1024 are4.4 × 106 and 4.3 × 101576, respectively. For the Lotkinmatrix, we must use more than 8192 bits (about 2466decimal digits ) in n = 1024.The size of the Ks are set as K = αnmin (α =

1, 2, . . . , 10) and nmin = 32. Furthermore, we investi-gated the computation time (seconds) and the maxi-mum relative error of the numerical solutions x at eachα. Fig. 1 (random matrix) and Fig. 2 (Lotkin matrix)show the results. For comparison, the computation timeand the maximum relative errors obtained using normalLU decomposition (column-wise LU) are shown in thesefigures.We observe that we can reduce computation time by

21 to 26% for a random matrix (n = 1024). For largervalues of α, the maximum relative errors grow from ap-proximately two to four decimal digits. The computationtimes of Strassen’s algorithm andWinograd’s variant arewithin two seconds of each other.We only show the results of using Winograd variant

on the Lotkin matrix. In this case, the relative error in-creased 138 decimal digits (n = 1024). Thus, Winograd’svariant operates in 8650 bits of computation in order torecover the increment of the relative error. Consequentlywe can reduce the computation time by 32%.

5. Conclusion and future work

We obtained the following results through our bench-mark tests involving Simple, Block, Strassen’s algo-rithm, and Winograd’s variant.

• The Block algorithm was more efficient than Simplealgorithm when precision was relatively low, even ina multiple precision arithmetic environment.

• Winograd’s variant is always faster than Strassen’salgorithm.

• LU decomposition with Strassen’s algorithm andWinograd’s variant is faster than column-wise LU,but causes the loss of significant digits when the rel-evant coefficient matrix is ill-conditioned, such asLotkin matrix.

min: 32.7 s

-70

-68

-66

-64

-62

-60

-58

-56

20

25

30

35

40

45

1 2 3 4 5 6 7 8 9 10

log1

0(M

ax.R

elat

ive

Erro

r)

Co

mp

.Tim

e (s

)

α

Strassen vs. Winograd: Rand. matrix, 1024×1024, 256 bits

Strassen Comp.TimeWinograd Comp.TimeStrassen Max.Rel.ErrWinograd Max.Rel.Err

Normal LU: 41.2 s

Normal LU: 8.4E-68

min: 94.6 s

-300

-298

-296

-294

-292

-290

-288

-286

60

70

80

90

100

110

120

130

1 2 3 4 5 6 7 8 9 10

log1

0(M

ax.R

elat

ive

Erro

r)

Co

mp

.Tim

e (s

)

α

Strassen vs. Winograd: Rand. matrix, 1024×1024, 1024 bits

Strassen Comp.TimeWinograd Comp.TimeStrassen Max.Rel.ErrWinograd Max.Rel.Err

Normal LU: 124.4 s

Normal LU: 1.7E-298

Fig. 1. Computation time and relative error of 1024× 1024 ran-dom matrix (Upper: 256 bits, Lower: 1024 bits).

min: 1617.5 s

-1200

-1000

-800

-600

-400

-200

0

0

500

1000

1500

2000

2500

1 2 3 4 5 6 7 8 9 10

log1

0(M

ax.R

elat

ive

Erro

r)

Co

mp

.Tim

e (s

)

α

Winograd 8192bits vs. 8650bits: Lotkin matrix, 1024 × 1024

Winograd(8192 bits) Comp.Time Winograd(8650 bits) Comp.Time

Winograd(8192 bits) Max.Rel.Err Winograd(8650 bits) Max.Rel.Err

Normal LU: 2376.9 s

Normal LU: 3.3E-903

Fig. 2. Computation time and relative error of 1024 × 1024Lotkin matrices (8192 and 8650 bits).

In future research, we will modify the block and recur-sive algorithms by using turning and parallelizing tech-niques.

References

[1] S. Huss-Lederman, E.M. Jacobson, J. R. Johnson, A. Tsao andT.Turnbull, Implementation of Strassen’s algorithm for matrix

multiplication, in: Proc. of Supercomputing ’96, pp. 6–9, 1996.[2] J. Li, S. Ranka and S. Sahni, Strassen’s matrix multiplication

on GPUs, in: Proc. of the 2011 IEEE 17th International Con-ference on Parallel and Distributed Systems, ICPADS ’11, pp.

157–164, Washington, DC, 2011.[3] V. Strassen, Gaussian elimination is not optimal, Numer.

Math., 13 (1969), 354–356.[4] D. Coppersmith and S. Winograd, Matrix multiplication via

arithmetic progressions, J. Symbolic Comput., 9 (1990), 251–280.

[5] N.J.Higham, Accuracy and Stability of Numerical Algorithms,2nd ed., SIAM, Philadelphia, 2002.

[6] G. H. Golub and C. F. van Loan, Matrix Computations, 4thed., Johns Hopkins University Press, Baltimore, 2013.

– 84 –

Page 89: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol.6 (2014) pp.85–88 c⃝2014 Japan Society for Industrial and Applied Mathematics J S I A MLetters

A key exchange protocol based on Diophantine equations

and S-integers

Attila Berczes1, Lajos Hajdu1, Noriko Hirata-Kohno2, Tunde Kovacs1,2 and Attila Petho1

1 University of Debrecen, 4032 Debrecen, Egyetem ter 1, Debrecen, Hungary2 Nihon University, 4-8-24 Kudan-Minami, Tokyo 102-8275, Japan

E-mail hirata math.cst.nihon-u.ac.jp

Received June 9, 2014, Accepted September 22, 2014

Abstract

The aim of this article is to present a cryptosystem with a new key exchange protocol based onDiophantine equations of polynomial type. Our protocol is inspired by that of H. Yosh whosesecurity comes from a translation of Diophantine equations. We suggest here a key exchangeprotocol relying on the hardness of solving Diophantine equations in the ring of S-integers.

Keywords cryptography, key exchange protocol, S-integers, Diophantine equation

Research Activity Group Algorithmic Number Theory and Its Applications

1. Introduction

The starting point of public key cryptography is con-sidered in the article of Diffie and Hellman [1] wherethe authors describe a new kind of cryptography, in-cluding the need of a key distribution system, known asthe Diffie-Hellman key exchange protocol. The theory ofpublic key cryptography has gone through a vast devel-opment since the introduction of their protocol. Someprotocols turned out to be un-secure, and others wereconsidered to be safe. However, a breakthrough due tothe continuous efforts may break the security of any pro-tocol, so creating new key exchange protocols remainsone of the primary tasks in the theory of cryptogra-phy. Indeed, key exchange protocols are mainly basedon mathematical problems, which are sufficiently diffi-cult.In 2011, H. Yosh [2] suggested the use of a key ex-

change protocol, the security of which is based on thehardness of solving Diophantine equations (indeed, themeaning of key exchange protocol here is slightly differ-ent from the usual terminology, but we follow the wayproposed in [2]). In [3], N. Hirata-Kohno and A. Pethoanalyzed the protocol due to Yosh, revealing severalweaknesses of the protocol, and suggested a modifica-tion of it. They removed partially the weaknesses andsuggested a choice of the parameters, which is secureagainst ciphertext-only attack.We give here a new key exchange protocol based on

S-integer solutions to Diophantine equations with an ex-ample, relying again on the idea by Yosh, but addition-ally combined with the complexity of S-integers. In ournew protocol, the public key size is much less than in theprevious versions, but provides at least the same level ofsecurity.

2. The key exchange protocol of H. Yosh

Let R be a ring. The protocol of Yosh is defined inthe case R = Z, but the idea works in the same way for

different rings, therefore we shall present the protocol ina general case. In [3] Hirata-Kohno and Petho simplifiedthe protocol of Yosh, according to their needs, howeverthat is essentially the protocol of Yosh. We shall describeit now in details.Alice and Bob are willing to agree in a secret key using

only unsecured channels for their communications. Inorder to do this they perform the following steps:

(i) Alice chooses elements r1, . . . , rm ∈ R and con-structs a polynomial Diophantine equation with co-efficients in R:

f(X1, . . . , Xm) = 0, in X1, . . . , Xm ∈ R (1)

such that the m-tuple (r1, . . . , rm) ∈ Rm is a solu-tion to the equation (1).

(ii) Alice keeps the m-tuple (r1, . . . , rm) ∈ Rm secret,and sends the polynomial f(X1, . . . , Xm) to Bob viathe unsecured channel. Consequently, the polyno-mial f(X1, . . . , Xm) has to be considered public.

(iii) Bob chooses randomly a polynomial g(X1, . . . , Xm)∈ R[X1, . . . , Xm] and chooses random elements ai ∈R and 0 ≤ bi ∈ Z (1 ≤ i ≤ n) with b1, . . . , bn odd,and defines the function

Tai,bi(X) := (X + ai)bi (1 ≤ i ≤ n)

so as to be invertible. Then Bob computes the poly-nomial

H(X1, . . . , Xm)

:= Tan,bn (. . . Ta1,b1 (g(X1, . . . , Xm)) . . . )

and takes a random element

h(X1, . . . , Xm)

∈ H(X1, . . . , Xm)

+ f(X1, . . . , Xm) ·R[X1, . . . , Xm].

(iv) Bob then sends g and h to Alice through the un-secured channel, but he keeps the elements ai ∈ R

– 85 –

Page 90: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.85–88 Attila Berczes et al.

and bi ∈ Z≥0 (1 ≤ i ≤ n) secret.(v) Alice, in the possession of g and h, computes the

values s = g(r1, . . . , rm) and u = h(r1, . . . , rm), andsends the element u to Bob through the unsecuredchannel.

(vi) For 1 ≤ i ≤ n, Bob computes the inverse functionsT−1ai,bi

to the bijective polynomial functions Tai,bi ,and obtains the value

s = T−1a1,b1

(. . . T−1

an,bn(u) . . .

),

which should be the shared secret of Alice and Bob.

3. Previous results concerning the secu-

rity of the protocol of Yosh

In [3] the authors proved the correctness of the proto-col of Yosh, simplified it and gave a careful analysis ofthe security of the simplified protocol.They also suggested a finite field version. For the

choice of the polynomial f the most important require-ment is that it has to be extremely hard to solve theequation f(X1, . . . , Xm) = 0 in Rm. More precisely theauthors proved in [3]:

Proposition 1 ([3, Proposition 3]) If the adversarycan compute more than 2m solutions to (1), not neces-sarily (r1, . . . , rm), then he/she can compute the elements and breaks the protocol.

Compared to Proposition 1, there is an importantdrawback to the finite field version of the protocol. In-deed, if we choose random values r2, . . . , rm from thefinite field for X2, . . . , Xm then a question to decidewhether the equation

f(X1, r2, . . . , rm) = 0

has a solution in X1 from the finite field or not, canbe answered in probabilistic polynomial time [4], and inthe case, a solution also can be found in probabilisticpolynomial time. This might enable the attacker to findmany solutions to the equation (1), which in view ofProposition 1 undermines the security of the protocol. Sowe cannot consider the finite field version of the protocolsafe.Thus in the present paper we suggest a variant of the

protocol of Yosh, which works over the rational integersand the ring of S-integers, in the case when considerablymore solutions to (1) are needed to break the protocol.

4. Our new key exchange protocol on the

ring of S-integers

In many cases, even if it is not possible to completelysolve a Diophantine equation, it may be feasible to findseveral “small” solutions by chance. This makes the pro-tocol of Yosh un-secure, both in its original form and inthe modified form analyzed in [3]. This weakness mightbe compensated by choosing the parameter n large, butas pointed out in [3], this becomes impossible by practi-cal considerations. Further, by the same reason the pos-itive integers bi (1 ≤ i ≤ n) must be also very small.Thus the number of the free parameters ai ∈ R and

bi ∈ Z≥0 (1 ≤ i ≤ n) cannot be sufficiently increased inthe protocol of Yosh.We mention that the polynomial functions Tai,bi in

the protocol of Yosh are of a special form only be-cause this form may guarantee that their compositefunction is invertible. So, in our new key exchange pro-tocol, first we suggest to choose a general polynomialfunction T which is invertible, instead of the functionTan,bn (. . . Ta1,b1 (X) . . . ).Second, let S = p1, . . . , pk be a finite set of dis-

tinct rational primes with a suitable k. Consider a ra-tional number a/b with a, b ∈ Z and gcd (a, b) = 1, suchthat the (possibly empty) set of prime divisors of b iscontained in S. This rational number is a so-called S-integer (corresponding to the specific set S). Denote byZS the set of S-integers. Clearly, this set ZS is a subringof Q ⊂ R and ZS contains Z. The elements of ZS havethe property that in their denominators, the exponentsof the primes lying in S can be arbitrarily large.In this article we choose R = ZS and we present the

following modification of the protocol of Yosh. The mainidea is that Alice considers r1, . . . , rm ∈ ZS and Bobchooses T ∈ ZS [X] in the step of the construction ofthe polynomial T , that makes the key exchange protocolpossibly more secure, relying on the difficulty of findingsolutions in ZS by random search.Choosing a solution in ZS , we note that it is an easy

task to find a Diophantine equation which vanishes atthis selected solution, but it is not at all easy to find asolution to a given Diophantine equation in S-integers.This is a typical one-way function to make a key ex-change protocol.Our new key exchange protocol is as follows. Alice

and Bob choose a finite set of distinct rational primesS = p1, . . . , pk with a suitable large k. They keep thisset S and proceed as follows.

(i) Alice chooses elements r1, . . . , rm ∈ ZS and con-structs a polynomial Diophantine equation inm un-knowns with coefficients in Z:

f(X1, . . . , Xm) = 0, in X1, . . . , Xm ∈ ZS (2)

such that the m-tuple (r1, . . . , rm) ∈ ZmS is a solu-

tion to the equation (2) (note that the coefficientsof f(X1, . . . , Xm) are in Z).

(ii) Alice keeps the m-tuple (r1, . . . , rm) ∈ ZmS secret,

and sends the polynomial f(X1, . . . , Xm) to Bobvia the unsecured channel. Consequently, the poly-nomial f(X1, . . . , Xm) has to be considered publicknowledge.

(iii) Bob chooses randomly a polynomial in m variablesg(X1, . . . , Xm) ∈ Z[X1, . . . , Xm] and chooses an-other random polynomial function T (X) ∈ ZS [X]such that T : R 7→ R is strictly monotonically in-creasing, namely invertible. Bob then computes thepolynomial

H(X1, . . . , Xm) = T (g(X1, . . . , Xm)) ,

and takes a random element

h(X1, . . . , Xm)

– 86 –

Page 91: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.85–88 Attila Berczes et al.

∈ H(X1, . . . , Xm)

+ f(X1, . . . , Xm) · ZS [X1, . . . , Xm].

(iv) Bob sends g and h to Alice through the unsecuredchannel, but he keeps the polynomials T (X) andH(X1, . . . , Xm) secret.

(v) Alice, in the possession of g and h, computes thevalues s = g(r1, . . . , rm) and u = h(r1, . . . , rm), andsends the element u to Bob through the unsecuredchannel, while she keeps the value s secret.

(vi) Knowing that the polynomial function T : R 7→ Ris strictly monotonically increasing continuous func-tion, indeed bijective, Bob computes the value

s = T−1 (u) ,

which should be the shared secret of Alice and Bob.

To ensure that T : R 7→ R is strictly increasing weneed that dT/dX is positive on R. This is fulfilled if thedegree of T is odd and the coefficients are well chosen.Here we have to mention that in the last step of theprotocol, to compute s one can use the secant methodfor the polynomial T (X) − u, since we know that s isthe only real root of T (X)− u = 0.

Proposition 2 The protocol described in Section 4 iscorrect.

Proof Alice can compute s because she knows g andr1, . . . , rm. As f(r1, . . . , rm) = 0, we have

u = h(r1, . . . , rm) = H(r1, . . . , rm).

Since H(r1, . . . , rm) = T (g(r1, . . . , rm)) = T (s) andT (X) is invertible, we have

s = T−1(u).

Bob can computes s by the secant method (this proof isessentially same as in [3, Proposition 1]).

(QED)

5. Security aspects

We analyze our protocol from mathematical and cryp-tographical point of view. It was proved in 1971 byY. Matijasevic (see [5]) that the solvability of polynomialDiophantine equations in integers, thus in S-integers too,is algorithmically not decidable. Nevertheless, there arealso large classes of Diophantine equations which can besolved by algorithms (see e.g. [6, 7]). However, as in ourprotocol, if a polynomial is constructed with a prescribedsolution, then this solution can be computed in at mostexponential time in the size of the solution.In order to have our protocol efficient enough, we have

to choose the form of f such that its parameters areeasy to compute, when a solution vector is given. Onthe other hand, by Proposition 1, we have to choose fsuch that it is hard to find solutions (r1, . . . , rm) ∈ Zm

S

to the equation

f(X1, . . . , Xm) = 0.

These requirements are obviously contradictory. We ar-gued in [3] that diagonal polynomials may satisfy bothrequirements.

The parameters g, T and r can be chosen randomly,thus h is a random element of the T (g) + fZS . Besidesf also g, h and u are public objects, and the relationh(r1, . . . , rm) = u is public as well. Thus already a pas-sive adversary knows that (r1, . . . , rm) satisfies the “sys-tem” of equations

f(r1, . . . , rm) = 0, h(r1, . . . , rm) = u. (3)

When m > 4, if h is chosen as a random polynomialand f as a diagonal one such that this system defines anon-singular algebraic variety in Rm of codimension 2,we may expect that it is at least similarly hard to findan S-integer solution to (3) as to (1).We have to mention a weak point of our protocol.

The key pairs of public key cryptosystems are stableobjects, they can be used several times. This propertyis used in multiple-user setting such as a client servermodel. However, the public keys in the protocol of Yoshdo not have this property and the polynomial f and itsroots are only for a single action. If Alice would cre-ate with k partners common keys using always the samef and r1, . . . , rm then denoting by h1, . . . , hk the corre-sponding polynomials computed in Step (iii) and settingui = hi(r1, . . . , rm), i = 1, . . . , k the passive adversarywould get k + 1 independent equations

f(r1, . . . , rm) = 0, hi(r1, . . . , rm) = ui, i = 1, . . . , k

for r1, . . . , rm. If k + 1 ≥ m then these determineuniquely r1, . . . , rm. With this respect the protocol ofYosh behaves as a one time pad, consequently, in thepresent form, it cannot be applied in multiple-user set-ting. We should concentrate us on this problem againstmultiple-user setting in our future work.We also point out that there might exist a way to ob-

tain the value g(r1, . . . , rm) = s without precisely know-ing r1, . . . , rm, but only f, h, g and u being given. Aninvestigation about such a possibility is an importantand essential problem, which is to be considered in oursituation.

6. Example

Finally we present an example as follows. Let S :=167, 359, 379. We perform the following steps.

(i) Alice chooses the polynomial f with the followingcoefficients.

f = c1X21 + c2X

52 + c3X

33 + c4X

74 + c5X

45 + c6,

c1 = 4806529705,

c2 = −6205175372,

c3 = 925478963,

c4 = −768530557342240919,

c5 = 1746745227,

c6 = 4946407506070084575251776766468057476

355317931641.

Alice keeps an S-integer solution (r1, r2, r3, r4, r5)

– 87 –

Page 92: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.85–88 Attila Berczes et al.

to f = 0 secret which is

r1 =4747053250

167,

r2 = 17914675,

r3 =1640439652

379,

r4 =9078809

359,

r5 = 3039073006.

Note that for (r1, r2, r3, r4, r5) we have at least oneindex i such that ri ∈ ZS\Z.Actually, Alice first generates randomly the so-lution (r1, . . . , r5) then computes the coefficientsc1, . . . , c6, which ensure f = 0.

In this example, we simply construct f such thatthe terms c1r

21, c2r

52, c3r

33, c4r

74, c5r

45 are all integers.

However, we can construct f such that these termsare not simultaneously all integers but we havef(r1, r2, r3, r4, r5) = 0. This step to construct fmeans solving a linear Diophantine equation, thatcan be done by the generalized Euclidean algorithm(see [8, p. 31]).

(ii) Bob sets

g = 234578− 29879731X2 + 26864732X5

− 48958473X1X2 + 7145266643X23

+ 5537433896X2X4

T = 476538X5 + 703764X4 + 893596X2

+ 31980091X + 43626626,

h ≡ T (g) mod f.

(iii) Alice computes s, u and gets the following result.

s =959693338498943929735558007182951

8611708873,

u =u1u2,

where

u1 = 387935870986922673356671859528440825

048718428727629837317519014456871355

554563200995045873743343613861794426

388290216600683762310234492035907605

503795045038985186057561141,

u2 = 473638174200194432033840670917144409

67619738975593.

(iv) Using the secant method, Bob computes

T−1(u) = s.

The example shows clearly that the new protocol issuperior to the protocol of [3]. In the present examplef has one more variable than in [3]. In both examplesT has five parameters, but in [3] only three parameterswere free, because the other two parameters could as-sume only very small values, while in the present case

all the five parameters are essentially free. They can bearbitrary large satisfying the mild assumption dT/dX ispositive on R. Thus by Section 5, the present exampleis at least as secure as the example in [3].On the other hand the size of the public key of

this example is much smaller than of it of [3]. Indeed,we presented in both cases explicitly the secret keys:r1, . . . , rm, T (H), s, and the public keys: f, g, u. The onlymissing data is h because this polynomial has a long formand we supposed that it might be waste of paper to giveall of the form here explicitly, thus we try to explain itas follows. We computed both examples with MAPLE13, which gave us the size of the internal representationof h, which is a multivariate polynomial. As such ob-jects do not have a canonical representation, the mosthonest way to compare the size of two such polynomialsis to give the size of their internal representation in thesame computer algebra system. The polynomial h of [3]has 2107 terms of form aXn1

1 Xn22 Xn3

3 Xn44 , where a de-

notes an integer and n1, . . . , n4 non-negative integers.Moreover the internal representation in MAPLE 13 haslength 800327. In contrast, the same parameter in thepresent example has only 269 terms and its internal rep-resentation in MAPLE 13 has length 18240.

Acknowledgments

The research was mainly supported by the Fund-ing Program for Next Generation World-Leading Re-searchers (NEXT Program, JSPS), whose grant numberis GR 087. It was also supported by JSPS P12806 for thefourth author, and partially supported by the Universityof Debrecen, by grants K100339 and NK104208 of theHungarian National Foundation for Scientific Research,and by the European Union and the European SocialFund through project Supercomputer, the national vir-tual lab (grant no.: TAMOP-4.2.2.C-11/1/KONV-2012-0010). We are obliged to sincerely thank the referee forhis/her excellent work giving essential advices for ourinvestigation.

References

[1] W. Diffie and M. Hellman, New direction in cryptography,

IEEE Trans. Inform. Theory, 22 (1976), 644–654.[2] H. Yosh, The key exchange cryptosystem used with higher

order Diophantine equations, IJNSA Journal, 3 (2011), 43–50.

[3] N. Hirata-Kohno and A. Petho, On a key exchange protocolbased on Diophantine equations, Infocommunications Jour-nal, 5 (2013), 17–21.

[4] M. Mignotte, Mathematics for Computer Algebra, Springer-Verlag, New York, 1992.

[5] M. Davis, Y. Matijasevic and J. Robinson, Hilbert’s tenthproblem, Diophantine equations: positive aspects of a nega-

tive solution, in: Proc. of Mathematical Developments Arisingfrom Hilbert Problems, F.E.Browder ed., Proc. Sympos.PureMath., Vol. 28, pp. 323–378, AMS, Providence, R.I., 1976.

[6] A. Baker, Transcendental Number Theory, Cambridge Univ.

Press, Cambridge, 1975.[7] J. H. Silverman, The Arithmetic of Elliptic Curves, 2nd ed.,

Graduate Texts in Mathematics, Vol. 106, Springer-Verlag,New York, 2009.

[8] L. J. Mordell, Diophantine Equations, Pure and AppliedMathematics, Vol. 30, Academic Press, New York, 1969.

– 88 –

Page 93: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol.6 (2014) pp.89–92 c⃝2014 Japan Society for Industrial and Applied Mathematics J S I A MLetters

Shape optimization of a rubber bushing

Kouhei Shintani1 and Hideyuki Azegami1

1 Nagoya University, A4-2 (780) Furo-cho, Chikusa-ku, Nagoya 464-8601, Japan

E-mail shintani az.cs.is.nagoya-u.ac.jp

Received March 18, 2014, Accepted October 22, 2014

Abstract

The present paper describes a solution to a non-parametric shape optimization problem ofa rubber bushing in order to adjust a function of the reaction force with respect to staticdisplacement to a desired function. The main problem is defined as a static hyperelasticproblem considering a large deformation and a non-linear constitutive equation. The squarederror norm of the work done by compulsory displacement and the volume are chosen as costfunctions. The shape derivatives of the cost functions are derived theoretically. An iterativealgorithm based on the H1 gradient method is used to solve the shape optimization problem.

Keywords hyperelastic problem, shape optimization, H1 gradient method

Research Activity Group Mathematical Design

1. Introduction

The rubber bushing is used as a vibration isolator invehicle suspension systems in order to prevent the vibra-tion of an engine or the tire from transferring into theguest room. The rubber bushing has been modeled as alargely deforming hyperelastic continuum and followinga non-linear constitutive equation. Many equations havebeen proposed for the constitutive equation using non-linear elastic potentials [1]. Numerical analyses of therubber bushing using the finite element method havebeen reported [2, 3].Moreover, numerical solutions to parametric shape op-

timization problems of the rubber bushing have beenpresented [4, 5]. In these studies, in order to adjust afunction of the reaction force with respect to static dis-placement to a desired function, a squared error normof the reaction force function has been chosen as a costfunction.In the present paper, we present the solution to the

non-parametric shape optimization problem of a rubberbushing. Domain variation from an initial domain is cho-sen as a design variable. The main problem, which werefer to as a boundary value problem of a partial dif-ferential equation in which the domain is defined as adesign variable, is formulated as a hyperelastic problemconsidering large deformation and a non-linear consti-tutive equation. We choose a squared error norm of thework done by compulsory displacement as an objectivefunction and the volume as a constraint function. Theshape derivatives of the cost functions are derived theo-retically following the standard procedure using the H1

gradient method [6], but the geometrical and materialnon-linearities are considered in the present paper.

2. Admissible set of design variables

First, let us define the admissible set of design vari-ables for the shape optimization problem. Let Ω0 ⊂ Rd

be a d ∈ 2, 3-dimensional domain with a Lipschitz

boundary, which is denoted by ∂Ω0. On ∂Ω0, ΓD0 ⊂ ∂Ω0

and ΓN0 = ∂Ω0 \ ΓD0 (ΓD0 = ΓD0 ∪ ∂ΓD0) denotethe Dirichlet boundary and the homogeneous Neumannboundary, respectively.We assume that Ω0 is fixed and that the domain is

created by continuous one-to-one mapping i+ϕ : Ω0 →Rd as Ω (ϕ) = (i+ ϕ) (x)|x ∈ Ω0, where i is used asthe identity mapping. In the same manner, the notation(·) (ϕ) is used as (i+ ϕ) (x)|x ∈ (·)0 in the presentpaper. In order to define the Frechet derivatives withrespect to domain variation, we use

X =ϕ ∈ H1

(Rd;Rd

)∣∣ϕ = 0Rd on ΓD0

(1)

as the Banach space for ϕ. In (1), the domain of ϕ isextended to Rd by Calderon’s extension theorem. More-over, in order to maintain the continuous one-to-onemapping property, we define the admissible set of ϕ as

D = ϕ ∈ X ∩ Y |∥ϕ∥Y < σ , (2)

where Y is defined by W 1,∞ (Rd;Rd), and σ > 0 is cho-

sen such that ϕ is a bijection.

3. Main problem

For ϕ ∈ D, let us define the main problem. Let(0, tT) ⊂ R be a time domain with a positive constanttT, and let uD : (0, tT)×Rd → Rd be a given function de-noting a quasi-static compulsion displacement, the mag-nitude of which increases monotonically with respect tot ∈ (0, tT) at all x ∈ Rd.Let u : (0, tT)×Rd → Rd be a displacement obtained

as a solution to a hyperelastic problem shown later inProblem 1 (refer for example [7]). In order to constructthis problem, we need to define the constitutive equationof the hyperelastic continuum. Let y = i+ u : (0, tT)×Rd → Rd be the mapping for the large deformation, and

F (u) =

(∂yi∂xj

)ij

= I +

(∂ui∂xj

)ij

(3)

– 89 –

Page 94: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.89–92 Kouhei Shintani et al.

be the deformation gradient tensor, where I denotesthe unit matrix of d-th order. Using the definition, theGreen-Lagrange strain is defined as

E (u) =1

2

(FT (u)F (u)− I

)= EL (u) +

1

2EBL (u,u) , (4)

where

EL (u) =1

2

(∂ui∂xj

+∂uj∂xi

)ij

,

EBL (u,v) =1

2

∑k∈1,...,d

∂uk∂xi

∂vk∂xj

ij

.

The constitutive equation for hyperelastic material is de-fined by assuming the existence of a nonlinear elasticpotential π : Rd×d → R that gives the second Piola-Kirchhoff stress tensor as

S (u) =∂π (E (u))

∂E (u)=D (E (u))E (u)

=

∑(k,l)∈1,...,d2

dijkl (E (u)) ekl (u)

ij

. (5)

Here, D (E (u)) is the stiffness. For π, in the presentstudy, we use the Yeoh model given as

π (E (u)) = e1 (i1 (u)− 3) + e2 (i1 (u)− 3)2

+ e3 (i1 (u)− 3)3+

1

d1(i3 (u)− 1)

2

+1

d2(i3 (u)− 1)

4+

1

d3(i3 (u)− 1)

6,

where e1, e2, e3, d1, d2 and d3 denote material parame-ters, i1 (u) and i3 (u) denote the first and third invari-ants defined by

i1 (u) = i−2/33 (u)

(c21 (u) + c22 (u) + c23 (u)

),

i3 (u) = detF (u) ,

and c1 (u) , c2 (u) and c3 (u) are the principal valuesof the right Cauchy-Green deformation tensor C (u) =FT (u)F (u) = 2E (u) + I.Using (5) as the constitutive equation, the hyperelas-

tic problem can be defined using the first Piola-Kirchhoffstress tensor defined by

Π (u) = S (u)FT (u) .

In the present study, ν denotes the outer unit normal onthe boundary.

Problem 1 (Hyperelastic problem) For ϕ ∈ Dand t ∈ (0, tT), let uD (t) : Rd → Rd be a given function.Find u (t) : Ω (ϕ)→ Rd such that

−∇TΠ (u (t)) = 0TRd in Ω (ϕ) ,

ΠT (u (t))ν = 0Rd on ΓN (ϕ) ,

u (t) = uD (t) on ΓD0.

If uD (t) is given appropriately, for the weak solution

u (t) to Problem 1, u (t) = u (t)− uD (t) lies within

U =u ∈ H1

(Rd;Rd

)∣∣u = 0Rd on ΓD0

, (6)

since the domain of u (t) can be extended to Rd byCalderon’s extension theorem. Moreover, in the presentpaper, we define the admissible set of u (t) by

S = U ∩W 2,4q(Rd;Rd

)(7)

for q > d, in order to obtain the domain variation in Ywithout singular points by the H1 gradient method [6].For the simplicity, u (t) is denoted by u (t) or u, anduD (t) is denoted by uD from here.For later use, we define the Lagrange function for

Problem 1 as

LM (ϕ,u,v) =

∫Ω(ϕ)

∇TΠ (u)vdx

+

∫ΓD0

[(u− uD) ·

(ΠT (v)ν

)+v ·

(ΠT (u)ν

)]dγ, (8)

where v ∈ U is introduced as the Lagrange multiplier.Here, the second term on the right-hand side of (8),which is assumed to be zero based on the Dirichlet con-ditions, was added for use later herein [6]. The first termon the right-hand side of (8) can be rewritten as∫

Ω(ϕ)

∇TΠ (u)vdx

=

∫Ω(ϕ)

[∇ · (Π (u)v)−Π (u) ·

(∇vT

)]dx

=

∫∂Ω(ϕ)

(Π (u)v) · νdγ

−∫Ω(ϕ)

Π (u) · F ′T (u) [v] dx,

where F ′ (u) [v] = ∂v/∂xT, and g·h denotes the scalarproduct. Moreover, considering S (u) = ST (u),

−∫Ω(ϕ)

Π (u) · F ′T (u) [v] dx

= −∫Ω(ϕ)

(S (u)FT (u)

)· F ′T (u) [v] dx

= −∫Ω(ϕ)

S (u) ·E′ (u) [v] dx (9)

holds, where

E′ (u) [v] =1

2

[F ′T (u) [v]F (u) + FT (u)F ′ (u) [v]

]= EL (v) +EBL (u,v) .

Then, using (9), (8) can be rewritten as

LM (ϕ,u,v) = −∫Ω(ϕ)

S (u) ·E′ (u) [v] dx

+

∫ΓD0

[(u− uD) ·

(ΠT (v)ν

)+v ·

(ΠT (u)ν

)]dγ. (10)

– 90 –

Page 95: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.89–92 Kouhei Shintani et al.

If u is the solution to Problem 1,

LM (ϕ,u,v) = 0 (11)

holds for all v ∈ U . Then, (11) agrees with the weakform of Problem 1.

4. Shape optimization problem

Using u, we define the shape optimization problem asfollows. Let α1, . . . , αm be the constants denoting the

desired value of uD ·(ΠT (u)ν

)at t1, . . . , tm ∈ (0, tT],

respectively. In the present paper, we defined

f0 (ϕ,u) =∑

i∈1,...,m

f0i (ϕ,u (ti)) (12)

as the objective cost function, where

f0i (ϕ,u (ti)) =

∫ΓD0

∣∣uD (ti) ·(ΠT (u (ti))ν

)− αi

∣∣2dγ.Moreover, we define

f1 (ϕ) =

∫Ω(ϕ)

dx− c1 (13)

as a constraint cost function, where c1 is a positive con-stant for which there exists ϕ ∈ D such that f1 (ϕ) ≤ 0.Using these cost functions, we construct the following

shape optimization problem.

Problem 2 (Squared error norm minimization)Let f0 (ϕ,u) and f1 (ϕ) be defined as in (12) and (13),respectively. Find ϕ such that

minϕ∈D

f0 (ϕ,u) |f1 (ϕ) ≤ 0,

u (t) ∈ S, t ∈ (0, tT) ,Problem 1.

5. Shape derivative of the cost functions

In order to solve Problem 2 by the gradient method,the Frechet derivatives of f0 and f1 with respect to do-main variation, which we refer to as the shape derivative,are required. Let φ ∈ X be the domain variation from ϕ.If there exist g0 and g1 such that f ′0 (ϕ,u) [φ] = ⟨g0,φ⟩and f ′1 (ϕ) [φ] = ⟨g1,φ⟩ for all φ ∈ X, we refer to g0and g1 as the shape derivatives of f0 and f1, respectively.Here, ⟨·, ·⟩ denotes the dual product.Since f0 is a functional of u, g0 is obtained as follows

using the Lagrange multiplier method. We define

L0 (ϕ,u,v01, . . . ,v0m)

=∑

i∈1,...,m

L0i (ϕ,u (ti) ,v0i)

=∑

i∈1,...,m

(f0i (ϕ,u (ti)) + LM (ϕ,u (ti) ,v0i))

as the Lagrangian for f0, where v0i is introduced as theLagrange multipliers for Problem 1 at t = ti such that

v0i = v0i + 2[uD (ti) ·

(ΠT (u (ti))ν

)− αi

]uD (ti) ∈

U . The shape derivative of L0i can be written as

L ′0i (ϕ,u (ti) ,v0i) [φ,u

∗ (ti) ,v∗0i]

= L0iϕ (ϕ,u (ti) ,v0i) [φ]

+ L0iu(ti) (ϕ,u (ti) ,v0i) [u∗ (ti)]

+ L0iv0i (ϕ,u (ti) ,v0i) [v∗0i] , (14)

where u∗ (ti) ∈ U and v∗0i ∈ U are the partial shapederivatives of u (ti) and v0i, respectively [6].Here, if u is the solution of Problem 1, the third term

on the right-hand side of (14) becomes 0. The secondterm on the right-hand side of (14) becomes

L0iu(ti) (ϕ,u (ti) ,v0i) [u∗ (ti)]

= −∫Ω(ϕ)

(S′ (u (ti)) [u

∗ (ti)] ·E′ (u (ti)) [v0i]

+ S (u (ti)) ·E′′ (u (ti)) [v0i,u∗ (ti)]

)dx

+

∫ΓD0

u∗ (ti) ·

(ΠT (v0i)ν

)+[v0i + 2

(uD (ti) ·

(ΠT (u (ti))ν

)− αi

)uD (ti)

]·(Π′T (u (ti)) [u

∗ (ti)]ν)

dγ, (15)

where

S′ (u) [v] =D (E (u))E′ (u) [v] ,

E′′ (u) [v,w] = EBL (v,w) ,

Π′ (u) [v] = S′ (u) [v]FT (u) + S (u)F ′T (u) [v] .

If we use the same relation used in (9), (15) becomes

−∫Ω(ϕ)

(S′ (u (ti)) [u

∗ (ti)] ·E′ (u (ti)) [v0i]

+ S (u (ti)) ·E′′ (u (ti)) [v0i,u∗ (ti)]

)dx

= −∫Ω(ϕ)

Π′ (u (ti)) [v0i] · F ′T (u (ti)) [u∗ (ti)] dx.

Moreover, assuming the relations u∗ (ti) = 0Rd on ΓD0

and Π′T (u (ti)) [v0i]ν = 0Rd on ΓN (ϕ), we have∫∂Ω(ϕ)

(Π′ (u (ti)) [v0i]u

∗ (ti))· νdγ

−∫Ω(ϕ)

Π′ (u (ti)) [v0i] · F ′T (u (ti)) [u∗ (ti)] dx

=

∫Ω(ϕ)

∇TΠ′ (u (ti)) [v0i]u∗ (ti) dx.

From the above relations, (15) can be rewritten as

L0iu(ti) (ϕ,u (ti) ,v0i) [u∗ (ti)]

=

∫Ω(ϕ)

∇TΠ′ (u (ti)) [v0i]u∗ (ti) dx

+

∫ΓD0

u∗ (ti) ·

(ΠT (v0i)ν

)+[v0i + 2

(uD (ti) ·

(ΠT (u (ti))ν

)− αi

)uD (ti)

]·(Π′T (u (ti)) [u

∗ (ti)]ν)

for all u∗ (ti) such that u∗ (ti) = 0Rd on ΓD0. Then, (15)becomes 0 if v0i is the solution of the following adjointproblem.

– 91 –

Page 96: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol. 6 (2014) pp.89–92 Kouhei Shintani et al.

Problem 3 (Adjoint problem for f0) Let u (ti) bethe solution of Problem 1. Find v0i : Ω (ϕ) → Rd suchthat

−∇TΠ′ (u (ti)) [v0i] = 0TRd in Ω(ϕ) ,

Π′T (u (ti)) [v0i]ν = 0Rd on ΓN (ϕ) ,

v0i = −2(uD(ti) ·

(ΠT(u(ti))ν

)− αi

)uD(ti) on ΓD0.

In order to obtain the domain variation in Y withoutsingular points by the H1 gradient method, v0i ∈ S isrequired [6].Let u (ti) and v0i be solutions of Problem 1 and Prob-

lem 3, respectively. Then, (14) becomes

L0iϕ (ϕ,u (ti) ,v0i) [φ] = f ′0i (ϕ,u) [φ]

=

∫ΓN(ϕ)

g0iN ·φdγ = ⟨g0i,φ⟩ , (16)

where

g0iN = −S (u (ti)) ·E′ (u (ti)) [v0i]ν.

For f0, we have

f ′0 (ϕ,u) [φ] =∑

i∈1,...,m

⟨g0i,φ⟩ = ⟨g0,φ⟩ . (17)

Moreover, for the shape derivative of f1, we have

f ′1 (ϕ) [φ] =

∫ΓN(ϕ)

ν ·φdγ = ⟨g1,φ⟩ . (18)

6. Solution

The algorithm for solving Problem 2 can be shownbased on the sequential quadratic programming [6]. Inthis algorithm, the H1 gradient method is used for re-shaping with shape derivatives g0 and g1 in (17) and(18), respectively.

7. Numerical example

We developed a computer program to solve Problem 2.In the program, a commercial finite element program,Abaqus 6.9 (Dassault Systemes), is used to solve Prob-lem 1 and Problem 3. Moreover, OPTISHAPE-TS 2011(Quint Corporation) is used to solve the boundary valueproblem in the H1 gradient method.Fig. 1(a) shows a finite element model of the rubber

bushing used as an example. The diameter of the outercylinder is 50.0 [mm]. The outer and inner cylinders areassumed to be the homogeneous and non-homogeneousDirichlet boundaries, respectively. The nodes on the in-ner cylinder are connected with a rigid element. Thearrow of uD shows the compulsory displacement of therigid element, the magnitude of which is 5.0 [mm]. Forf0, we assume that m = 3, and ∥uD (t1)∥ , ∥uD (t2)∥,∥uD (t3)∥ = 2.5, 3.75, 5.0 [mm]. For α1, α2 and α3,we use a 10% decrease, no change, and a 10% increase

for the values of uD ·(ΠT (u)ν

)at t = t1, t2 and t3,

respectively.Fig. 1(b) shows the optimum shape obtained by the

developed program. The reaction force of the rigid ele-ment defined by ∥

∫ΓD0

ΠT (u (t))νdγ∥ with respect to

¡D0

¡D0 uD

(a) Initial (b) OptimizedFig. 1. Finite element models of simple rubber bushings.

0

200

400

600

800

1000

1200

1 2 3 4 5

Displacement [mm]

Rea

ctio

n forc

e [N

]

0

Initial Sape

Optimal Sape

(a) Reaction force functions

0

20

40

60

80

100

4 8 12 16 20 24 28

f0/f0 init

1f1/c1

Itaration number of reshaping

Cos

t fu

nct

ions

0

(b) Iteration historiesFig. 2. Graphs for shape optimization analysis.

(a) Initial (b) Optimal

0.0Mpa

4.0Mpa

Fig. 3. Initial and optimized von Mises stresses.

compulsory displacement ∥uD (t)∥ is shown in Fig. 2(a).Fig. 2(b) shows the iteration histories of the cost func-tions with respect to the number of reshapings, wheref0init and c1 denote the values of f0 and the volume,respectively, for the initial shape.Based on these results, f0 decreases monotonically un-

der the constraint of f1 (Fig. 2(b)), and the desired re-action force function is obtained (Fig. 2(a)).In addition, Fig. 3 shows the distributions of the von

Mises stress of the initial and optimum shapes at t = t3.The results confirm that as the result of increasing thereaction force at t = t3, the von Mises stress in theoptimum shape increases.

References

[1] G. Marckmann and E. Verron, Comparison of hyperelastic

models for rubber-like materials, Rubber Chem. Technol., 79(2006), 835–858.

[2] S. Wenbin, L. H. Zhen and S. Jianjun, Finite element anal-

ysis of static elastic characteristics of the rubber isolators inautomotive dynamic systems, Technical Report 2003-01-0240,2003.

[3] L.R.Wang, Z.H.Lu and I.Hagiwara, Finite element simulation

of the static characteristics of a vehicle rubber mount, Journalof Automobile Engineering, 216 (2002), 965–973.

[4] J. J. Kim and H. Y. Kim, Shape design of an engine mountby a method of parameter optimization, Comput. Struct., 65

(1997), 725–731.[5] Q. Li, J. Zhao, B. Zhao and X. Zhu, Parameter optimization

of rubber mounts based on finite element analysis and ge-netic neural network, J. Macromol. Sci., Pure Appl. Chem.,

46 (2008), 186–192.[6] H. Azegami, Regularized solution to shape optimization prob-

lem (in Japanese), Trans. JSIAM, 23 (2014), 83–138.[7] J. Bonet and R. D.Wood, Nonlinear Continuum Mechanics for

Finite Element Analysis, Cambridge University Press, Cam-bridge, 1997.

– 92 –

Page 97: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and

JSIAM Letters Vol.6 (2014)

ISBN : 978-4-9905076-5-7

ISSN : 1883-0609

©2014 The Japan Society for Industrial and Applied Mathematics

Publisher :

The Japan Society for Industrial and Applied Mathematics

4F, Nihon Gakkai Center Building

2-4-16, Yayoi, Bunkyo-ku, Tokyo, 113-0032 Japan

tel. +81-3-5684-8649 / fax. +81-3-5684-8663

Page 98: The Japan Society for Industrial and Applied Mathematicsjsiaml.jsiam.org/ebooks/JSIAMLetters_vol6-2014.pdfJSIAM Letters Vol.6 (2014) pp.1{4 ⃝c 2014 Japan Society for Industrial and