1SE-TRd-90-86
On the Efficiency of an SOR-like Method Suited to Vector Processor
by
Masaaki Sugihara, Yoshio Oyanagi, Masatake Mori and Seiji Fujino
October 4, 1990
Qn the Efiiciency of an SOR-like Method Suited to
Vector Processor *
Masaaki Sugihara
Depαrtment of EconomiC8, Hitotsiめashi Univers吻
Ifunitachi, Tokyo, 186 Japan
Yoshio Oyanagi
Institzeteげlnformαtion Sciences, Univers吻。!TSttkubα
Tsukuba,乃a rα ki,305」’αPαn
Masatake Mori
Fac吻・!恥卿εε7吻, Unive7’吻・ノT・ky・
Bu吻。一紘Tokyo, 113 」αpan
Seij i Fuj ino
Institute of Computational 171uid Dynamz’cs
Haramachi 2-1-4, Meguro-ku, Tokyo, 152 Japan
October 199. 0
Abstract
A simple implementation on vector processors of the SOR method for solving linear
system of equations ari’sing’ from’the discretization of partial differential equations makes
the SOR method into another, which, though, looks like the SOR met・hod, lii this
paper, the efficiency of this SOR-like method (pseudo-SOR method) is investiga,tcd,
“lnvited talk presented by Oyanagi at lnternatio7}al Congress en Computational an(1 A;)p17ied A・fathematics
held at Katholieke Universiteit Leuven, Belgium, on July 23-28, 1990.
1
For the Poisson equation in a rectangular region, in both five-point and nine-point
discretization, we plove a皿alytica皿y that the optimal acceleratio皿1)arameter is smaller
and the optimal convergence rate is lower in the pseudo-SOR method. Comparison
on several vector computers was made between the SOR method vectorized with the
hyperplane technique and the pseudoSOR method. lt turned out that the hyperplane
SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on
vector processors. We also tested an example of Poisson equation descretized in a curved
coordinates in a region bounded by two eccentric circles and found numerically that
the optimal acceleration parameter becomes small and the convergence becomes slow in
the pseudo-SOR method, while the optimal acceleration parameter in the SOR method
does not change appreciably. The genuin SOR method is superior to the pseudo-SOR
method.
1 Introduction
Various vectorization techniques have been proposed to improve the computational speed
on vector processors. The essential ingredients are to remove recursion in loops and to make
the vector length as long ag possible. We will consider the vectorization (>f the SOR(Successive
OverRelaxation) method for two-dimentional partial differential equations.
The following discussion holds for general linear second-order elliptic partial differential
equations. For simplicity, however, we will start with the Poisson equation in a rect angular
re 9i・nの={@,y)10<x<1,0<y<1},
△一1舞+1舞一・・nD,姻一ψ(ゆn∂P・ (1)
Discretization of the second derivatives of (1) by five-point or nine-point finite differences
on an (N 十 1) × (N 十 1)一grid with grid spacing h = 1/N leads to a set of linear equations:
σ臼,ゴ+σi+、,ゴ+σらゴー、+σらゴ+一4Ui,ゴ=’ O (2)
or
4(Ui-i,」 + Ui+i,」 + Ui,」一i + Ui,」+i) + Ui+i,」一i + Ui+i,」+i + Ui-i,」一i + Ui-i,」+i 一一 20Uif一一・一一 O (3)
for i,2’ 一一“= 1,2,...,N 一 1, which form a system of linear equations for t,he solution vector U.
2
Eqs.(2) and (3) can be solved with the polular SOR method. We vvill show here a program
for the five-point case:
DO 10 J=1, N-1
DO 20 1=1, N-1
U(1,J)=U(1,J)+OMEGA*(O.25*(U(1,J-i)+U(1,J+1)+U(1-1,J)+U(1+1,J))一U(1,J))
20 CONTINUE10 CONTINUE
Here, OMEGA(w) is the aceeleration parameter. Running with i and 2’ from 1 to IV 一一 1, two
neighboring grid points of (i,1’), (i 一 1,」’) and (i,2’ 一一 1) have already been updated, so that
Ui,」 i computed with these new values and old values of Ui+i,」 and Ui,」t+i. The SOR method,
therefore, is recursive in both i一 and 2Ldirections. This data dependency makes this loop
hard to be vectorized.
There are some well-known techniques to vectorize the SOR method [1]. We will consider
here a so-ca皿ed hyperplane method[2], which essentially reorders the numbering of the lattice
points without changing the algorithm. The program will be as follows:
DO 10 K=2, 2*(N-1)
*VOPTION NODEP(U)
DO 20 J=MAX(1,K+1・一N), MIN(N-1,K-1)
工嵩K-J
U(1,J)=U(1,J)+ONEGA*(O.25*(U(1,J-1)+U(Z,J+1)+U(r-1,J)+U(1+1,J))rU(1,J))
20 CONTIN’UE10 CONT工]EfUE
This program is methematica皿y equivalent to the SOR method. The inner loop has no
recursion and the access to the array U is with only constant strides, Howe’ 魔?s the hyperplane
SOR method involves short average vector length (N 一 1)2/(21V 一一 3) f」 N/2 and ordinary
vectorizing compilers cannot recognize that there are no data dePendencies, so that we have
to insert a directive, *VOPTION in this example, to the compiler. For some vector computers,
access with stride more than one may cause a decrease in the data transfer rate from memory
to vector registers and shorter vector length may increase relative start-up time. Nine-point
discretization can also be processed by a hyperplane method, but with average vector length
fu N/3.
A more naive implementation of this loop on a vector processor is to modify the program
as
3
DO 10 J=1, N-1
DO 20 1=1, N-1
V(1)=U(工,J) + OMEGA*(0.25*(U(1,J-1)+U(1,J+1)+U(1-1,J)+U(工+1,J))一U(1,J))
20 CONTIN’UE DO 30 工=1, N-1
30 U(1,J)=V(1)
tO CONTINUE
In this program七he data dependency in the SOR method is removed so that the loops
can be successfully vectorized. The program involves only stride one access to the memory.
Obviously this is a different algorithm, since Ui-i.・ on the right hand side comes from the old
i七eration Ievel in contrast to the SOR method. We will call it the pseu(1(”SOR method. The
pseudo-SOR method is sometimes used as a simple vectori zation technique in computational
fluid dynamics.
In the next section, the optimal acceleration parameter w and the rate of converngence of
the two methods for five point discretization is discussed. Similar arguments for nine-point
discretization are given in section 3, Numerical results on various vector processors are shown
in section 4, inc}uding one for curvilinear coordinates. ln section 5, we will conclude that the
hyperplane SOR is superior to the pseudo-SOR method.
2 Comparison of SOR and pseudo-SOR Methods for
Five-Point Discretization
2.1 Matrix Formulation
Equation(2) forms a system of liner equations with matrix As for the solution vector u. If
we order the elements in the solution vector lexicographically as
(Ul,1)Ul,2;…7Ul,N一一1)112,1)U2,2)…)U2,N一一1)..・)UN-1,N-1))
the coefficient matrix A.r is,
4
with
Bs =
1
-1 4
o
1-m一@4
1
As :
1
4
一 一 一
Bs
Cs
o
o
一一一 獅P
oBs Cs
一 一 一 “ 一 一
一 一 一 一 一 一
Cs Bs Cs
Cs Bs
一1
4
,Cs =
o
’
1
4
1
4
o
1
4
The iteration in the SOR method is written as usual,
u(m+i) = lcgORu(m) + b’ = (1 一一 cvL,SOR)一’((1 一 c,7)1 + cvU,SOR)u(M) + b’
where
and
SORL,Dv. =
一.Bま
一〇、一瑳
o
o
一C, 一B,L
,
B,L =
σ『o丑=(五ぎOE)T.
o
一一 1
4
o
o
o
一1 0 4
On the other hand, the iteration in七he pseudo-SOR method is forlnulated
u(m+i) = ,ce,一SORu(m) + bt ,., (1 . cdLe,一SOR)一i((1 一 co)1 + cou,p’SOR),t,(m) + btt
where
o O
-C, O
.Lg-SOR,=1 …,.. 1,
0 一C,0
5
’
(4)
(5)
(6)
(7)
(8)
(9)
(10)
and
(一B,L-B,U) 一Cs O (一B,L-B,U) 一Cs
u,p-SOR=1 ... ... 1, (11) (一瑠一瑠) 一〇5
0 (一 B,L-B,U)
with B,U 一一一: (B,L)T.
2.2 Acceleration Parameter w and the Spectral Radius
The order of convergence of Eq.(6) and Eq.(9) is determined by the spectral radius of LsSOR
and Lg’一SOR, respectively, As an illustration, p(LgOR) and p(Lg-SOR) are numerically esti-
mated for IV = 6 and shown in Fig. 1. The result shows:
1.The optimal accelelation parameterω。pt f()I the pseudo。SOR method is substantia皿y
smaller than that for the SOR method.
2. The rate of convergence of the pseudo-SOR method using the optimal acceleration
parameter is worse than that of the SOR method.
3. The region of the acceleration parameter which ensures the convergence of the iteration
is narrower for the pseud()一SOR method.
In the next subsection, we will generalize the results to arbitrary N.
2.3 Analytic Determination of the Optimal Acceleration Param-
eter of the SOR and the pseudo-SOR Methods
The optimal acceleration parameter of the SOR method for our case and the correspon(ling
spectra} radius is given by Varga [3]:
cvopt = ir一 Fgtllk’lki;ii’in .z1.),Aopt =p(LgOR(cvopt)) = cc一,.pt 一一 i. (i2)
Asymptotically, c-o.,t =2 一T 3i’li + o( 1N2), A.t =i一 3i’; + o( ?」}i,T)・ (i3)
6
On the other hand, we will later show that the optimal acceleration parameter for the
pseudo-SOR method w.pt and the corresponding spectral radius A.pt are given as the solution
of the following equation with respect to w and A:
憶鰍潜醐 (14)
The asymptotic estimates are:
,..,, = g一 g(.z[一)2+ o(lieT, ), A.,, =1 一一一 (ft)2 + o(7(!:,r). (ls)
We note that the optimal spectral radius A.pt approaches unity in O(N-i) in the SOR
method, while in 0(IV-2) in the pseudo-SOR method. The observation for IV == 6 given in
the previous subsection also applies to large N.
In order to derive Eq.(14), we will find all the eigenvalues of Cg-SOR by solving t he secular
equation. We note t he following equality holds:
det(cg-SOR(c,7) 一一 Ai)
= det((1 一 cvLg-SOR)一i.((1 一 co)1 十 cvU,PtaSOR) 一一 AJ)
=(det(1-cv L9曙SOR))一1 det((レω一λ)1+λcvLg-SOR+ωσぎ一SOR),
(16)
We have only to solve the equation
det((1 一 c,7 一一 A)1 + AcvLg-SOR + co u,p-SOR) = o. (17)
The matrix in the parenthesis is a block triangular matrix:
E F
A.F E lii
- e 一 一 一
o
o
AF E
N 一一 1 blocks
,
7
with
E=
1一一 cL2 一A
里 4
o
里 4
1一 cv 一A
一 一 一
里4
一 一 一
!
4
o
1一ω一λ 景
¥ 1一一cv一一一A 4
N 一一 1 elements
wand F= =1. 4
We will show the following lemma.
エemma 1ゐθ田αnd・C be squαre mαtrices・∫・rder・n,抗θノ・〃・ω吻iden鞠ん・lds・
det
B C
AC B
一 一 一
o
C
一 一 一
AC
o
B C
AC B
: det
Proof Since the first equality is self-evident
matrix of order m with
A=
B V‘]iCC
>仮O B vCXc
一 一 一 { 一 一
viXc
o
o
B vi;Slc
vi51C B
m blocks
一壷d・t(B卿),一一2…(m律1)・(18)
, we will prove the second one. The following
null dia,gonal elements is diagonalized by an orthogonal matrix P.
i O 1 l l 62 ... .” 1 -pl .” Ip-i. (lg)
1 0 1 1 1 6m-i
Replacing ea・h element aiゴwith a bl・・k曜rσand adding B t・the diag・nal b1・cks, we
8
have the folowing relation between blocked matrices of ordel nm.
β〉「AC O B+δ1>『o ’ 0 ~/XO jB ~/ヲ(C 1ヲ十δ2 v!天0
.., ... =戸 ... 戸一1
,
● . ● o ● ● ・ ● 0
0 ~バoB O B+δm>ro (20)
ガwhere」P is a matrix of order nm extended from、P, where p‘ゴis replaced with a block, jpiゴ1,、.
Estimating the determinants on both sides, we obtain Eq.(18). Q.E,D.
We applyもhis lemma to Eq.(17)and丘nd that the equation is equivalent to the system
of equations:
d・t(E+δゴ〉伐∬)一・,δゴー2…(影)
forゴ=1,_,1V-1. Si皿ce」醒十6ゴ〉「A∬is again a七ridiagonal matrix, applying this lemma
again, we丘nd that Eq.(17)is equivalent to the sysmtem of equations forλ:
1一(1+生 4)ω一λ一害孤ω(ゴー1,…,N-1;k・・= 1,...㌍)(21)
The following proposition holds:
Proposition l Theαろsolute vαlues o/th e solutions o/th e egzzαtion(21?is smα〃eT thαn or
eguαl to
・The 1αrgest abs・lute vαlue・f the s・lutions for 1一(1一圭。・s(k))ω一λ=一去。・s(持)V伐ω
07’
・1+圭。・s(T))ω一1・
If this proposition is proved, it is straiglltforward to show thatω。pt is given as the solution
of Eq.(15)、
Pr・gf Eq・(21)・an b・ ・xpressgd a・ th・fa血1y・f quad・atic equati・・曲rλ,
1一αω一λ =士β〉「A,withα〉β≧0. (22)
ω
We will first discuss the case forβ.>0. Figure 2 sllows that Eq.(22)has two real solutiolls
for O<ω≦ω1 and two complex solutions forω1<ωwith the absohlte valueαω一1, where
9
ω1=2/(α十〉両).For real solu七ions, the larger one monotonicaHy decreases asω
increases and the absolute value of complex solutions increases as w increases.
It c an easily be understood that the absolute value of the complex solution of Eq.(22) is
smaller than or equal t o (1 十 t2 cos(ft))w 一 1, where 2 cos(ft) is the maximum value of 6k.
Let us consider the re al solutions. Figure 3 shows the curves for two sets of equations
with the same a and different fi, fi2 〉 fii. lf the equation with fii has real solutions, the
equation with fi2 has also real solutiohs whose larger value is greater than that for 6i. Figure・
4 shows the curves for two sets of equations with the same fi and different a, or2 〉 ori. It
can easily be seen that, if the equation with a2 has real solutions, the equation with ai has
also real solutions whose larger value is greater than that for a2.
From these observations the absolute value of the real solution for Eq.(22) is always
smaller than or equal to the larger solution for Eq.(22) with the smallest or and the largest
6. The real solution of Eq.(21) is no greater than the larger solution of
1 一一 (1 一 icos(ft))cL, 一一 A = 一icos(ft-v/]Xcv).
This completes the proof for fi 〉 O.
If 6 = O, Eq.(22) is a linear equation. The same arguments hold if real and complex
solutions are understood as positive and negative solutions, respectively. Q.E.D.
The numerical results of w.pt and A.pt are shown in Table. 1.
3 Comparison of SOR and pseudo-SOR Methods for
Nine-Point Discretization
In this section we wi皿analyse the SOR and pseudo-SOR method for the nine-point
discretization of Poisson equation, Eq,(3). In the SOR method, Ui,,・ i s computed with four
new values Ui-i,」’, Ui-i,」’一i, Ui,」’.i and Ui,」+i, while in the pseudQSOR method three new
values Ui-i“・一i, Ui“’一i and Ui,」+i are used, so that we expect the degradation in using pseude
SOR method may not be so large as in the five-point discretization.
10
3.1 Matrix Formulation
For the nine-point discretization, the linear equation is described in the matrix form,
B, Cg’ O
Cg Bg Cg
Agu=b, Ag=1 ...... 1, (23)
Cg Bg Cg
O C,B,
wiもh
1-9 0 -t一売 0
一一去1一去 . 一売一一去一売
Bg=1 ”.”. 1,C,一1 .” ”, 1. (24) 1 1 1 一 一 一 一 一 一 20 5 20
0 一gi7 kO 一“,一g
We皿ote that Og is not a diago皿al matrix in contrast to O5. The iteration in the SOR
method is,
.(m+i)=,C80Ru(M)+bt==(1一一ctiL,SOR)一i((1-cu)1+coU,SOR)u(M)+bt (2s)
where
一瑠 o o O
-C, 一B,L 1 1一一gOL,SOR eI .,. ., 1,B,L一一.p 1 ...... 1, (26)
一C,一B,L 1 1 .一一・
0 一σ9一増 0 一去・
and
U,SOR ., ,(L,SOR)T. (27)
On the other hand, the iteration of the pseudo-SOR method for nine-point discretization
is formulated in the same way as for five-point discretization,
u(m+i):,cB-SORu(m)+bt=(1-cvLgTSOR)tsi((1-cv)1+cou,P-SOR)u(m)+b” (2s)
11
where
O o
“Cg 〇
五召一50R= ...,.. , (29)
一一C, 0
0 一一 C, O
and
(一Bポ殉 一〇・ 0 (一B,L-B,U) 一Cg
u,p-SOR,.1 ... ... 1. (30) (一B,L-B,U) 一Cg
O (一B,L-B,U)
3.2 Acceleration Parameter w and the Spectral Radius
First we estimated numerically the spectral radia of LgOR(w) and Lg-SOR(w) for IV = g. The
results are shown in Fig. 5. The same statements holds as in Section 2.2. Although three
new values are used to update Ui“・, the pseudeSOR method is considerably inefficient, We
will discuss the convergence of these two methods for general N.
3.3 Analytic Determination of the Optimal Acceleration Param-
eter of SOR and pseudo-SOR Methods
In a similar manner to the five-point case, we have to solve det((1 一一 cv 一 A)1 十 Acx7LgSOR 十
wUgSOR) == O, to obtain the eigenvalues of LgSOR. Using Lemma 1 twice, the problem is
reduced to a set of equations,
レω一λ一6ゴ(ω/5)vCX 一 6,(ω/5)(λ一(δゴ/4)孤)(1一(δ、/4)一N/‘iY)=0, (31)
for 2’ 一一一 1,...,IV 一一 1,k = 1,...,N一 1. Based on our numerical experience, we conjecture that
the optimal acceleration parameter w.pt is given by the w for which the largest real solution
coincides wi七h the absolute value of the complex solutio皿of the quartic equaもion wi七h Iespect
to A,
(1 一 ct7 一一一 A)4 一一 262(8 + 62)(!ll)2A(1 一一一一 cv 一 A)2 + 68(llll)‘ +
12
166‘(w5)3A(A+ 1)(1 一 cv 一 A) 一一一一 466({ilA(A+ 1)2 = O. (32)
Assuming this conjecture, we calculated w.pt and A.pt numerically for IV S 1000 and found
an approximate formula: 5.61 6.63
cuopt f;2一 :[i’i(1:VV) Aopt f s 1一” E’iiiir:V““・ (33)
On the other hand, in a similar manner to the伽e-point case, the opもimal acceleration
parameter and the corresponding spectral radius for the pseudo-SOR method are determined
in terms of the se七〇f equation:
1 一 (1 一 g ,os(ft,)),, 一 A = 一!ill ,.,(li(i,)(2 + ,.,(ft,))vlx
2 .r
A=(1-Eicos(:ii’7・))co-1. (34)
Asymptotically,
,,,.,=IUti一一:51(k)2+o(liel,T), A.,,=1-g(ft)2+o(lier,). ’ (3s)
Table 2 shows the numerical values for w.pt and A.pt. VSie can see that the optimal accelera-
tion parameter for the pseudo-SOR method is substantially smaller than that for the SOR
method and the rate of convergence of the pseudo-SOR method using the optimal acceleration
parameter is worse than that of the SOR method,
4 Numerical Results on Vector Processors
We have seen that the pseudo-SOR method is inferior to the original SOR method in
terms of the convergence ratio for both five-point and ninepoint discretization, so that the
former method requires more number of i七erations than the latter. On the other hand, the
pseudeSOR method is very suited to vector supercomputers, since it involves only stride
one access and has longer vector length than the hyperplane SOR. We will implement those
method on various supercomputers and compare the CPU time for solving the problem.
13
4.1 Dirichlet Problem on a Rectangular Mesh
As an example, we will consider a Dirichlet problem on the unit square with the following
boundary conditions:
U(O,g)=O, U(1,y)=O (O〈y〈1)
u(x,o) = o, U(x,1) = O.5 一 lx 一一一 O.51. (O 〈 x 〈 1)
4.1.1 Numerical Verification of w.pt for IV =6
As a numerical test, we discretized the square by six in one direction (N = 6) and solved the
equation. We show in Figs.6-9 the behavior of convergence for SOR method and the pseudo-
SOR method for both five-point and nine-point discretizations in ternrs of the Li-norm of
the residual vector. The initial vector u(O) is a zero vector. The calculations were performed
in the double precision. Figures (a) are for cv 〈 wopt and (b) for w 〉 co.pt. The theoretical
optimal acceleration par ameters w.pt for N = 6. are given in Table 1. Those figures show that
the predicted w.pt is actually optimal.
4.1.2 Performance on Vector Computers
In order to make a realistic evaluation, we have tested the same problem with more grip points
and measured the performance on various vector supercomputers. We only applyed the five
poin七discre七ization, We tested七en cases with/V=45,63,77,89,99,109,119,ユ27,135,141.
The acceleration parameter is fixed on the thgoretical optimal value. The initial vector is
null and the computations were done in the double precision. The convergence criterion is
that the L2-norm of the residual is less than 10-6. The computers used are,
Vendor Model Max. Speed
a) Fujitsu
b) Fujitsu・
c) Hitachi
d) NEC
e)C・皿vex
f) Ardent
g) Stellar
VP-200
VP-400E
S-820/80
SX-2
C-1 XP
TITAN
GS 1000
570 MFLOPS
ユ700MFLOPS
2000 MFLOPS
1300 MFLOPS
20 MFLOPS
ユ6MF:LOPS
40 MFLOPS
14
The CPU time spent till the convergence is shown in Fig.10 as a function of the number
of grid points, The hyperplane vectorized SOR is superior to the pseudo-SOR for all the
supercomputers tested here.
4.2 Dirichret Problem on a Curved Mesh
In general curved coordinate (〈, ny), the ’Laplacian contains a cross term a2u/O〈Oo, which
is discretized in terms of (Ui+i“・+i 十 Ui.一i,」’.i 一一 Ui+i,」一i 一一 Ui-i,」’+i), so that the discretized
equation is a nine-point difference formula in general [4]. We test here two cases, a) where
the numbers of grid points in two dimensions are the same, and b) where the number of grid
points in the horizontal direction is larger than the one in the vertical directibn.
a) The domain is shown in Fig.11. The boundary values are 20 on the top, 50 on
the bottom and have interpolated values on the right and left sides. The grid points are
462,622,782,902,1022,1102,1182,1262,1342,1422. The initial value is all O and the conVer-
gence condition is that the L2-norm of the residual vector be less than 10-6. Since the optimal
acceleration parameter is not known theoretically, we searched the optimal one in step O.02.
The CPU time for the SOR and pseudo-SOR methods is shown in Fig.12. The hyperplane
vectorized SOR method is much faster than the pseudfrSOR method even in the nine-point
discretization.
b) In this case a horizontalIy long domain with curved boundaries is treated(Fig.13). We
tested three types of grids, 40×82, 40 × 162,40 ×322. Other conditions are same as in a). Table
3 shows the CPU time for VP-200 and S-820/80. Although a long vector length is favorable
for the pseudo-SOR method, the result in this table indicates, the larger the number of grid
points, the faster the hyperplane SOR method as compared to the pseud(>SOR method.
4.3 Mixed Boundary Problem on a Circular Mesh
In this subsection we wi11 discuss Poisson equation
O2u 02u 52tilii + 3ii71/ = 一一2sin(x)sin(y)
in regions bounded by two eccentric circles, as shown i.n Fig,14. VkTe tested six kinds of
grid with different eccentricity and concentricity [5]. Such grids are used in calculating
15
the stream around a circular cylinder. Two kinds of boundary conditions are assumed: a)
Dirichlet contition on all boundaries, and b) Dirichlet on the inner boundary and Neumann
on the outer boundary. The size of the grid is 16(r-direction) × 50(e-direction). The analytic
solution is u = sin(x)sin(y).
The optimal acceleration parameter and the corresponding spectral radius are searched
numerically in step O.02. The results are shown in Table 4 for the six kinds of grid in Fig.14.
The optimal acceleration parameters in the hyperplane SOR method are stable for both
Dirichet and mixed conditions with respect to the eccentricity, while those the pseudo-SOR
method changes appreciably. ln the case of hyperplane SOR method, the optimal acceleration
parameter for the mixed condition is larger by O.12-O.18 than that for Dirichlet condition,
while in the case of the pseudQSOR method, there is no difference.
Moreover, the spectral radius for the optimal acceleration parameter for the mixed con-
dition is larger than that for Dirichlet condition. Especially, the spectral radius for the
pseudo-SOR method is nearly 1.0, the convergence limit. Fig.15 shows the numbers of iter-
ations for different acceleration parameters. It is to be noted that the pseudo-SOR method
is very critical, in the sense that an acceleration parameter slightly larger than the optimal
value may cause divergence of the iteration. On the other hand, the converge’nce region for
the hyperplane SOR method is large.
5 C o nclusio n
We have aI1記ysed both theoretically and numeIica皿y the performance of the genuine
SOR and pseudo-SOR. Our observation is as follows.
1. For the rectangular region, in both five-point and nine-point discretiz ation, the asymp-
totic convergence rate is 1 一 C/IV for the SOR method, while 1 一 C’/N2 for the pseudQ
SOR method. Although the latter is easily vectorizable, the overa11 efliciency of the
former is higher.
2. Even when the aspect ratio of the grid is large, which case is favorable for the pseudo-
SOR method, this method is inferior to the hyperplane SOR method.
3. ln the curved mesh discussed here,
16
(a) As the eccentricity increases, the optimal acceleration parameter becomes small
and the spectral radius approches unity in the pseudo-SOR method. On the other
hand, the optimal acceleration parameter in the hyperplane SOR does not change
appreciably.
(b) ln the pseudcHSOR method, acceleration parameter slightly larger than the opti-
mal value may cause divergence.
(c) When the outer boundary is with Neumann condition, the spectral radius for the
pseudo-SOR method is nearly unity.
We conclude the pseudo-SOR method is less than the genuin SOR method in efficiency
on vector supercomputers, although the former is easily vectoriz able with unit stride. The
rate of convergence should be most respected in the vectorization of an algorithm.
AcknowledgementThis work is partly supported by the Grand-in-Aid for General Scientific Research of The
Ministry of Education, Science and Culture of the Japanese Government (No.01540169).
References
[1] J. M. Ortega, ”lntroduction to Parallel and Vector solution of Linear Systems”, Plenum
Press, New York and London, 1988,
[2] L. Lamport, ”The Parallel Execution of DO Loops”, Comm, of ACM, 17, pp.83-93, 1974.
[3] R. S. Varga, ”Matrix lterative Analysis”, Prentice-Hall, New York, 1962.
[4] J. F. Thompson, Z. U. A. Warsi, ”Numerical Grid Generation”, North Holland, 1974.
[5] T, Kawamura, H. Takami, K. Kuwahara, ”Computaion of High Reynolds Number Flow
around a Circular Cylinder with Surface Roughness”, Fluid Dynamics Research, 1,
pp.145-162, 1986.
17
Table 1 co,pt and A,pt for five-point discretization
SOR method pseud(トSOR method
ω。μ λ、ゆ ω。μ λ。μ
1V=6 1.33333 0.33333 1.23431 0.76878
ノV=10 1.52786 0.52786 1.29285 0.90764
1V=20 1.72945 0.72945 1.32259 0.97584
1V=50 1.88183 0.88183 1.33158 0.99606
2V=100 1.93909 0.93909 1.33289 0.99901
1V=1000 1.99373 0.99373 1.33332 0.99999
Table 2 wopt and Aopt for nine-point discretization
SOR method pseud(トSOR method
ω。Pε λ。P・ ω。P・ λ。μ
1V’=6 1.31393 0.37071 1.26184 0.69896
1V=10 1.50902 0.56335 1.35459 0.86991
ノV=20 1.71627 0.75377 1.40799 0.96425
!V=50 1.87542 0。89351 1.42517 0.99411
N=100 1.93567 0.94529 1.42772 0.99852
ノV=1000 1.99337 0.99439 1.42856 0.99999
Table 3 CPU time in sec for the optimal acceleration
grid 40×82 40×162 40×322
pseudo-SOR
?凾垂?窒垂撃≠獅?@SOR
0.49 2.55 17.2
O.27 0.72 3.29
(VP-200)
grid 40×82 40×162 40×322
pseud(トSOR
?凾垂?窒垂撃≠獅?@SOR
0.17 0.82 9.08
O.08 0.27 1.69
(S-820/80)
18
Table 4 tu,pt and popt for an eccentric double circle
mesll pseud(》SOR hyperplane SOR
ω・μ ρ・μ ω(μ ρqp舌
Fig.14(a) 1.22 0.971 1.68 0.720
Fig.14(b) 1.12 0.971 1.70 0.717
Fig。14(c) 1.00 0.977 1.70 0.782
Fig.14(d) 1.20 0.970 1.68 0.720
Fig.14(e) 1.14 0.975 1.72 0.766
Fig.14(f) 1.02 0.975 1.72 0.766
a) Dirichlet condition on both boundaries
mesh pseud(トSOR hyperplane SOR
ω・Pε ρ・P診 ωqP舌 ρqp舌
Fig.14(a) 1.26 0.991 1.84 0.840
Fig.14(b) 1.16 0.993 1.88 0.856
Fig.14(c) 1.02 0.994 1.82 0.953
Fig.14(d) 1.26 0.998 1.88 0.900
Fig.14(e) 1.16 0.998 1.88 0.913
Fig.14(f) 1.00 0.997 1.80 0.924
b) mixed boundary conditions
19
Figure Captions
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Spectral radii of L,SOR(cx7) and Lg一一SOR(cv).
Parabola士βへ/:X and lines(1一αω一λ)/ω)withω〈ω1,ω=・ω1 andω〉ω1・
Line (1 一 cvco 一一 A)/cv) and parabolas ±x3v’ZX for (6i 〈 P2).
Par ab ol as ±60 V‘jX and Iines (1 一一 cMcv 一 A)/co) for (cvi 〈 cu2).
Spectral radii of C,SOR(cv) and 1 g-SOR(c,?).
Convergence behavior of the SOR method with five-point discretization.
Convergence behavior of the pseudo-SOR method with five-point discretization,
Convergence behavior of the SOR method with nine-point discretization.
Convergence behavior of the pseudo-SOR method with nine-point discretization.
Comparison of the CPU time on several vector computers (five-point discretization).
The horizontal a・xis is the number of grid points and the vertical axis is the CPU time
in seconds.
11. The domain for analysis.
12. Comparison of the CPU time on several vector computers (nine-point discretization in
curved mesh).
13. The domain for analysis.
14. The grid systems of eccentric double circles.
15. The number of iterations needed for convergence as a function of the acceleration pa一
rameter for (1) pseudoSOR and (II) hyperplane SOR method, The boundary condition
is Dirichlet and the grid systems are (a), (b) and (c) in Fig. 14.
20
・,Ji i)
le5
雪
空
t6 i.o
誤
O.5t(cos(g))=v一? 一ii =o.437t一..
11
0.0/
1eO
1
/
/
1
P(瑠凸))
ノ
1
1
111
ロ i
ア(劇ω 1))
i
i
:
1.5 2.0 u,)re laxat ion factor
F i g . 1
一 ct
1
p /x
j)t一’
(W = U・)1)
F i g. 2
22
一輪
1
ρ2灰
Bi 1X
: i i’
/
F i g. 3
x
一戸i!叉
一P 2M
23
1
噺吻
2@ i@ i●圏幽 」■● 鯛■● ■嗣) ●■画 一 1…フ、凶ω一λ
危1天
A
・巫
C‘Ci)
Fi g. 4
24
Jl’
g(cos(g))一fg=o.34i6...
1.5
も0 P(縄澱ω))
冨
L一鋤。 ■9■■DO■顧齢●■馳■一一■ ■一■■爾 ロー ●願■・騨。■膠口■.陶
8紆 {
@ 3
P/ P(趣))0.5 1 1
!1 1
@ 62
46…33
()0
@ 1.0 1.5 2Drelaxation facto
F i g . 5
ほ邸
慶
℃リリ
ga
o』
ロ〇一一s’
翼,gi IT,
鍵
静碓1.3ω箋i∵
の6;
℃ の
の①』
コ〇一
“
fo一
章鶏
llω議
9i 、鍋慣ul =1.7
to =1.9
,
to =1.8
爲
o No. of iterations
(a)
,
50 o No. of iterations
(b)
,
so
F i g. 6
一
5℃ の
mの』
の〇一
”i
翼
鱒?3
担hヨω=ωopt 園口。」
覇補
to =1.0
/
争ω=1・1to tl.2
;ゴ=1●鶴1・6
93
o No. of iterations 50
e“;, .F
蓉8塁
鉾ヨ2 ヨ
?3
鷺蜀績
岱
CO = (D OPt
w =1. 5
to =1.4
ul =1.3
鎚
o No. of iterations so
(a) (b)
F i g. 7
コ
〇一 9
り㊦=
で ロ
ca
o』
:, g
9ヨ
醤
難9耳
翼。”,...
;
9
/」ωoptヂ
(D =1.0
to =L2to =1. 3
邸,
での ロ
co
o』
一コ
補
雪当
日鷺電もヨ
ー{ωopt
顎a
/
’lx
to tl.4
so
to =1.9
w =1i8(D =1e6 to =1.7
圏
oNo. of iterations o No.’ of iterations se
(a)(b)
F i g. 8
一,
℃ のり
の。』
o.,..,
.,1
導
葺
翼12 i’
昏
〇一
/
to = to opt
(D =1.0
亦
(D ’= 1. 2
の¢コ
℃ ロ
の①
』
o肖
一ロ
0の甲
0り門ほ
〇一〒
oの頃1
の㌣
oの氏8
oI
3.ミ
g・ {
tl g
襲ちヨ
(D =1.9
Nto =L8 a) =1.7
to = w opt
to =1.6
to =L4
叡 w =1.3
“3一一一一’N一一一一一一一一一一一一一一一一一一一’r一一一’一一
R
,
o No. ef iterations
(a)
50 o No. of iterations
(b)
so
F i g一 9
8「一一一「6い_。R/l
8[
6,i
41
2 1
mo
1 04
(a) vp-200
pseudo-SOR
Hyperplane
1 04
(b) VP-400E
8, L,
4 [i
2[
h
2Xl 04
20 Oi
2Xl 04
200
pseudo-SOR
Hyperplane
8
6F
41 1
oL∠一===========コ 104 2XIO4
(c) S-820/80
r
ト
2[
o
pseudo-SOR
Hyperplane
200
r
r pseudo-soR t
L. /Hyperpianeo1
1 04
(d) SX-2
2Xl 04
L
L
L
L
l
loor 江
t’
o
pseudo-SOR
Hyperplane
l
ili oo
ll
l
L
lOOY L
l
r
い
o
pseudo-SOR
ffyperpianel
l
1 04
(e)C-1 XP
2Xl 04 1 04
(f) TITAN
F i g
2Xl 04
. 1 0
104 2xlO4
(g) Gslooo
fiii]
Y
X 扁
40u2。[
セ
0
{103
「
r
亡
1 03/!
1
「
「
の l
P
O
,se、d。.s。R 40i
Hyperplane
lO
104 2×104
(a)VP-200
「
1 「
[
に t
・seudo-SOR @ 「 ロ し
} 0
104 2×104
(e)C-1 XP
2 Ot
/
2×103「一一一「2×103し ii
pseudo’SOR / 1
20
Hyperplane i .
一t, 2xl o//
(b) VP-400E
l
pseudo-SOR/ 1 l o3
kyperpian6
i
o
104 2xlO4
(f) TITAN
F i g . 1 2
i40
pseudo-soR /20
Hyperplane
104 2XIO4 0
(c) S“820/80
pseudo“SOR
Hyperp lane
104 2XIO4
(g) Gslooo
pseud6-soR /zl
1 04
(d) SX-2
Hyperplane ;
2Xl 04
朗
Y
ヨ嵩
El:ti#. il”一:i,}:ILtr’.?.J.].1
螺こに園
海気雌遡i雌
耳隅司塾㌃.一一
鋼製総懸
一L・“U.1.“b
墾 X E3
F i g一 1 3
Y Y
X
N
X
(a)Y
(d)Y
X X
(b) (e)
X
(c) (f)
Fig一 1434
No. of iterations
300:’ic・=r.m
1
200
100 0e9
雛 、へし
@ ’⊆L 、
5 5 .
@ ∫ 5
、口、 、
@ 、口、、 、口、 、《國L-r■」 一
一:
1(b) 「
P
(a)
D
1eO lel le2
aceeleration parameter
Fi g一 15 (1)
le3
霧
No. of iterations
50●1、
q 、 、
、、、 \●
、、 11口 、 !●
I
、自 ノ@ノ
獣\
40、、b 、 ’
@ノ@’
ヒ●、
(c)、、 、 、
1@ 1@ !
@1鴻m
mヒ 、
Q(b)
、、 、 、
’
’
@ノS
b”
3
(a)
2望601.70
acceleration parameter
F i g. 1 5 ( II)
36
INSTITUTE OF INFORMATION SCIENCES AND ELECTRONICS
UNIVERSITY OF TSUKVBA
TSUKUBA-SHI, IBARAKI 305 JAPAN
REPORTA
DOCUMENTATION PAGE
REPORT NUMBER
工SE-TR-90-h86
TITLEOn the E伍ciency of a:n SOR-like Method Suited to Vector
Processor “
AUTHOR (S)
Masaaki Sugihara, Department of Economics, Hitotsubashi University
Yoshio Oyanagi, lnstitute of lnformation Sciences, UniveTsity of Tsukuba
Masatake Mori,Fαculty o/EngineeTing, Univers吻of Tokyo
Seiji Fujino, lnstitute of Computational Fluid Dynamics
REPORT DATEOctober 1990
MAIN CATEGORY
numerica1 analysユS
NUMBER OF PAGES
36
CR CATEGORIES
KEY WORDS
SOR, vector processor, acceleration parameter, hyperplane method
ABSTRACT A・impl・impl・m・nt・ti…nv・ct・・p・・ce…rs・fth・SOR m・th・d f・・s・1・i・g li・…ly・tem・f・quati・n・
arising from the discretization of partial differential equations makes the SOR method into another, which,
though, looks like the SOR method・In this paper, the.e伍ciency of this SOR-like method(pseudo-SOR
コ method)is investigated. For the Poisson equation in a rectangular region, in both五ve-point and nine.point
discretization, we prove analytically that the optimal acceleration parameter is smalIer and the optimal
convergence rate is lower in the pseudo-SOR method. Comparison on several vector computers was
made between the SOR method vectorized with the hyperplane technique and the pseudo-SOR method.
It turned out that the hyperplane SOR is superior to the pseudo-SOR method although the latter is
easily vectorizable on vector processors. We also tested an example of Poisson equation descretized in a
curved coordinates in a region bounded by two eccentric circles and found numerically that the optimal
acceleration parameter becomes small and the convergence becomes slow in the pseudo-SOR method,
while the optimal acceleration parameter in the SOR method does not change appreciably. The genuill
SOR method is superior to the pseudo-SOR method.
SUPPLEMENTARY NeTES .lnvited talk presented by Oyanagi at lnternational Congress on Computational and Applied Mathematics held ,at Katholieke
Universiteit Leuven, Belgium, on July 23-28, 1990.