Upload
john-arthur
View
218
Download
1
Embed Size (px)
Citation preview
NUMERICAL LINEAR ALGEBRA FOR RESERVOIR SIMULATION
by
AIda Behie
INDEX
I NTRODUCTI ON
1 Basic Black Oil Model ((3 Model)
1.1 Model Assumptions
1.2 Black Oil Equations
1.3 Darcy's Law
1.4 Flow Equations
1.5 PVT Assumptions
1.6 Boundary Condi tions
1.7 Relative Permeabilities
2 Difference Methods
2.1 Control Volume Discretization
2.2 Upstream Weighting
2.3 Fully Implicit and IMPES Formulations of Flow Equations
2.4 Dynamic Implicit Formulation of Flow Equations
3 Direct And Iterative Solution Methods
3.1 Structure of the Matrix
3.2 Conditioning of the System
3.3 Direct Solution Methods
3.4 Ordering for Direct Solution Methods
3.5 Iterative Solution Methods
3.6 Classification of Iterative Methods
3.7 Incomplete Factorization Methods
3.8 Treatment of Error Terms
M. B. Allen III et al., Multiphase Flow in Porous Media© Springer-Verlag New York Inc. 1988
248
3.9 Ordering for Incomplete Factorization Methods
3.10 Acceleration Techniques
3.11 Convergence Properties
3.12 Block Incomplete Factorization
3.13 Nested Factorization
3.14 Multigrid Methods
4 Associated Topics
4.1 Treatment of Source Terms
4.2 Vectorization of Algorithms
4.3 Comparison of Methods
BIBLIOGRAPHY
249
INTRODUCTION
The focus in Chapter 3 will be on the solution of large, sparse
sets of linear equations. This will be discussed in the context of the
black oil model, but this is not the only application. The solution
methods discussed here apply equally well to other models, such as
compositional and thermal models, that are commonly used in reservoir
simulation.
Historically, the development of reservoir simUlation models has
been along the I ines presented in this chapter. That is, first a
description of the physical processes involved, or physical model is
developed, whether it be black oil, compositional or thermal. Then a
mathematical model is developed. Usually this consists of writing down
the governing partial differential equations, based on mass or component
balance considerations, algebraic equations governing the transfer of
mass between phases and/or chemical reactions between compnents. The
third step is the development of a numerical model to solve the
governing equations. The most common approach is to discretize the mass
(or component) conservation equations to give a set of nonlinear
algebraic equations. Various linearization techniques have been used for
these equations. The most robust approach has been to simply use Newton
iteration. The use of Newton iteration in turn requires the solution of
sets of linear algebraic equations. The solution of these equations
becomes the most computationally intensive portion of the problem. In
some sense then, the "difficulty" of the problem has been transferred to
the solution of the linear equations. Indeed for some time, their
solution was one of the most problematical areas in the development of
computer models. For a number of years most simulation models had
several solution options available to the user, and if one algorithm
failed the user could try another. If all else failed, the user could
resort to the time consuming but reliable Gaussian elimination
algorithm. It is now known how to derive iterative solution algorithms
for these systems and state-of-the-art simulators all use solution
methods based on accelerated incomplete factorization methods.
250
SECTION 1: BASIC BLACK· OIL HODEL (~-HODEL)
1.1 Hodel Assumptions
The basic black oil model assumes multi-phase, isothermal flow of
three phases; 2 hydrocarbon phases (oil and gas) and water. The
hydrocarbon system is approximated by 2 components:
(1) a non-volatile (black) oil, and
(2) a volatile gas which is soluble in the oil phase.
There is also a water component.
Component Phase
oil···························~ oil ..... l'
gas .... · ...... · .. · ...... · .... ·~ gas
water .................... ·~ water
Fi gure 1. 1. 1
Figure 1. 1. 1 illustrates the relationships between components and
phases in the model. Water and oil are immiscible, they do not exchange
mass or change phase. The gas component is soluble in the oil phase but
not the water phase. Water is usually assumed to be the wetting phase,
with oil having intermediate wettability and gas being non-wetting.
This is the most basic version of the black 011 model. Other
thermodynamic effects can be included. These are described in some
detail in Chapter 2 of this volume.
1.2 Black Oil Equations
The model equations are derived by combining the mass conservation
equat ions for the 3 components and Darcy's Law. The mass conservat ion
equations are given below (see Aziz and Settari, 1979, or Peaceman,
1977, for derivation of these equations):
011: - a - -
- V·(pouo) = at ( ~Sopo) + qo 0.2.0
gas: - v. (p u + p u) = aat ( ~S p + ~S P ) + q + qd9 9 9 dg 0 9 9 0 dg 9
0.2.2)
- a - -water: - V. (p u ) = at ( ~S P ) + q w w w w w
0.2.3)
where qo is the production rate of oil , qg the production rate of free
gas, q the production rate of dissolved gas (from the oil phase) and dg
qw the production rate of water, all at reservoir conditions.
251
1.3 Darcy's Law
In addition to the equations of mass conservation, a relationship
between the flow rate and pressure gradient in each phase is required.
In hydrodynamic flow this is given by the momentum equation. For
laminar, single-phase flow through a porous medium it is given by an
empirical or phenomenological relationship which was discovered by Darcy
in 1856.
u = -k Il
( Vp - rVD ) 0.3.1)
where r = pg, D is depth, Il is the viscosity of the fluid and k is
permeability. The constant k depends only on the nature of the porous
medium and not on the fluid. It is determined experimentally. Darcy's
Law has the same form as the Poiseuille law for laminar flow in a
cylindrical tube. It can be derived from the Navier-Stokes equation (see
for example, Scheidegger, 1960; Whitaker, 1970; Fulks et al,1971). For
multiphase flow, Darcy's Law is extended as follows:
u =-t k krt
Il t (1.3.2)
where the subscript t refers to the oil, gas or water phase, and krt is
the relative permeability of phase t. The relative permeability is an
empirical function of one or more saturations It is also determined
experimentally and will be discussed in more detail below.
1.4 Flow Equations
The mass conservation equations and Darcy's Law can be
combined to get the ·flow equations:
011: (1.4.1)
with similar equations for gas and water. There are three other
algebraic constraint equations:
S + S + S = 1 (1.4.2) 0 9 w
Peow Po - Pw f( St) (1.4.3)
Peog Pg - P = 0
f( St) (1.4.4)
The capi llary pressure terms p and cow Peo9
are empirical. This set gives
six equations in six unknowns.
252
1.5 PVT Assumptions
The PVT behaviour in black oil models is expressed by
formation volume factors
B = o
(V+V ) o dg HC
---rv:r;TC B
9 (1. 5.1)
where VHC is the volume of a fixed mass at reservoir conditions and VSTC
is the volume of a fixed mass at stock tank conditions. The mass transfer
between the oil and gas phases is described by the solution gas-oil
ratio
R = ~ [V] B Vo STC
(1.5.2)
The solution gas-oil ratio is the ratio of the gas component in the oil
phase to the amount of oil component in the oil phase as a function of
oil phase pressure. Finally the three phase densities are written in
terms of the component densities ( Pi. ) which were used in the mass
conservation equations:
1 R P ) (1.5.3) Po Ii Po + Po + P
STC B 9 STC dg 0
1 STC) (1.5.4) Pg Ii Pg = Pg
9
1 STC) (1.5.5) Pw = Ii Pw = Pw
w
These phase densities are substituted into the flow equations which are
then divided by PSTC to get the standard model equations:
a IPS oil: v. ( T { V P - V D } ) = 0 ) + q (1.5.6) 'l at( B 0 0 0 0
0
k k where T ro
= 1l0 Bo 0
is called the transmissibility. There are similar equations for the gas
and water components.
1.6 Boundary Conditions
The mathematical model is not complete without specification of the
necessary boundary and initial conditions. Since the exact extent of the
reservoir is almost never precisely known, the standard model assumes no
flow conditions at the boundaries. Any other conditions, ego constant
pressure boundaries or constant water influx at the boundary can be
253
handled by adding appropriate wells. This has the effect of shifting any
complications to a proper description of injection and production wells.
The initial conditions usually take the form of specified pressures and
saturations. These can be calculated a priori given knowledge of the
water-oil and gas-oil contacts and the assumption of gravity and
capillary pressure equilibrium.
1. 7 Relative Permeability
Most experimental work on relative permeability has been done for
two phase systems. Figure 1. 7.1 below shows the structure of typical
two phase relative permeability curves (for a water-wet system). These
curves are usually not straight lines. For a full description of
relative permeability see section 2.2 of Chapter 1 of this volume.
k ro
1 1
·····················T , , i ! I ! ! i i .....
... , .... ,1" .................................. .
! i
k rw
°O~----~S~------~~~~--------~10
we s w
Figure 1. 7. 1
Figure 1. 7. 1 represents a water-oil system with water displacing the
oil. S is the critical water saturation (the saturation at which water we
would no longer flow) and S is the residual oil saturation (the 011 or
that cannot be removed from the reservoir).
In actual fact, porosity, capillary pressure and relative
permeability are related. In a reservoir with strongly varying
properties (different lithologIes), different relative permeability
curves and residual saturations should be used in different parts of the
reservoir. In terms of the simUlation model, this situation is referred
to as using different rock types.
For three phase systems, it is hypothesized that the following
holds:
k
k rw
ro
= f
= f
S), k w rg
S , S ). w 9
f(S),and 9
(1.7.1)
254
The funct ional dependance of k on S and S is not usually known in ro W \I
practice so that three phase relative permeabilities are normally
derived from two sets of two phase data. These are a water-oil system
with water displacing oil and a liquid-gas system with the oil (in the
presence of critical water) displacing the gas. Stone's Model II is the
one most commonly used. It defines the relative permeability of oil as
follows:
ro rocw {
k k (~W + k )x(~ + k )-( k
k rw k r\l rocw rocw
+ k } rw rg (1.7.2) k k
where k i!! O. The values k ,k ,k ,and k are determined ro rw rg row rog
from the two phase data. The other parameter is
k = k (S =S ) (1.7.3) rOCM row w we:
The expression requires that
k = k (S =1) rocw rog L
(1.7.4)
in order that it reduce to the proper two phase data in the absence of
gas or water. For further discussion of three-phase relative
permeability data see the 1970 and 1973 papers by Stone.
SECTION 2: DIFFERENCE METHODS
2.1 Control Volume Discretization
The control volume discretization is one of the simplest ways to
approach the discretization of the flow equations. It has the advantage
of guaranteeing conservation of mass in the discretized equations and is
equivalent to other methods (eg. Taylor series). Consider a one
dimensional, single component system. It can be described as follows:
~---.Ax---+
+--Axl+l/2
direction of flow
Figure 2.1.1
255
Change in accumulation in time interval At in bl,ock i
= ( flow in -flow out) A In at
Accumulation consists of
(1) accumulation due to compressibility
(2) accumulation due to sources and sinks
The flow in minus the flow out in time interval At is
PI_1I2UI_1I2A At - PI+1I2ul+1I2A At (2. 1. 1)
where A is area of the grid block face, P is the density, and u is the
superficial velocity of the fluid. The acc umu I at ion due to
compressibility is given by
VI [ (~p)~+At_ ( ~ p)~ ], (2.1.2)
where ~ is the porosity, and VI is the volume of the I th block and is
given by
VI = AXI AYI AZ I = AXI A . (2.1.3)
The second contribution to the accumulation is simply
q VIAt , (2.1.4)
where q is source strength in mass per unit volume per unit time.
Equation (2.1.1) is divided by
PSTC= P B
multiplied by Axl/(AxIAt) and the discretized version of Darcy's
Law (with no gravity)
= - ( k ) PI - PI-1
Ii 1-1/2 AX I _ 1/2 (2.1.5)
is used to give the discretized form of the flow equation for a single
component flowing in one dimension
( ( (2.1.6)
256
The superscripts m,n denote the time level at which the designated terms
are evaluated (n = old time level, n+1 = new time level, and m = an
intermediate time level), and the subscripts 1,1±1/2 denote the spatial
point at which they are evaluated. The above equation is generalized to
the multiphase case by using the multiphase version of Darcy's Law,
equation (1. 3. 2). Note that the control volume approach automatically
generates a discretization of the accumulation terms of the flow
equation,ie. the terms
a a at ( 41 5 0 8 0 ), at ( 41 S989} , ... etc.
which is mass conservative. Other discretizations of these terms (with
the same truncation error) are possible, but they can lead to non-mass
conserving schemes (see Aziz and Settari, 1979) which can cause material
balance errors and/or instabilities.
There are several issues to be resolved at this point concerning
the discretized equation (2.1.6). The first is the evaluation of the
terms with the subscript 1±1/2. The geometrical term AX1+1/2 is given
by the arithmetic average of the adjacent block lengths
Ax +Ax 1 + 1 1
AX 1+1/2 2
If there were gravity terms in (2.1.5) they would be of the form
Since the density is a smoothly varying function of pressure, it is
approximated by an arithmetic average.
usually evaluated using a harmonic
permeabll it ies
k (Ax +Ax )
1 1+1
1 +1/2
The permeability
average of the k l ±112 is
adjacent
The justification for this is that it gives the exact answer for
incompressible, single phase, steady-state flow when there is a
discontinuity in the permeability between block 1 and block 1+1.
2.2 Upstream Weighting
A second issue is the way in which the time dependant terms
invol ving fl uid properties are evaluated. These are the terms of the
form
257
-- or ( 1) BjJ 1 +1/2
rakrt ) (in the multiphase case). U tjJ I. 1+1/2
They could be evaluated as an average of the values in the adjacent
blocks (midpoint weighting) or as the value in the adjacent block which
is upstream for the flow of that particular phase (upstream weighting).
The former has the advantage of being O(Ax2 ) while the latter is only
O(Ax). However the midpoint weighting scheme is clearly incorrect for
certain physical situations. The one-dimensional example above, with
water flowing from the left and displacing oil could have a sharp water
saturation front with only residual oil behind the front. A midpoint
weighting for blocks at the front could indicate that flow of water was
possible from block 1 to block 1+1 when block 1+1 was still ahead of
the front. This error manifests itself as oscillations in the saturation
solutions around the front. Upstream weighting always gives the correct
relative permeability to the flow. Upstream weighting has the
disadvantage of causing numerical dispersion in the solution, ie.sharp
fronts will be smeared out. The smearing can be reduced by choosing
smaller grid block sizes, which is however computationally expensive.
Another solution is to refine the grid only in the area of the front.
Two-point upstream weighting ( Todd et aI, 1972) is also used to reduce
the spatial truncation error. The upstream direction for each phase is
usually chosen by determining the sign of the right hand side of
(1.3.2), the multiphase Darcy's Law. That is the sign of the function
F - p - P I. - 1.,1+1 1.,1 - PI.,l+1/2
( Zl +1 - Zl ]
g Ax 1 +1/2
with P1+1/2 approximated as described above, is evaluated. If,
FI. > 0, i+l is the upstream block; if
FI. < 0, i is the upstream block. for phase I.
(2.1.7)
The use of equation (2.1. 7) to determine the upstream direction is an
approximation in that the expressions FI. for each phase are not really
decoupled but are nonlinear functions of the phase pressures and
saturations. The expression for the upstream direction is the correct
one for the solutions determined at the previous iteration. The upstream
direction should therefore be checked at the end of each nonlinear
iteration to ensure that it is consistent with the one chosen at the
beginning of the iteration.
258
1.3 Fully Implicit and IKPES Formulation ot Reservoir Simulation
Equations
The final issue to be resolved is the time level at which the terms
qn the left-hand-side of equation (2.1.6) are to be evaluated. If this
is the new time level, ie. m = n+1 , the resulting formulation is termed
the tully impUcit formulation. The fully implicit formulation has
become widely used in reservoir simulation. For modelling complex
physical processes it produces a stable, robust numerical model.
The fully implicit method produces a set of ncx N nonlinear
algebraic equations (where n is the number of coupled equations per c
grid node and N is the number of grid nodes). This system is solved
using Newton iteration (see Au et al ,1980). The solution of the
associated Jacobian system of ncx N linear equations becomes the most
computationally intensive portion of the simulator.
The first reservoir simulation models developed were not fully
implicit. Lack of computers powerful enough to solve the large sets of
linear equations resulted in less implicit approximations being made to
the terms on the left-hand-side of equation (2.1.6). A widely-used
method is the one known as the IKPES an acronym for implicit pressure,
explicit saturation) formulation. The formulation is developed by
decoupling the discretized flow equations using certain approximations.
Condsider the discretized flow equations for an oil-water system with
some terms on the left-hand side written at the old time level:
oil:
water:
v { [k] n r- -p ) n+1 _ -.!.. k ~ 01 01-1 + Ax 1-112 8 Ax
I Ilo 0 1-1/2 1-1/2
k [~]n (POI+1-POI ]n+1} n 1+112 Il 8 Ax + qol
o 0 1+1/2 1+1/2
v {( "'S80 ) n+1 _ A~
o I ("':: r } (2.3.1)
V k n (_ ) n+ 1 ( _p ) n _ -.!.. { k [~] ( Pol Pol-1 _ Pcowl cowl-1 ) +
AXI 1-1/2 Ilw8w 1-1/2 AX I _1/2 AX I _1/2
k ( )n+1( )n k [~] n ( Pol + 1 -Pol _ Pc 0 wi + 1 -Pc 0 WI] } + n
1+112 Il 8 Ax Ax qwl w w 1+1/2 1+1/2 1+1/2
(2.3.2)
259
Note that all terms on the left that are functions of saturations (ie.
relative permeabilities
at the old time level.
the water equation is
and capillary pressure terms) have been written
Now the 011 equation is multiplied by Bn+ 1 and o
mul tiplied by Bn+1 and the two equations are " added. Since the saturations must sum to one the saturation terms on the
right which are at the new time level drop out and the resulting
equation is a parabolic pressure equation. The pressure equation is
solved to give the pressures at the new time level and these are
substituted in one of the equations (2.3.1) or (2.3.2) to give the
resulting saturation explicitly. Note that the pressure equation still
contains some (pressure dependant) terms at the new time level, so that
one or two cycles of simple iteration should be used to converge these.
For more details of the implementation see Aziz and Settari (1919).
The IMPES method suffers from a fairly severe timestep limitation
due to the explicit treatment of the terms
[ krt )
"'tBt This timestep limitation is given by (see Aziz and Settari,1919, or
Peaceman,1911)
I1t $ I1x (for I-D or u
a
$ I1x + l1y (for 2-D ) . u U xa ya
where u, u , and u are the velocities of advance of constant a xa ya
saturation fronts. This condition implies the timestep is limited by the
fact that the throughput of every block in the system must be smaller
than the pore volume of that block. For simulations that attempt to
model near-wellbore effects (ie. coning) relatively small blocks must be
used near the well. In addition flow rates are very high near the well.
Moreover, the presence of free gas (low viscosity) can also lead to high
flow rates. All of these factors lead to unacceptably small timesteps in
an IMPES model for many simulation problems.
2.4 Adaptive Implicit Formulation
More recently, formulations which combine the best aspects of both
of these methods (ie. the low computational cost of the IMPES method and
the large timestep capability of the fully implicit have been
developed. Thomas and Thurnau (1983) and Forsyth and Sammon (1984) have
260
described black oil models based on such formulations. The former uses
Gaussian elimination to solve the linear system and the latter uses an
iterative method (more details of which will be discussed later).
The method begins with the same discretized equations as the fully
implicit method. It is assumed that there are only two types of blocks,
and these are designated as IMPES blocks or fully implicit blocks. In
the IMPES blocks only the pressure is solved implicitly; in the fully
implicit blocks a pressure and 2 saturations (or 2 pressures and a
saturation) are solved implicitly.
The criterion for selection of implicit cells described by Thomas
and Thurnau and confirmed by Forsyth and Sammon is based on a specified
saturation or pressure change threshold from a previous iteration. Such
a criterion can only be used to switch the designation of a particular
block from IMPES to fully implicit. The reverse switch is not possible.
This is because a fully implicit cell can have a large throughput and
yet the saturation changes can be small ( typically seen at the end of a
waterflood, for example). Such a block could easily violate the IMPES
stabili ty criterion. Normally the progression of timestep size in a
black oil simulation goes from small, after a well opening or change,
where transients must be resolved , to larger and larger timesteps ,
until another well change is encountered. The above strategy fits in
well with this sequence. At a well change only a few blocks are set
implicit (the well blocks and its neighbours). Once a block is switched
to fully implicit it is not reset until the next well change. Well
blocks remain implicit always.
It is also necessary to detect slowly growing instabilities, which
would not be detected by the above cri terion. To do this saturat ion
change thresholds must be restricted to significantly smaller levels
than the changes which control timestep selection.
Table 2.4.1 below illustrates some of the savings attainable by
such a method (taken from Forsyth and Sammon) for the first SPE
comparative solution project (see Odeh,1981). The problem was one of gas
injection on a 10 x 10 x 3 grid. Material balance errors were found to
be small and cumulative production totals differed by less than 4Y. for
all cases. A 40Y. reduction in CPU time was seen. By the end of the
simulation 2/3 of the blocks had switched to implicit, but the
time-weighted average of the number of implicit blocks was less than
40Y., even though the top layers of the reservoir contained mostly mobile
261
gas and were made up of mostly implicit cells at the end of the
simulation.
Case
1
2
3
4
Table 2.4.1: Comparison of Adaptive Implicit and Fully Implicit
Solution to First SPE Comparative Solution Project
Timestep seiection norm: pressure
Saturation
pressure
threshold
125.0
250.0
600.0
saturation pressaure saturations
Saturation
threshold
0.025
0.050
0.150
Fully implicit throughout
1000.0 psi 1000.0 psi
0.20
CPU time (sec)
(Honeywell DPS-B)
1285
1239
1334
2178
SECTION 3: DIRECT AND ITERATIVE SOLUTION METHODS
3.1 Structure of the Matrix
The linear systems generated by Newton iteration of the fully
implicit nonlinear algebraic set of equations, discussed in the previous
section, are large, sparse and banded in structure. A five-point
discretization (or seven-point for three dimensional systems) leads to a
five-banded (or seven-banded) matrix. A nine-point discretizaton
molecule is also used in reservoir simulation (see Yanosik and
McKraken,1979; and Shah,1983 ). This leads to a nine-banded (or
eleven-banded) system. Figure 3.1.1 shows the incidence matrix for a
3 x 3 grid with a five-point discretization molecule.
n y
7
4
1
B
5
2
n x
grid
9
B
3
x x x
x x x x
x x ic x x x x
x x x x x
x x x x
x x x
x x x x
x x x
incidence matrix
Figure 3.1.1
( i ,J +11
1(1,Jl o 0-1.J?1--( i+1,J)
( i , J-1I
computational molecule
262
For a fully implicit formulation, each x represents a dense block matrix
of size n x n , where n is typically 3 for a three-component black oil c c c
model. These systems will be called block-banded For an IMPES
formulation, or any other formulation that solves only a pressure
equation, each x represents a single numerical entry. For an adaptive
implicit formulation the diagonal blocks are of size
off-diagonal blocks can be 1 x n, n x 1, or 1 x 1. c c
3.2 Conditioning of the System
The system of linear equations can be written as
A x = b
n x n , c c
but the
(3.2.1)
where A has the structure described above and is non-symmetric. The
concept of diagonal dominance is important when considering the
conditioning of this system. In the case where each entry x above
represents a block submatrix the concept of block diagonal dominance is
appropriate.
Definition: Suppose that a NxN matrix has been partitioned so that
(Feingold and Varga,1962)
A = [
A11 A12••···· Alk
(3.2.2)
Akl ........ Akk
k
where All is of order n l and f n l = N. The submatrices All can be single
elements, or dense submatrices (as described above), or larger matrices
representing all the unknowns along a particular grid line or even
plane. Then A is strictly (block) diagonally dominant if A II
is
non-singular and
L I!A~~ II I !AlP < 1 (3.2.3)
J;CI
for 1 ~ I ~ k and I bll any sui table matrix norm.
if:
It can be shown that the system in (3.2.1) is non-singular
(1) A is strictly (block)-diagonally dominant, or
(2) A is irreducibly (block)-diagonally dominant (ie. the
inequality in (3.2.3) holds for at least one i , the rest are
required only to be equal, and A is (block) irreducible).
263
In addition, (block) diagonal dominance implies that pivoting is not
necessary during the direct elimination process (see Wilkinson,1961, or
Varah, 1972).
To determine whether the systems generated from reservoir
simulation models are in fact (block) diagonally dominant, discretized
equations such as (2.3.1) and (2.3.2) must be examined. It is
intuitively clear that blocks of the order of the number of coupled
equations per grid node should be considered. Varah (1972) and Feingold
and Varga (1962) give examples of block decompositions which succeed
when a corresponding point decomposition failed.
The entries in the linear Jacobian system are derivatives of the
discretized equations with respect to the unknowns (saturations and
pressures). It can be seen that the discretized accumulation terms play a
major role in determining the "amount" of diagonal dominance, since the flux
terms contribute similar entries to the diagonal and off-diagonal blocks. If
the system is "reasonably" compressible, it will be "reasonably" diagonally
dominant. Note also that the magnitude of the diagonal contribution of the
accumulation terms is proportional to the volume of the block and inversely
proportional to the timestep size. Thus small volumes and large timestep sizes
adversely affect the diagonal dominance of the system. Also, the presence of
constant bottom hole pressure wells affects the diagonal dominance positively,
and the the presence of constant rate wells affects it negatively. This is
because the former adds a contribution to the diagonal without an equal
contribution to an off-diagonal element. These considerations about "amount"
of diagonal dominance also have relevance when discussing iterative solution
methods. In particular, iterative methods such as incomplete factorization
methods converge more quickly for systems with a reasonable amount of diagonal
dominance.
3.3 Direct Solution Methods
Direct elimination of the system in (3.2.1) is done using Gaussian
elimination without pivoting. Gaussian elimination gives a very accurate
solution to the linear system (sometimes more accurate than is necessary
if an outer iteration is present) but is costly in terms of computing
time and storage. Both work and storage increase substantially as the
size of the problem increases. The work and storage depend on the system
parameters in the following way:
WORK '" (N2 ) x n3x N (3.3.1) B c
where
264
STORAGE ( N + 1 )x n 2 x N (3.3.2)
N B
n c
N
B c
is the half-bandwidth of the matrix in terms of
blockbands and is equal to n (n n ) for 2D (3D) systems x x y
is the number of coupled equations at each grid node and
is the number of grid nodes ( = n n n) x y z
It is important for the block-banded systems to treat each small
block matrix as a single unit for the purposes of elimination. That is,
the operation of dividing each row by the magnitude of the diagonal
element becomes the operation of multiplying each row by the inverse of
the diagonal block, and so on (see Section 3.2).
Gaussian elimination is equivalent to the formation of the factors
A = L U (3.3.3)
of the matrix system. During the elimination process the area between
the bands of the original system becomes full. This is 111 ustrated
below:
[~l = [~l [~l A L U
Figure 3.3. 1
Figure 3.3.1 makes it clear why the work depends on the half-bandwidth.
The half-bandwidth, in turn, depends on the geometry of the underlying
grid system and on how the grid nodes are ordered. Clearly it is
advantageous to choose an ordering for which the half-bandwidth is
minimized. Note that the entries of U are not actually formed unless
there are multiple right-hand-sides to be solved (not usually the case
in reservoir simulation).
3.4 Ordering for Direct Solution Methods
The following discussion on ordering techniques will be given in
the context of Gaussian elimination, but ordering algorithms are also
important for iterative solution methods. Note that these orderings are
usually applied to the grid nodes, not the equations, when there are
several coupled equations per grid node.
In the work estimate (3.3.1), the number of bands, NB, in the 2D
case is equal to the number of grid nodes in the first ordering
265
direction. If n » n then ordering in the y-direction first gives a x y
smaller bandwidth. Similarly. if n »n the ordering should be in the y x
x-direction first. For example. consider
16 18 17 18 19 20 21 3 8 9 12 16 18 21 8 9 10 11 12 13 U 2 6 8 11 U 17 20
1 2 3 4 6 8 7 1 4 7 10 13 16 19
Figure 3.4.1
The ordering used in the second grid gives a much smaller bandwidth than
that used in the first.
Price and Coats (1974) introduced (to the petroleum literature) the
idea of
(1) minimizing the bandwidth by using a diagonal ordering
instead of ordering by rows or columns (D2 ordering)
(2) using a red-black ordering to form a matrix that allows
half of the unknowns to be decoupled
Consider the above grid ordered in a diagonal fashion:
4 7 10 13 18 19 21 2 6 8 11 U 17 20 1 3 8 9 12 16 18
Figure 3.4.2
The bandwidth of the resulting matrix is at most 3 and for the first and
last few rows it is even less. This effect is more pronounced on a
square grid. The red-black ordering (Figure 3.4.3) puts the matrix in a
form which allows N/2 unknowns to be decoupled and the resulting reduced
system involves the other N/2 unknowns. The reduction process is
described below.
2 9
7 3 1
1 8
elllDlnat.ed variables
6
0
4
12 6
11
x x
x x
x
x x x x x x x x
x x x x x x
x x x x
x x x x 0 0 0
x x x x x
----+
o x 0 0 0
x o 0 x 0 o x x x x 0 0 0 x 0 0
x x o o x 0
xx ooox
Figure 3.4.3
266
Note that the matrix is partitioned into four quadrants. The top left
contains only diagonal entries. Therefore, elimination of the first N/2
unknowns leaves no fill in the top half of the matrix. It produces zeros
in the lower left. The fill in the lower right is indicated by o's The
bandwidth of the reduced system is no larger than the bandwidth of the
original system. The work and storage requirements are (for a 2D problem) 3 3 n n n
WORK x y c
2 2 2 n n n
STORAGE x y c "" 2
Price and Coats combined the ideas of red-black ordering and D2 ordering
to produce a new ordering they called D4 ordering. Again a set of N/2
unknowns can be decoupled. The reduced system matrix in this case gives
a bandwidth that is on average smaller than that of the original system.
The D4 ordering is illustrated below.
6 20 10 24 13
16 6 21 11 26
2 17 7 22 12
14 3 18 8 23
1 16 4 19 9
Figure 3.4.4
The work and storage requirements for this ordering are for large n, n
(2D problems)
WORK
STORAGE
Note that for n = n x y
n3} 2 Y n 6" c
the work is 114 of that for natural
x y
ordering
(3.3.1). The scheme can be easily extended to 3D grids. The analysis
here is more dependant on geometry but for physically reasonable grids
(ie. n, n ~ n ) it gives similar results. x y z
For a nine-point discretization molecule, zebra ordering ( see
McDonald and Trimble, 1977) can be used. Alternatively, zebra ordering
can be applied to the reduced system, which has the connect ions of a
nine-point operator. McDonald and Trimble show zebra ordering on the
reduced system to be faster than D4, for larger problems (n ~ 18). y
George (1973) introduced an ordering algorithm which is
theoretically O(n3 ). Furthermore, he showed that O(n3 ) is a lower bound
267
for the elimination work on any n x n grid. This algorithm is known as
nested dissection. It performs well for large square grids
(n = n 2::: 33) when compared to zebra ordering, for example, on a five x y
point discretization molecule. Unfortunately, performance is
significantly degraded for non-square grids.
Another pseudo optimal ordering scheme is the minimum degree
ordering which has been discussed in various contexts (see Price
and Coats for references). The ordering is based on choosing pivots
along the diagonal so that the fill at any stage is a minimum. This
strategy is of course dependant on how far the algorithm is prepared to
"look ahead". It has been used successfully in, for example, the Yale
Sparse Matrix Package (Eisenstat et aI, 1983). This ordering does not
necessarily have to be applied to the grid nodes (ie. the coupled
equations at each node do not have to be treated as a unit).
3.5 Iterative solution Methods
The solution of large 2D and 3D problems often requires too much
computational work and storage to use direct elimination. An alternative
is the use of an iterative method where the work and storage depend on:
WORK ~ N x n3 x number of iterations c
STORAGE ~ N x n c
(3.5.1)
(3.5.2)
All iterative methods require an initial solution, x(O), which is
used to start the algorithm. The performance of the method depends on
how close the initial solution is to the true solution. For time
dependant problems, an initial solution which is the solution of the
previous timestep is convenient. In reservoir simulation, since the
system being solved is the Jacobian system:
F (x) ax = -F(x) (3.5.3)
and at convergence ax ~ 0, an initial solution of x(O) = a is usually
chosen. The number of i terat ions required for convergence can vary
widely for different problems and different methods. Also, for most
methods, the number of iterations depends on problem size , ie. most
methods are not O(N) but O(N'" ) where m > 1. Iteration parameters are
often used ( in Gauss Seidel, SOR , etc ) to accelerate convergence.
The physics of the problem being solved in reservoir simulation can
produce matrices A which are in some sense "difficult" to solve
iteratively. For example, anisotropic permeabilities often occur, with
k , k » k. Also, discontinuities in k , k , k are enountered (shale x y z x y z
268
barriers). As mentioned earlier, small V and/or ~t give a matrix which
is "less" diagonally dominant and therefore harder to solve for most
methods. Neumann boundary conditions are generally used. The presence of
pressure-controlled wells adds terms to the diagonal but rate-controlled
wells are equivalent to Neumann boundary conditions. Furthermore, the
matrix A is generally non-symmetric.
3.6 Classification of Iterative Methods
A general iterative method for the solution of the linear system
(3.2.1) can be written as follows. First a splitting of A where C is
nonsingular and
A = ·C - R (3.6.1)
is defined. It is now possible to define a basic iterative
method (Varga,1962) to be
Cx=Rx+b
or in residual notation xln+1) = xln) + C- 1r ln).
where r ln) = b _ AKIn)
(3.6.2)
(3.6.3)
Most common methods can be formed by an appropriate choice of C. For
example,
(1) C A, and R = 0 the algorithm becomes the
Gaussian elimination algorithm
(2) C = I, gives the Richardson method.
Between these two extremes there is a whole spectrum of methods:
(3) C = D gives the Jacobi method (where D is the
diagonal part of A)
(4) C = ~ D - GL gives SOR where ( A = D - GL- Gu)
(5) C = LU gives an incomplete factorization method.
It is this last class of methods that will be discussed in detail.
It is known from the study of symmetric systems (Kershaw, 1978;
MeiJerink and van der Vorst, 1977) that these methods have the best
potential for the types of matrices found in reservoir simulation.
3.7 Incomplete Factorization Methods:
Given a sparse banded matrix A, an incomplete factorization, LDU,
of A is defined to be:
269
LDU = A + E (3.7.1)
where E is known as the error matrix, L, D and U are lower triangular,
diagonal and upper triangular matrices respectively. In order to
minimize the work per iteration, L + D + U should have a sparse banded
structure close to the structure of A. However convergence will be more
rapid if the elements of E are made as small as possible. This is
generally achieved at the expense of extra bands in L and U.
If L and U retain the same sparsity structure as A, ie.
[~l = [~l [~l [~l A = L D U
Figure 3.7.1
the factorization reduces to the SIP method (Stone, 1974) or the DKR
method (Dupont et aI, 1968). Both of these methods have been widely used
in reservoir simulation. Adding more bands to the incomplete
factorization increases the work of forming the approximation and the
work per iteration. The strategy used for deciding which extra bands to
add varies among different authors. Behie and Forsyth (1984) use the
"degree" concept. In this, the incomplete factorization is viewed as
carrying out a few steps of Gaussian elimination on A. If the bands of
the original matrix A are labelled as first degree, then higher degree
bands are formed by fill-in resulting from elimination. The degree of a
fill band is equal to the degree of the band being eliminated plus the
degree of the band inducing it. This use of degree is equivalent to
Watt's (1981) concept of "order", but not the same as Gustafsson's
(1978) use of the word "order". Gustafsson's strategy for adding extra
bands is also slightly different. Some examples of the structure of
different degree incomplete factorizations are given in Figures 3.7.2
and 3.7.3'
[~l' [~l [~l [~l A L D U
Figure 3.7.2: Second Degree ILU (natural ordering)
270
[~l· [~l [~l [~l A L D U
Figure 3.7.3: Third Degree ILU (natural ordering).
As the degree increases, extra bands are added. This increases the
computational work, but also increases the convergence rate of the
algori thm. The trade-off between extra work and rate of convergence
depends somewhat on the problem being solved, but general guidelines can
be established by numerical experimentation (see Section 4.3).
3.8 Treatment of Error Terms
In equation (3.7.1) above E was defined to be the matrix containing
the "error" made in the incomplete factorization. The elements of E are
outside the structure of L + D + U. If it is assumed that the solution
is continuous, then clearly an element,e , of E, which falls outside the
computational molecule can be approximated by points inside the molecule
so that:
(3.8.0
where ea is an approximation to e and h is a measure of the mesh size.
If
{LDU}=A-E (3.8.2) a
where refers to the bands coinciding wi th the band structure of
L + D + U, then consequently,
LDU A-E +E a
= A + E' (3.8.3)
where E' = E - E. This results in an m'th order factorization of A, a
assuming equation (3.8.1) is true. Consequently, as the mesh size h is
decreased, and the number of unknowns N is increased, the error E
decreases. Intuitively, it is clear that as m increases, the convergence
rate degrades less with increasing N. This is shown more rigorously in
Gustafsson (1977, 1978) for symmetric problems. In part icular, if the
error terms are not accounted for at all, the factorization is zeroth
order. The first order factorization of Gustafsson simply approximates
the error term by using the diagonal point. This is the modified
factorization ( MILU ) which will be referred to later. It is possible
271
to obtain a higher order factorization by using more points within the
molecule. SIP (Stone,1968; Saylor,1974) is an example of a second order
factorization.
3.9 Ordering for Incomplete Factorization Methods
It is well known that the ordering of the grid nodes can affect the
convergence properties of iterative methods (see Varga, 1962, for
example). Watts (1981) suggested using the 02 ordering mentioned earlier
(Section 3.4) and found that this ordering in combination with an ILU
method produced improved rates of convergence for typical reservoir
simulation problems. Physically, 02 ordering removes the directional
bias which results from such things as anisotropies in permeabi li ty
(which are usually aligned with the grid lines) and/or varying block
sizes.
Axelsson and Gustafsson (1979) suggested using red-black ordering
to form a reduced system which is then solved iteratively. The system in
(3.2.1) if ordered with red-black ordering can be written as ( see
Figure 3.4.3)
[ :: :: 1 [ :: 1 ' [ :: 1 (3.9.1)
where D and D are block diagonal matrices and A and A are block R B B B banded. This can scaled as
[ I A
1 [ :: 1 [ :~ 1 R R
(3.9.2) A D B B
The system (3.9.2) is equivalent to
(3.9.3)
Let R = DB - ABAR and c = b - A b ' B B R
and (3.9.3) can be written:
R XB = c, with
x = b' - A x R R R B
272
(3.9.4)
(3.9.5)
This reduced system can be diagonally ordered (Behie and Forsyth, 1984)
to produce an algorithm which has the directional bias removing
properties of that of Watts' 02 algorithm. The system in (3.9.4) can now
be factored in an approximate fashion to any desired degree of accuracy.
The resulting system is solved iteratively only for the black points and
once convergence is reached the red points are retrieved via (3.9.5).
3.10 Acceleration Techniques:
Again, by analogy with work done for symmetric systems, it is
postulated that such incomplete factorizations probably work best when
used in conjunction with an acceleration technique.
systems, techniques such as conjugate gradient
acceleration (see Meijerink and van der Vorst, 1977,
For symmetric
and Chebyshev
Manteuffel,1977,
Kershaw,1978) have been used. Nonsymmetric analogues of the conjugate
gradient acceleration method have been developed (Vinsome,1976, Young
and Jea,1980, Axelsson,1980, Elman,1981, Saad and Schultz,1986). The
ORTHOMIN algorithm developed by Vinsome has proved to be one of the most
useful of these acceleration techniques. It provides a computationally
simple , robust acceleration method. It does not require estimation of
eigenvalues as does Chebyshev acceleration. The iteration parameters are
generated automatically by the algorithm. When combined with an ILU
method, it gives rise to the following computational algorithm:
(k) q
k = 0,1, ...
{ -o
A ax ( k) ,A q ( 1 ) )
A q( I), A q( I) )
k-l ax(k) + L a!k) q(l)
1= •
• <k-l
k-l
L (k) A (I) a l q 1=.
11< k-l
for all 1 in (II :51 :5 k-l)
otherwise
(3.10.1)
(3.10.2)
(3.10.3)
(3.10.4)
where
and
(k) w
A (k) r(k) ) q ,
X(k+l) = x(k) + w(klq(kl
r(k+l) = r(k) _ w(k) A q(kl
273
m = int( k/(N +1))*(N +1), orth orth
q(O) = .5x(Ol = (LDU)-lr(O)
(3.10.5)
(3.10.6)
(3.10.7)
Note that this is called the restarted version of the ORTHOMIN
algorithm. This version of the ORTHOMIN algorithm is also often referred
to as the generalized conjugate residual algorithm (GCR) see
Elman, 1981).
The ORTHOMIN algorithm is analagous to the conjugate gradient
algorithm in
direction A
previous set
the following way. At each step of the algorithm a search (k)
q is constructed so that it is orthogonal to some (I)
of A q . This set cannot be large as it would involve
too much work to perform the orthogonalizations. Practical choices of
the parameter North lie between 1 and 10. Once the (North+l) search
vectors have been constructed, the algorithm is restarted, and a new set
of (North+1) vectors is constructed. In the conjugate gradient algorithm
the new search direction at each iteration is automatically conjugate
to all previous search directions. The orthogonalization procedure in (k)
(3.10.2) is essentially a Gram-Schmidt procedure. The parameter w is
chosen to minimize the residual in the t2 norm. GMRES (Saad and
Schultz,1986) is a similar procedure but the search vectors q, and Aq
are not saved but rather they are reconstructed by an Arnoldi process
constrained to minimize the residual. This algorithm can have
significantly lower storage costs (for large North ) but the convergence
properties are similar (it is mathematically equivalent to ORTHOMIN).
Note that when the ORTHOMIN acceleration algorithm is used with
the 04 ILU approximate factorization the matrix-vector multiply can be
calculated in a computationally efficient way . Put
y = R .5x = D .5x - (A (A .5x) ) . B B R
(3.10.8)
In 30 this costs 13 NB mult iplies versus 19 NB mult iplies for the
conventional way. In 20 the work is the same.
274
3.11 Convergence Properties
For a symmetric, positive definite incomplete factorization
acce lerated by conjugate gradient (or precondi tloned conjugate
gradient as it also called), the rate of convergence is proportional to
a power of
where
is the ratio of the maximum to minimum eigenvalues of the iteration
matrix (Concus et aI, 1975) and is known as the condition number. It is
therefore desirable to choose a splitting for which K is as small as
possible.
The success of the conjugate gradient accelerated incomplete
factorization is not only due to the reduction of the condition number
but also to the fact that the modified system has eigenvalues which are
nearly one except for a few extreme eigenvalues that are quickly
eliminated by the conjugate gradient acceleration (Kershaw, 1978).
There is little in the way of analysis of the convergence
properties for these iterative methods applied to non-symmetric systems.
Elman (1981) has shown that if the symmetric part of A
definite then ORTHOMIN generates a sequence which satisfies
[ i\ (C) ]II?
II r II :S 1 ml n r II I 2 i\ (C)+ p(R2)/i\ (C) 0 2
max min and thus the process cannot diverge.
3.12 Block Incomplete Factorization
is positive
(3.11.1)
It is well known, for classical methods such as Jacobi and
successive overre 1 axat ion (SOR) , that block methods can be
asymptotically faster than the corresponding point methods (Varga,
p. 199). Moreover, in the case of block or line SOR, the method can be
normalized to require exactly the same computational work per grid node
as the point method does. For these reasons there has been considerable
interest recently in applying these ideas to ILU methods.
For a block incomplete factorization. the matrix A is partitioned
as follows:
275
: : x x x I x x x !
; ..................... ~ ···;······;············1···;··················
x ixxxi x x i x x i
! i x
......................... ~ ............................. ~ ........................ . ! x i x x : x : x x x ! i : x : x x
Figure 3.12.1
where the B's are tridiagonal matrices and the L's and the U's are
diagonal matrices. As in Section 3.1, each x can represent a single
entry in the matrix or a dense submatrix of order n x n . Note that the c c
term "block" is used in a more general sense here than in Section 3.2
where it referred only to the dense submatrix represented by the x's.
The concept of block diagonal dominance applies here too, with the
blocks defined by the above partitioning.
There is usually, but not always, a physical basis to the
partitioning chosen. In the above example, the B matrices represent the
coupling between unknowns on the same line of the grid (see Figure
3.1.1). This partitioning views the matrix A as a block triangular
system where
A=B+L+U
An exact factorization of this system is
where
and
A = ( G + L ) G-1 ( G + U )
G 1
G = B - L G-1 U 1 1 1 1-1 1-1
(3.12.1)
(3.12.2)
1 = 1, ... N
The G (except for the first ) are full matrices. For a block incomplete 1
factorization a sparse approximation to
approximated by
G is used. The matrix A is 1-1
H = ( H + L ) ( H-1 ) ( H + U ) (3.12.3)
A + H - B + L H-1U
A + E
Many different approximations to H- 1 are of course possible. The aim is 1-1
276
to find an approximation that is sparse (diagonal or tridiagonal in
structure) yet retains enough information to significantly increase the
convergence rate. The error terms, E, in (3.12.3) can be accounted for
in an analagous fashion to that discussed in Section 3.8. The column
sums (or row sums) of E are subtracted from the corresponding diagonal
element.
A simple approximation is to write
H-1 = diag (G-1 ) 1-1 1-1
(3.12.4)
where only the diagonal element of the exact inverse is retained. Note
than for 2D problems this gives the method known as Nested Factorization
( for further discussion of this method see Section 3.13). A more
accurate treatment is to put
H~~1 = band ( G~~1 ' P ) (3.12.5) in which p bands on either side of the diagonal, as well as the diagonal
are retained. Methods based on this approximation are known as INV(p)
and MINV(p). For further discussion of block incomplete factorization
methods see Underwood (1976), MeiJerink (1983), Axelsson et al (1984),
and Concus et al (1985).
3.13 Nested Factorization
A type of block incomplete factorization that is used in reservoir
simUlation is the algorithm known as Nested Factorization (Cheshire,
1983). For 2D systems it is equivalent to block incomplete factorization
with the approximation (3.12.4) for the inverse of the diagonal block.
For 3D systems and a seven-point discretization molecule, the matrix A
is partitioned as follows (for a 3x2x2 grid):
x x Ix :
x x x i x : x x i x
;········ .. ···i;···;······ x ix x x
x ~ x x ••••••••••••••• j ••••.••••••••••
x
x x
x x
x x
x ~ ..................................... .
x x
x
x x Ix xxxi x~·····
x x i x ;·············r;···;···~·····
x ;x x x x I x x
Figure 3.13.1
U 3
U 2
277
Symbolically the matrix A is written as
A=D+L+U+L+U+L+U 1 1 2 2 3 3
(3.13.1)
where again each x may represent either a single element or a nc x nc
submatrix. The matrices represented by D, L, and U are diagonal in x.
The matrix A is then partitioned into a block tridiagonal form by the
coarsest partitioning in Figure (3.13.1) and a block incomplete
factorization is written a
M = ( p + L3 ) p-l ( p + U3 ) (3.13.2)
where p-l is a sparse approximation to the true inverse. This sparse
approximation is obtained from a second partitioning of the system into
a block tridiagonal system (formed by the finer partitioning in Figure
(3.13.1) ) and a second block incomplete factorization and is written as
p' = ( T + L ) T-1 ( T + U ) 2 2
(3.13.3)
where T-1 is again a sparse approximation to the true inverse and is
formed by factoring each matrix T (which represents the coupling between
nodes on a single line of the grid). At this level the factorization can
be done exactly
T = ( s + L ) S-1 ( S + U 1 1
(3.13.4)
where S is a diagonal matrix defined by
S = -1 D - Ll S U1 - colsum(
-1 - colsum( L3P U3 ) (3.13.5)
The colsum( ) terms are added to account for the error in the incomplete
factorizations represented by (3.13.2) and (3.13.3). They ensure that
the colsums of the error matrix (M - A) are zero and thus that residuals
sum to zero (independently for each unknown in the system) if the
condition
(3.13.6)
is enforced. The nested factorizaton error matrix is itself block
diagonal and thus the residuals will also sum to zero within planes ( or
lines i a 20 system). This fact can be used to check the implementation
of the algorithm.
Note that only the coupling between nodes on a single line of a
grid is accounted for exactly in this factorization. The coupling in the
other two directions is accounted for only in the error terms. For this
278
reason, the algorithm is very sensitive to the ordering of the grid
nodes (for problems with anisotropies and/or discontinuities in
coefficients) and care must be taken to order· the grid nodes properly.
3.14 Multigrid Methods
The multigrid method is an iterative technique developed for the
solution of elliptic partial differential equations which, for smooth
problems (le. no discontinuities in coefficients), can be shown to
converge at a rate which is D(N), where N is the number of unknowns
(Brandt, 1977; Brandt & Dinar, 1979; Brandt, 1986). This convergence
rate makes the method potentially a very attractive one since the
incomplete factorization methods discussed above are D(N312 ) or D(Ns/4 )
(see section 3.11). For a method which is D(N), the number of
iterations required for convergence is independent of the number of
unknowns, so that the size of the problem could be doubled and not
increase the number of iterations required for convergence.
The first step of the multigrid method is to discretize the problem
on a number of grids of varying fineness (a coarser grid usually being
a subset of the finer one). The second step is to take advantage of a
known property of relaxation methods (ie. Gauss-Siedel and related
methods) which is that they are efficient at eliminating local or high
frequency errors which have a wave length on the order of the grid
spacing. Relaxation methods are, on the other hand, very inefficient at
eliminating longer wavelength error. Each successive grid ,therefore,
can be used to reduce error components which are of the order of that
particular grid's spacing. The problem is then passed on to another
(coarser) grid to eliminate the longer wavelength error components.
279
Figure 3.14.1
If the differential equation to be solved is written as
L u = f K it can be represented on the finest grid, e, as
LK tl' = fK
If uK is an approximation to tl', then the residual on eK is
rK = fK _ LK UK
Equation (3.14.2) written in residual form is
LK vK = rK
where
(3.14.1)
(3.14.2)
(3.14.3)
(3.14.4)
To reiterate, a relaxation method used on equation (3.14.4)
eliminates the high frequency components of error. The solution is
smooth in the sense that it does not have fluctuations on the scale of K K e . This means that an approximation to v can be found using a coarser
grid. The essential idea behind multigrid is then that the problem 1s
prepared in such a way that 1 t can be represented and solved on a
coarser grid. The problem on the coarser grid is written
(3.1,4.5)
where
and
280
1"-1 is an interpolation operator from e" to e"-1, " " "-1 L represents the differential operator on e .
Again, on this grid, relaxation is efficient at eliminating errors that
have a wavelength of the order of the mesh size. This operation is
referred to as smoothing and is independent of the mesh size (Brandt,
1977) .
Equation (3.14.5) can be used to improve u" with
(u" )new = (u" )Old + I" "-1
"-1 V (3.14.6)
That is, the coarse grid provides a correction to the fine grid
solution. This correction (I" V"-1) contains information about low "-1 frequency components of the solution and hence speeds convergence.
After a few sweeps of relaxation on e"-1 convergence will
deteriorate, as it did on the finer grid. However, it is now observed
that (3.14.5) can be treated in exactly the same way as (3.14.2) was.
That is, a correction can be obtained from a still coarser grid, e"-2. On the coarsest grid, e1 , the problem is usually solved exactly. The
algorithm can be represented diagramatically as follows:
e" smooth (perform
~elaxation sweeps)
transfer residuals (1"-1 " " r
e"-1 smooth (perform ~elaxation sweeps)
Figure (3.14.2)
This represents one multigrid cycle. In particular, it is called a
V-cycle. Other types of cycles are possible, ego W-cycles A common
configuration is to have three levels or grids in the cycle.
To complete the specification of the algorithm, the following must
be defined:
(1) the interpolation operators I" (the residual transfer "-1
281
M-1 operator 1M is generally taken to be the inverse operation)
(2) the approximation of the differential operator on the coarse
grid, ego LM- 1
For smooth problems this definition is easy. The M-1 operator L is
defined in the obvious manner, ie. the fine grid nodes are simply
removed. For the interpolation operator linear or quadratic
interpolation is used. For problems with discontinuities or anisotropies
in the coefficients, the definition of these operators is not so
obvious. In fact this definition is the crux of defining a workable
multigrid algorithm for such problems.
For discontinuous and/or anisotropic problems it is difficult to
define physically meaningful interpolation operators. Early attempts at
defining interpolation operators were often thwarted by counterexamples
where the operator failed. Working on the premise that the interpolation
operator should mimic the properties of the differential operator, it
was suggested that the differential operator itself be used as an
interpolation operator wherever possible (Alcouffe et aI, 1981). This
cannot be done everywhere since the interpolation operator on the
coarser grids would grow undesirably large. When the differential
operator cannot be used in its entirety, a collapsed or averaged version
of the differential operator is used to define the interpolation
operator. The process can be described in the following way:
(1) for f.ine grid points corresponding to coarse grid points, use
the identity operator;
(2) for fine grid points on a line between two coarse points, the
different ial operator L M is averaged in the two directions
perpendicular to the coarse grid lines, ie. the components of M M C1 L are added to produce a collapsed 10 operator (L) ,and
the interpolation operator is obtained from (LM)C1 uM = 0 ;
(3) for fine points not on coarse grid lines the full differential
operator is used, ie. the interpolation operator is obtained
from L M uM = 0
The differential operator on the coarse grid can now be defined
recursively as: L M-1 I M-1 L MI M
M M-1
(1M )T LM 1M M-1 M-1
This is called the automatic prescription for the coarse grid operator.
282
If the fine grid operator is the standard five-point (or seven-point
operator, in 3D) discretization operator, the coarse grid operator will
be a nine-point (or twenty-seven point) operator, even with the
averaging described above.
The third consideration in defining the algorithm, is to choose a
smoothing method. For smooth problems, point Gauss-Seidel is used. For
anisotropic problems, line or alternating line Gauss-Seidel is used.
Other iterative methods have been tried as smoothing methods with
varying degrees of success (Hemker, 1982; Kettler, 1982). In three
dimensions this issue is more serious, since the three-dimensional
analogue of line Gauss Seidel, plane or alternating plane Gauss Seidel,
is too costly. It requires the solution of a 2D problem on each plane
(Behie and Forsyth, 1983). One solution is to use an iterative, rather
than an exact method to solve the 2D problem. Several methods have been
used, including ILU factorization and 2D multigrid (Dendy, 1987).
Extending the algorithm to solve 3D problems poses other
difficulties as well. Behie and Forsyth (1983) found that the
straightforward application worked well for most classes of problems (of
the type encountered in reservoir simulation) but failed on anisotropic
problems with small compressibility. Dendy (1987) devised an improved
interpolation and differential operator prescription which does not fail
on this class of problems. Another approach which has been discussed for
3D problems is to refine only in the x and y directions. This seems a
particularly reasonable approach for reservoir simulation since the z
direction typically contains fewer grid nodes.
Multigrid algorithms have been developed for non-symmetric problems
(Dendy, 1983) and for problems involving systems of equations , ie. more
than one equation being solved at each grid node (Dendy). Both of these
are typical of reservoir simulation problems. However these algorithms
have not yet been actively used in reservoir simulation applications.
Multigrid algorithms have been used for reservoir simulation mainly in
the context of solving a pressure equation which is usually symmetric or
nearly symmetric.
Mul t igrid algori thms have potent ial for application to reservoir
simulation but are st ill an area of current research and are not yet
state-of-the art methods. A drawback to the use of multigrid algorithms
is the large setup cost to calculate the coarse grid operators. Even for
a 2D problem, this work is on the order of 36N operations. The storage
283
involved in saving the coefficients is also substantial. The algorithm
is practical only for large problems. The performance of the multigrid
algor! thm on some standard test problems is discussed in Behie and
Forsyth, 1983.
Another approach to developing a multigrid algorithm is the
Algebraic Multigrid Method (AMG) (Brandt ,1986). The AMG method is based
solely on algebraic information contained in the matrix A, ie. strong
and weak connections. AMG requires no knowledge of an underlying grid
structure. AMG constructs a sequence of "grids", "coarse grid operators"
and "intergrid transfer operators" which are then combined in the usual
multigrid way. It is fully automatic. AMG can be applied to many
problems where standard multigrid methods are not applicable. It has
been used successfully on the standard "difficult" problems, ie. those
with anisotropic and/or discontinuous coefficients. It does not require
special smoothing. The usual requirement is that A be symmetric and of
positive type, Ie. all > 0,
N
L a ~ 0 J= 1 IJ
a :5 0 (J;t:i) and IJ
(1 = 1,2, ... ,N) (3.14.7)
These conditions can be relaxed somewhat. AMG works on non-symmetric
matrices. Also, the matrix need only be of essentially positive type,
which means that the condition (3.14.7) can be violated on some grid
GM- k coarser than GM• AMG for systems is an area of current research.
The drawbacks to the AMG method include the complexity of the algorithm.
It is generally more computationally expensive than the standard
multigrid algorithm.
SECTION 4: ASSOCIATED TOPICS
4.1 Treatment of Source Terms
Most reservoir simulators include the modeling of multi block wells
(and/or fractures). For the. best convergence rates in the Newton
iteration) this should be done implicitly, ie. the source terms in
equation (2.1.6) should be written at the latest iteration level. Each
conservation equation in
cell i contains a well): ;\1
cell i will have a source term of the form (if
where qk k
q = mass influx of component k k
(4.1.1)
284
~k mobility of phase k
PI pressure in cell i
pJ = wellbore pressure in well J If
To specify the wellbore pressure, an additional equation is required.
This takes the form of a constraint on the total flow:
qj T
where qJ T
n c
I/>j
n c
E E ~I ( pJ - P k If I
k=l IEI/> J
total specified fluid flow into well J
total number of fluid components
the set of cell numbers penetrated by well J
(4.1.2)
There is now an extra degree of coupling between cells because of these
terms. The resulting matrix contains terms outside the regular band
structure. Figure (4.1.1) shows the grid structure and resulting
incidence matrix for the implicit solution of the reservoir and well
unknowns for a 3 x 3 grid with a multiblock well completed in three of
the grid blocks
t '7~ 8 9
x x x x x x x x
t--! , C 5 B x x x
~~ 2 3
x x x x x x x x x x
x x x x x x x x
x x x x
........................................... ~ ............ ~ ... ~ .... ~ ...... . x x x i x
Figure (4.1.1)
A standard treatment is to order the unknowns connected to the reservoir
flow first and the well unknowns second. In Figure (4.1.1) the reservoir
unknowns are ordered first,. resulting in what is known as a bordered
matrix.
285
derivatives of flow equations with respect to flow unknowns
! ! derivatives of ! (t.1.1) with ! respect to
I p!
······d;;;i;~li;;;;···;;f····i·t·:·"i"·:·2·i·T···d;;;i;~l"i·-;;es of
with respect to flow : (t.l.2) with unknowns ! respect to p!
Figure (4.1.2)
Figure (4.1.2) illustrates the partitioning in the bordered matrix. The
number of extra rows and columns is nw' the number of fully coupled
wells. The matrix problem is now written as
[ :, :: 1 [=: 1 · [:: 1 (4.1.3)
The total number of unknowns is N = n x nc + nw . Typically
n »nw' so that one approach to solving (4.1.3) is to perform a block
elimination on it. This is an exact factorization of the block system.
(4.1.4)
Since Land U are factors of A, the factorization of the flow portion of
the Jacobian matrix can be done with the usual solution algorithm (eg.
direct elimination, ILU factorization etc.). The solution of the
algorithm for the whole system involves two additional computationally
intensive portions. These are;
(1)
(2)
the computation of (LU)-IW which involves a forward and 2
backward solve for each.well;
the computation
represented by
factorization.
of the n x n w w
W -W (LU)-I" 3 1 2
(generally full) matrix
and its subsequent
If the LU factorization of A comes from a direct elimination, the
algorithm is the same as Gaussian elimination on the whole system. If
the factorization of A comes from an ILU method, the total algorithm is
also an iterative one, with the reservoir-well coupling handled exactly.
When the number of wells is large the above algori thm can be
286
prohibitively costly in terms of computation time. An alternative
algorithm can be used to factor the system in the following way:
DW 2
W -W DW 312
(4.1.5)
This is not an exact factorizat ion of the reservoir- well coupling. It
is known as a sparsely coupled factorization. The reservoir-well
coupling is treated in a DKR fashion. This algorithm works well for many
problems but does encounter difficulty if there are many constant rate,
constant injectivity wells. The number of iterations required for
convergence increases and the well constraints are not well satisfied
(an undesirable trait from the user's point of view).
To gain some insight into why the sparsely coupled algorithm
performs poorly for constant rate injection wells, consider the the
constraint equation for these wells:
q L AI ( P - P ) (4.1.6) s w I
I where q the specified rate,
s AI the constant injectivity into block i,
PI the pressure in block i,
and Pw the bottom hole pressure in the well
In Figure (4.1.1) the rows of the matrix corresponding to derivatives of
(4.1.6) will have zero row sums. Also the rows corresponding to
derivatives of the flow equations for grid nodes containing constant
rate injection wells will have zero row sums. This latter result is a
consequence of mass conservative differencing. The sparsely coupled
algorithm does not preserve the zero row sums. The incomplete
factorization in (4.1.5) has a nonzero row sum. This means that there
is a material balance error which results in rates not being preserved
during the iterations.
Instead of the block LU factorization of the system described in
(4.1.4), consider a block UL factorization:
287
(4.1.7)
Note that this approach is completely equivalent to making a LU
factorization of the sytem with the well unknowns ordered first and the
reservoir unknowns ordered second ( as described by MeiJerink and van
der Vorst, 1981). The factorization described in (4.1.7) is still exact
but a decision must be made on how to handle the term A _ ""-1,, ) 2 3 1
-1 since "2"3 "1 has terms outside the band structure of A. One approach
is to use the following rule. The term ( A - ""-1,, ) is replaced by 2 3 1
A - ""-1,, }, where { } is defined as: 2 3 1
-1 (1) if an element of "2"3 "1 falls on an existing band, It is
subtracted from the appropriate element of A, but
(2) if it is outside the band structure of A, it is subtracted
from the corresponding diagonal element.
Note that with this rule single layer wells and double layer, nearest
neighbour wells do not introduce any new connections and can therefore
be eliminated exactly. This rule also has the desirable feature of
preserving row sums at each iteration. The matrix represented by { } can
be factored by one of the usual ILU methods to deri ve an i terat i ve
method for the whole system. The reservoir-well coupling is not handled
exectly (as it was in (4.1.4) but some of the important features of this
coupling are preserved. This algorithm has been found to be most
effective when the number of fully coupled wells is more than ten.
4.2 Programming Considerations For Vector Machines
Reservoir simulation models are often used to simulate large fields
for tens of years of operation. For this reason, large vector computers
such as the CRAY I and CRAY II series, or the CYBER 205 are used to run
such simulations. To make optimal use of these machines, care must be
taken in programming the computer model. For black oil models of the
type described above, most of the computational time is consumed by the
linear solution part of the model so that this is really the only part
of the code that needs to be especially designed for the vector machine.
This process is referred to as "vectorizing" the code. For other models,
288
such as thermal or compositional models, coefficient generation
equation-of-state calculations, or table look-ups can require large
amounts of computational effort, so that the vectorization process must
extend to other parts of the code as well.
For large simulations the solution algorithm of choice would be an
accelerated ILU method. The method can be divided up into the following
steps:
(1) calculate the incomplete factors L and U,
(2) calculate the residual by forward and backward solution
(3) correct this residual by an acceleration technique such
as ORTHOMI N .
Step (1) is done only once per nonlinear iteration (Newton cycle) and is
an innately recursive process and as such cannot be vectorized. For
reduced system orderings the reduction portion of step (1) is
vectorizable, however. Steps (2) and (3) are carried out between 5 to 15
times per nonlinear iteration. It is most beneficial, therefore, to
concentrate on vectorizing these portions of the algorithm.
It is fairly straightforward to vectorize the ORTHOMIN portion of
the algorithm (step (3». It consists of a matrix-vector multiply and
several inner products. The inner products are trivially vectorizable.
The matrix-vector multiply can be vectorized as well, if the bands of
the matrix are straight. This requirement results from a vector machine
limitation which requires that two vectors being multiplied have a
constant "stride" (ie. elements of the vector must be separated by a
constant increment in memory). The bands of the matrix are already
straight if the ordering is natural. But in reservoir simulation
diagonal orderings are often used. Therefore, the solution vector must
be reordered before and after the matrix-vector multiply. For red-black
ordering, the bands are straight only if n is odd (in 2D) or n and n x x y
are odd (in 3D). To ensure that this is true an extra row of null
blocks is added to the grid if necessary.
In this sect ion, some simple modificat ions to standard computer
algorithms, which improve their performance on vector machines, will be
discussed. Much more sophisticated modifications can be made, but these
generally result in restrictions on the code's portability.
Modifications for parallel architectures will not be discussed.
Given that the bands of the matrix are straight, the matrix-vector
multiply is performed by multiplying not in the usual way (ie. 1st row
289
of the matrix times the multiplying vector, then the 2nd row of the
matrix times the multiplying vector and so on) but by multiplying along
the bands of the matrix (Karush et aI, 1975). This can be illustrated
diagramatically:
Figure 4.2.1
The diagonals of the matrix are extended until each is full (in this
case has 6 elements). Each diagonal in turn is multiplied by the vector
of x's. If the "a" diagonal, for instance, is multiplied by the vector
of x's the only non-zero results wil be in the 4th, 5th, and 6th places,
Ie. atxt , asxs' and aaxa. The whole vector of results is calculated at
once for the same cost as one scalar multiply, so the the cost of
multiplying zero entries is not a consideration. The vector of results
is then transferred to the appropriate sum ( one for each row of the
matrix). For example atxt goes to the sum for the first row, and so on.
Note that if each entry in Figure 4.2.1 represents a 3x3 submatrix three
multiplications will be done for each diagonal.
The forward and backward substitution portion of the algorithm is
not easy to vectorize since it is also innately recursive (ie. the
resul t at any point depends on the resul t of the previous step , or
steps). Consider the forward solve procedure:
x
x x
x x x x ....................•.
:x xi x . , ...... ;-; ... 1'.,. x
: : ! ... ~.~.~ ...... ;;;;~.: .. i : 000 000 :
i ... ~~.~ ....... ~.~.~ .... !
x
o o o o o o
Figure 4.2.2
290
r·~ . !x i ~' ... '
x
~li;-;i ..... x x
The elements of the matrix and the vector must be gathered into continuous vectors
The procedure can be vectorized by first gathering the appropriate
elements of the factor L into a continuous vector( in the example above,
this would be the first two elements of the fifth row), remembering that
each element is usually a 3x3 submatrix. The appropriate elements of the
solution vector are also gathered into a vector ( in the above example,
the second and fourth elements which are at this point known values),
remembering that these too each have three components. The first, second
and third rows of the gathered matrix elements are each multiplied by
the solution vector. The result vectors are then summed in the
appropriate fashion and subtracted off the fifth element of the
right-hand-side vector. The vector lengths in this case are of the
order of the number of bands in the ILU (between 3 and 17 depending on
the factorization used) times the number of equations per grid node
(which is 3 for black oil). The reduction in time is on the order of 30
to 40% on the CRAY 1.
For the simplest ILU (ie. DKR ordering where only the diagonal band
is altered by the factorization) the forward and backward solve can be
part ially vectorized by using diagonal ordering (Towler and Killough,
1982) .
: xi
....... l. xi x x! x
, 7 9
2 5 18
.... · .. j .... :;; ............ T x ! : i x x i x i x i x ................... : ......................... ..
1 3 6 i x x x
! ............. ~ ...... ~ .. . x
x xi x ................ : ......... . Figure 4.2.4
291
The diagonal ordering and the use of the five point discretization
molecule lead to the situation illustrated In Figure 4.2.3 where the
unknowns along any given diagonal are Independent of each other ( le.
non-recursive). For example the unknowns at the fourth, fifth and sixth
grid nodes can all be computed simultaneously (le. in a vector
operation), in the example above. This is because they depend only on
the resul ts of the previous diagonal , the unknowns at the second and
third grid nodes . The algorithm can therefore be vectorized to the
extent of the longest diagonal. A time reduction of 70 to 80% has been
reported in vector mode on the CRAY for an IMPES model (Towler and
Killough, 1982). The DKR algori thm is not always opt imal in terms of
convergence rate (see examples in Section 4.2.3) so that this
application has limited use.
Another approach to vectorizing ILU methods is to expand the
factors L-1 (and U-1 ) in a series expansion (van der Vorst, 1982). This
is done as follows:
L-1 = (D'(I + L,»-1 '" (I-L'+ ... HD,)-l (4.2.1)
The forward solve then becomes a series of matrix-vector multiplies
which can be vectorized. The approximation in (4.2.1) is not exact and
will affect the convergence properties of the algorithm.
4.3 Comparison of Methods
To compare the performance of iterative methods used in reservoir
simulation, some standard model problems have been developed (Stone,
1968; Kershaw, 1978; Elman, 1981; Watts, 1981; Appleyard and Cheshire,
1983; Sherman, 1985). These incorporate some of the the propert ies of
reservoir systems that make the resulting linear equation sets "hard" to
solve. The Stone's model problems solve the following differential
equation:
8 (KX 8p ) + ~ (KY aaE ) = _q ax ax ay "
(4.3.1)
which is discretized on the unit square with uniform mesh size h:
KX 1+1/2,J (Pl+1,J - Pl,J)
h2
(4.3.2)
292
where KX and KY are given by the harmonic mean as
KX 1+1I2,J
2KX KX = I,] 1+1.J KX +KX
1 , J 1 +1 ,J A second
(4.3.1),
model problem involves the three dimensional analogue of
~ (KX 8p) 8 (KY 8p ) + ~ (KZ 8p ) 8x 8x + 8y 8y 8z 8z -q (4.3.3)
This problem is discretized as in (4.3.2).
Both of these problems yield sets of linear equations which are
symmetric and are therefore only useful (as regards applications to
reservoir simulation problems) in testing various types of incomplete
factorizations. To evaluate the performance of the full nonsymmetric
algorithm, the convection-diffusion equation can be used;
with
82p 82p 8 8 0 + fJ1 8~ + fJ2 8~
8x2 8,,2 "
p(x,O)=O, p(O,y)=I, p(x,I)=I, and p (l,y)=O. x
This is discretized on the unit square as
4p - (1 - fJ \l I,J ""2
(1 + fJ 2 h ) P = 0 ""2 1.1-1
- (1 +fJ \l ) P-l, J ""2
- (1 _fJ ~ ""2
(4.3.4)
P,J+l
(4.3.5)
where central differencing is used on the convection terms. The
derivative boundary condition is discretized as
(3 + fJ1h ) p - (1 + fJ 1h ) PI-l,J -""2 1.1 ""2
(1 _ fJ2h )
""2 (1 + fJ2 h ) p = 0
""2 I,J-l (4.3.6)
Note that for fJ1h/2 or fJ2h/2 greater than I, the discretization is no
longer diagonally dominant. This test problem therefore provides a
stringent test for determining which algorithms might be most useful for
reservoir simulation problems.
Finally, several simulator-generated linear systems are used to
test the efficiency of the algorithms in the full three-dimensional,
multiphase environment of the reservoir simulator.
Comparisons involving these model problems wi th different
geometrical configurations are presented below. First, the model problem
in (4.3.2) is solved with the geometry shown in Figure 4.3.1
293
A
Figure 4.3.1
Test Problem 1: KX=1, KY=1, (x,y) e AvBuCvD
This is a symmetric, homogeneous, isotropic problem and should be "easy"
for almost any incomplete factorization.
Test Problem ~: KX=1.0, KY=O.01, (x,y) e AvBuCvD
This is a symmetric, homogeneous but anisotropic problem and therefore
provides a more difficult test for the iterative method.
Test Problem ;!: KX=O.1, KY=1.0, (x,y) e AvBvCvD
This is similar to test problem 2 but with the anisotropy in the
opposite direction.
Test Problem i: KX=KY=1, (x,y) A
KX=1, KY=100, (x,y) e 8
KX=100, KY=1, (x,y) e C
KX=O, KY=O, (x,y) e D
This is a symmetric problem but has anisotropic and discontinuous
coefficients. It provides a severe test of the incomplete factorization.
Traditional methods such as LSOR which will perform well on,problems 1-3
will fail here (Aziz and Settari, 1979).
The model problem in (4.3.2) is also solved with the geometry shown
in Figure 4.3.2. This is called the "staircase" problem and is adapted
from one by Alcouffe et al (1979).
294
·q2 qs q =1. 0 A B ql=O.5
q2=O.6 q3=_1. 83
·q4 q4=-O.27
s
. q3
Figure 4.3.2
Test Problem §.: KX=KY=1000 (x,y) £: B
KX=KY=l elsewhere
This is a symmetric, homogeneous problem with an anisotropy running in a
staircase fashion through the grid. It is a problem originally derived
from reactor physics.
The results for these first five test problems are shown in
Figure 4.3.3 to Figure 4.3.6. The theoretical work is defined in terms
of a work unit, WU, where
wu = number of operations (multiplications and divisions)/N
These work units of course reflect the algori thm' s performance on a
scalar machine. The work counts include the set-up work (factorization
work and calculation of the initial solution and residual, if
applicable), work for forward and backward solve, acceleration work, and
in the case of the 04 ordered algori thms, the reduction work and the
cost of the recovery of the eliminated points. Theoretical work counts
are used instead of CPU time to avoid the issue of coding efficiency,
but in general there Is good correlation between work counts and CPU
times. The initial solution for the pointwise ILU methods is the zero
solution. For nested factorization the initial solution is given by
(3.13.6) .
The algorithms tested are several pointwise ILU methods, including
OKR, a third degree naturally ordered ILU (Figure 3.7.3), a fifth degree
02 ordered ILU, and a third degree 04 ordered ILU. Nested factorization
results are included as well. Since the discretized equation (4.3.2) is
symmetric, conjugate gradient acceleration is used. These test problems
are essentially a test of the various factorization methods.
100 200 300 THEORETICAL WORK
Figure 4.3.3: Test problem 1
i,\ .....
, \-~Vv<\,\ \ \\ ~ 1 "
\ \ \ 1 ~,
\ \ \ \ \. \ \ 'I 1 \.
,\ 1 '.
\ \\ 1 I 1 I \ \
500
OKR NAT 3 02 5 04 3 NESTEO
10-8 +-..,.--,--..,.--,--..,.--.,--..,.---, 200 400 600 800
THEORET I CAL WORK
Figure 4.3.5: Test problem 3
295
10-8_1~'T'~'~'~lrr'T'T'~"lrr'T'T'~"lrr'T'T'~"lr·rr'T·~·'lr·rT'T'~',I 100 200 300 400 500 500
THEORETICAL WORK
Figure 4.3.4: Test problem 2
10-2
10-4
10-8
10-8
0 200 400 500 THEORET leAL WORK
Figure 4.3.6: Test problem 4
OKR NAT 3 025 04 3 NESTED
800
18-1
18-2
18-3
18-4
18-5
18-8
18-7
•
296
.\\ ;\\><~'"''V'''\''''' \ \ ...... . ., \ .....•
\ \ \ ..... . \ I \., ...
............... OKR _.-.-.-.-. NAT 3 •••••••••••• 025
------ 04 3 --- NESTED
,.\ \ \ \ \ .... \ \ .....
\ \ ...
\ '. ... , :
\ \ \.. \ \ \.
28. 418 61. B88 THEORETICAL WORK
Figure 4.3.7: Test problem 5
The fastest rate of convergence in all cases is produced by the
third degree D4 algorithm (labelled D4 3 in the figures). For problem 2,
nested factorization has a slightly lower total work count due to its
lower set-up cost. Note that for problem 3 the nested factorization is
very slow to converge. This is due to the method's sensitivity to grid
node ordering. Problem 3 is essentially the same as problem 2 with the
grid nodes ordered in the y-direction first. The third degree D4
algorithm, in contrast, shows little sensitivity to direction.
The next set of problems solve (4.3.3) on a 21x21x21 grid with
homogeneous Neumann boundary conditions.
Test Problem 12.: KX=KY=KZ=1 .
Test Problem 7.: KX=KY=1, KZ=100.
Test Problem !l: KX=KY=1.0, KZ=0.01.
The results of test problems'S, 7 and 8 are shown in Figures 4.3.8,
4.3.9 and 4.3.10. The works units and initial solution are as described
above. Again the discretized equation is symmetric so that the problems
test the different incomplete factorization methods. The results
plotted include the DKR algorithm, a third degree naturally ordered ILU,
first and second degree D4 ordered ILUs and the nested factorization.
II-I ••••••••••••••• DlCR •••••• ,.-. NAT 3 •••••••••••• 04 I
11-2 .----- 04 2 --- NESTED
11-3
11-4
ll-S
II-II
1.-7 , 288 41' III. 811 II. THEORETICAL WORK
297
11'1
11-2
1,-3
11-4
II-II
,
............... DKR •••••••••• NAT 3 •••••••••••• 04 I ••••• - 04 2 --- NESTED
411 ,.. 1288 1& •• THEORETICAL WORK
Figure 4.3.8: Test problem 6 Figure 4.3.9: Test problem 7
Note that for the isotropic problem (problem 6) all the
ILU methods perform better than the nested factorization.
pointwise
For the
anisotropic problems (7 and 8) nested factorization is about the same as
first degree D4 on problem 7 and slightly better on problem 8. In both
cases the optimal direction for nested factorization has been chosen .
1,'4
II-II
,
............... DKR -.-••• _... NAT 3 ............ 04 I .--.-- 04 2 --- NESTED
4.. ,.. 12.. 16.' 2 ••• THEORETICAL WORK
Figure 4.3.10: Test problem 8
The discretized equation (4.3.5) leads to a non-symmetric matrix
problem and is therefore a useful test of algorithm performance.
Test Problem Ii: Uses (4.3.5) with ~ =~ =100 and is discretized 1 2
298
on a 31x31 grid ( Note that ~1h12 is greater than 1).
The results for test problem 9 are shown in Figure 4.3.11. ORTHOMIN
acceleration with 8 orthogonalizations is used for all cases.
t ••• : ......... .
~ ....... \~ " ..... ,. ,\ .......
\ ~ ~' .... .. \ ....... \
-----•• DKR --.-.-- NAT;'
------- D4 2 -- NESTED
\ \ .. \ ........... , "
\ ... . \ ........ ...
\ '\ \ , \ , ... , ... \ '.
100 200 300 400 ~oo T~T1C'" \oOlI(
Figure 4.3.11: Test problem 9
The results for test problem 9 show the best rate of convergence to
be for the second degree D4 algorithm. The nested factorization behaves
very much like the third degree natural factorization. There is very
little variation with direction of grid ordering since there is no
intrinsic preferred direction in the physics of the problem. Note that
the all the ORTHOMIN accelerated approximate factorizations converge on
this problem which is not diagonally dominant.
The last series of problems are simulator-generated problems and
will be described briefly with some of the special features which make
them "difficult" to solve.
Test Problem 10:
oil simulator
This problem was produced by a fully implicit black
The size of the grid is 10x10x3 and there are
permeability contrasts in the layers. The time step at the time the
matrix was generated was 100 days, making this a fairly difficult
problem.
Test Problem 11: This problem was produced by an IMPES black 011
simulator with 12,960 unknowns. There were transmissibility variations
of from 0 to 20.
Test Problem 12: This problem was produced by a steam simulator
299
solving for 3 unknowns per grid node. The dimensions of the problem are
11x11xS. There is a low permeabill ty layer wi th KX=KZ=O, KZ=4x10-'
darcies separating the reservoir vertically into two halves with
injection wells in one half and a production well in the other half.
The results for problems 10, 11 and 12 are given in Tables 4.3.1,
4.3.2 and 4.3.3. The ORTHOMIN acceleration with 10 orthogonalizations is
used for problems 10 and 12 which are strongly nonsymmetric. Problem 11
involves only a pressure equation from an IMPES black oil simulator and
has a better rate of convergence when conjugate gradient acceleration is
used (the pressure equation is nearly symmetric). The tables show
theoretical work units, number of iterations and CPU times for a
normalized residual reduction of 10-e, where the normalized residual is
defined as
II r II 00
II roll 00
and r is the residual at any iteration, and r the initial residual. o
Method Theoretical Work Number of CPU sec. on Iterations Honeywe 11 OPSS
Nested 3 432 10 lS.S
OKR 11 034+ 60+ 6S.6
3rd degree 9 063 21 37.3 natural
3rd degree 5 466 10 23.3 02
1st degree 04 2 547 6 12.7
2nd degree 04 6 641 5 21. 3
Table 4.3.1: Test problem 10 (fully implicit black oil)
The results for test problem 10 show that the best rate of
convergence is given by the 1st degree 04 algori thm The grid node
ordering used is the one that is optimal for the nested factorization.
Note that although the second degree 04 factorization required fewer
iterations, it did not payoff in terms of computational cost. Note also
that the OKR algorithm failed to converge in 60 iterations. Even though
this algorithm will run faster on a vector machine its poor convergence
properties would make it of little use.
300
Method Theoretical Work Number of CPU sec. on Iterations Honeywe 11 DPSS
Nested 641 20 271
OKR 1 095 51 332
3rd degree 1 319 35 325 natural
3rd degree 934 24 249 D2
1st degree 04 469 22 151
2nd degree 04 105 16 115
Table 4.3.2: Test problem 11 (IMPES black oil problem)
The results for problem 11 again show the best overall algorithm to
be first degree 04. The optimal grid ordering for nested factorization
is used. This is a considerably larger problem than problem 10 and this
is reflected in the increased number of iterations required for
convergence. The problem also has a fair degree of anisotropy and the
OKR and naturally ordered third degree ILU do not perform well.
Method Theoretical Work Number of CPU sec. on Iterations Honeywe 11 OPSS
Nested 3 432 10 61
OKR 11 034+ 60+ 184+
3rd degree 10 008 24 133 natural
3rd degree 9 063 21 123 02
1st degree 04 5 231 22 16
2nd degree 04 1 812 10 93
Table 4.3.3: Test problem 12 (fully implicit steam problem)
Test problem 12 has a strong directional bias and nested
factorization (with optimal ordering) performs very well on this
problem. The second degree D4 algorithm is second best. Note that the
OKR algorithm again fails to converge on this difficult steam
problem. If the ordering used is changed • the number of iterations
required for convergence of the nested factorization can vary from 10 to
28. The variation in the second degree D4 is from 10 to 11.
The performance of various algorithms has been tested on the model
problems and on simulator generated problems. Guidelines for the use of
incomplete factorization iterative methods in reservoir simUlation can
301
be outlined as follows:
(1) Use of diagonal ordering is important when dealing with
problems with inherent anisotropies and/or discontinuities.
Both 02 ordered and 04 ordered pointwise ILU's perform better.
The 04 ordered ILU's are generally the best due to the reduced
system property of this ordering (where only half the unknowns
need be solved for).
(2) Block incomplete factorization methods (such as nested
factorization also have potential. They are always better than
their naturally ordered pointwise counterparts. They do not
beat the performance of the 04 ordered pointwise methods,
except in cases with very strong anisotropies and when optimal
ordering is chosen.
302
BIBLIOGRAPHY
Alcouffe, R. E. ,
Multi-grid
Brandt,A. ,
Method for
Dendy, J. E. Jr. , and Painter, J. W. , The
The
Discontinuous Coefficients,
430-454.
Diffusion Equation with
SIAM J.Sci.Stat.Comput. 2
Strongly
(1981)
Au,A.D.K., Behle,A., Rubin,B., and Vinsome,P.K.W., Techniques For Fully
Implicit Reservoir Simulation, Paper SPE 9302, presented at the
Fall Meeting of SPE (Dallas, 1980).
Axelsson, O. and Gustafsson, I. , On the Use of Preconditioned Conjugate
Gradient Methods for Red-Black Ordered Five Point Difference
Schemes, J.Comp.Phys. 35 (1980) 284-289.
Axelsson,O., Conjugate Gradient Type Methods For Unsymmetric and
Inconsistent Systems of Linear Equations, Lin. Alg.Appl. 29 (1980)
1-16.
Aziz,K. and Settari,A., Petroleum Reservoir Simulation, Applied Science,
London, 1979.
Behie,G.A., and Forsyth,P.A., Comparison of Fast Iterative Methods For
Symmetric Systems, IMA J. Num. Anal 3 (1983) 41-63.
Behie,A., and Forsyth,P.A., Practical Considerations For Incomplete
Factorization Methods in Reservoir Simulation, SPE 12263, presented
at the Seventh SPE SYmposium on Reservoir Simulation, San
Franclsco,1983.
Behie,G.A., and Forsyth,P.A., Multi-Grid Solution of Three-Dimensional
Problems With Discontinuous Coefficients, Appl.Math.Comp. 13 (1983)
229-240.
Behie,G.A., and Forsyth,P.A., Incomplete Factorization Methods for Fully
Implicit Simulation of Enhanced Oil Recovery, SIAM
J.Sci.Stat.Comput. 5 (1984) 543-561.
Behie,A., Comparison of Nested Factorization, Constrained Pressure
Residual and Incomplete Factorization Preconditionings, SPE 13531,
presented at the Eighth SPE SYmposium on Reservoir Simulation,
Dallas, 1985.
Behie,A., Collins,D., Forsyth,P., and Sammon,P., Fully Coupled
Multi-Block Wells In Reservoir Simulation, SPEJ (August, 1985).
Bell,J., Trangenstein,J.A. ,and Shubin,G., Conservation Laws of Mixed
Type Describing Three-Phase Flow In Porous Media, submitted to SIAM
J.Appl.Math.
303
Brandt, A., Mult i-Level Adapt Ive Solut ions to Boundary-Value Problems,
Math.Comp. 31 (1977) 333-390.
Brandt,A., and Dinar,N., Multi-Grid Solutions to Elliptical Flow
Problems, lCASE Report No. 79-15 (1979).
Brandt,A., Algebraic Multigrid Theory: The Symmetric Case,
Appl.Math.Comp. 19 (1986).
Brandt,A., Multi-Level Approaches to Large Scale Problems, Survey
Lecture at rCM-86 (Berkeley, August 1986).
Concus,P., Golub,G.H., and O'Leary,D.P., A Generalized Conjugate
Gradient Method for the Numerical Solution of Elliptic Partial
Differential Equations, Lawrence Berkeley Laboratory Pub. LBL-4604,
Berkeley,CA (1975).
Dendy,J.E.Jr., Black Box Multigrid For Nonsymmetric Problems,
Appl.Math.Comp. 13 (1983) 261-283.
Dendy,J.E.Jr., Black Box Multigrid for Systems, Appl.Math.Comp., to
appear.
Dendy,J.E.Jr., Two Multigrid Methods for Three-Dimensional Problems with
Discontinuous and Anisotropic Coefficients, SIAM J.Sci.Stat.Comput.
8 (1987) 673-685.
Dupont,T., Kendall,R.P. and Ratchford,H.H., An Approximate Factorization
Procedure For Solving Self-Adjoint Elliptic Difference Equations,
SIAM J. Numer. Anal. 5 (1968) 559-573.
Eisenstat,S.C., Elman,H.C., Schultz,M.H., and Sherman,A. The (New) Yale
Sparse Matrix Package, Yale Univ. Rep. 265, Yale Univ., New Haven,
er, 1983.
Elman,H.C., Iterative Methods for Large, Sparse Nonsymmetric Systems of
Linear Equations, Ph. D. Thesis, Yale Univ. Rep. 229, Yale Univ.,
New Haven, CT, 1981.
Fayers,F.J., and Matthews,J.D., Evaluation of Normalized Stone's Methods
for Estimating Three-Phase Relative Permeabilities, SPEJ 24 (1984)
225-232.
Feingold,D.G. and Varga,R.S., Block Diagonally Dominant Matrices and
Generalizations of the Gerschgorin Circle Theorem, Pacific J. Math
12 (1962) 1241-1250.
Forsyth,P.A., and Sammon,P.H., Pract'ical Considerations forAdaptlve
Implicit Methods in Reservoir Sil1Jjlation, J.Comp.Phys. 62 (No.2)
(1986).
Forsyth,P.A., and Sammon,P.H., Quadratic Convergence for Cell Centered
304
Grids, App. Num. Math., to appear.
Fulks,W.B., Guenther,R.B., Roetman,E.L., Acta Mech. 12 (1971) 121.
George, A., Nested Dissection of A Regular Finite Element Mesh, SIAM J.
Num.Anal 10 (No.2) (1973) 345-363.
Gustafsson, I. , On First Order Factorization Methods for the Solution of
Problems With Discontinuous Material Coefficients, Technical
Report, Computer Sciences 77.13 R, Chalmers University of
Technology, Goteborg, Sweden (1977).
Gustafsson, I., A Class of First Order Factorization Methods, BIT 18
(1978) 142-156.
Hemker,P.W., On the Comparison of Line-Gauss-Seidel and ILU Relaxation
in Mult igr id Algorithms, Preprint NW 129/82, Dept. of Numerical
Mathematics, Mathematical Centre, Amsterdam, 1982.
Kershaw,D.S., The Incomplete Cholesky Conjugate Gradient Method For The
Iterative Solution of Systems of Linear Equations, J.Comp.Phys. 26
(1978) 43-65.
Kettler,R., Analysis and Comparison of Relaxation Schemes in Robust
Multi-Grid and Preconditioned Conjugate Gradient Methods, Lecture
Notes in Mathematics, Springer-Verlag, Berlin 1982.
Manteuffel,T.A., The Tchebychev Iteration for Nonsymmetric Linear
Systems, Numer.Math. 28 (1977) 307-327.
Manteuffel,T.A. and White,A.B., The Numerical Solution of Second-Order
Boundary Value Problems On Nonuniform Meshes, Los Alamos National
Laboratory preprint LA-UR-84-196, submitted to Mathematics of
Computation.
McDonald, A. E., and Trimble, R. H., Efficient Use of Mass Storage During
Elimination for Sparse Sets of Simultaneous Equations, SPEJ
(August,1977) 300-316.
MeiJerink,J.A., and van der Vorst,H.A., An Iterative Solution Method For
Linear Systems In Which The Coefficient Matrix Is A Symmetric M
Matrix, Math.Comp. 31 (1977) 148-162.
MeiJerink, J. A., and van der Vorst, H. A., Guidelines for the Usage of
Incomplete Decompositions in Solving Sets of Linear Equations as
Occur in Practical Problems, J.Comp.Phys. 44 (1981) 134-155.
Odeh, A., A Comparison of Solut ions to A Three Dimensional Black 011
Reservoir Simulation Problem, J. Pet. Tech. 33 (1981) 13-25.
Peaceman,D.W., Fundamentals of Numerical Reservoir Simulation, Elsevier,
Amsterdam (1977)
305
Price, H.S., and Coats, K., Direct Methods In Reservoir Simulation,
Trans. SPE of AlME 257 (1974) 295-308.
Saad,Y., and Schultz,M.H., GnRES: A Generalized Minimal Residual
Algorithm For Solving Nonsymmetric Linear Systems, SIAM
J.Sci.Stat.Comput. 7 (1986) 856-870.
Saylor,P.E., Second Order Strongly Implicit Symmetric Factorization
Methods For The Solution of Elliptic Difference Equations,
SIAM J.Numer.Anal. 11 (1974) 894-908.
Scheidegger,A.E., The Physics of Flow Through Porous Media, Univ.
Toronto Press, Toronto, 1950.
Shah, P.C., A Nine-Point Finite Difference Operator for Reduction of the
Grid Orientation Effect, SPE 12251, presented at the Seventh SPE
Symposium On Reservoir Simulation, San Francisco, 1983.
Spivak,A., and Dixon, T.N., Simulation of Gas Condensate Reservoirs,
Third SPE Symposium on Numerical Simulation of Reservoir
Performance (Houston, 1973).
Stone,H.L., Probability Model For Estimating Three-Phase Relative
Permeability, JPT (1970) 214-218.
Stone,H.L., Iterative Solution
Multi-Dimensional Partial
J.Num.Analysis 5 (1968) 530-558.
of Implic1 t
Differential
Approximations of
Equations, SIAM
Trangenstein, J.A., and Bell,J.B., The Mathemtical Structure of
Black-Oil Reservoir Simulation, submitted to SIAM J.Appl.Math.
Thomas, G. W., and Thurnau, D. H., Reservoir Simulation Using An Adaptive
Implicit Method, Soc. Pet. Eng.J. 23 (1983)
Towler, B. F., and Killough, J. E. ,
Conjugate Gradient Method
Comparison of Preconditioners For The
in Reservoir Simulation, SPE 10490,
presented at the Sixth SPE Symposium on Reservoir Simulation, New
Orleans, 1982.
Varah, J.M., On the Solution of Block Tridiagonal Systems Arising From
Certain Finite Difference Equations, Mathematics of Computation
26 (No. 120) (1972) 859-869.
Varga, R. S., Matrix Iterative Analysis, Prentice-Hall Inc., Englewood
Cliffs, NJ, 1962.
Vinsome,P.K.W., ORTHOMIN, An Iterative Method For Solving Sparse Sets of
Simultaneous Linear Equations, paper SPE 5729, Fourth SPE Symposium
On Numerical Simulation of Reservoir Performance, Los Angeles,
1976.
306
Watts,J.W. III, A Conjugate Gradient-Truncated Direct Method for the
Iterative Solution of the Reservoir Sil1lUlation Pressure Equation,
SPEJ 21 (1981) 345-353.
Weiser, A. and Wheeler, M. F., On Convergence of Block-Centered Finite
Differences for Elliptic Problems, Exxon Production Research
Company Report TR-SR-84-14.
Whitaker,S., Ind. Eng. Chem., 62 (10) (1970) 54.
Wilkinson,J.H., Error Analysis of Direct Methods of Matrix Inversion, J.
Assoc. Comput. Mach. 8 (1961) 281-330.
Woo, P.T., Eisenstat, S.C., Schultz,M.H., and Sherman,A.H., Application
of Sparse Matrix Techniques to Reservoir Sil1lUlation, Sparse Matrix
Computations, Academic Press, New York (1976).
Yanosik, J.L., and McCraken, T.A., A Nine-Point Finite Difference
Reservoir Sil1lUlator For Relative Prediction of Adverse Mobility
Ratio Displacemants, SPEJ (August,1979) 253-262.
Young,D.M. and Jea,K.C., Generalized Conjugate Gradient Acceleration of
Nonsymmetric Iterative Methods, Linear Algebra and Appl. 34 (1980)
159-194.