[Lecture Notes in Engineering] Multiphase Flow in Porous Media Volume 34 || Numerical Linear Algebra for Reservoir Simulation

NUMERICAL LINEAR ALGEBRA FOR RESERVOIR SIMULATION

by

AIda Behie

INDEX

I NTRODUCTI ON

1 Basic Black Oil Model ((3 Model)

1.1 Model Assumptions

1.2 Black Oil Equations

1.3 Darcy's Law

1.4 Flow Equations

1.5 PVT Assumptions

1.6 Boundary Condi tions

1.7 Relative Permeabilities

2 Difference Methods

2.1 Control Volume Discretization

2.2 Upstream Weighting

2.3 Fully Implicit and IMPES Formulations of Flow Equations

2.4 Dynamic Implicit Formulation of Flow Equations

3 Direct And Iterative Solution Methods

3.1 Structure of the Matrix

3.2 Conditioning of the System

3.3 Direct Solution Methods

3.4 Ordering for Direct Solution Methods

3.5 Iterative Solution Methods

3.6 Classification of Iterative Methods

3.7 Incomplete Factorization Methods

3.8 Treatment of Error Terms

M. B. Allen III et al., Multiphase Flow in Porous Media© Springer-Verlag New York Inc. 1988

248

3.9 Ordering for Incomplete Factorization Methods

3.10 Acceleration Techniques

3.11 Convergence Properties

3.12 Block Incomplete Factorization

3.13 Nested Factorization

3.14 Multigrid Methods

4 Associated Topics

4.1 Treatment of Source Terms

4.2 Vectorization of Algorithms

4.3 Comparison of Methods

BIBLIOGRAPHY

249

INTRODUCTION

The focus in Chapter 3 will be on the solution of large, sparse

sets of linear equations. This will be discussed in the context of the

black oil model, but this is not the only application. The solution

methods discussed here apply equally well to other models, such as

compositional and thermal models, that are commonly used in reservoir

simulation.

Historically, the development of reservoir simUlation models has

been along the I ines presented in this chapter. That is, first a

description of the physical processes involved, or physical model is

developed, whether it be black oil, compositional or thermal. Then a

mathematical model is developed. Usually this consists of writing down

the governing partial differential equations, based on mass or component

balance considerations, algebraic equations governing the transfer of

mass between phases and/or chemical reactions between compnents. The

third step is the development of a numerical model to solve the

governing equations. The most common approach is to discretize the mass

(or component) conservation equations to give a set of nonlinear

algebraic equations. Various linearization techniques have been used for

these equations. The most robust approach has been to simply use Newton

iteration. The use of Newton iteration in turn requires the solution of

sets of linear algebraic equations. The solution of these equations

becomes the most computationally intensive portion of the problem. In

some sense then, the "difficulty" of the problem has been transferred to

the solution of the linear equations. Indeed for some time, their

solution was one of the most problematical areas in the development of

computer models. For a number of years most simulation models had

several solution options available to the user, and if one algorithm

failed the user could try another. If all else failed, the user could

resort to the time consuming but reliable Gaussian elimination

algorithm. It is now known how to derive iterative solution algorithms

for these systems and state-of-the-art simulators all use solution

methods based on accelerated incomplete factorization methods.

250

SECTION 1: BASIC BLACK· OIL HODEL (~-HODEL)

1.1 Hodel Assumptions

The basic black oil model assumes multi-phase, isothermal flow of

three phases; 2 hydrocarbon phases (oil and gas) and water. The

hydrocarbon system is approximated by 2 components:

(1) a non-volatile (black) oil, and

(2) a volatile gas which is soluble in the oil phase.

There is also a water component.

Component Phase

oil···························~ oil ..... l'

gas .... · ...... · .. · ...... · .... ·~ gas

water .................... ·~ water

Fi gure 1. 1. 1

Figure 1. 1. 1 illustrates the relationships between components and

phases in the model. Water and oil are immiscible, they do not exchange

mass or change phase. The gas component is soluble in the oil phase but

not the water phase. Water is usually assumed to be the wetting phase,

with oil having intermediate wettability and gas being non-wetting.

This is the most basic version of the black 011 model. Other

thermodynamic effects can be included. These are described in some

detail in Chapter 2 of this volume.

1.2 Black Oil Equations

The model equations are derived by combining the mass conservation

equat ions for the 3 components and Darcy's Law. The mass conservat ion

equations are given below (see Aziz and Settari, 1979, or Peaceman,

1977, for derivation of these equations):

011: - a - -

- V·(pouo) = at ( ~Sopo) + qo 0.2.0

gas: - v. (p u + p u) = aat ( ~S p + ~S P ) + q + qd9 9 9 dg 0 9 9 0 dg 9

0.2.2)

- a - -water: - V. (p u ) = at ( ~S P ) + q w w w w w

0.2.3)

where qo is the production rate of oil , qg the production rate of free

gas, q the production rate of dissolved gas (from the oil phase) and dg

qw the production rate of water, all at reservoir conditions.

251

1.3 Darcy's Law

In addition to the equations of mass conservation, a relationship

between the flow rate and pressure gradient in each phase is required.

In hydrodynamic flow this is given by the momentum equation. For

laminar, single-phase flow through a porous medium it is given by an

empirical or phenomenological relationship which was discovered by Darcy

in 1856.

u = -k Il

( Vp - rVD ) 0.3.1)

where r = pg, D is depth, Il is the viscosity of the fluid and k is

permeability. The constant k depends only on the nature of the porous

medium and not on the fluid. It is determined experimentally. Darcy's

Law has the same form as the Poiseuille law for laminar flow in a

cylindrical tube. It can be derived from the Navier-Stokes equation (see

for example, Scheidegger, 1960; Whitaker, 1970; Fulks et al,1971). For

multiphase flow, Darcy's Law is extended as follows:

u =-t k krt

Il t (1.3.2)

where the subscript t refers to the oil, gas or water phase, and krt is

the relative permeability of phase t. The relative permeability is an

empirical function of one or more saturations It is also determined

experimentally and will be discussed in more detail below.

1.4 Flow Equations

The mass conservation equations and Darcy's Law can be

combined to get the ·flow equations:

011: (1.4.1)

with similar equations for gas and water. There are three other

algebraic constraint equations:

S + S + S = 1 (1.4.2) 0 9 w

Peow Po - Pw f( St) (1.4.3)

Peog Pg - P = 0

f( St) (1.4.4)

The capi llary pressure terms p and cow Peo9

are empirical. This set gives

six equations in six unknowns.

252

1.5 PVT Assumptions

The PVT behaviour in black oil models is expressed by

formation volume factors

B = o

(V+V ) o dg HC

---rv:r;TC B

9 (1. 5.1)

where VHC is the volume of a fixed mass at reservoir conditions and VSTC

is the volume of a fixed mass at stock tank conditions. The mass transfer

between the oil and gas phases is described by the solution gas-oil

ratio

R = ~ [V] B Vo STC

(1.5.2)

The solution gas-oil ratio is the ratio of the gas component in the oil

phase to the amount of oil component in the oil phase as a function of

oil phase pressure. Finally the three phase densities are written in

terms of the component densities ( Pi. ) which were used in the mass

conservation equations:

1 R P ) (1.5.3) Po Ii Po + Po + P

STC B 9 STC dg 0

1 STC) (1.5.4) Pg Ii Pg = Pg

9

1 STC) (1.5.5) Pw = Ii Pw = Pw

w

These phase densities are substituted into the flow equations which are

then divided by PSTC to get the standard model equations:

a IPS oil: v. ( T { V P - V D } ) = 0 ) + q (1.5.6) 'l at( B 0 0 0 0

0

k k where T ro

= 1l0 Bo 0

is called the transmissibility. There are similar equations for the gas

and water components.

1.6 Boundary Conditions

The mathematical model is not complete without specification of the

necessary boundary and initial conditions. Since the exact extent of the

reservoir is almost never precisely known, the standard model assumes no

flow conditions at the boundaries. Any other conditions, ego constant

pressure boundaries or constant water influx at the boundary can be

253

handled by adding appropriate wells. This has the effect of shifting any

complications to a proper description of injection and production wells.

The initial conditions usually take the form of specified pressures and

saturations. These can be calculated a priori given knowledge of the

water-oil and gas-oil contacts and the assumption of gravity and

capillary pressure equilibrium.

1. 7 Relative Permeability

Most experimental work on relative permeability has been done for

two phase systems. Figure 1. 7.1 below shows the structure of typical

two phase relative permeability curves (for a water-wet system). These

curves are usually not straight lines. For a full description of

relative permeability see section 2.2 of Chapter 1 of this volume.

k ro

1 1

·····················T , , i ! I ! ! i i .....

... , .... ,1" .................................. .

! i

k rw

°O~----~S~------~~~~--------~10

we s w

Figure 1. 7. 1

Figure 1. 7. 1 represents a water-oil system with water displacing the

oil. S is the critical water saturation (the saturation at which water we

would no longer flow) and S is the residual oil saturation (the 011 or

that cannot be removed from the reservoir).

In actual fact, porosity, capillary pressure and relative

permeability are related. In a reservoir with strongly varying

properties (different lithologIes), different relative permeability

curves and residual saturations should be used in different parts of the

reservoir. In terms of the simUlation model, this situation is referred

to as using different rock types.

For three phase systems, it is hypothesized that the following

holds:

k

k rw

ro

= f

= f

S), k w rg

S , S ). w 9

f(S),and 9

(1.7.1)

254

The funct ional dependance of k on S and S is not usually known in ro W \I

practice so that three phase relative permeabilities are normally

derived from two sets of two phase data. These are a water-oil system

with water displacing oil and a liquid-gas system with the oil (in the

presence of critical water) displacing the gas. Stone's Model II is the

one most commonly used. It defines the relative permeability of oil as

follows:

ro rocw {

k k (~W + k )x(~ + k )-( k

k rw k r\l rocw rocw

+ k } rw rg (1.7.2) k k

where k i!! O. The values k ,k ,k ,and k are determined ro rw rg row rog

from the two phase data. The other parameter is

k = k (S =S ) (1.7.3) rOCM row w we:

The expression requires that

k = k (S =1) rocw rog L

(1.7.4)

in order that it reduce to the proper two phase data in the absence of

gas or water. For further discussion of three-phase relative

permeability data see the 1970 and 1973 papers by Stone.

SECTION 2: DIFFERENCE METHODS

2.1 Control Volume Discretization

The control volume discretization is one of the simplest ways to

approach the discretization of the flow equations. It has the advantage

of guaranteeing conservation of mass in the discretized equations and is

equivalent to other methods (eg. Taylor series). Consider a one

dimensional, single component system. It can be described as follows:

~---.Ax---+

+--Axl+l/2

direction of flow

Figure 2.1.1

255

Change in accumulation in time interval At in bl,ock i

= ( flow in -flow out) A In at

Accumulation consists of

(1) accumulation due to compressibility

(2) accumulation due to sources and sinks

The flow in minus the flow out in time interval At is

PI_1I2UI_1I2A At - PI+1I2ul+1I2A At (2. 1. 1)

where A is area of the grid block face, P is the density, and u is the

superficial velocity of the fluid. The acc umu I at ion due to

compressibility is given by

VI [ (~p)~+At_ ( ~ p)~ ], (2.1.2)

where ~ is the porosity, and VI is the volume of the I th block and is

given by

VI = AXI AYI AZ I = AXI A . (2.1.3)

The second contribution to the accumulation is simply

q VIAt , (2.1.4)

where q is source strength in mass per unit volume per unit time.

Equation (2.1.1) is divided by

PSTC= P B

multiplied by Axl/(AxIAt) and the discretized version of Darcy's

Law (with no gravity)

= - ( k ) PI - PI-1

Ii 1-1/2 AX I _ 1/2 (2.1.5)

is used to give the discretized form of the flow equation for a single

component flowing in one dimension

( ( (2.1.6)

256

The superscripts m,n denote the time level at which the designated terms

are evaluated (n = old time level, n+1 = new time level, and m = an

intermediate time level), and the subscripts 1,1±1/2 denote the spatial

point at which they are evaluated. The above equation is generalized to

the multiphase case by using the multiphase version of Darcy's Law,

equation (1. 3. 2). Note that the control volume approach automatically

generates a discretization of the accumulation terms of the flow

equation,ie. the terms

a a at ( 41 5 0 8 0 ), at ( 41 S989} , ... etc.

which is mass conservative. Other discretizations of these terms (with

the same truncation error) are possible, but they can lead to non-mass

conserving schemes (see Aziz and Settari, 1979) which can cause material

balance errors and/or instabilities.

There are several issues to be resolved at this point concerning

the discretized equation (2.1.6). The first is the evaluation of the

terms with the subscript 1±1/2. The geometrical term AX1+1/2 is given

by the arithmetic average of the adjacent block lengths

Ax +Ax 1 + 1 1

AX 1+1/2 2

If there were gravity terms in (2.1.5) they would be of the form

Since the density is a smoothly varying function of pressure, it is

approximated by an arithmetic average.

usually evaluated using a harmonic

permeabll it ies

k (Ax +Ax )

1 1+1

1 +1/2

The permeability

average of the k l ±112 is

adjacent

The justification for this is that it gives the exact answer for

incompressible, single phase, steady-state flow when there is a

discontinuity in the permeability between block 1 and block 1+1.

2.2 Upstream Weighting

A second issue is the way in which the time dependant terms

invol ving fl uid properties are evaluated. These are the terms of the

form

257

-- or ( 1) BjJ 1 +1/2

rakrt ) (in the multiphase case). U tjJ I. 1+1/2

They could be evaluated as an average of the values in the adjacent

blocks (midpoint weighting) or as the value in the adjacent block which

is upstream for the flow of that particular phase (upstream weighting).

The former has the advantage of being O(Ax2 ) while the latter is only

O(Ax). However the midpoint weighting scheme is clearly incorrect for

certain physical situations. The one-dimensional example above, with

water flowing from the left and displacing oil could have a sharp water

saturation front with only residual oil behind the front. A midpoint

weighting for blocks at the front could indicate that flow of water was

possible from block 1 to block 1+1 when block 1+1 was still ahead of

the front. This error manifests itself as oscillations in the saturation

solutions around the front. Upstream weighting always gives the correct

relative permeability to the flow. Upstream weighting has the

disadvantage of causing numerical dispersion in the solution, ie.sharp

fronts will be smeared out. The smearing can be reduced by choosing

smaller grid block sizes, which is however computationally expensive.

Another solution is to refine the grid only in the area of the front.

Two-point upstream weighting ( Todd et aI, 1972) is also used to reduce

the spatial truncation error. The upstream direction for each phase is

usually chosen by determining the sign of the right hand side of

(1.3.2), the multiphase Darcy's Law. That is the sign of the function

F - p - P I. - 1.,1+1 1.,1 - PI.,l+1/2

( Zl +1 - Zl ]

g Ax 1 +1/2

with P1+1/2 approximated as described above, is evaluated. If,

FI. > 0, i+l is the upstream block; if

FI. < 0, i is the upstream block. for phase I.

(2.1.7)

The use of equation (2.1. 7) to determine the upstream direction is an

approximation in that the expressions FI. for each phase are not really

decoupled but are nonlinear functions of the phase pressures and

saturations. The expression for the upstream direction is the correct

one for the solutions determined at the previous iteration. The upstream

direction should therefore be checked at the end of each nonlinear

iteration to ensure that it is consistent with the one chosen at the

beginning of the iteration.

258

1.3 Fully Implicit and IKPES Formulation ot Reservoir Simulation

Equations

The final issue to be resolved is the time level at which the terms

qn the left-hand-side of equation (2.1.6) are to be evaluated. If this

is the new time level, ie. m = n+1 , the resulting formulation is termed

the tully impUcit formulation. The fully implicit formulation has

become widely used in reservoir simulation. For modelling complex

physical processes it produces a stable, robust numerical model.

The fully implicit method produces a set of ncx N nonlinear

algebraic equations (where n is the number of coupled equations per c

grid node and N is the number of grid nodes). This system is solved

using Newton iteration (see Au et al ,1980). The solution of the

associated Jacobian system of ncx N linear equations becomes the most

computationally intensive portion of the simulator.

The first reservoir simulation models developed were not fully

implicit. Lack of computers powerful enough to solve the large sets of

linear equations resulted in less implicit approximations being made to

the terms on the left-hand-side of equation (2.1.6). A widely-used

method is the one known as the IKPES an acronym for implicit pressure,

explicit saturation) formulation. The formulation is developed by

decoupling the discretized flow equations using certain approximations.

Condsider the discretized flow equations for an oil-water system with

some terms on the left-hand side written at the old time level:

oil:

water:

v { [k] n r- -p ) n+1 _ -.!.. k ~ 01 01-1 + Ax 1-112 8 Ax

I Ilo 0 1-1/2 1-1/2

k [~]n (POI+1-POI ]n+1} n 1+112 Il 8 Ax + qol

o 0 1+1/2 1+1/2

v {( "'S80 ) n+1 _ A~

o I ("':: r } (2.3.1)

V k n (_ ) n+ 1 ( _p ) n _ -.!.. { k [~] ( Pol Pol-1 _ Pcowl cowl-1 ) +

AXI 1-1/2 Ilw8w 1-1/2 AX I _1/2 AX I _1/2

k ( )n+1( )n k [~] n ( Pol + 1 -Pol _ Pc 0 wi + 1 -Pc 0 WI] } + n

1+112 Il 8 Ax Ax qwl w w 1+1/2 1+1/2 1+1/2

(2.3.2)

259

Note that all terms on the left that are functions of saturations (ie.

relative permeabilities

at the old time level.

the water equation is

and capillary pressure terms) have been written

Now the 011 equation is multiplied by Bn+ 1 and o

mul tiplied by Bn+1 and the two equations are " added. Since the saturations must sum to one the saturation terms on the

right which are at the new time level drop out and the resulting

equation is a parabolic pressure equation. The pressure equation is

solved to give the pressures at the new time level and these are

substituted in one of the equations (2.3.1) or (2.3.2) to give the

resulting saturation explicitly. Note that the pressure equation still

contains some (pressure dependant) terms at the new time level, so that

one or two cycles of simple iteration should be used to converge these.

For more details of the implementation see Aziz and Settari (1919).

The IMPES method suffers from a fairly severe timestep limitation

due to the explicit treatment of the terms

[ krt )

"'tBt This timestep limitation is given by (see Aziz and Settari,1919, or

Peaceman,1911)

I1t $ I1x (for I-D or u

a

$ I1x + l1y (for 2-D ) . u U xa ya

where u, u , and u are the velocities of advance of constant a xa ya

saturation fronts. This condition implies the timestep is limited by the

fact that the throughput of every block in the system must be smaller

than the pore volume of that block. For simulations that attempt to

model near-wellbore effects (ie. coning) relatively small blocks must be

used near the well. In addition flow rates are very high near the well.

Moreover, the presence of free gas (low viscosity) can also lead to high

flow rates. All of these factors lead to unacceptably small timesteps in

an IMPES model for many simulation problems.

2.4 Adaptive Implicit Formulation

More recently, formulations which combine the best aspects of both

of these methods (ie. the low computational cost of the IMPES method and

the large timestep capability of the fully implicit have been

developed. Thomas and Thurnau (1983) and Forsyth and Sammon (1984) have

260

described black oil models based on such formulations. The former uses

Gaussian elimination to solve the linear system and the latter uses an

iterative method (more details of which will be discussed later).

The method begins with the same discretized equations as the fully

implicit method. It is assumed that there are only two types of blocks,

and these are designated as IMPES blocks or fully implicit blocks. In

the IMPES blocks only the pressure is solved implicitly; in the fully

implicit blocks a pressure and 2 saturations (or 2 pressures and a

saturation) are solved implicitly.

The criterion for selection of implicit cells described by Thomas

and Thurnau and confirmed by Forsyth and Sammon is based on a specified

saturation or pressure change threshold from a previous iteration. Such

a criterion can only be used to switch the designation of a particular

block from IMPES to fully implicit. The reverse switch is not possible.

This is because a fully implicit cell can have a large throughput and

yet the saturation changes can be small ( typically seen at the end of a

waterflood, for example). Such a block could easily violate the IMPES

stabili ty criterion. Normally the progression of timestep size in a

black oil simulation goes from small, after a well opening or change,

where transients must be resolved , to larger and larger timesteps ,

until another well change is encountered. The above strategy fits in

well with this sequence. At a well change only a few blocks are set

implicit (the well blocks and its neighbours). Once a block is switched

to fully implicit it is not reset until the next well change. Well

blocks remain implicit always.

It is also necessary to detect slowly growing instabilities, which

would not be detected by the above cri terion. To do this saturat ion

change thresholds must be restricted to significantly smaller levels

than the changes which control timestep selection.

Table 2.4.1 below illustrates some of the savings attainable by

such a method (taken from Forsyth and Sammon) for the first SPE

comparative solution project (see Odeh,1981). The problem was one of gas

injection on a 10 x 10 x 3 grid. Material balance errors were found to

be small and cumulative production totals differed by less than 4Y. for

all cases. A 40Y. reduction in CPU time was seen. By the end of the

simulation 2/3 of the blocks had switched to implicit, but the

time-weighted average of the number of implicit blocks was less than

40Y., even though the top layers of the reservoir contained mostly mobile

261

gas and were made up of mostly implicit cells at the end of the

simulation.

Case

1

2

3

4

Table 2.4.1: Comparison of Adaptive Implicit and Fully Implicit

Solution to First SPE Comparative Solution Project

Timestep seiection norm: pressure

Saturation

pressure

threshold

125.0

250.0

600.0

saturation pressaure saturations

Saturation

threshold

0.025

0.050

0.150

Fully implicit throughout

1000.0 psi 1000.0 psi

0.20

CPU time (sec)

(Honeywell DPS-B)

1285

1239

1334

2178

SECTION 3: DIRECT AND ITERATIVE SOLUTION METHODS

3.1 Structure of the Matrix

The linear systems generated by Newton iteration of the fully

implicit nonlinear algebraic set of equations, discussed in the previous

section, are large, sparse and banded in structure. A five-point

discretization (or seven-point for three dimensional systems) leads to a

five-banded (or seven-banded) matrix. A nine-point discretizaton

molecule is also used in reservoir simulation (see Yanosik and

McKraken,1979; and Shah,1983 ). This leads to a nine-banded (or

eleven-banded) system. Figure 3.1.1 shows the incidence matrix for a

3 x 3 grid with a five-point discretization molecule.

n y

7

4

1

B

5

2

n x

grid

9

B

3

x x x

x x x x

x x ic x x x x

x x x x x

x x x x

x x x

x x x x

x x x

incidence matrix

Figure 3.1.1

( i ,J +11

1(1,Jl o 0-1.J?1--( i+1,J)

( i , J-1I

computational molecule

262

For a fully implicit formulation, each x represents a dense block matrix

of size n x n , where n is typically 3 for a three-component black oil c c c

model. These systems will be called block-banded For an IMPES

formulation, or any other formulation that solves only a pressure

equation, each x represents a single numerical entry. For an adaptive

implicit formulation the diagonal blocks are of size

off-diagonal blocks can be 1 x n, n x 1, or 1 x 1. c c

3.2 Conditioning of the System

The system of linear equations can be written as

A x = b

n x n , c c

but the

(3.2.1)

where A has the structure described above and is non-symmetric. The

concept of diagonal dominance is important when considering the

conditioning of this system. In the case where each entry x above

represents a block submatrix the concept of block diagonal dominance is

appropriate.

Definition: Suppose that a NxN matrix has been partitioned so that

(Feingold and Varga,1962)

A = [

A11 A12••···· Alk

(3.2.2)

Akl ........ Akk

k

where All is of order n l and f n l = N. The submatrices All can be single

elements, or dense submatrices (as described above), or larger matrices

representing all the unknowns along a particular grid line or even

plane. Then A is strictly (block) diagonally dominant if A II

is

non-singular and

L I!A~~ II I !AlP < 1 (3.2.3)

J;CI

for 1 ~ I ~ k and I bll any sui table matrix norm.

if:

It can be shown that the system in (3.2.1) is non-singular

(1) A is strictly (block)-diagonally dominant, or

(2) A is irreducibly (block)-diagonally dominant (ie. the

inequality in (3.2.3) holds for at least one i , the rest are

required only to be equal, and A is (block) irreducible).

263

In addition, (block) diagonal dominance implies that pivoting is not

necessary during the direct elimination process (see Wilkinson,1961, or

Varah, 1972).

To determine whether the systems generated from reservoir

simulation models are in fact (block) diagonally dominant, discretized

equations such as (2.3.1) and (2.3.2) must be examined. It is

intuitively clear that blocks of the order of the number of coupled

equations per grid node should be considered. Varah (1972) and Feingold

and Varga (1962) give examples of block decompositions which succeed

when a corresponding point decomposition failed.

The entries in the linear Jacobian system are derivatives of the

discretized equations with respect to the unknowns (saturations and

pressures). It can be seen that the discretized accumulation terms play a

major role in determining the "amount" of diagonal dominance, since the flux

terms contribute similar entries to the diagonal and off-diagonal blocks. If

the system is "reasonably" compressible, it will be "reasonably" diagonally

dominant. Note also that the magnitude of the diagonal contribution of the

accumulation terms is proportional to the volume of the block and inversely

proportional to the timestep size. Thus small volumes and large timestep sizes

adversely affect the diagonal dominance of the system. Also, the presence of

constant bottom hole pressure wells affects the diagonal dominance positively,

and the the presence of constant rate wells affects it negatively. This is

because the former adds a contribution to the diagonal without an equal

contribution to an off-diagonal element. These considerations about "amount"

of diagonal dominance also have relevance when discussing iterative solution

methods. In particular, iterative methods such as incomplete factorization

methods converge more quickly for systems with a reasonable amount of diagonal

dominance.

3.3 Direct Solution Methods

Direct elimination of the system in (3.2.1) is done using Gaussian

elimination without pivoting. Gaussian elimination gives a very accurate

solution to the linear system (sometimes more accurate than is necessary

if an outer iteration is present) but is costly in terms of computing

time and storage. Both work and storage increase substantially as the

size of the problem increases. The work and storage depend on the system

parameters in the following way:

WORK '" (N2 ) x n3x N (3.3.1) B c

where

264

STORAGE ( N + 1 )x n 2 x N (3.3.2)

N B

n c

N

B c

is the half-bandwidth of the matrix in terms of

blockbands and is equal to n (n n ) for 2D (3D) systems x x y

is the number of coupled equations at each grid node and

is the number of grid nodes ( = n n n) x y z

It is important for the block-banded systems to treat each small

block matrix as a single unit for the purposes of elimination. That is,

the operation of dividing each row by the magnitude of the diagonal

element becomes the operation of multiplying each row by the inverse of

the diagonal block, and so on (see Section 3.2).

Gaussian elimination is equivalent to the formation of the factors

A = L U (3.3.3)

of the matrix system. During the elimination process the area between

the bands of the original system becomes full. This is 111 ustrated

below:

[~l = [~l [~l A L U

Figure 3.3. 1

Figure 3.3.1 makes it clear why the work depends on the half-bandwidth.

The half-bandwidth, in turn, depends on the geometry of the underlying

grid system and on how the grid nodes are ordered. Clearly it is

advantageous to choose an ordering for which the half-bandwidth is

minimized. Note that the entries of U are not actually formed unless

there are multiple right-hand-sides to be solved (not usually the case

in reservoir simulation).

3.4 Ordering for Direct Solution Methods

The following discussion on ordering techniques will be given in

the context of Gaussian elimination, but ordering algorithms are also

important for iterative solution methods. Note that these orderings are

usually applied to the grid nodes, not the equations, when there are

several coupled equations per grid node.

In the work estimate (3.3.1), the number of bands, NB, in the 2D

case is equal to the number of grid nodes in the first ordering

265

direction. If n » n then ordering in the y-direction first gives a x y

smaller bandwidth. Similarly. if n »n the ordering should be in the y x

x-direction first. For example. consider

16 18 17 18 19 20 21 3 8 9 12 16 18 21 8 9 10 11 12 13 U 2 6 8 11 U 17 20

1 2 3 4 6 8 7 1 4 7 10 13 16 19

Figure 3.4.1

The ordering used in the second grid gives a much smaller bandwidth than

that used in the first.

Price and Coats (1974) introduced (to the petroleum literature) the

idea of

(1) minimizing the bandwidth by using a diagonal ordering

instead of ordering by rows or columns (D2 ordering)

(2) using a red-black ordering to form a matrix that allows

half of the unknowns to be decoupled

Consider the above grid ordered in a diagonal fashion:

4 7 10 13 18 19 21 2 6 8 11 U 17 20 1 3 8 9 12 16 18

Figure 3.4.2

The bandwidth of the resulting matrix is at most 3 and for the first and

last few rows it is even less. This effect is more pronounced on a

square grid. The red-black ordering (Figure 3.4.3) puts the matrix in a

form which allows N/2 unknowns to be decoupled and the resulting reduced

system involves the other N/2 unknowns. The reduction process is

described below.

2 9

7 3 1

1 8

elllDlnat.ed variables

6

0

4

12 6

11

x x

x x

x

x x x x x x x x

x x x x x x

x x x x

x x x x 0 0 0

x x x x x

----+

o x 0 0 0

x o 0 x 0 o x x x x 0 0 0 x 0 0

x x o o x 0

xx ooox

Figure 3.4.3

266

Note that the matrix is partitioned into four quadrants. The top left

contains only diagonal entries. Therefore, elimination of the first N/2

unknowns leaves no fill in the top half of the matrix. It produces zeros

in the lower left. The fill in the lower right is indicated by o's The

bandwidth of the reduced system is no larger than the bandwidth of the

original system. The work and storage requirements are (for a 2D problem) 3 3 n n n

WORK x y c

2 2 2 n n n

STORAGE x y c "" 2

Price and Coats combined the ideas of red-black ordering and D2 ordering

to produce a new ordering they called D4 ordering. Again a set of N/2

unknowns can be decoupled. The reduced system matrix in this case gives

a bandwidth that is on average smaller than that of the original system.

The D4 ordering is illustrated below.

6 20 10 24 13

16 6 21 11 26

2 17 7 22 12

14 3 18 8 23

1 16 4 19 9

Figure 3.4.4

The work and storage requirements for this ordering are for large n, n

(2D problems)

WORK

STORAGE

Note that for n = n x y

n3} 2 Y n 6" c

the work is 114 of that for natural

x y

ordering

(3.3.1). The scheme can be easily extended to 3D grids. The analysis

here is more dependant on geometry but for physically reasonable grids

(ie. n, n ~ n ) it gives similar results. x y z

For a nine-point discretization molecule, zebra ordering ( see

McDonald and Trimble, 1977) can be used. Alternatively, zebra ordering

can be applied to the reduced system, which has the connect ions of a

nine-point operator. McDonald and Trimble show zebra ordering on the

reduced system to be faster than D4, for larger problems (n ~ 18). y

George (1973) introduced an ordering algorithm which is

theoretically O(n3 ). Furthermore, he showed that O(n3 ) is a lower bound

267

for the elimination work on any n x n grid. This algorithm is known as

nested dissection. It performs well for large square grids

(n = n 2::: 33) when compared to zebra ordering, for example, on a five x y

point discretization molecule. Unfortunately, performance is

significantly degraded for non-square grids.

Another pseudo optimal ordering scheme is the minimum degree

ordering which has been discussed in various contexts (see Price

and Coats for references). The ordering is based on choosing pivots

along the diagonal so that the fill at any stage is a minimum. This

strategy is of course dependant on how far the algorithm is prepared to

"look ahead". It has been used successfully in, for example, the Yale

Sparse Matrix Package (Eisenstat et aI, 1983). This ordering does not

necessarily have to be applied to the grid nodes (ie. the coupled

equations at each node do not have to be treated as a unit).

3.5 Iterative solution Methods

The solution of large 2D and 3D problems often requires too much

computational work and storage to use direct elimination. An alternative

is the use of an iterative method where the work and storage depend on:

WORK ~ N x n3 x number of iterations c

STORAGE ~ N x n c

(3.5.1)

(3.5.2)

All iterative methods require an initial solution, x(O), which is

used to start the algorithm. The performance of the method depends on

how close the initial solution is to the true solution. For time

dependant problems, an initial solution which is the solution of the

previous timestep is convenient. In reservoir simulation, since the

system being solved is the Jacobian system:

F (x) ax = -F(x) (3.5.3)

and at convergence ax ~ 0, an initial solution of x(O) = a is usually

chosen. The number of i terat ions required for convergence can vary

widely for different problems and different methods. Also, for most

methods, the number of iterations depends on problem size , ie. most

methods are not O(N) but O(N'" ) where m > 1. Iteration parameters are

often used ( in Gauss Seidel, SOR , etc ) to accelerate convergence.

The physics of the problem being solved in reservoir simulation can

produce matrices A which are in some sense "difficult" to solve

iteratively. For example, anisotropic permeabilities often occur, with

k , k » k. Also, discontinuities in k , k , k are enountered (shale x y z x y z

268

barriers). As mentioned earlier, small V and/or ~t give a matrix which

is "less" diagonally dominant and therefore harder to solve for most

methods. Neumann boundary conditions are generally used. The presence of

pressure-controlled wells adds terms to the diagonal but rate-controlled

wells are equivalent to Neumann boundary conditions. Furthermore, the

matrix A is generally non-symmetric.

3.6 Classification of Iterative Methods

A general iterative method for the solution of the linear system

(3.2.1) can be written as follows. First a splitting of A where C is

nonsingular and

A = ·C - R (3.6.1)

is defined. It is now possible to define a basic iterative

method (Varga,1962) to be

Cx=Rx+b

or in residual notation xln+1) = xln) + C- 1r ln).

where r ln) = b _ AKIn)

(3.6.2)

(3.6.3)

Most common methods can be formed by an appropriate choice of C. For

example,

(1) C A, and R = 0 the algorithm becomes the

Gaussian elimination algorithm

(2) C = I, gives the Richardson method.

Between these two extremes there is a whole spectrum of methods:

(3) C = D gives the Jacobi method (where D is the

diagonal part of A)

(4) C = ~ D - GL gives SOR where ( A = D - GL- Gu)

(5) C = LU gives an incomplete factorization method.

It is this last class of methods that will be discussed in detail.

It is known from the study of symmetric systems (Kershaw, 1978;

MeiJerink and van der Vorst, 1977) that these methods have the best

potential for the types of matrices found in reservoir simulation.

3.7 Incomplete Factorization Methods:

Given a sparse banded matrix A, an incomplete factorization, LDU,

of A is defined to be:

269

LDU = A + E (3.7.1)

where E is known as the error matrix, L, D and U are lower triangular,

diagonal and upper triangular matrices respectively. In order to

minimize the work per iteration, L + D + U should have a sparse banded

structure close to the structure of A. However convergence will be more

rapid if the elements of E are made as small as possible. This is

generally achieved at the expense of extra bands in L and U.

If L and U retain the same sparsity structure as A, ie.

[~l = [~l [~l [~l A = L D U

Figure 3.7.1

the factorization reduces to the SIP method (Stone, 1974) or the DKR

method (Dupont et aI, 1968). Both of these methods have been widely used

in reservoir simulation. Adding more bands to the incomplete

factorization increases the work of forming the approximation and the

work per iteration. The strategy used for deciding which extra bands to

add varies among different authors. Behie and Forsyth (1984) use the

"degree" concept. In this, the incomplete factorization is viewed as

carrying out a few steps of Gaussian elimination on A. If the bands of

the original matrix A are labelled as first degree, then higher degree

bands are formed by fill-in resulting from elimination. The degree of a

fill band is equal to the degree of the band being eliminated plus the

degree of the band inducing it. This use of degree is equivalent to

Watt's (1981) concept of "order", but not the same as Gustafsson's

(1978) use of the word "order". Gustafsson's strategy for adding extra

bands is also slightly different. Some examples of the structure of

different degree incomplete factorizations are given in Figures 3.7.2

and 3.7.3'

[~l' [~l [~l [~l A L D U

Figure 3.7.2: Second Degree ILU (natural ordering)

270

[~l· [~l [~l [~l A L D U

Figure 3.7.3: Third Degree ILU (natural ordering).

As the degree increases, extra bands are added. This increases the

computational work, but also increases the convergence rate of the

algori thm. The trade-off between extra work and rate of convergence

depends somewhat on the problem being solved, but general guidelines can

be established by numerical experimentation (see Section 4.3).

3.8 Treatment of Error Terms

In equation (3.7.1) above E was defined to be the matrix containing

the "error" made in the incomplete factorization. The elements of E are

outside the structure of L + D + U. If it is assumed that the solution

is continuous, then clearly an element,e , of E, which falls outside the

computational molecule can be approximated by points inside the molecule

so that:

(3.8.0

where ea is an approximation to e and h is a measure of the mesh size.

If

{LDU}=A-E (3.8.2) a

where refers to the bands coinciding wi th the band structure of

L + D + U, then consequently,

LDU A-E +E a

= A + E' (3.8.3)

where E' = E - E. This results in an m'th order factorization of A, a

assuming equation (3.8.1) is true. Consequently, as the mesh size h is

decreased, and the number of unknowns N is increased, the error E

decreases. Intuitively, it is clear that as m increases, the convergence

rate degrades less with increasing N. This is shown more rigorously in

Gustafsson (1977, 1978) for symmetric problems. In part icular, if the

error terms are not accounted for at all, the factorization is zeroth

order. The first order factorization of Gustafsson simply approximates

the error term by using the diagonal point. This is the modified

factorization ( MILU ) which will be referred to later. It is possible

271

to obtain a higher order factorization by using more points within the

molecule. SIP (Stone,1968; Saylor,1974) is an example of a second order

factorization.

3.9 Ordering for Incomplete Factorization Methods

It is well known that the ordering of the grid nodes can affect the

convergence properties of iterative methods (see Varga, 1962, for

example). Watts (1981) suggested using the 02 ordering mentioned earlier

(Section 3.4) and found that this ordering in combination with an ILU

method produced improved rates of convergence for typical reservoir

simulation problems. Physically, 02 ordering removes the directional

bias which results from such things as anisotropies in permeabi li ty

(which are usually aligned with the grid lines) and/or varying block

sizes.

Axelsson and Gustafsson (1979) suggested using red-black ordering

to form a reduced system which is then solved iteratively. The system in

(3.2.1) if ordered with red-black ordering can be written as ( see

Figure 3.4.3)

[ :: :: 1 [ :: 1 ' [ :: 1 (3.9.1)

where D and D are block diagonal matrices and A and A are block R B B B banded. This can scaled as

[ I A

1 [ :: 1 [ :~ 1 R R

(3.9.2) A D B B

The system (3.9.2) is equivalent to

(3.9.3)

Let R = DB - ABAR and c = b - A b ' B B R

and (3.9.3) can be written:

R XB = c, with

x = b' - A x R R R B

272

(3.9.4)

(3.9.5)

This reduced system can be diagonally ordered (Behie and Forsyth, 1984)

to produce an algorithm which has the directional bias removing

properties of that of Watts' 02 algorithm. The system in (3.9.4) can now

be factored in an approximate fashion to any desired degree of accuracy.

The resulting system is solved iteratively only for the black points and

once convergence is reached the red points are retrieved via (3.9.5).

3.10 Acceleration Techniques:

Again, by analogy with work done for symmetric systems, it is

postulated that such incomplete factorizations probably work best when

used in conjunction with an acceleration technique.

systems, techniques such as conjugate gradient

acceleration (see Meijerink and van der Vorst, 1977,

For symmetric

and Chebyshev

Manteuffel,1977,

Kershaw,1978) have been used. Nonsymmetric analogues of the conjugate

gradient acceleration method have been developed (Vinsome,1976, Young

and Jea,1980, Axelsson,1980, Elman,1981, Saad and Schultz,1986). The

ORTHOMIN algorithm developed by Vinsome has proved to be one of the most

useful of these acceleration techniques. It provides a computationally

simple , robust acceleration method. It does not require estimation of

eigenvalues as does Chebyshev acceleration. The iteration parameters are

generated automatically by the algorithm. When combined with an ILU

method, it gives rise to the following computational algorithm:

(k) q

k = 0,1, ...

{ -o

A ax ( k) ,A q ( 1 ) )

A q( I), A q( I) )

k-l ax(k) + L a!k) q(l)

1= •

• <k-l

k-l

L (k) A (I) a l q 1=.

11< k-l

for all 1 in (II :51 :5 k-l)

otherwise

(3.10.1)

(3.10.2)

(3.10.3)

(3.10.4)

where

and

(k) w

A (k) r(k) ) q ,

X(k+l) = x(k) + w(klq(kl

r(k+l) = r(k) _ w(k) A q(kl

273

m = int( k/(N +1))*(N +1), orth orth

q(O) = .5x(Ol = (LDU)-lr(O)

(3.10.5)

(3.10.6)

(3.10.7)

Note that this is called the restarted version of the ORTHOMIN

algorithm. This version of the ORTHOMIN algorithm is also often referred

to as the generalized conjugate residual algorithm (GCR) see

Elman, 1981).

The ORTHOMIN algorithm is analagous to the conjugate gradient

algorithm in

direction A

previous set

the following way. At each step of the algorithm a search (k)

q is constructed so that it is orthogonal to some (I)

of A q . This set cannot be large as it would involve

too much work to perform the orthogonalizations. Practical choices of

the parameter North lie between 1 and 10. Once the (North+l) search

vectors have been constructed, the algorithm is restarted, and a new set

of (North+1) vectors is constructed. In the conjugate gradient algorithm

the new search direction at each iteration is automatically conjugate

to all previous search directions. The orthogonalization procedure in (k)

(3.10.2) is essentially a Gram-Schmidt procedure. The parameter w is

chosen to minimize the residual in the t2 norm. GMRES (Saad and

Schultz,1986) is a similar procedure but the search vectors q, and Aq

are not saved but rather they are reconstructed by an Arnoldi process

constrained to minimize the residual. This algorithm can have

significantly lower storage costs (for large North ) but the convergence

properties are similar (it is mathematically equivalent to ORTHOMIN).

Note that when the ORTHOMIN acceleration algorithm is used with

the 04 ILU approximate factorization the matrix-vector multiply can be

calculated in a computationally efficient way . Put

y = R .5x = D .5x - (A (A .5x) ) . B B R

(3.10.8)

In 30 this costs 13 NB mult iplies versus 19 NB mult iplies for the

conventional way. In 20 the work is the same.

274

3.11 Convergence Properties

For a symmetric, positive definite incomplete factorization

acce lerated by conjugate gradient (or precondi tloned conjugate

gradient as it also called), the rate of convergence is proportional to

a power of

where

is the ratio of the maximum to minimum eigenvalues of the iteration

matrix (Concus et aI, 1975) and is known as the condition number. It is

therefore desirable to choose a splitting for which K is as small as

possible.

The success of the conjugate gradient accelerated incomplete

factorization is not only due to the reduction of the condition number

but also to the fact that the modified system has eigenvalues which are

nearly one except for a few extreme eigenvalues that are quickly

eliminated by the conjugate gradient acceleration (Kershaw, 1978).

There is little in the way of analysis of the convergence

properties for these iterative methods applied to non-symmetric systems.

Elman (1981) has shown that if the symmetric part of A

definite then ORTHOMIN generates a sequence which satisfies

[ i\ (C) ]II?

II r II :S 1 ml n r II I 2 i\ (C)+ p(R2)/i\ (C) 0 2

max min and thus the process cannot diverge.

3.12 Block Incomplete Factorization

is positive

(3.11.1)

It is well known, for classical methods such as Jacobi and

successive overre 1 axat ion (SOR) , that block methods can be

asymptotically faster than the corresponding point methods (Varga,

p. 199). Moreover, in the case of block or line SOR, the method can be

normalized to require exactly the same computational work per grid node

as the point method does. For these reasons there has been considerable

interest recently in applying these ideas to ILU methods.

For a block incomplete factorization. the matrix A is partitioned

as follows:

275

: : x x x I x x x !

; ..................... ~ ···;······;············1···;··················

x ixxxi x x i x x i

! i x

......................... ~ ............................. ~ ........................ . ! x i x x : x : x x x ! i : x : x x

Figure 3.12.1

where the B's are tridiagonal matrices and the L's and the U's are

diagonal matrices. As in Section 3.1, each x can represent a single

entry in the matrix or a dense submatrix of order n x n . Note that the c c

term "block" is used in a more general sense here than in Section 3.2

where it referred only to the dense submatrix represented by the x's.

The concept of block diagonal dominance applies here too, with the

blocks defined by the above partitioning.

There is usually, but not always, a physical basis to the

partitioning chosen. In the above example, the B matrices represent the

coupling between unknowns on the same line of the grid (see Figure

3.1.1). This partitioning views the matrix A as a block triangular

system where

A=B+L+U

An exact factorization of this system is

where

and

A = ( G + L ) G-1 ( G + U )

G 1

G = B - L G-1 U 1 1 1 1-1 1-1

(3.12.1)

(3.12.2)

1 = 1, ... N

The G (except for the first ) are full matrices. For a block incomplete 1

factorization a sparse approximation to

approximated by

G is used. The matrix A is 1-1

H = ( H + L ) ( H-1 ) ( H + U ) (3.12.3)

A + H - B + L H-1U

A + E

Many different approximations to H- 1 are of course possible. The aim is 1-1

276

to find an approximation that is sparse (diagonal or tridiagonal in

structure) yet retains enough information to significantly increase the

convergence rate. The error terms, E, in (3.12.3) can be accounted for

in an analagous fashion to that discussed in Section 3.8. The column

sums (or row sums) of E are subtracted from the corresponding diagonal

element.

A simple approximation is to write

H-1 = diag (G-1 ) 1-1 1-1

(3.12.4)

where only the diagonal element of the exact inverse is retained. Note

than for 2D problems this gives the method known as Nested Factorization

( for further discussion of this method see Section 3.13). A more

accurate treatment is to put

H~~1 = band ( G~~1 ' P ) (3.12.5) in which p bands on either side of the diagonal, as well as the diagonal

are retained. Methods based on this approximation are known as INV(p)

and MINV(p). For further discussion of block incomplete factorization

methods see Underwood (1976), MeiJerink (1983), Axelsson et al (1984),

and Concus et al (1985).

3.13 Nested Factorization

A type of block incomplete factorization that is used in reservoir

simUlation is the algorithm known as Nested Factorization (Cheshire,

1983). For 2D systems it is equivalent to block incomplete factorization

with the approximation (3.12.4) for the inverse of the diagonal block.

For 3D systems and a seven-point discretization molecule, the matrix A

is partitioned as follows (for a 3x2x2 grid):

x x Ix :

x x x i x : x x i x

;········ .. ···i;···;······ x ix x x

x ~ x x ••••••••••••••• j ••••.••••••••••

x

x x

x x

x x

x ~ ..................................... .

x x

x

x x Ix xxxi x~·····

x x i x ;·············r;···;···~·····

x ;x x x x I x x

Figure 3.13.1

U 3

U 2

277

Symbolically the matrix A is written as

A=D+L+U+L+U+L+U 1 1 2 2 3 3

(3.13.1)

where again each x may represent either a single element or a nc x nc

submatrix. The matrices represented by D, L, and U are diagonal in x.

The matrix A is then partitioned into a block tridiagonal form by the

coarsest partitioning in Figure (3.13.1) and a block incomplete

factorization is written a

M = ( p + L3 ) p-l ( p + U3 ) (3.13.2)

where p-l is a sparse approximation to the true inverse. This sparse

approximation is obtained from a second partitioning of the system into

a block tridiagonal system (formed by the finer partitioning in Figure

(3.13.1) ) and a second block incomplete factorization and is written as

p' = ( T + L ) T-1 ( T + U ) 2 2

(3.13.3)

where T-1 is again a sparse approximation to the true inverse and is

formed by factoring each matrix T (which represents the coupling between

nodes on a single line of the grid). At this level the factorization can

be done exactly

T = ( s + L ) S-1 ( S + U 1 1

(3.13.4)

where S is a diagonal matrix defined by

S = -1 D - Ll S U1 - colsum(

-1 - colsum( L3P U3 ) (3.13.5)

The colsum( ) terms are added to account for the error in the incomplete

factorizations represented by (3.13.2) and (3.13.3). They ensure that

the colsums of the error matrix (M - A) are zero and thus that residuals

sum to zero (independently for each unknown in the system) if the

condition

(3.13.6)

is enforced. The nested factorizaton error matrix is itself block

diagonal and thus the residuals will also sum to zero within planes ( or

lines i a 20 system). This fact can be used to check the implementation

of the algorithm.

Note that only the coupling between nodes on a single line of a

grid is accounted for exactly in this factorization. The coupling in the

other two directions is accounted for only in the error terms. For this

278

reason, the algorithm is very sensitive to the ordering of the grid

nodes (for problems with anisotropies and/or discontinuities in

coefficients) and care must be taken to order· the grid nodes properly.

3.14 Multigrid Methods

The multigrid method is an iterative technique developed for the

solution of elliptic partial differential equations which, for smooth

problems (le. no discontinuities in coefficients), can be shown to

converge at a rate which is D(N), where N is the number of unknowns

(Brandt, 1977; Brandt & Dinar, 1979; Brandt, 1986). This convergence

rate makes the method potentially a very attractive one since the

incomplete factorization methods discussed above are D(N312 ) or D(Ns/4 )

(see section 3.11). For a method which is D(N), the number of

iterations required for convergence is independent of the number of

unknowns, so that the size of the problem could be doubled and not

increase the number of iterations required for convergence.

The first step of the multigrid method is to discretize the problem

on a number of grids of varying fineness (a coarser grid usually being

a subset of the finer one). The second step is to take advantage of a

known property of relaxation methods (ie. Gauss-Siedel and related

methods) which is that they are efficient at eliminating local or high

frequency errors which have a wave length on the order of the grid

spacing. Relaxation methods are, on the other hand, very inefficient at

eliminating longer wavelength error. Each successive grid ,therefore,

can be used to reduce error components which are of the order of that

particular grid's spacing. The problem is then passed on to another

(coarser) grid to eliminate the longer wavelength error components.

279

Figure 3.14.1

If the differential equation to be solved is written as

L u = f K it can be represented on the finest grid, e, as

LK tl' = fK

If uK is an approximation to tl', then the residual on eK is

rK = fK _ LK UK

Equation (3.14.2) written in residual form is

LK vK = rK

where

(3.14.1)

(3.14.2)

(3.14.3)

(3.14.4)

To reiterate, a relaxation method used on equation (3.14.4)

eliminates the high frequency components of error. The solution is

smooth in the sense that it does not have fluctuations on the scale of K K e . This means that an approximation to v can be found using a coarser

grid. The essential idea behind multigrid is then that the problem 1s

prepared in such a way that 1 t can be represented and solved on a

coarser grid. The problem on the coarser grid is written

(3.1,4.5)

where

and

280

1"-1 is an interpolation operator from e" to e"-1, " " "-1 L represents the differential operator on e .

Again, on this grid, relaxation is efficient at eliminating errors that

have a wavelength of the order of the mesh size. This operation is

referred to as smoothing and is independent of the mesh size (Brandt,

1977) .

Equation (3.14.5) can be used to improve u" with

(u" )new = (u" )Old + I" "-1

"-1 V (3.14.6)

That is, the coarse grid provides a correction to the fine grid

solution. This correction (I" V"-1) contains information about low "-1 frequency components of the solution and hence speeds convergence.

After a few sweeps of relaxation on e"-1 convergence will

deteriorate, as it did on the finer grid. However, it is now observed

that (3.14.5) can be treated in exactly the same way as (3.14.2) was.

That is, a correction can be obtained from a still coarser grid, e"-2. On the coarsest grid, e1 , the problem is usually solved exactly. The

algorithm can be represented diagramatically as follows:

e" smooth (perform

~elaxation sweeps)

transfer residuals (1"-1 " " r

e"-1 smooth (perform ~elaxation sweeps)

Figure (3.14.2)

This represents one multigrid cycle. In particular, it is called a

V-cycle. Other types of cycles are possible, ego W-cycles A common

configuration is to have three levels or grids in the cycle.

To complete the specification of the algorithm, the following must

be defined:

(1) the interpolation operators I" (the residual transfer "-1

281

M-1 operator 1M is generally taken to be the inverse operation)

(2) the approximation of the differential operator on the coarse

grid, ego LM- 1

For smooth problems this definition is easy. The M-1 operator L is

defined in the obvious manner, ie. the fine grid nodes are simply

removed. For the interpolation operator linear or quadratic

interpolation is used. For problems with discontinuities or anisotropies

in the coefficients, the definition of these operators is not so

obvious. In fact this definition is the crux of defining a workable

multigrid algorithm for such problems.

For discontinuous and/or anisotropic problems it is difficult to

define physically meaningful interpolation operators. Early attempts at

defining interpolation operators were often thwarted by counterexamples

where the operator failed. Working on the premise that the interpolation

operator should mimic the properties of the differential operator, it

was suggested that the differential operator itself be used as an

interpolation operator wherever possible (Alcouffe et aI, 1981). This

cannot be done everywhere since the interpolation operator on the

coarser grids would grow undesirably large. When the differential

operator cannot be used in its entirety, a collapsed or averaged version

of the differential operator is used to define the interpolation

operator. The process can be described in the following way:

(1) for f.ine grid points corresponding to coarse grid points, use

the identity operator;

(2) for fine grid points on a line between two coarse points, the

different ial operator L M is averaged in the two directions

perpendicular to the coarse grid lines, ie. the components of M M C1 L are added to produce a collapsed 10 operator (L) ,and

the interpolation operator is obtained from (LM)C1 uM = 0 ;

(3) for fine points not on coarse grid lines the full differential

operator is used, ie. the interpolation operator is obtained

from L M uM = 0

The differential operator on the coarse grid can now be defined

recursively as: L M-1 I M-1 L MI M

M M-1

(1M )T LM 1M M-1 M-1

This is called the automatic prescription for the coarse grid operator.

282

If the fine grid operator is the standard five-point (or seven-point

operator, in 3D) discretization operator, the coarse grid operator will

be a nine-point (or twenty-seven point) operator, even with the

averaging described above.

The third consideration in defining the algorithm, is to choose a

smoothing method. For smooth problems, point Gauss-Seidel is used. For

anisotropic problems, line or alternating line Gauss-Seidel is used.

Other iterative methods have been tried as smoothing methods with

varying degrees of success (Hemker, 1982; Kettler, 1982). In three

dimensions this issue is more serious, since the three-dimensional

analogue of line Gauss Seidel, plane or alternating plane Gauss Seidel,

is too costly. It requires the solution of a 2D problem on each plane

(Behie and Forsyth, 1983). One solution is to use an iterative, rather

than an exact method to solve the 2D problem. Several methods have been

used, including ILU factorization and 2D multigrid (Dendy, 1987).

Extending the algorithm to solve 3D problems poses other

difficulties as well. Behie and Forsyth (1983) found that the

straightforward application worked well for most classes of problems (of

the type encountered in reservoir simulation) but failed on anisotropic

problems with small compressibility. Dendy (1987) devised an improved

interpolation and differential operator prescription which does not fail

on this class of problems. Another approach which has been discussed for

3D problems is to refine only in the x and y directions. This seems a

particularly reasonable approach for reservoir simulation since the z

direction typically contains fewer grid nodes.

Multigrid algorithms have been developed for non-symmetric problems

(Dendy, 1983) and for problems involving systems of equations , ie. more

than one equation being solved at each grid node (Dendy). Both of these

are typical of reservoir simulation problems. However these algorithms

have not yet been actively used in reservoir simulation applications.

Multigrid algorithms have been used for reservoir simulation mainly in

the context of solving a pressure equation which is usually symmetric or

nearly symmetric.

Mul t igrid algori thms have potent ial for application to reservoir

simulation but are st ill an area of current research and are not yet

state-of-the art methods. A drawback to the use of multigrid algorithms

is the large setup cost to calculate the coarse grid operators. Even for

a 2D problem, this work is on the order of 36N operations. The storage

283

involved in saving the coefficients is also substantial. The algorithm

is practical only for large problems. The performance of the multigrid

algor! thm on some standard test problems is discussed in Behie and

Forsyth, 1983.

Another approach to developing a multigrid algorithm is the

Algebraic Multigrid Method (AMG) (Brandt ,1986). The AMG method is based

solely on algebraic information contained in the matrix A, ie. strong

and weak connections. AMG requires no knowledge of an underlying grid

structure. AMG constructs a sequence of "grids", "coarse grid operators"

and "intergrid transfer operators" which are then combined in the usual

multigrid way. It is fully automatic. AMG can be applied to many

problems where standard multigrid methods are not applicable. It has

been used successfully on the standard "difficult" problems, ie. those

with anisotropic and/or discontinuous coefficients. It does not require

special smoothing. The usual requirement is that A be symmetric and of

positive type, Ie. all > 0,

N

L a ~ 0 J= 1 IJ

a :5 0 (J;t:i) and IJ

(1 = 1,2, ... ,N) (3.14.7)

These conditions can be relaxed somewhat. AMG works on non-symmetric

matrices. Also, the matrix need only be of essentially positive type,

which means that the condition (3.14.7) can be violated on some grid

GM- k coarser than GM• AMG for systems is an area of current research.

The drawbacks to the AMG method include the complexity of the algorithm.

It is generally more computationally expensive than the standard

multigrid algorithm.

SECTION 4: ASSOCIATED TOPICS

4.1 Treatment of Source Terms

Most reservoir simulators include the modeling of multi block wells

(and/or fractures). For the. best convergence rates in the Newton

iteration) this should be done implicitly, ie. the source terms in

equation (2.1.6) should be written at the latest iteration level. Each

conservation equation in

cell i contains a well): ;\1

cell i will have a source term of the form (if

where qk k

q = mass influx of component k k

(4.1.1)

284

~k mobility of phase k

PI pressure in cell i

pJ = wellbore pressure in well J If

To specify the wellbore pressure, an additional equation is required.

This takes the form of a constraint on the total flow:

qj T

where qJ T

n c

I/>j

n c

E E ~I ( pJ - P k If I

k=l IEI/> J

total specified fluid flow into well J

total number of fluid components

the set of cell numbers penetrated by well J

(4.1.2)

There is now an extra degree of coupling between cells because of these

terms. The resulting matrix contains terms outside the regular band

structure. Figure (4.1.1) shows the grid structure and resulting

incidence matrix for the implicit solution of the reservoir and well

unknowns for a 3 x 3 grid with a multiblock well completed in three of

the grid blocks

t '7~ 8 9

x x x x x x x x

t--! , C 5 B x x x

~~ 2 3

x x x x x x x x x x

x x x x x x x x

x x x x

........................................... ~ ............ ~ ... ~ .... ~ ...... . x x x i x

Figure (4.1.1)

A standard treatment is to order the unknowns connected to the reservoir

flow first and the well unknowns second. In Figure (4.1.1) the reservoir

unknowns are ordered first,. resulting in what is known as a bordered

matrix.

285

derivatives of flow equations with respect to flow unknowns

! ! derivatives of ! (t.1.1) with ! respect to

I p!

······d;;;i;~li;;;;···;;f····i·t·:·"i"·:·2·i·T···d;;;i;~l"i·-;;es of

with respect to flow : (t.l.2) with unknowns ! respect to p!

Figure (4.1.2)

Figure (4.1.2) illustrates the partitioning in the bordered matrix. The

number of extra rows and columns is nw' the number of fully coupled

wells. The matrix problem is now written as

[ :, :: 1 [=: 1 · [:: 1 (4.1.3)

The total number of unknowns is N = n x nc + nw . Typically

n »nw' so that one approach to solving (4.1.3) is to perform a block

elimination on it. This is an exact factorization of the block system.

(4.1.4)

Since Land U are factors of A, the factorization of the flow portion of

the Jacobian matrix can be done with the usual solution algorithm (eg.

direct elimination, ILU factorization etc.). The solution of the

algorithm for the whole system involves two additional computationally

intensive portions. These are;

(1)

(2)

the computation of (LU)-IW which involves a forward and 2

backward solve for each.well;

the computation

represented by

factorization.

of the n x n w w

W -W (LU)-I" 3 1 2

(generally full) matrix

and its subsequent

If the LU factorization of A comes from a direct elimination, the

algorithm is the same as Gaussian elimination on the whole system. If

the factorization of A comes from an ILU method, the total algorithm is

also an iterative one, with the reservoir-well coupling handled exactly.

When the number of wells is large the above algori thm can be

286

prohibitively costly in terms of computation time. An alternative

algorithm can be used to factor the system in the following way:

DW 2

W -W DW 312

(4.1.5)

This is not an exact factorizat ion of the reservoir- well coupling. It

is known as a sparsely coupled factorization. The reservoir-well

coupling is treated in a DKR fashion. This algorithm works well for many

problems but does encounter difficulty if there are many constant rate,

constant injectivity wells. The number of iterations required for

convergence increases and the well constraints are not well satisfied

(an undesirable trait from the user's point of view).

To gain some insight into why the sparsely coupled algorithm

performs poorly for constant rate injection wells, consider the the

constraint equation for these wells:

q L AI ( P - P ) (4.1.6) s w I

I where q the specified rate,

s AI the constant injectivity into block i,

PI the pressure in block i,

and Pw the bottom hole pressure in the well

In Figure (4.1.1) the rows of the matrix corresponding to derivatives of

(4.1.6) will have zero row sums. Also the rows corresponding to

derivatives of the flow equations for grid nodes containing constant

rate injection wells will have zero row sums. This latter result is a

consequence of mass conservative differencing. The sparsely coupled

algorithm does not preserve the zero row sums. The incomplete

factorization in (4.1.5) has a nonzero row sum. This means that there

is a material balance error which results in rates not being preserved

during the iterations.

Instead of the block LU factorization of the system described in

(4.1.4), consider a block UL factorization:

287

(4.1.7)

Note that this approach is completely equivalent to making a LU

factorization of the sytem with the well unknowns ordered first and the

reservoir unknowns ordered second ( as described by MeiJerink and van

der Vorst, 1981). The factorization described in (4.1.7) is still exact

but a decision must be made on how to handle the term A _ ""-1,, ) 2 3 1

-1 since "2"3 "1 has terms outside the band structure of A. One approach

is to use the following rule. The term ( A - ""-1,, ) is replaced by 2 3 1

A - ""-1,, }, where { } is defined as: 2 3 1

-1 (1) if an element of "2"3 "1 falls on an existing band, It is

subtracted from the appropriate element of A, but

(2) if it is outside the band structure of A, it is subtracted

from the corresponding diagonal element.

Note that with this rule single layer wells and double layer, nearest

neighbour wells do not introduce any new connections and can therefore

be eliminated exactly. This rule also has the desirable feature of

preserving row sums at each iteration. The matrix represented by { } can

be factored by one of the usual ILU methods to deri ve an i terat i ve

method for the whole system. The reservoir-well coupling is not handled

exectly (as it was in (4.1.4) but some of the important features of this

coupling are preserved. This algorithm has been found to be most

effective when the number of fully coupled wells is more than ten.

4.2 Programming Considerations For Vector Machines

Reservoir simulation models are often used to simulate large fields

for tens of years of operation. For this reason, large vector computers

such as the CRAY I and CRAY II series, or the CYBER 205 are used to run

such simulations. To make optimal use of these machines, care must be

taken in programming the computer model. For black oil models of the

type described above, most of the computational time is consumed by the

linear solution part of the model so that this is really the only part

of the code that needs to be especially designed for the vector machine.

This process is referred to as "vectorizing" the code. For other models,

288

such as thermal or compositional models, coefficient generation

equation-of-state calculations, or table look-ups can require large

amounts of computational effort, so that the vectorization process must

extend to other parts of the code as well.

For large simulations the solution algorithm of choice would be an

accelerated ILU method. The method can be divided up into the following

steps:

(1) calculate the incomplete factors L and U,

(2) calculate the residual by forward and backward solution

(3) correct this residual by an acceleration technique such

as ORTHOMI N .

Step (1) is done only once per nonlinear iteration (Newton cycle) and is

an innately recursive process and as such cannot be vectorized. For

reduced system orderings the reduction portion of step (1) is

vectorizable, however. Steps (2) and (3) are carried out between 5 to 15

times per nonlinear iteration. It is most beneficial, therefore, to

concentrate on vectorizing these portions of the algorithm.

It is fairly straightforward to vectorize the ORTHOMIN portion of

the algorithm (step (3». It consists of a matrix-vector multiply and

several inner products. The inner products are trivially vectorizable.

The matrix-vector multiply can be vectorized as well, if the bands of

the matrix are straight. This requirement results from a vector machine

limitation which requires that two vectors being multiplied have a

constant "stride" (ie. elements of the vector must be separated by a

constant increment in memory). The bands of the matrix are already

straight if the ordering is natural. But in reservoir simulation

diagonal orderings are often used. Therefore, the solution vector must

be reordered before and after the matrix-vector multiply. For red-black

ordering, the bands are straight only if n is odd (in 2D) or n and n x x y

are odd (in 3D). To ensure that this is true an extra row of null

blocks is added to the grid if necessary.

In this sect ion, some simple modificat ions to standard computer

algorithms, which improve their performance on vector machines, will be

discussed. Much more sophisticated modifications can be made, but these

generally result in restrictions on the code's portability.

Modifications for parallel architectures will not be discussed.

Given that the bands of the matrix are straight, the matrix-vector

multiply is performed by multiplying not in the usual way (ie. 1st row

289

of the matrix times the multiplying vector, then the 2nd row of the

matrix times the multiplying vector and so on) but by multiplying along

the bands of the matrix (Karush et aI, 1975). This can be illustrated

diagramatically:

Figure 4.2.1

The diagonals of the matrix are extended until each is full (in this

case has 6 elements). Each diagonal in turn is multiplied by the vector

of x's. If the "a" diagonal, for instance, is multiplied by the vector

of x's the only non-zero results wil be in the 4th, 5th, and 6th places,

Ie. atxt , asxs' and aaxa. The whole vector of results is calculated at

once for the same cost as one scalar multiply, so the the cost of

multiplying zero entries is not a consideration. The vector of results

is then transferred to the appropriate sum ( one for each row of the

matrix). For example atxt goes to the sum for the first row, and so on.

Note that if each entry in Figure 4.2.1 represents a 3x3 submatrix three

multiplications will be done for each diagonal.

The forward and backward substitution portion of the algorithm is

not easy to vectorize since it is also innately recursive (ie. the

resul t at any point depends on the resul t of the previous step , or

steps). Consider the forward solve procedure:

x

x x

x x x x ....................•.

:x xi x . , ...... ;-; ... 1'.,. x

: : ! ... ~.~.~ ...... ;;;;~.: .. i : 000 000 :

i ... ~~.~ ....... ~.~.~ .... !

x

o o o o o o

Figure 4.2.2

290

r·~ . !x i ~' ... '

x

~li;-;i ..... x x

The elements of the matrix and the vector must be gathered into continuous vectors

The procedure can be vectorized by first gathering the appropriate

elements of the factor L into a continuous vector( in the example above,

this would be the first two elements of the fifth row), remembering that

each element is usually a 3x3 submatrix. The appropriate elements of the

solution vector are also gathered into a vector ( in the above example,

the second and fourth elements which are at this point known values),

remembering that these too each have three components. The first, second

and third rows of the gathered matrix elements are each multiplied by

the solution vector. The result vectors are then summed in the

appropriate fashion and subtracted off the fifth element of the

right-hand-side vector. The vector lengths in this case are of the

order of the number of bands in the ILU (between 3 and 17 depending on

the factorization used) times the number of equations per grid node

(which is 3 for black oil). The reduction in time is on the order of 30

to 40% on the CRAY 1.

For the simplest ILU (ie. DKR ordering where only the diagonal band

is altered by the factorization) the forward and backward solve can be

part ially vectorized by using diagonal ordering (Towler and Killough,

1982) .

: xi

....... l. xi x x! x

, 7 9

2 5 18

.... · .. j .... :;; ............ T x ! : i x x i x i x i x ................... : ......................... ..

1 3 6 i x x x

! ............. ~ ...... ~ .. . x

x xi x ................ : ......... . Figure 4.2.4

291

The diagonal ordering and the use of the five point discretization

molecule lead to the situation illustrated In Figure 4.2.3 where the

unknowns along any given diagonal are Independent of each other ( le.

non-recursive). For example the unknowns at the fourth, fifth and sixth

grid nodes can all be computed simultaneously (le. in a vector

operation), in the example above. This is because they depend only on

the resul ts of the previous diagonal , the unknowns at the second and

third grid nodes . The algorithm can therefore be vectorized to the

extent of the longest diagonal. A time reduction of 70 to 80% has been

reported in vector mode on the CRAY for an IMPES model (Towler and

Killough, 1982). The DKR algori thm is not always opt imal in terms of

convergence rate (see examples in Section 4.2.3) so that this

application has limited use.

Another approach to vectorizing ILU methods is to expand the

factors L-1 (and U-1 ) in a series expansion (van der Vorst, 1982). This

is done as follows:

L-1 = (D'(I + L,»-1 '" (I-L'+ ... HD,)-l (4.2.1)

The forward solve then becomes a series of matrix-vector multiplies

which can be vectorized. The approximation in (4.2.1) is not exact and

will affect the convergence properties of the algorithm.

4.3 Comparison of Methods

To compare the performance of iterative methods used in reservoir

simulation, some standard model problems have been developed (Stone,

1968; Kershaw, 1978; Elman, 1981; Watts, 1981; Appleyard and Cheshire,

1983; Sherman, 1985). These incorporate some of the the propert ies of

reservoir systems that make the resulting linear equation sets "hard" to

solve. The Stone's model problems solve the following differential

equation:

8 (KX 8p ) + ~ (KY aaE ) = _q ax ax ay "

(4.3.1)

which is discretized on the unit square with uniform mesh size h:

KX 1+1/2,J (Pl+1,J - Pl,J)

h2

(4.3.2)

292

where KX and KY are given by the harmonic mean as

KX 1+1I2,J

2KX KX = I,] 1+1.J KX +KX

1 , J 1 +1 ,J A second

(4.3.1),

model problem involves the three dimensional analogue of

~ (KX 8p) 8 (KY 8p ) + ~ (KZ 8p ) 8x 8x + 8y 8y 8z 8z -q (4.3.3)

This problem is discretized as in (4.3.2).

Both of these problems yield sets of linear equations which are

symmetric and are therefore only useful (as regards applications to

reservoir simulation problems) in testing various types of incomplete

factorizations. To evaluate the performance of the full nonsymmetric

algorithm, the convection-diffusion equation can be used;

with

82p 82p 8 8 0 + fJ1 8~ + fJ2 8~

8x2 8,,2 "

p(x,O)=O, p(O,y)=I, p(x,I)=I, and p (l,y)=O. x

This is discretized on the unit square as

4p - (1 - fJ \l I,J ""2

(1 + fJ 2 h ) P = 0 ""2 1.1-1

- (1 +fJ \l ) P-l, J ""2

- (1 _fJ ~ ""2

(4.3.4)

P,J+l

(4.3.5)

where central differencing is used on the convection terms. The

derivative boundary condition is discretized as

(3 + fJ1h ) p - (1 + fJ 1h ) PI-l,J -""2 1.1 ""2

(1 _ fJ2h )

""2 (1 + fJ2 h ) p = 0

""2 I,J-l (4.3.6)

Note that for fJ1h/2 or fJ2h/2 greater than I, the discretization is no

longer diagonally dominant. This test problem therefore provides a

stringent test for determining which algorithms might be most useful for

reservoir simulation problems.

Finally, several simulator-generated linear systems are used to

test the efficiency of the algorithms in the full three-dimensional,

multiphase environment of the reservoir simulator.

Comparisons involving these model problems wi th different

geometrical configurations are presented below. First, the model problem

in (4.3.2) is solved with the geometry shown in Figure 4.3.1

293

A

Figure 4.3.1

Test Problem 1: KX=1, KY=1, (x,y) e AvBuCvD

This is a symmetric, homogeneous, isotropic problem and should be "easy"

for almost any incomplete factorization.

Test Problem ~: KX=1.0, KY=O.01, (x,y) e AvBuCvD

This is a symmetric, homogeneous but anisotropic problem and therefore

provides a more difficult test for the iterative method.

Test Problem ;!: KX=O.1, KY=1.0, (x,y) e AvBvCvD

This is similar to test problem 2 but with the anisotropy in the

opposite direction.

Test Problem i: KX=KY=1, (x,y) A

KX=1, KY=100, (x,y) e 8

KX=100, KY=1, (x,y) e C

KX=O, KY=O, (x,y) e D

This is a symmetric problem but has anisotropic and discontinuous

coefficients. It provides a severe test of the incomplete factorization.

Traditional methods such as LSOR which will perform well on,problems 1-3

will fail here (Aziz and Settari, 1979).

The model problem in (4.3.2) is also solved with the geometry shown

in Figure 4.3.2. This is called the "staircase" problem and is adapted

from one by Alcouffe et al (1979).

294

·q2 qs q =1. 0 A B ql=O.5

q2=O.6 q3=_1. 83

·q4 q4=-O.27

s

. q3

Figure 4.3.2

Test Problem §.: KX=KY=1000 (x,y) £: B

KX=KY=l elsewhere

This is a symmetric, homogeneous problem with an anisotropy running in a

staircase fashion through the grid. It is a problem originally derived

from reactor physics.

The results for these first five test problems are shown in

Figure 4.3.3 to Figure 4.3.6. The theoretical work is defined in terms

of a work unit, WU, where

wu = number of operations (multiplications and divisions)/N

These work units of course reflect the algori thm' s performance on a

scalar machine. The work counts include the set-up work (factorization

work and calculation of the initial solution and residual, if

applicable), work for forward and backward solve, acceleration work, and

in the case of the 04 ordered algori thms, the reduction work and the

cost of the recovery of the eliminated points. Theoretical work counts

are used instead of CPU time to avoid the issue of coding efficiency,

but in general there Is good correlation between work counts and CPU

times. The initial solution for the pointwise ILU methods is the zero

solution. For nested factorization the initial solution is given by

(3.13.6) .

The algorithms tested are several pointwise ILU methods, including

OKR, a third degree naturally ordered ILU (Figure 3.7.3), a fifth degree

02 ordered ILU, and a third degree 04 ordered ILU. Nested factorization

results are included as well. Since the discretized equation (4.3.2) is

symmetric, conjugate gradient acceleration is used. These test problems

are essentially a test of the various factorization methods.

100 200 300 THEORETICAL WORK

Figure 4.3.3: Test problem 1

i,\ .....

, \-~Vv<\,\ \ \\ ~ 1 "

\ \ \ 1 ~,

\ \ \ \ \. \ \ 'I 1 \.

,\ 1 '.

\ \\ 1 I 1 I \ \

500

OKR NAT 3 02 5 04 3 NESTEO

10-8 +-..,.--,--..,.--,--..,.--.,--..,.---, 200 400 600 800

THEORET I CAL WORK


295

10-8_1~'T'~'~'~lrr'T'T'~"lrr'T'T'~"lrr'T'T'~"lr·rr'T·~·'lr·rT'T'~',I 100 200 300 400 500 500

THEORETICAL WORK


10-2

10-4

10-8

10-8

0 200 400 500 THEORET leAL WORK


OKR NAT 3 025 04 3 NESTED

800

18-1

18-2

18-3

18-4

18-5

18-8

18-7

•

296

.\\ ;\\><~'"''V'''\''''' \ \ ...... . ., \ .....•

\ \ \ ..... . \ I \., ...

............... OKR _.-.-.-.-. NAT 3 •••••••••••• 025

------ 04 3 --- NESTED

,.\ \ \ \ \ .... \ \ .....

\ \ ...

\ '. ... , :

\ \ \.. \ \ \.

28. 418 61. B88 THEORETICAL WORK


The fastest rate of convergence in all cases is produced by the

third degree D4 algorithm (labelled D4 3 in the figures). For problem 2,

nested factorization has a slightly lower total work count due to its

lower set-up cost. Note that for problem 3 the nested factorization is

very slow to converge. This is due to the method's sensitivity to grid

node ordering. Problem 3 is essentially the same as problem 2 with the

grid nodes ordered in the y-direction first. The third degree D4

algorithm, in contrast, shows little sensitivity to direction.

The next set of problems solve (4.3.3) on a 21x21x21 grid with

homogeneous Neumann boundary conditions.

Test Problem 12.: KX=KY=KZ=1 .

Test Problem 7.: KX=KY=1, KZ=100.

Test Problem !l: KX=KY=1.0, KZ=0.01.

The results of test problems'S, 7 and 8 are shown in Figures 4.3.8,

4.3.9 and 4.3.10. The works units and initial solution are as described

above. Again the discretized equation is symmetric so that the problems

test the different incomplete factorization methods. The results

plotted include the DKR algorithm, a third degree naturally ordered ILU,

first and second degree D4 ordered ILUs and the nested factorization.

II-I ••••••••••••••• DlCR •••••• ,.-. NAT 3 •••••••••••• 04 I

11-2 .----- 04 2 --- NESTED

11-3

11-4

ll-S

II-II

1.-7 , 288 41' III. 811 II. THEORETICAL WORK

297

11'1

11-2

1,-3

11-4

II-II

,

............... DKR •••••••••• NAT 3 •••••••••••• 04 I ••••• - 04 2 --- NESTED

411 ,.. 1288 1& •• THEORETICAL WORK

Figure 4.3.8: Test problem 6 Figure 4.3.9: Test problem 7

Note that for the isotropic problem (problem 6) all the

ILU methods perform better than the nested factorization.

pointwise

For the

anisotropic problems (7 and 8) nested factorization is about the same as

first degree D4 on problem 7 and slightly better on problem 8. In both

cases the optimal direction for nested factorization has been chosen .

1,'4

II-II

,

............... DKR -.-••• _... NAT 3 ............ 04 I .--.-- 04 2 --- NESTED

4.. ,.. 12.. 16.' 2 ••• THEORETICAL WORK


The discretized equation (4.3.5) leads to a non-symmetric matrix

problem and is therefore a useful test of algorithm performance.

Test Problem Ii: Uses (4.3.5) with ~ =~ =100 and is discretized 1 2

298

on a 31x31 grid ( Note that ~1h12 is greater than 1).

The results for test problem 9 are shown in Figure 4.3.11. ORTHOMIN

acceleration with 8 orthogonalizations is used for all cases.

t ••• : ......... .

~ ....... \~ " ..... ,. ,\ .......

\ ~ ~' .... .. \ ....... \

-----•• DKR --.-.-- NAT;'

------- D4 2 -- NESTED

\ \ .. \ ........... , "

\ ... . \ ........ ...

\ '\ \ , \ , ... , ... \ '.

100 200 300 400 ~oo T~T1C'" \oOlI(


The results for test problem 9 show the best rate of convergence to

be for the second degree D4 algorithm. The nested factorization behaves

very much like the third degree natural factorization. There is very

little variation with direction of grid ordering since there is no

intrinsic preferred direction in the physics of the problem. Note that

the all the ORTHOMIN accelerated approximate factorizations converge on

this problem which is not diagonally dominant.

The last series of problems are simulator-generated problems and

will be described briefly with some of the special features which make

them "difficult" to solve.

Test Problem 10:

oil simulator

This problem was produced by a fully implicit black

The size of the grid is 10x10x3 and there are

permeability contrasts in the layers. The time step at the time the

matrix was generated was 100 days, making this a fairly difficult

problem.

Test Problem 11: This problem was produced by an IMPES black 011

simulator with 12,960 unknowns. There were transmissibility variations

of from 0 to 20.

Test Problem 12: This problem was produced by a steam simulator

299

solving for 3 unknowns per grid node. The dimensions of the problem are

11x11xS. There is a low permeabill ty layer wi th KX=KZ=O, KZ=4x10-'

darcies separating the reservoir vertically into two halves with

injection wells in one half and a production well in the other half.

The results for problems 10, 11 and 12 are given in Tables 4.3.1,

4.3.2 and 4.3.3. The ORTHOMIN acceleration with 10 orthogonalizations is

used for problems 10 and 12 which are strongly nonsymmetric. Problem 11

involves only a pressure equation from an IMPES black oil simulator and

has a better rate of convergence when conjugate gradient acceleration is

used (the pressure equation is nearly symmetric). The tables show

theoretical work units, number of iterations and CPU times for a

normalized residual reduction of 10-e, where the normalized residual is

defined as

II r II 00

II roll 00

and r is the residual at any iteration, and r the initial residual. o

Method Theoretical Work Number of CPU sec. on Iterations Honeywe 11 OPSS

Nested 3 432 10 lS.S

OKR 11 034+ 60+ 6S.6

3rd degree 9 063 21 37.3 natural

3rd degree 5 466 10 23.3 02

1st degree 04 2 547 6 12.7

2nd degree 04 6 641 5 21. 3

Table 4.3.1: Test problem 10 (fully implicit black oil)

The results for test problem 10 show that the best rate of

convergence is given by the 1st degree 04 algori thm The grid node

ordering used is the one that is optimal for the nested factorization.

Note that although the second degree 04 factorization required fewer

iterations, it did not payoff in terms of computational cost. Note also

that the OKR algorithm failed to converge in 60 iterations. Even though

this algorithm will run faster on a vector machine its poor convergence

properties would make it of little use.

300

Method Theoretical Work Number of CPU sec. on Iterations Honeywe 11 DPSS

Nested 641 20 271

OKR 1 095 51 332

3rd degree 1 319 35 325 natural

3rd degree 934 24 249 D2

1st degree 04 469 22 151

2nd degree 04 105 16 115

Table 4.3.2: Test problem 11 (IMPES black oil problem)

The results for problem 11 again show the best overall algorithm to

be first degree 04. The optimal grid ordering for nested factorization

is used. This is a considerably larger problem than problem 10 and this

is reflected in the increased number of iterations required for

convergence. The problem also has a fair degree of anisotropy and the

OKR and naturally ordered third degree ILU do not perform well.

Method Theoretical Work Number of CPU sec. on Iterations Honeywe 11 OPSS

Nested 3 432 10 61

OKR 11 034+ 60+ 184+

3rd degree 10 008 24 133 natural

3rd degree 9 063 21 123 02

1st degree 04 5 231 22 16

2nd degree 04 1 812 10 93

Table 4.3.3: Test problem 12 (fully implicit steam problem)

Test problem 12 has a strong directional bias and nested

factorization (with optimal ordering) performs very well on this

problem. The second degree D4 algorithm is second best. Note that the

OKR algorithm again fails to converge on this difficult steam

problem. If the ordering used is changed • the number of iterations

required for convergence of the nested factorization can vary from 10 to

28. The variation in the second degree D4 is from 10 to 11.

The performance of various algorithms has been tested on the model

problems and on simulator generated problems. Guidelines for the use of

incomplete factorization iterative methods in reservoir simUlation can

301

be outlined as follows:

(1) Use of diagonal ordering is important when dealing with

problems with inherent anisotropies and/or discontinuities.

Both 02 ordered and 04 ordered pointwise ILU's perform better.

The 04 ordered ILU's are generally the best due to the reduced

system property of this ordering (where only half the unknowns

need be solved for).

(2) Block incomplete factorization methods (such as nested

factorization also have potential. They are always better than

their naturally ordered pointwise counterparts. They do not

beat the performance of the 04 ordered pointwise methods,

except in cases with very strong anisotropies and when optimal

ordering is chosen.

302

BIBLIOGRAPHY

Alcouffe, R. E. ,

Multi-grid

Brandt,A. ,

Method for

Dendy, J. E. Jr. , and Painter, J. W. , The

The

Discontinuous Coefficients,

430-454.

Diffusion Equation with

SIAM J.Sci.Stat.Comput. 2

Strongly

(1981)

Au,A.D.K., Behle,A., Rubin,B., and Vinsome,P.K.W., Techniques For Fully

Implicit Reservoir Simulation, Paper SPE 9302, presented at the

Fall Meeting of SPE (Dallas, 1980).

Axelsson, O. and Gustafsson, I. , On the Use of Preconditioned Conjugate

Gradient Methods for Red-Black Ordered Five Point Difference

Schemes, J.Comp.Phys. 35 (1980) 284-289.

Axelsson,O., Conjugate Gradient Type Methods For Unsymmetric and

Inconsistent Systems of Linear Equations, Lin. Alg.Appl. 29 (1980)

1-16.

Aziz,K. and Settari,A., Petroleum Reservoir Simulation, Applied Science,

London, 1979.

Behie,G.A., and Forsyth,P.A., Comparison of Fast Iterative Methods For

Symmetric Systems, IMA J. Num. Anal 3 (1983) 41-63.

Behie,A., and Forsyth,P.A., Practical Considerations For Incomplete

Factorization Methods in Reservoir Simulation, SPE 12263, presented

at the Seventh SPE SYmposium on Reservoir Simulation, San

Franclsco,1983.

Behie,G.A., and Forsyth,P.A., Multi-Grid Solution of Three-Dimensional

Problems With Discontinuous Coefficients, Appl.Math.Comp. 13 (1983)

229-240.

Behie,G.A., and Forsyth,P.A., Incomplete Factorization Methods for Fully

Implicit Simulation of Enhanced Oil Recovery, SIAM

J.Sci.Stat.Comput. 5 (1984) 543-561.

Behie,A., Comparison of Nested Factorization, Constrained Pressure

Residual and Incomplete Factorization Preconditionings, SPE 13531,

presented at the Eighth SPE SYmposium on Reservoir Simulation,

Dallas, 1985.

Behie,A., Collins,D., Forsyth,P., and Sammon,P., Fully Coupled

Multi-Block Wells In Reservoir Simulation, SPEJ (August, 1985).

Bell,J., Trangenstein,J.A. ,and Shubin,G., Conservation Laws of Mixed

Type Describing Three-Phase Flow In Porous Media, submitted to SIAM

J.Appl.Math.

303

Brandt, A., Mult i-Level Adapt Ive Solut ions to Boundary-Value Problems,

Math.Comp. 31 (1977) 333-390.

Brandt,A., and Dinar,N., Multi-Grid Solutions to Elliptical Flow

Problems, lCASE Report No. 79-15 (1979).

Brandt,A., Algebraic Multigrid Theory: The Symmetric Case,

Appl.Math.Comp. 19 (1986).

Brandt,A., Multi-Level Approaches to Large Scale Problems, Survey

Lecture at rCM-86 (Berkeley, August 1986).

Concus,P., Golub,G.H., and O'Leary,D.P., A Generalized Conjugate

Gradient Method for the Numerical Solution of Elliptic Partial

Differential Equations, Lawrence Berkeley Laboratory Pub. LBL-4604,

Berkeley,CA (1975).

Dendy,J.E.Jr., Black Box Multigrid For Nonsymmetric Problems,

Appl.Math.Comp. 13 (1983) 261-283.

Dendy,J.E.Jr., Black Box Multigrid for Systems, Appl.Math.Comp., to

appear.

Dendy,J.E.Jr., Two Multigrid Methods for Three-Dimensional Problems with

Discontinuous and Anisotropic Coefficients, SIAM J.Sci.Stat.Comput.

8 (1987) 673-685.

Dupont,T., Kendall,R.P. and Ratchford,H.H., An Approximate Factorization

Procedure For Solving Self-Adjoint Elliptic Difference Equations,

SIAM J. Numer. Anal. 5 (1968) 559-573.

Eisenstat,S.C., Elman,H.C., Schultz,M.H., and Sherman,A. The (New) Yale

Sparse Matrix Package, Yale Univ. Rep. 265, Yale Univ., New Haven,

er, 1983.

Elman,H.C., Iterative Methods for Large, Sparse Nonsymmetric Systems of

Linear Equations, Ph. D. Thesis, Yale Univ. Rep. 229, Yale Univ.,

New Haven, CT, 1981.

Fayers,F.J., and Matthews,J.D., Evaluation of Normalized Stone's Methods

for Estimating Three-Phase Relative Permeabilities, SPEJ 24 (1984)

225-232.

Feingold,D.G. and Varga,R.S., Block Diagonally Dominant Matrices and

Generalizations of the Gerschgorin Circle Theorem, Pacific J. Math

12 (1962) 1241-1250.

Forsyth,P.A., and Sammon,P.H., Pract'ical Considerations forAdaptlve

Implicit Methods in Reservoir Sil1Jjlation, J.Comp.Phys. 62 (No.2)

(1986).

Forsyth,P.A., and Sammon,P.H., Quadratic Convergence for Cell Centered

304

Grids, App. Num. Math., to appear.

Fulks,W.B., Guenther,R.B., Roetman,E.L., Acta Mech. 12 (1971) 121.

George, A., Nested Dissection of A Regular Finite Element Mesh, SIAM J.

Num.Anal 10 (No.2) (1973) 345-363.

Gustafsson, I. , On First Order Factorization Methods for the Solution of

Problems With Discontinuous Material Coefficients, Technical

Report, Computer Sciences 77.13 R, Chalmers University of

Technology, Goteborg, Sweden (1977).

Gustafsson, I., A Class of First Order Factorization Methods, BIT 18

(1978) 142-156.

Hemker,P.W., On the Comparison of Line-Gauss-Seidel and ILU Relaxation

in Mult igr id Algorithms, Preprint NW 129/82, Dept. of Numerical

Mathematics, Mathematical Centre, Amsterdam, 1982.

Kershaw,D.S., The Incomplete Cholesky Conjugate Gradient Method For The

Iterative Solution of Systems of Linear Equations, J.Comp.Phys. 26

(1978) 43-65.

Kettler,R., Analysis and Comparison of Relaxation Schemes in Robust

Multi-Grid and Preconditioned Conjugate Gradient Methods, Lecture

Notes in Mathematics, Springer-Verlag, Berlin 1982.

Manteuffel,T.A., The Tchebychev Iteration for Nonsymmetric Linear

Systems, Numer.Math. 28 (1977) 307-327.

Manteuffel,T.A. and White,A.B., The Numerical Solution of Second-Order

Boundary Value Problems On Nonuniform Meshes, Los Alamos National

Laboratory preprint LA-UR-84-196, submitted to Mathematics of

Computation.

McDonald, A. E., and Trimble, R. H., Efficient Use of Mass Storage During

Elimination for Sparse Sets of Simultaneous Equations, SPEJ

(August,1977) 300-316.

MeiJerink,J.A., and van der Vorst,H.A., An Iterative Solution Method For

Linear Systems In Which The Coefficient Matrix Is A Symmetric M

Matrix, Math.Comp. 31 (1977) 148-162.

MeiJerink, J. A., and van der Vorst, H. A., Guidelines for the Usage of

Incomplete Decompositions in Solving Sets of Linear Equations as

Occur in Practical Problems, J.Comp.Phys. 44 (1981) 134-155.

Odeh, A., A Comparison of Solut ions to A Three Dimensional Black 011

Reservoir Simulation Problem, J. Pet. Tech. 33 (1981) 13-25.

Peaceman,D.W., Fundamentals of Numerical Reservoir Simulation, Elsevier,

Amsterdam (1977)

305

Price, H.S., and Coats, K., Direct Methods In Reservoir Simulation,

Trans. SPE of AlME 257 (1974) 295-308.

Saad,Y., and Schultz,M.H., GnRES: A Generalized Minimal Residual

Algorithm For Solving Nonsymmetric Linear Systems, SIAM

J.Sci.Stat.Comput. 7 (1986) 856-870.

Saylor,P.E., Second Order Strongly Implicit Symmetric Factorization

Methods For The Solution of Elliptic Difference Equations,

SIAM J.Numer.Anal. 11 (1974) 894-908.

Scheidegger,A.E., The Physics of Flow Through Porous Media, Univ.

Toronto Press, Toronto, 1950.

Shah, P.C., A Nine-Point Finite Difference Operator for Reduction of the

Grid Orientation Effect, SPE 12251, presented at the Seventh SPE

Symposium On Reservoir Simulation, San Francisco, 1983.

Spivak,A., and Dixon, T.N., Simulation of Gas Condensate Reservoirs,

Third SPE Symposium on Numerical Simulation of Reservoir

Performance (Houston, 1973).

Stone,H.L., Probability Model For Estimating Three-Phase Relative

Permeability, JPT (1970) 214-218.

Stone,H.L., Iterative Solution

Multi-Dimensional Partial

J.Num.Analysis 5 (1968) 530-558.

of Implic1 t

Differential

Approximations of

Equations, SIAM

Trangenstein, J.A., and Bell,J.B., The Mathemtical Structure of

Black-Oil Reservoir Simulation, submitted to SIAM J.Appl.Math.

Thomas, G. W., and Thurnau, D. H., Reservoir Simulation Using An Adaptive

Implicit Method, Soc. Pet. Eng.J. 23 (1983)

Towler, B. F., and Killough, J. E. ,

Conjugate Gradient Method

Comparison of Preconditioners For The

in Reservoir Simulation, SPE 10490,

presented at the Sixth SPE Symposium on Reservoir Simulation, New

Orleans, 1982.

Varah, J.M., On the Solution of Block Tridiagonal Systems Arising From

Certain Finite Difference Equations, Mathematics of Computation

26 (No. 120) (1972) 859-869.

Varga, R. S., Matrix Iterative Analysis, Prentice-Hall Inc., Englewood

Cliffs, NJ, 1962.

Vinsome,P.K.W., ORTHOMIN, An Iterative Method For Solving Sparse Sets of

Simultaneous Linear Equations, paper SPE 5729, Fourth SPE Symposium

On Numerical Simulation of Reservoir Performance, Los Angeles,

1976.

306

Watts,J.W. III, A Conjugate Gradient-Truncated Direct Method for the

Iterative Solution of the Reservoir Sil1lUlation Pressure Equation,

SPEJ 21 (1981) 345-353.

Weiser, A. and Wheeler, M. F., On Convergence of Block-Centered Finite

Differences for Elliptic Problems, Exxon Production Research

Company Report TR-SR-84-14.

Whitaker,S., Ind. Eng. Chem., 62 (10) (1970) 54.

Wilkinson,J.H., Error Analysis of Direct Methods of Matrix Inversion, J.

Assoc. Comput. Mach. 8 (1961) 281-330.

Woo, P.T., Eisenstat, S.C., Schultz,M.H., and Sherman,A.H., Application

of Sparse Matrix Techniques to Reservoir Sil1lUlation, Sparse Matrix

Computations, Academic Press, New York (1976).

Yanosik, J.L., and McCraken, T.A., A Nine-Point Finite Difference

Reservoir Sil1lUlator For Relative Prediction of Adverse Mobility

Ratio Displacemants, SPEJ (August,1979) 253-262.

Young,D.M. and Jea,K.C., Generalized Conjugate Gradient Acceleration of

Nonsymmetric Iterative Methods, Linear Algebra and Appl. 34 (1980)

159-194.

Documents

[Lecture Notes in Engineering] Multiphase Flow in Porous Media Volume 34 || Numerical Linear Algebra for Reservoir Simulation