123
Copyright by Donovan Kilmartin 2009

Copyright by Donovan Kilmartin 2009 · PDF fileSequential simulation algorithms in geostatistic require an assessment of local uncertainty in an attribute value at a location followed

Embed Size (px)

Citation preview

Copyright

by

Donovan Kilmartin

2009

Development of Reservoir Models using Economic Loss Functions

by

Donovan James Kilmartin, B.Sc

Thesis

Presented to the Faculty of the Graduate School of

The University of Texas at Austin

in Partial Fulfillment

of the Requirements

for the Degree of

Master of Science in Engineering

The University of Texas at Austin

May 2009

Development of Reservoir Models using Economic Loss Functions

Approved by Supervising Committee: Sanjay Srinivasan, Supervisor Larry W. Lake

Dedication

This thesis is dedicated to my loving fiancée Jill and my parents Carol and Jim. Without

their support through the years, none of this would be possible.

v

Acknowledgements

I am extremely grateful to my supervisor Dr. Srinivasan. His valuable feedback

and support have been priceless and have helped this project to continually move

forward. It has been a pleasure working for Dr. Srinivasan, and I am thankful he has

shared in knowledge and experience with me.

May 8, 2009

vi

Abstract

Development of Reservoir Models using Economic Loss Functions

Donovan James Kilmartin, M.S.E

The University of Texas at Austin, 2009

Supervisor: Sanjay Srinivasan

As oil and gas supply decrease, it becomes more important to quantify the

uncertainty associated with reservoir models and implementation of field development

decisions. Various geostatistical methods have assisted in the development of field scale

models of reservoir heterogeneity. Sequential simulation algorithms in geostatistic

require an assessment of local uncertainty in an attribute value at a location followed by

random sampling from the uncertainty distribution to retrieve the simulation value.

Instead of random sampling of an outcome from the uncertainty distrubution, the retrieval

of an optimal simulated value at each location by considering an economic loss function

is demonstrated in this thesis.

By applying a loss function that depicts the economic impact of an over or

underestimation at a location and retrieving the optimal simulated value that minimizes

the expected loss, a map of simulated values can be generated that accounts for the

impact of permeability as it relates to economic loss. Both an asymmetric linear loss

vii

function and a parabolic loss function models are investigated. The end result of this

procedure will be a reservoir realization that exhibits the correct spatial characteristics

(i.e. variogram reproduction) while, at the same time, exhibiting the minimum expected

loss in terms of the parameters used to construct the loss function.

The process detailed in this thesis provides an effective alternative whereby

realizations in the middle of the uncertainty distribution can be directly retrieved by

application of suitable loss functions. An extension of this method is to alter the loss

function (so as to emphasize either under or over estimation), other realizations at the

extremes of the global uncertainty distribution can also be retrieved, thereby eliminating

the necessity for the generation of a large suite of realizations to locate the global

extremes of the uncertainty distribution.

viii

Table of Contents

Table of Contents................................................................................................. viii

List of Tables ...........................................................................................................x

List of Figures ........................................................................................................ xi

Chapter 1: Introduction ............................................................................................1

Chapter 2: Literature Review...................................................................................6 2.1 Sequential Simulation Conditioned to Data..............................................6

2.2.1 Parametric Approach to Sequential Simulation............................8 2.2.2 Non-Parametric Approach to Sequential Simulation..................10

2.3 Loss Function in the Sequential Simulation Framework........................13 2.3.3 Applications of Loss Functions in Various Engineering Fields .16

Chapter 3: Problem Setup ......................................................................................23 3.1 Conditioning Data...................................................................................24 3.2 Sub-Domain Delineation ........................................................................26

Chapter 4: Loss Function Development ...............................................................35 4.1 Loss Function Development for Delineated Sub-Domains ....................35 4.2 Loss Function Optimization....................................................................41

4.2.1 Analytical Solution .....................................................................42 4.2.2 Numerical Solution .....................................................................47

Chapter 5: Implementation of Optimized Loss Function within a Sequential Simulation framework ..................................................................................50 5.1 Implementation within SISIM Algorithm...............................................50 5.2 Loss Function Implementation for Individual Sub-Domains ..............52

5.2.1 Parabolic Loss Function..............................................................56 5.3.1 Sampling realizations within specific NPV ranges.....................60

ix

Chapter 6: Conclusion............................................................................................66

Appendix A: PCA Example...................................................................................71

Appendix B: Analytical Solution for Optimized Loss Function ...........................74

Appendix C: Modification to SISIM Code............................................................78

Appendix D: Code for Implementation Analytical Solution for Optimized Loss Function ........................................................................................................86

Appendix E: Code for Implementation Numerical Solution for Optimized Loss Function ........................................................................................................98

References............................................................................................................107

Vita ..................................................................................................................111

x

List of Tables

Table 1: Loss function model summary. .................................................................41

Table 2: Asymmetric linear loss function models for the four sub-domains...........53

Table 3: Parabolic loss function models. .................................................................56

Table 4: Alterations made to the loss function to sample specific parts of the global NPV distributions. .........................................................................................61

Table 5: Global extreme estimations. ......................................................................64

Table 6: Original data and adjusted data set for PCA example. .............................71

xi

List of Figures

Figure 1: Methodology of thesis...............................................................................5

Figure 2: Asymmetric linear loss function. ............................................................15

Figure 3: Histogram of permeability used for the unconditional indicator simulation........................................................................................................................25

Figure 4: Location of the 100 conditioning data......................................................26

Figure 5: Flow chart PCA for sub-domain delineation...........................................29

Figure 6: Histogram of conditioning data used for generating the suite of realizations for domain delineation. ..................................................................................30

Figure 7: Sample realizations obtained using SISIM conditioned to the available data........................................................................................................................31

Figure 8: Upscaled versions of the sample models in Figure 7. ..............................32

Figure 9: Sub-domain identification by PCA. ........................................................33

Figure 10: Locations selected for computation of loss functions (left) accompanied by sub-domain identification for a reference (right). ..........................................37

Figure 11: Sample nodes from each region showing relation between economic loss and permeability estimation error (region 1 (upper left); region 2 (upper right); region 3 (lower left); region 4 (lower right)). Green dashed lines indicate a window containing 70% of the data, plus and circle represent under and overestimation respectively. ..........................................................................38

Figure 12: Same representative region nodes from Figure 11 fitted with asymmetric linear (black lines) and parabolic loss functions (black curves). ...................40

Figure 13: Linear and quadratic loss function models (average of best fits)...........41

Figure 14: Flow chart for numerical solution. ........................................................48

Figure 15: SISIM algorithm incorporating loss function for optimal estimation. ...52

Figure 16: The average of 50 realizations obtained using SISIM without implementing the loss functions (left), and the sub-domain identification plot (right). .......54

Figure 17: A single permeability realization obtained after implementing the loss functions (left), and the average of 10 realizations obtained after implementing the loss functions. ..........................................................................................54

Figure 18: Single and averaged optimized parabolic loss function realizations......57

Figure 19: Reiteration of sub-domain identification by PCA..................................58

xii

Figure 20: Histogram of NPVs corresponding to i) traditional SISIM realizations (blue); ii) asymmetric linear loss functions (brown); and iii) parabolic loss functions (green). ...........................................................................................59

Figure 21: Base case loss function model (left). For comparison the permeability model for a typical SISIM model is shown (right). ..................................................62

Figure 22: Permeability model corresponding to where the loss function penalizes over-estimation more. For comparison the permeability model for the base case loss function is also shown (right). .......................................................................63

Figure 23: Permeability model corresponding to where the loss function penalizes under-estimation more. For comparison the permeability model for the base case is also shown (right).......................................................................................64

Figure 24: Histogram of case study including global extremes...............................65

1

Chapter 1: Introduction

As the demand for oil continues to increase, so does the demand for new

technology and innovation that can help increase the understanding of petroleum

reservoirs. There are many areas of study that help to quantify and understand the fluid

dynamics, and rock-fluid interactions within the reservoir. This understanding is key to

developing field scale models of reservoir heterogeneity that can be used to assess future

reservoir performance. This research focuses on the development of optimized reservoir

models taking into account the effect of the loss associated with wrong estimation of

permeability in different regions of the reservoir. The regions are defined using a domain

delineation algorithm using principal component analysis (PCA).

Sequential simulation methods for reservoir model development include a variety

of different methods to generate multiple equi-probable models of the reservoir.

Sequential Gaussian simulation (SGSIM) and sequential indicator simulation (SISIM) are

examples of sequential simulation methods. In sequential Gaussian simulation, the

Kriged estimate and variance are identified with the mean and variance of a Gaussian

distribution and the simulated value at a node is obtained by randomly sampling from that

distribution. In other words, the simulated models are realizations of the Gaussian

random function model (Deutsch and Journel, 1998; Alabert et al., 1992; Tran, et al.,

2001; Caers, 2001; Jensen, 2000). In comparison, SISIM uses indicator Kriging to

directly model local conditional distributions that can be non-Gaussian (Journel, 1988)

and simulated values are obtained again by randomly sampling the Kriged distribution.

The realizations generated using both SGSIM and SISIM can be used for a number of

different applications including reservoir flow simulation for assessing flow performance.

2

There are many commercial software programs that are used for reservoir flow

simulation. Flow simulation software are complex and the accuracy of the results are

significantly affected by the numerical procedures used to solve the flow equations, as

well as other modeling decisions such as the spatial discretization scheme employed and

the selection of time step size used for modeling the dynamical system (Fanchi 2001).

Sub-domains specify which regions are the most important in terms of impacting

the well response (Yadav et al., 2005; Yadav, 2006; Smith, 2002). One approach for sub-

domain delineation uses principle component analysis (PCA) of the Hessian matrix that

reflects the variation in flow response due to changes in reservoir properties such as

permeability and porosity. PCA is a procedure that allows grouping of grid nodes into

regions based on the similarity of their influence on the flow response (Smith, 2002).

The optimized spatial distribution of permeability within a delineated domain is

obtained by applying a loss function. Instead of randomly sampling from a local

conditional probability distribution (as in SGSIM or SISIM), an optimal local estimate of

permeability at a location is obtained by minimizing the expected loss. A loss function is

a mathematical function that describes the relationship between estimation error and loss

associated with that error (Journel, 1988; Srivastava, 1990; Ma and Zhao, 2005; Rukhin,

1988; Suresh, 2008). Loss, in this case, is defined as the economic impact of wrongly

estimating the permeability value at a location. The key conjecture is that error in

estimation of permeability at a location translates to errors in prediction of well

response(s) and that translates to an economic loss. Since the sensitivity of flow response

to permeability could be different in different regions of the reservoir, hence the

precursory step of domain delineation. The end result of this procedure will be a reservoir

realization that exhibits the correct spatial characteristics (i.e. variogram reproduction)

while at the same time exhibiting the minimum expected loss in terms of the parameters

3

used to construct the loss function. For example, is the loss function considers the NPV

due to fluid (oil + water + gas) production as well as facilities cost, then the simulated

realization would be such as to exhibit the spatial permeability field that yields the

minimum expected loss or fluid rates that are neither too optimistic nor too pessimistic.

As reservoir simulators become more advanced, they are able to give insight into

future reservoir operations. The integration of past production data to adjust the pressure

response has become a very useful tool (Sener and Bakiler 1989; Culham et al., 1969;

Faroug Ali and Neison, 1970; Yadav 2006; Yadav et al., 2005). This amalgamation of

production data with the simulation is generally referred to as history matching. History

matching makes the simulation more accurate for understanding both current and future

production. The ultimate goal of reservoir and history matching is to develop an accurate

representation of reservoir heterogeneity that can be subsequently used to develop an

optimal strategy for reservoir production. An extension of the proposed methodology

would be to define the loss function in terms of deviation from the true NPV (obtained on

the basis of the available production data). The loss function would then account for the

sensitivity of the model NPV to changes in permeability at different locations in the

reservoir. The reservoir realization corresponding to the minimum expected loss would

be a model that comes the closest to the observed production data. While this application

to history matching is not discussed elaborately in this thesis, several other researches

have developed methodology that utilize the sensitivity matrix (Yadav, 2005; Yadav et

al,. 2007, and Yadav 2006) or Hessian to guide the history matching process. It can be

argued that the Hessian matrix (reflecting the sensitivity of the well response to changes

in model parameters) is a special case of a loss function (one that does not consider any

economic factors).

4

Traditional sequential simulation methodology is based upon the construction of

the local conditional probability distribution function (lcpdf) at each location conditioned

to the surrounding data. The simulated value is obtained by randomly sampling from the

lcpdf. In lieu of such random sampling of an outcome from the lcpdf, the retrieval of an

optimal simulated value at each location by considering an economic loss function is

demonstrated in this thesis. It is conjectured that by applying a loss function that depicts

the economic impact of an over or underestimation at a location, and retrieving the

optimal simulated value that minimizes the expected loss, a map of simulated values can

be generated that is risk neutral. Such a map when processed through a flow simulator

should yield profiles for oil rate and water production rate that are neither overly

optimistic nor overly pessimistic. In the traditional workflow for assessing uncertainty in

reservoir performance, a large suite of realizations of the reservoir model need to be

generated and then processed through the flow simulator. Realizations in the middle of

the uncertainty distribution (signifying risk-neutrality) can then be retrieved. The process

detailed in this thesis provides an effective alternative whereby realizations in the middle

of the uncertainty distribution can be directly retrieved by application of suitable loss

functions. By altering the loss function (so as to emphasize either under or over

estimation), other realizations at the extremes of the global uncertainty distribution can

also be retrieved, thereby eliminating the necessity for the generation of a large suite of

realizations.

5

Figure 1: Methodology of thesis.

The thesis organization follows a linear path, so the information in each section

flows into the next section or chapter’s discussion. First, the basic input parameters and

methodology for delineated the sub-domain will be introduced. Next there is a detailed

discussion on the development of a loss function for the delineated sub-domains. The

chapter details the implementation of both asymmetric and parabolic loss functions

within the sequential simulation framework. The final topic discusses a case study

showing how varying the loss functions can give an indication to the endpoint of the

NPV uncertainty distribution.

6

Chapter 2: Literature Review

Many of the topics involved with this thesis deal with the sequential paradigm.

Therefore, this chapter includes the theory involved in the sequential framework for both

parametric and non-parametric estimators. In addition, some of the strengths and

shortcomings of Gaussian and indicator kriging are shown. Loss functions are introduced

to help exploit the benefits of sequential modeling while avoiding kriging pitfalls. A few

simple optimal estimators for loss function are explores. These examples should help

with the understanding of the theory of how optimal solution can be formulated from loss

functions. The last part of the chapter discusses alternative loss function applications.

Loss functions are not a new concept and have been used in a number of different

engineering applications including but not limited to electrical, quality, mechanical, and

petroleum engineer. Some basic problem and various forms of loss functions from with

these different engineering fields are discussed. This chapter’s main focus is to provide

the theory necessary for comprehension of the material presented throughout this thesis.

2.1 SEQUENTIAL SIMULATION CONDITIONED TO DATA

There are a variety of different geostatistical methods that can be implemented for

generating realizations of the reservoir model. Much of the work related to this thesis

focuses on the framework of sequential simulation. Sequential simulation is a useful tool

to generate multiple realization that honor a set of conditioning data while representing

the spatial connectivity of the attribute being modeled as captured by the spatial

covariance or semivariogram. Sequential simulation is a sequential application of Bayes

rule for the synthesis of a multivariate distribution using conditional and marginal

7

distributions (Casteel, 1997; Journel 1989). Consider, a realization with N data to be

simulated initially conditioned on a set of n known values. The process of constructing

a model for attribute values at the N nodes is tantamount to sampling a realization from

the N-variate joint distribution shown on the left hand side of Equation (2.1). In

Equation (2.1), u denotes the locations and z indicates the outcome of the RV. The joint

distribution characterizes the joint variability of the N random variables.

By the definition of conditional distributions, a joint distribution of N RVs Ai can be

expressed as:

1 2 1 2 1

1 2 1

1 2 1 2 1

1 2 3 1

1

Prob( , ..., ) Prob( | , ,.... )*Prob( , ,..., )

Prob( , ..., ) Prob( | , ,.... )...*Prob( | , ,.... )...

*Prob( )

N N N N

N N

N N N N

N N N

A A A A A A AA A A

A A A A A A AA A A A

A

− −

− −

− −

− − −

=

= (2.1)

Applying this to the previous joint probability distribution of N RVs at spatial locations,

we get:

1 2 1 1 1

1 2 1 1 2 1

1 1 1 1 2 1

1 2 2 1 1 2

( , ,..., ; ,..., | ) ( ; | , ,..., )* ( , ,..., ; , ,..., | )

( ; | , ,..., )* ( ; | , ,... )...( , ,..., ; , ,..., ;| )

( ; | ( 1

N N N N N

N N

N N N N N N

N N

N N

F u u u z z n F u z n z zF u u u z z z n

F u z n z z F u z n z zF u u u z z z n

F u z n N

− −

− − − −

− −

=

=

= + − 1 1 1

1 1

))* ( ; ; | ( 2))...( ; | )

N n NF u z z n NF u z n

− − − + − (2.2)

In the implementation of Equation (2.2), the RHS proceeds from the last term to

the first. The conditional distribution at location u1 conditioned to the n data is

8

constructed first. An outcome z1 is sampled from that distribution. That simulated value

along with the n original data is used to construct the conditional distribution at the

location u2. The process of simulating and updating the conditioning data set is continued

along a random path through all the N nodes of the model. The various sequential

methods are different from each other in the approach used to determine the local

conditional distribution lcpdf (Journel, 1988). The lcpdf’s F(u1;z1| n) can be determined

using parametric and non-parametric methods. Below is a list of the steps for the general

sequential algorithm.

Steps in Sequential Simulation

1. Model the lcdf at the first location u1 using prior conditioning data n

2. Sample from z1 from the lcdf

3. Add the sample data to the conditioning data which is now of size n+1

4. Model a new lcdf using the updated conditioning data

5. Draw a sample z2

6. Repeat the process until realization is completely populated

Since the nodes are visited along a random path and the simulated value at a node is

obtained by randomly sampling from the lcpdf at that node, the process yields several

equi-probable realization of the RF represented by the LHS of expression (2).

2.2.1 Parametric Approach to Sequential Simulation

Sequential simulation is easily rendered feasible with the assumption of

multivariate Gaussianity (Deutsch and Journel, 1998; Alabeit et al., 1992; Tran, et al.,

2001; Caers, 2001; Jensen, 2000). Multivariate Gaussianity implies Gaussian local

conditional distributions. This in turn implies that the lcpdf’s and indeed the entire

9

multivariate distribution can be fully determined if the local conditional means and

variance can be accurately calculated. Furthermore, it is known (Deutsch, C.V. and

Journel, 1998) that the conditional mean of a Gaussian conditional distribution can be

expressed as a linear weighted combination of the conditioning values and the

corresponding conditional variance is homoscedastic i.e. independent of the conditioning

data (only a function of the correlations between the data and the unknown). The

equation for a Kriged estimator and Kriged variance are presented below in Equation

(2.3) and (2.4) respectively, where z(u) is the true value, z*(u) is the estimator, λ is the

weight associated with each known value, and C(h) is the covariance model as a function

of lag distance. Consistent with the requirements of the conditional mean of a Gaussian

lcpdf, the Kriging estimate is expressed as a linear combination of the available data and

the estimation variance is independent of the available data. The weights are determined

so that the error variance { }2*( ) ( )E z u z u⎡ ⎤−⎣ ⎦ is minimized (Bohling, 2005).

1

*

1( ) ( )

n i

z u z uα αα

λ+ −

=

= ∑              (2.3)   

2 (0) ( )K oC C hα ασ λ= − ∑           (2.4)   

In simple Kriging, all of the same parametric assumptions are made and Equation

(2.3) and (2.4) are still valid. However, the estimator can be rearranged into Equation

(2.5) where mo and mα represent the population mean and the mean of previously

simulate nodes respectively. Equation (2.6) provides the system equation used to

determining the weights for the Kriged estimation and variance.

10

* ( )o oz m z mα ααλ− = −∑             (2.5)      

( ) ( )oC h C hβ αβ αλ =∑              (2.6)     

Using simple Kriging, a localized cumulative density function (lcdf) can be

described completely for sampling the simulated value at a current grid location. For the

first grid node the lcdf is conditioned to the prior data. From there the continually

updating of the data conditioned to both the hard, initial condition data, and the newly

generated simulated values can proceed until an entire realization has been populated.

Since a Gaussian assumption is implicit in the above discussions, the resultant algorithm

is referred to as sequential Gaussian simulation (SGSIM).

There are two main problems with SGSIM. First, the assumption of multivariate

Gaussianiaty may not be true for the particular application. Many natural processes are

known to be not Gaussian (Caers, 2001). Second, Kriging is a variance minimizing

procedure meaning a quadratic form of the error is assumed. However, the assumption of

a quadratic loss (error) function may not be valid for all cases of practical application.

For instance an asymmetric linear loss function or a combination of quadratic and linear

error (loss) function might be necessary to accurately represent the impact of under or

over-estimation of the estimated value at a particular location. Such asymmetric loss

functions are impossible to integrate into a simple or ordinary Kriging framework.

2.2.2 Non-Parametric Approach to Sequential Simulation

The local conditional probability distribution within sequential simulation can

also be estimated following a non-parametric approach. The biggest advantage of using a

non-parametric approach is to allow for sampling from a non-Gaussian distribution

(Deutsch and Journel, 1998; Journel, 1988). Defining an indicator RV as:

11

1, ( )

( , )0

kk

if z u zI u z

otherwise≤⎧

= ⎨⎩

(2.7)

where u is the location and zk represent a particular threshold. A unique feature is

that the expected value of indicator variable at a particular threshold is equal to the

probability the estimate is less than the threshold:

{ } { }( , ) Prob ( ) ( , )k k kE I u z z u z F u z= ≤ =        

or { }( , | ) ( ; | )k kF u z n E I u z nα=

Denoting the conditional expectation of the indicator variable as I*(u | n), the

projection theorem (Leunberger, 1968) states that the minimum L2 norm estimator of I*

based on the n indicator data Iα is the linear combination:

*( )I u Iα α

λ= ∑                 

In simple indicator kriging, the kriged estimate is expressed as an update over the

prior probability distribution F(z):

* *

1

( ; | ) ( ; | ) 1 ( ; ) ( )

( ; ) ( ; )k k k k

k k

I u z n F u z n u z F z

u z I u zα

α

λ

λ

⎡ ⎤= = −⎣ ⎦+

∑∑

     (2.8)  

The weights are the solution to the system:

( ; ) ( ; ) ( ; )k I k I o ku z C h z C h zβ αβ αλ =∑             

Equation (2.8) can be used to describe the lcdf completely once the soft or hard

conditioning data has been indicator coded. The most important difference between the

12

previous Gaussian based approach and the indicator approach is that the task of

constructing the lcpdf at a location has been dissociated from the task of retrieving

sample from it. In the Gaussian approach, the mean of the lcpdf (which is a sample from

that distribution) is retrieved first and subsequently the distribution is constructed. That

sample (the mean) was obtained by minimizing a particular type of loss (error) function –

a symmetric quadratic function (error variance). In the indicator kriging approach, the

distribution is constructed directly. Subsequent retrieval of any sample can be based on

an loss function formalism. It is this important feature of indicator based approaches that

is exploited in this thesis.

The methodology for implementing indicator Kriging in sequential simulation

(SISIM) is the same as discussed previously under SGSIM. Similar to SGSIM Monte

Carlo sampling from the Kriged distribution provides the simulated values at grid nodes.

The simulated values are assimilated within the conditioning data set for the next node

visited along a random path. There are some advantages to SISIM compared to SGSIM.

The first advantage is the elimination of the multivariate Gaussianity assumption.

Second, at each threshold an independent indicator covariance CI(h;zk) can be used to

model heterogeneity or the underlying geological structure, whereas SGSIM is restricted

to one variogram model. The use of multiple variogram models allows for more

sophistication and better representation of the underlying geological model; however it

can be computationally expensive to develop unique variogram particularly as the

number of thresholds increases (Journel, 1988), in contrast too few thresholds can cause

the lcdf to be segmented (Jensen et al. (2000).

13

2.3 LOSS FUNCTION IN THE SEQUENTIAL SIMULATION FRAMEWORK

As mentioned previously, kriging based sequential Gaussian simulation is

predicated on the retrieving an estimate that minimizes a particular (quadratic) form of

loss function. The kriging estimate is obtained such that the estimation variance is

minimized. In order to dissociate the process of constructing the lcpdf from the task of

retrieving an estimate from that distribution, indicator simulation may be a more suitable

alternative.By indicator coding the available data and kriging using the indicator data, the

lcdf representing a general (non-Gaussian) distribution can be synthesized. Subsequently

an optimal value can be retained from the distribution by minimizing a suitable loss

(error) function.

Loss functions tie loss, either economical or physical, to an error in a control

variable. The following definition is taken from Journel (1988). Given an estimated value

u*(x)=u* of a certain phenomena, an estimation error can occur defined as (u*-u(x)),

where u(x) is the true value at x. L(u*-u(x)) is the loss associated with the estimation

error. Assuming that the distribution model, denoted as F(u |(n)), of u(x) is available, the

expected value of the loss can be determined as:

* { ( * ) | ( )} ( * ) ( | ( ))L E L u U n L u u dF u n+∞

−∞

= − = − ⋅∫        (2.9) 

The optimal estimate is the value )(* xuL that minimizes the expected loss in

Expression (2.9). The integral in Expression (2.9) is sometimes very difficult to solve and

so a discrete approximation may be employed:

14

'11

{ ( * )| ( )} ( * ) ( | ( ))

( * ) [ ( | ( )) ( | ( ))]Kk k kk

E L u U n L u u dF u n

L u u F u n F u n

+∞

−∞

+=

− = − ⋅

≈ − ⋅ −

∑    

For example, in simple or ordinary Kriging, the estimate is obtained as that which

minimizes E{[z(u)-z*(u)]2} or the error variance, i.e. the estimate is obtained as that

minimizes a quadratic loss function : { }2 2( ) ( ) ( )E L e L e dF x∞

−∞

= ∫ .

There are a couple of common cases where loss functions have been optimized

and give interesting and relatively simple results that can us understand how loss

functions operate. The first case assumes the loss follows a quadratic form or L(e)=e2. In

this case, the best estimation for minimizing the expected loss is just the expected value

of the distribution (Journel, 1988). This comes about because of the characteristic

property of the expectation (or conditional expectation) that it corresponds to the

minimum error variance. The corresponding optimal estimate is thus referred to as the E-

type estimate. Linear least squares based estimation and kriging yield E-type estimates

based on the available data (Srivastava, 1999).

A second case to consider is when the loss function is of the form L(e)=|e|. In this

case, the optimal estimate is the median of the distribution. Stated mathematically this is

q0.5(x)=F-1(0.5;x|(n)) such that F(q0.5(x);x|(n))=0.5 (Journel, 1988). A third case arises

when the loss function is assumed to be a constant. For this example, the best estimate is

the mode (Srivastava, 1999). A fourth case arises when the loss function is assumed to be

an asymmetric linear loss function with w1 for the slope of the underestimation and w2

for the slope of the overestimation. In this case, the optimal estimation is 21

1

wwwp+

=

(Journel, 1988). Figure 2 is the graphical representation of an asymmetric linear loss

15

function, and it shows the slope of the loss function for an underestimation. This is

different than the slope of the loss function for overestimations. Although these cases

provide an interesting theoretical solution, in most practical cases the solutions cannot be

solved analytically and must be solved numerically.

L(e)

w2(e)w1(e)

e=u*‐u(x) OverestimationUnderestimation

Figure 2: Asymmetric linear loss function.

For an asymmetric linear loss function the expected loss is:

*

*

* *1 1{ ( )} ( ) ( ) ( ) ( )

X

X XX

E L e w X X dF X w X X dF X∞

−∞

= − + −∫ ∫

Each term is separated for integration purposes:

* *

* *

* *1 1 2 2{ ( )} ( ) ( ) ( ) ( )

X X

X X X XX X

E L e w X dF X w XdF X w XdF X w X dF X∞ ∞

−∞ −∞

= − + −∫ ∫ ∫ ∫

Rearranging the terms and using the second fundamental theorem of calculus reduces to:

16

* *

* *

*

* *

*1 1

*2 2

* *1 1 2

* *2 2 2

{ ( )} ( ) ( )

1 ( ) 1 ( )

{ ( )} ( ) ( )

( ) ( )

X X

X X

X X

X X

X

X X

X X

X X

E L e w X dF X w XdF X

w XdF X w X dF X

E L e w X F X w XdF X w

w XdF X w X w X dF X

−∞ −∞

∞ ∞

−∞

∞ ∞

= −

⎡ ⎤ ⎡ ⎤+ − − −⎢ ⎥ ⎢ ⎥

⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦

= − +

− − +

∫ ∫

∫ ∫

∫ ∫

Taking the derivative of the expected loss gives:

* * * * *

1 1 1*

* * * * *2 2 2 2

{ ( )} ( ) ( ) ( )

( ) ( ) ( )

X X X

X X X

dE L e w F X w X f X w X f XdX

w X f X w w F X w X f X

= + −

− − + +

With simplification this becomes and solving for the optimal estimate gives:

* *

1 2 2*

{ ( )} ( ) ( ) 0X XdE L e w F X w F X w

dX= + − =

1 *2

2 1

wF Xw w

− ⎛ ⎞=⎜ ⎟+⎝ ⎠

(2.10)

If w1=w2 the loss function will represent a symmetric. Using Equation (2.10) the

optimal value is determined to be the median.

1 1 12 2

2 1 2 2

12

w wF F F Medianw w w w

− − −⎛ ⎞ ⎛ ⎞ ⎛ ⎞= = =⎜ ⎟ ⎜ ⎟ ⎜ ⎟+ + ⎝ ⎠⎝ ⎠ ⎝ ⎠

2.3.3 Applications of Loss Functions in Various Engineering Fields

This section is not intended as in-depth discussion on the application of loss

function to non-petroleum related fields, but instead to give the reader an exposure to the

17

various forms of loss functions in actual applications and the general types of problems to

which they are applied.

One application area is pattern classification in electrical engineering. In this

field, multi-classification problems are often solved with loss functions (Suresh et al.,

2008). A multi-classification problem deals with accurately identifying the correct

classification for an observed pattern (Suresh et al., 2008). The authors demonstrate that

the most robust classification results when the loss function is modeled as Rn≥ lβ+ε1+ε2,

where lβ is the loss function, ε1 is the approximation error (due to the fact that the

estimation is done using a finite sample set), and ε2 is the estimation error (i.e. the

deviation of the estimates from the “true” values). The loss function is written as:

( )∑ ≠=

=C

jjj jjX XcPmEl *,1)|(β (2.11)

The mj term is the risk factor term for a particular classifier, and P(cj|X) is the

posterior probability of class j given X. The approximation and estimation errors are

given as:

)),,((11

1s

isif

N

if YWXNL

N ∑=

=ε              

)],(),([ ****2 WXNWXNE s

ff −=ε            

siX =Given Sample

fN =Classifier based on X and weights W

fL =Deviation between the predicted class and the actual class

W=Weight Parameter

18

Risk is introduced in Equation (2.11) as mj, but if the class label k is introduced

risk factor matrix mkj can be calculated as:

CjXcP

XcPXcPN

mjN

isik

sij

sik

j

jkj ,....2,1

)|(ˆ)|(ˆ)|(ˆ

1=

+

+= ∑

= ε

β      (2.12)   

)|(ˆ sik XcP =Posterior Probability of class k given Xi

jβ =Cost of misclassification

jN =Number of training samples

ε=Error term

kjm =Risk factor matrix

The numerator in Equation (2.12) is an example of cross-entropy i.e. the entropy

of class j compared to class label k, In order to determine the optimal classes, the

expected value of Rn needs to be minimized.

A second example on the application of loss function is from manufacture

engineering. Loss functions are commonly used to solve multi-variable control

optimization problems. This class of problem deals with quality control in manufacture

engineering. Quality control problems deal with setting design variables to achieve an

optimal compromise of the response variables (Ma and Zhao, 2004). Often, quality

control deals with N-type, L-type, and S-type tolerance criteria (Suhr and Batson, 2001;

Ma and Zhao, 2004). N-type stands for normal is the best, and an example would be the

design of a car door, which cannot be too large or too small. If it is too large, the door

will not shut, and if the door is too small, it will not latch. S-type stands for the smaller

the better. An example of this type of problem would be vehicle wear - smaller the wear,

the better the design. The third type of quality control criteria are the L-type where the

19

larger the response variable the better. An example would be fuel efficiency; the more

miles per gallon a vehicle could get the better.

Most quality characteristics are of N-type (Ma and Zhao, 2004). In

manufacturing, loss functions are connected to economic loss, and that has been found to

be proportional to the square of the error (Ma and Zhao, 2004; Artiles-Leon 1996). This

leads to a quadratic form of the loss functions proposed by Taguchi (Taguchi, 1990) and

confirmed in Artiles-Leon (1996), similar to the E-type equation described earlier:

2)()( TYkYLoss −=              

22

⎟⎟⎠

⎞⎜⎜⎝

−=

LSLUSLk

             

In the above equation, k is the quality loss coefficient, T is the target value for

design, and Y is the quality characteristic, USL and LSL are the upper specification and

lower specification limits respectively.

One advantage of Artiles-Leon equation is that it has a normalized form, and

therefore allows the equation to compare a wide range of quality control problems on the

same scale. The next step is the combination of all of the quality characteristics, or the

control variables. This is simply done by extending the loss function definition:

2

2 2

( )( ( ), ) 4

( ) ( )...4 4

n i ii N

i i

n ni i i ii L i S

i i i i

Y X tL Y X XUSL LSL

Y X t Y X tUSL LSL USL LSL

∈ ∈

⎛ ⎞−= +⎜ ⎟−⎝ ⎠

⎛ ⎞ ⎛ ⎞− −+⎜ ⎟ ⎜ ⎟− −⎝ ⎠ ⎝ ⎠

∑ ∑

The above equation encompasses all of the control variables and the three

different types of quality control problems. However, this equation can only be used if

20

the response variables are independent of each other (Artiles-Leon, 1996). There are

often a variety of constraints that can add to the complexity of the problem. Different

types of constraints are structural, distribution, mechanical, cost, tolerance, and

specification (Suhr and Batson, 2001). Structural constraints relate to how the quality

characteristics interact with the control factors. Distribution constraints relate to how the

quality characteristics are limited by mean and standard deviation of their respective

distributions. When a control problem is limited by physical and chemical limitations,

then a mechanical constraint is present. Economic limitations often take the form of cost

constraints. Tolerance and specification constraints are normally the limitations due to

the customer’s specification, and lastly capability constraints are related to the acceptable

process capabilities indices.

Depending on the situation and the facets of the loss function, different

constraints may or may not apply. Often, different constraints overlap. For instance,

there may be physical constraints, but before the physical constraints can be met, a cost

constraint is discovered. Therefore, both constraints exist, but only one constraint can be

the limiting factor. An example of overlapping constraints is the possibility of a new

hybrid vehicle that can get 1000 mpg. Although a car can be developed with this type of

fuel efficiency, it does not really matter; the car will cost a million dollars to

manufacture. Therefore, cost constraint is reached before a physical limitation.

The third application of loss function is in the field of petroleum engineering. In

Srivastava (1990), geostatistical methods were used to model injected pore volumes of

solvent. Indicator simulation was used to make 500 initial realizations that were

conditioned using existing well information (Srivastava, 1990). From these 500

realizations, random walkers were used to map the connected pore volume between well

pairs. A random walker chooses a random path to get from point A to point B. The

21

trajectories of the random walkers and the length distribution of their paths were

recorded. The walkers were also required to have a certain net to gross ratio. The

random walkers’ paths were used to create pore volume distribution, and the number of

random walk trajectories at a location also gave a probabilistic estimate of that location

being a part of a connected pore volume (Srivastava, 1990). The pore volume

distribution could subsequently be used in conjunction with a loss function.

The distribution was used to build an economic model. The mean of the

distribution, which would be the arithmetic average of the pore volume realizations, was

used as a base case for determining the net present value (NPV). The full probability

distribution of connected pore volumes provides the probability the solvent volume will

deviate from the mean, affectively allowing the user to assign probabilities to the NPV

values. The error in pore volume (expressed as deviation from the mean) was then

plotted against the corresponding change in NPV. This represents the loss function. The

loss function assumed for this example was a linear asymmetric loss function and is

shown in Figure 2. The w1, for this example, was determined to be the cost of

underestimation, which was equal to the cost of the lost production minus the cost of the

solvent saved from not injecting (Srivastava, 1990). The slope, or w2, for the

overestimation was just the cost of the additional solvent, because the maximum amount

of oil would be produced but the extra solvent would be a waste (Srivastava, 1990).

As discussed before for linear loss functions, the minimum loss occurs

when21

1

wwwp+

= . This equation can be rewritten if r=Price Oil/Price Solvent as:

1

1 2

$ $ *($ ) $($ $ ) $ *($ )

w oil solvent r solvent solventpw w oil solvent solvent r solvent

− −= = =

+ − +  

1rpr−

= (2.13)

22

Using Equation (2.13), different ratios of oil to solvent price r give insight into the

optimal pore volume estimation. If the solvent price equals the oil price, the minimum

loss would be at the zero quantile, and the smallest pore volume would be used.

However, if the oil price increased and solvent stayed constant, the minimum loss would

occur when the largest pore volume was used.

In this thesis, the concept of loss function is applied to synthesize reservoir

models. In a manner similar to the initial discussion on multi-classification, an economic

loss function is used to simulate (classify) an outcome of permeability at a location in the

reservoir. By visiting the simulation domain on a random path and repeating the process

of drawing an outcome of permeability from the local conditional distribution (lcpdf) at

that location using the loss function , it is demonstrated that a suite of reservoir models

can be developed that are risk-neutral towards the particular economic objective of the

modeling process.

23

Chapter 3: Problem Setup

A suite of equi-probable realization is used to identify regions within the reservoir

exhibiting similar well responses late in the field’s production life. A pressure covariance

matrix is developed to represent the similarity each grid node has with every other grid

node. By solving and retaining important Eigenvalues, a map of the retained pressure

vectors can be recreated, and the vectors affectively identify delineated sub-domains

sharing common heterogeneity.

Once the significant grid nodes are identified using PCA, they are be designated a

domain index based on their regional relationship. The index is used to identify which

loss function is appropriate for generating estimation. If the domain index indicates the

current grid node is within a delineated sub-domain the estimation will be preformed by

minimizing the expected loss; however, if the domain index indicated the current node is

outside of the delineated sub-domains the node will be sampled from the kriged

distribution. Since, the estimation is performed on a grid scale, it is important the sub-

domain delineation is likewise conducted on a grid scale.

The steps for calculating and implementing an optimized loss function for

reservoir modeling is demonstrated first through a synthetic example. The example setup

includes steps for creating conditioning data, generating multiple realizations, and

delineating individual region using PCA is discussed first.

24

3.1 CONDITIONING DATA

The conditioning data provides a common underlying structure to each individual

realization. Sequential indicator realization (SISIM) is used for generating multiple equi-

probable realizations conditioned to the available data. In the following example,

permeability data was assumed as the conditioning information, since it is expected that

the flow response (and hence economic NPV) will be significantly affected by the spatial

variation in permeability in various regions of the reservoir.

The conditioning data for this project is synthesized. An unconditional SISIM

was created on a 100 by 100 grid. The histogram used for an unconditional SISIM is

shown in Figure 3. As can be seen, the data is truncated at 0 and 600md. The thresholds

used to model this histogram were 108.5, 141, 199.5, 277.5, and 381.5 with associated

cdf probabilities of 0.11, 0.244, 0.499, 0.744, and 0.902. The variogram is modeled as

isotropic and replicated for each threshold. The permeability distribution is chosen to be

model as a log-normal distribution because permeability is often log-normal (Jansen et al.

2002). Once an initial realization is generated, 100 locations were chosen at random.

These locations and associated permeability values become the conditioning data. Below

in Figure 4 is the visual representation of the conditioning data used throughout this

thesis.

25

Conditioning Data Histogram

0

5

10

15

20

25

30

35

40

4547 83 119

154

190

225

261

297

332

368

404

439

475

510

546

582

617

653

688

724

760

795

831

866

902

More

Bin

Frequency         

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

         Cumulative Probability  

Figure 3: Histogram of permeability used for the unconditional indicator simulation.

26

Figure 4: Location of the 100 conditioning data.

As a check, all of the subsequent realizations should have these matching

permeability values at the condition data locations. In addition, the synthetic geological

features should be preserved in each realization meaning the upper regions of the

reservoir should contain predominately higher permeability values than the lower

regions.

3.2 SUB-DOMAIN DELINEATION

The basic premise of this work is that realizations of permeability that are better

suited to address specific economic objectives can be constructed by using a economic

27

loss function to sample from local probability distributions (rather than Monte Carlo

sampling that is currently done). It is to be recognized, though, permeability in different

regions of the reservoir may influence the well responses (and hence economic NPV)

differently. Recognizing this important issue, specific reservoir regions with common

response characteristic were modeled separately. Loss functions were computed for these

regions and later used for indicator simulation of specific reservoir regions. Principal

component analysis (PCA) can help with region identification within the reservoir model.

The procedure implemented in Yadav (2005) was applied for domain delineation. A suite

of indicator simulation models were generated using the available conditioning data.

These models were processed through the simulator in order to compute the spatial

distribution of grid node pressures. The covariance matrix of grid node pressures was

computed.

The form of the covariance matrix of a model with three nodes can be found in

Equation (3.1) below. The diagonal of the covariance matrix represents the variance of

the pressure at a particular grid node computed over the suite of models. The covariance

matrix is a square matrix with n rows and n columns (Jensen et al., 2000; Smith, 2002),

where n represents the number of grid nodes in the model. Arguably, the most important

feature of the covariance matrix is that it is a semi-positive definite (Wallace and

Hussain, 1969). This means the Eigenvalues are real and distinct.

⎟⎟⎟

⎜⎜⎜

⎛=

),cov(),cov(),cov(),cov(),cov(),cov(),cov(),cov(),cov(

zzyyxzzyyyxyzxyxxx

C

(3.1)

Eigenvalues are a scale determinant of the Eigenvector, and are the solution to the

characteristic equation of the matrix being solved (Smith, 2002). The Eigenvalues are

sorted in the order of decreasing magnitude. The sum of the Eigenvalues is related to the

28

variance of the data and so a limited number of Eigenvalues and the corresponding

Eigenvectors are retained based on a variance cut-off. The Eigenvectors with the largest

Eigenvalues correspond to the dimensions that have the strongest correlation in the data

set. In other words, the procedure yields a grouping of nodes where pressure exhibits a

strong degree of correlation.

Starting from n nodes of presssure P, the covariance matrix is obtained as the

product TP P which is of the order n n× . PCA yields a reduced set n k× of Eigenvectors

i.e. a set of k Eigenvectors at each pressure node. The k principal components of pressure

summarizing the variability of pressures over the entire reservoir are obtained by

multiplying the transpose of the Eigenvector matrix with the original data vector P. Thus

each principal component can be construed as a weighted linear combination of the

original data with the weights (loadings) being the Eigenvector. Going back to each node,

there are k-Eigenvectors at each node. The maximum value of the k-Eigenvectors at each

node is identified and the rank of the Eigenvalue corresponding to that maximum value is

marked as the domain index at that node. All pressure nodes that exhibit the same domain

index constitute a grouping that exhibits similar pressure characteristics within the

reservoir (Yadav, 2006). In order to restrict the number of identified domains, a volume-

cutoff is applied. The volume-cutoff simply stipulates that only a certain volume fraction

of the reservoir will be covered by the identified domains. The remaining nodes that

contain domain indices that are low remain ungrouped implying that these nodes do not

exhibit any specific pressure signature that reflects any systematic response due to

underlying heterogeneity. An outline of the procedures for PCA is provided in Figure 5.

29

Step 3:  Domain indices are identified by using maximum of the k eigenvalues.

Step 4:  Apply a volume‐cutoff to limit the number of identified domains

Step 1: Determine the covariance matrix of the order nxn by 

performing  PTP where P is the pressure matrix

Step 2: Obtain k  principal components of pressure by multiplying eigenvector matrix by P

Figure 5: Flow chart PCA for sub-domain delineation.

Since the procedure described above requires the pressure fields corresponding to

a large suite of realizations, significant savings in cpu cost result if upscaled flow

simulations are performed. In addition, the volume threshold used for delineating

domains regulates the number of grid nodes covered by the identified sub-domains. The

threshold determines interaction between regions (Yadav et al., 2005). Thus, if the

threshold is too large, there is too much interaction, and if the threshold is too small, the

reservoir volume within sub-domains is very small, implying that the estimates retained

from the local conditional distributions for a large number of nodes will utilize the same

30

loss function. This could render the simulated realizations sub-optimal. There needs to be

some trial and error for choosing a specific threshold.

In order to develop the covariance matrix that is central to PCA, an ensemble of

equi-probable realizations is generated. Indicator simulation (SISIM) was again used to

develop multiple realizations except the histogram used in these new set of simulations

use only the conditioning data in Figure 4. The corresponding histogram is shown in

Figure 6. Fifty realizations are generated for development of the covariance matrix, and

the first five are shown in Figure 7. The scale is from 600 md (red) to 100 md (blue).

Figure 6: Histogram of conditioning data used for generating the suite of realizations for domain delineation.

31

Millidarcies (m

d)

Figure 7: Sample realizations obtained using SISIM conditioned to the available data.

Notice the condition data and underlying geological features are preserved in all

five models. The covariance matrix is generated using the flow response, but prior to that

the 50 models were up-scaled to conserve computational expense. The grid is scaled

from a 100 by 100 grid to a 10 by 10 grid. In this up-scaling process some of the

resolution is inevitable lost. Below in Figure 8 are the up-scaled versions of the same

five realizations presented in Figure 7 above.

32

Figure 8: Upscaled versions of the sample models in Figure 7.

The five upscaled models exhibit the same spatial trend of permeability as the fine

scale models. The flow simulation model used for obtaining the flow response assumes a

basic black oil reservoir. As shown in Figure 9, three production wells are located at (30,

30), (50, 50), and (70, 70) while one injector is located at (90, 90). Once the flow

simulations are completed, a covariance matrix is calculated using the pressure at nodes

in the upscaled grid calculated at 1,260 days (at a mature stage during the water injection

process).

Since the covariance matrix of pressure is assured to be semi-positive definite

matix, real distinct Eigenvalues result ((Wallace and Hussain, 1969). The top 200

Eigenvalues (corresponding to a variance cut-off of 60%) are retained for sub-domain

delineation. The number of domains identified was further reduced by applying a 40%

volume cut-off. The corresponding 4 domains identified are shown in Figure 9. The

33

nodes in the un-shaded part of the reservoir constitute a fifth region comprising of nodes

that do not a have systematic impact on the response at wells. The final step is to down-

scale the identified regions into the original 100 by 100 grid. All fine scale nodes falling

within a coarse scale domain are assigned to that domain.

Well P3 (30, 30)

Well P2 (50, 50)

Well P1 (70, 70)

Well I1 (90, 90)

Figure 9: Sub-domain identification by PCA.

There are a few important features about Figure 9. First, most of regions are

located near to well bores. Since a large portion of the pressure drop occurs near the well

bore, it is reasonable for the nodes relatively close to the well bore to be considered

significant. Second, region 4 and region 3 map mostly the high permeability regions

located in the upper portion of the models. These two regions are likely in pressure

communication due to their high permeability values. The domains identified in Figure 9

will be used for subsequent discussion throughout this thesis. Now that the sub-domains

34

have been identified the loss function for each region can be established and that is

discussed in the next chapter.

35

Chapter 4: Loss Function Development

The loss function must map inaccurate permeability estimates to corresponding

economic loss when the inaccurate model is used to plan facilities and operations for a

water flood. In addition, the loss function must be applied on an individual grid node

basis since the SISIM simulations are performed on a grid. In order to accomplish that

objective, the calculation of the loss function will be performed within sub-domains, with

all the grid nodes comprising the sub-domain sharing the same loss function.

Before discussing the development of the loss function, the base case economic

model is described. Since a water flood scenario is considered, cost of purchasing and

processing water is factored into the NPV. The available produced water is assumed re-

injected, and only processing cost is associated with that water volume. However, when

there is not enough water for a specified injection rate, additional water is assumed

purchased, and that cost is added. Therefore, NPV considers both water and oil handling

cost and the revenue from the sale of oil.

This chapter has a number of different functions including (1) to show how

permeability estimation error is mapped to NPV loss, (2) propose a numerical method for

estimating the optimal estimated loss and analytical fit to the loss function.

4.1 LOSS FUNCTION DEVELOPMENT FOR DELINEATED SUB-DOMAINS

Consider X* as an estimation of the RV X, and the error (E) defined as E=X*-X.

Since the true variable X is inaccessible, E is a RV. If the error is positive, X* is an

overestimation of X, and if the error is negative, X* is an underestimation of X. The

36

impact of the an incorrect estimation X* can be related to a loss L(e). For this thesis the

attribute mapped by X is permeability, and L(e) is in terms of NPV.

For each grid node, a loss function can be determined based on an ensemble of 50

realizations generated previously. A permeability value can be sampled from the lcpdf

constructed using indicator kriging at a particular grid node using the loss function for a particular sub-domain. Denoting the simulated grid node permeability value to be ( )*l

nX ,

where l is the unique permeability realization from which that value is retained and n is

the grid node. In the previous expression for E, X denotes the “true” value at node n.

This is unknown. It can be assumed that the “true” value could be the permeability value

in any of the remaining L-1 realizations at the grid location n. Keeping the grid node estimation ( )*l

nX constant, subtracting the possible “true” value from the other 49

realization produces a set of 49 error values (e).

The 50 permeability realizations can be processed though a flow simulator in

order to obtain the well responses. Those in turn can be input for the economic module in

order to obtain the corresponding NPVs. A set of losses can therefore be defined as:

( ) ( )*1( ) cln nL e NPV X NPV X= − , where lc are all the other realizations other than l.

Once the error and the corresponding loss are determined, a function can be fit to

best describe the relationship between L(e) and estimation error (e). However, this is

only one node from n possible gird nodes within a particular region. In order to test the

representativeness of the constructed loss function, seven additional grid nodes are

sampled randomly within each region. The locations are chosen to best represent each

sub-domain and are shown in Figure 10 accompanied by a sub-domain map for reference.

37

Well P3 (30, 30)

Well P2 (50, 50)

Well P1 (70, 70)

Well I1 (90, 90)

Figure 10: Locations selected for computation of loss functions (left) accompanied by sub-domain identification for a reference (right).

Two factors that determine the characteristics of the loss function are node

sampling location (affecting the outcomes of e of the RV E) and the realization chosen for ( )*l

nX . Selecting a node from close to the edge of a sub-domain would not be

representative because the values at grid nodes outside the region influence the kriged

distribution. It is more efficient to sample location nodes well within the borders of each

region and that are spatially separated as much as possible.

The first factor influencing the development of the loss function is the set of

outcomes e, of the RV E, used for constructing the loss function. The “true” values are

unknown and are hence identified with the values in the simulated realizations. Second, the retained estimate ( )*l

nX also controls the error values. If ( )*lnX is closer to the mean

value of permeability expected in that reservoir region, then the resultant loss function

can be deemed risk neutral i.e. the resultant permeability model has values that balance

the risk of underestimation with that of over-estimation.. If instead the loss function is calculated using a low value of ( )*l

nX , then implicitly the resultant loss function is such

that cost associated with over-estimation of permeability is deemed less that the cost

38

associated with under-estimation. The opposite is true if a high value is retained for

( )*lnX in the calculation of the error e, It is also to be emphasized that this selection of

( )*lnX is critical only for the calculation of the loss function. After the loss function is

available, optimal estimate X* of permeability at a location in the simulation grid will be

obtained by using the loss function in conjunction with the probability distribution

describing the lcpdf at a location. Figure 11 shows the relationship between the

estimation error and economic loss for representative nodes from within each sub-

domain.

Figure 11: Sample nodes from each region showing relation between economic loss and permeability estimation error (region 1 (upper left); region 2 (upper right); region 3 (lower left); region 4 (lower right)). Green dashed lines indicate a window containing 70% of the data, plus and circle represent under and overestimation respectively.

39

There are number of reasons for the large amount of spread in the plots shown in

Figure 11. First, the sub-domains are identified according pressure correlation, however

these plots link permeability to NPV. Second, NPV is non-linear. Using permeability

estimation error to correlate to NPV might not be sufficient. Third, the sub-domains are

identified from the upscaled pressure relationships, but the plots in Figure 11 are

developed on a grid scale. However, the relationship between economic loss and

permeability estimation error is sufficient for the scope of this thesis.

Once the estimation error is plotted against the associated NPV loss, two different

types of loss function were fitted for each plot (Figure 12). The first was an asymmetric

linear loss function, and the second was a parabolic loss function. Since kriging is based

on the minimization of error variance, that implies a quadratic loss function. The

parabolic equation differs from kriging because it contains the linear term in addition to

the quadratic term.

40

Figure 12: Same representative region nodes from Figure 11 fitted with asymmetric linear (black lines) and parabolic loss functions (black curves).

An average of the eight nodes within each region was used to define the loss

function models used for developing the risk neutral realizations. Figure 13 shows the

four sub-domain loss functions (average of the best fits at the eight nodes) for both the

asymmetric linear and parabolic models. Region 1 and region 2 have the steepest slopes

indicating significant loss associated with incorrect estimations. Regions 3 and region 4

exhibit similar loss in both the asymmetric linear and parabolic models. The slope of the

loss function in regions 3 and region 4 are shallower indicating NPV is not as heavily

affected by permeability changes as in regions 1 and 2. Looking at the delineated

domains, it can be observed that there are no producing wells located in regions 3 or 4.

Since the absolute values of the slopes of the linear loss function for region 1 are

41

approximately the same, the optimal solution will be approximately the median. A

summary of the various regional loss function models is shown in Table 1.

Figure 13: Linear and quadratic loss function models (average of best fits).

Table 1: Loss function model summary.

4.2 LOSS FUNCTION OPTIMIZATION

Once the loss functions are established, the next step is to find the optimal

estimate of permeability. The optimal estimate, as has been discussed earlier in Chapter

42

2, is one that minimizes the expected loss. In the cases presented below, since the loss functions have been developed with the error of estimation e corresponding to ( )*l

nX in

the middle of the distribution, the corresponding optimal estimates using such loss

function will also be somewhere in the middle of the lcpdf. There are two ways to arrive

at the optimal estimate and these are described next.

4.2.1 Analytical Solution

Since a linear loss function is a reduced form a parabolic function, finding a

generalized solution for the polynomial model will also suffice for the asymmetric linear

loss function previously determined. The first step is to take the expected value of the

loss function. The general form of the expected loss is:  

{ ( )} ( ) ( )EE L e L e F e de∞

−∞

= ∫  

Since e=X*-X and X is a RV, the above reduces to:

* *{ ( )} { ( )} ( ) ( )xE L X X E L e L X X F X dx∞

−∞

− = = −∫           

Inserting a general parabolic expression for the loss function yields:

43

*

* ** 2 *

** 2

*

*

* *

{ ( )} ( ) ( ) ( ) ( )

( ) ' ( ) ( )

' ( ) ( ) ' ( )

X X

x x

X

x xX

x xX X

E L e a X X dF x b X X dF x

e dF x a X X dF x

b X X dF x e dF x

−∞ −∞

−∞

∞ ∞

= − + − +

+ − +

− +

∫ ∫

∫ ∫

∫ ∫

    (4.1)  

In Equation (4.1) a, b, and e are the coefficients associated with the quadratic, linear, and

constant underestimation respectively. Similarly, a’, b’, and c’ denote the coefficients of

an overestimation. For simplicity, each of the six integrals above is evaluated separately

and recombined for the final result. Therefore, each of the expected loss for each integral

is defined as: *

* 21

**

2

*

3

* 24

*

*5

*

6*

{ ( )} ( ) ( )

{ ( )} ( ) ( )

{ ( )} ( )

{ ( )} ' ( ) ( )

{ ( )} ' ( ) ( )

{ ( )} ' ( )

X

x

X

x

X

x

xX

xX

xX

E L e a X X dF x

E L e X X dF x

E L e e dF x

E L e a X X dF x

E L e b X X dF x

E L e e dF x

−∞

−∞

−∞

= −

= −

=

= −

= −

=

 

Multiplying out the exponent and pulling the X* out of the integrals, E{L(e1)} can be

rearranged to:

( )* *

2* * * 21{ ( )} ( ) 2 ( ) ( )

X X

x xE L e a X F X aX XdF x a X dF x−∞ −∞

= − +∫ ∫  

44

Taking the derivative with respects to X* and simplifying gives:

( )*

* *1* { ( )} 2 ( ) 2 ( )

X

xd E L e a X F X a XdF x

dX −∞

= + ∫         

The same process is repeated for the quadratic term of the overestimation

E{L(e4)}. The second expression below the bounds have been switched using the second

fundamental thermo of calculus and the X* has been removed from the integrals.

( )

* 24

** *

22 * * *4

{ ( )} ' ( ) ( )

{ ( )} ' ( ) 2 ' ( ) ' 1 ( )

xXX X

x x

E L e a X X dF x

E L e a X dF x a X XdF x a X F X

∞ ∞

= −

⎡ ⎤= + + −⎣ ⎦

∫ ∫ 

Taking the derivative of the above expression and simplifying yields:

* *

4**

{ ( )} 2 ' ( ) 2 ' 1 ( )xX

d E L e a XdF x a X F XdX

⎡ ⎤= + −⎣ ⎦∫       

This same process is applied to the under and overestimation linear and constant terms.

The simplified derivatives are:

*

2* { ( )} ( )Xd E L e bF X

dX=               

*5* { ( )} ' 1 ( )X

d E L e b F XdX

⎡ ⎤= −⎣ ⎦  

*3* { ( )} ( )x

d E L e ef XdX

=               

*6* { ( )} ' ( )x

d E L e e f XdX

= −  

When the derivates are combined and simplified the expression is:

45

* * *

*

* *

{ ( )} ( )[ '] '

( )[ '] ' ( )[ ']

X

X X X x

d E L e X F X a a a XdX

a a F X b b b f X e eμ μ< >

= − +

− − + + − + −  

By inserting the optimal estimate X*optimal in the above equation the final expression is:

* * *

*

*

0 ( )[ '] '

( )[ ']

' ( )[ ']

optimal X optimal optimal

X X X optimal

x optimal

X F X a a a X

a a F X b b

b f X e e

μ μ< >

= − +

− − + +

− + −

(4.2)

In this expression µX> and µX> are the means of the truncated distribution below

and above the X*. The value of X* is determine so that Equation (4.2) is equal to zero.

This X* will give the minimum expected loss. A complete derivation is presented in the

Appendix.

The last expression for X*optimal can not be solved analytically until simple

distribution types are pre-supposed for the lcpdf. There are three checks that can be

performed on the expression for X*optimal:

1. The solution must return the global mean as the optimal estimate if the linear and

constant coefficients are assumed to be zero and if the coefficients of the

quadratic terms are equal. The characteristic property of the expectation is the

global mean minimized the error variance. i.e a quadratic loss functions

46

* * **

* *

**

**

*

*

:' ' 0 '

{ ( )} ( )[ '] '

( )[ '] ' ( )[ '] ,

{ ( )}

{ ( )} 0

( )

X

X X X x

X X

X X

X X

X

For a quadratic loss functionb b e e and a a then

d E L err X F X a a a XdX

a a F X b b b f X e e becomesd E L e aX a a

dXd E L e aX a a

dXaX a

X

μ μ

μ μ

μ μ

μ μ

μ

< >

< >

< >

< >

<

= = = = =

= − +

− − + + − + −

= − −

= = − −

= +

= Xμ μ>+ =

2. If the linear coefficient remains, the optimal estimate should be the median as

seen before.

* * **

* *

**

*

* 1

:' ' '

{ ( )} ( )[ '] '

( )[ '] ' ( )[ ']

{ ( )} 0 ( )[ '] '

( )'

''

X

X X X x

X

X

For a symetric linear loss functiona a e e and b b then

d E L err X F X a a a XdX

a a F X b b b f X e ed E L err F X b b b

dXb F X

b bbX F if b b then X

b b

μ μ< >

= = = =

= − +

− − + + − + −

= = + −

⎛ ⎞ =⎜ ⎟+⎝ ⎠⎛ ⎞= =⎜ ⎟+⎝ ⎠

( )* 1 .5 i.e. *F X Median−= =

3. The third check is if only the constant term is considered in the loss function, then

the optimal estimate should be the mode. The loss function is such that for zero

error, the loss is zero but for any finite valued error, the loss quickly climbs to a

constant value. This implies that:

47

* * **

* *

**

* *

* *

' ' 0 '

{ ( )} ( )[ '] '

( )[ '] ' ( )[ ']

{ ( )} 0 ( )[ ']

( ) ( ) mode

X

X X X x

x

x xX X X X

If a a b b and e e thend E L err X F X a a a X

dXa a F X b b b f X e e

d E L err f X e edX

f X f X which implies the is optimal estimateLim Lim

μ μ< >

→ − → +

= = = = =

= − +

− − + + − + −

= = −

=

The analytical formula for the optimal estimate passes all three checks. However,

in a number of instances, the loss function may not have the congenial form assumed

here. In those cases, it may be necessary to calculate the optimal estimate numerically. A

numerical solution procedure is discussed next. Although the same parabolic form for the

loss function is assumed in the following, the procedure can be generalized to any other

loss function.

4.2.2 Numerical Solution

A numerically optimized solution for the expected loss is found using the same

general methodology as the analytical solution. A large ensemble of possible optimal

values (X*) are sampled from the lcpdf at a grid location. For each sampled value of X*,

the expected loss is calculated as:

{ } *

1( ) ( )

N

jj

E L e L X X=

= −∑

Here the error is defined as the deviation from an unknown “true” value that is

represented as a random variable with N possible outcomes. The calculation can be

repeated for M sampled values X*. The X* yielding the minimal expected loss is the

optimal estimate. Figure 14 depicts a flow chart of the methodology previously

described.

48

Figure 14: Flow chart for numerical solution.

Note that the form of the loss function and the probability distribution at the node

can be completely general. The procedure is based on identifying the minimum expected

loss among M calculated values of expected loss. Therefore, it is important to calculate

the expected loss corresponding to a wide range of guesses for X*.

The procedure for calculating the optimal estimate using a loss function hinges on

the representation of the local uncertainty F(x) at a grid location. The construction of that

49

lcpdf can be easily accomplished using indicator kriging. Furthermore, since the

simulated values should exhibit the correct spatial correlation, it would be appropriate to

construct the lcpdf within a sequential simulation framework. The implementation of the

loss function methodology within the SISIM framework is described in the next chapter.

50

Chapter 5: Implementation of Optimized Loss Function within a Sequential Simulation framework

The loss function previously derived needs to be combined with the SISIM

algorithm to generate permeability realizations that are loss optimal. As discussed in the

previous chapter, the loss function can be tailored to reflect preferences for risk-

neutrality, over-estimation or under-estimation. This chapter explains how the lcpdf is

used in conjunction with the loss functions within sub-domains to estimate permeability

values that minimize the expected loss. As discussed in the previous chapter, the loss can

be represented using both asymmetric linear and parabolic functions and the

corresponding permeability realizations are interpreted both physically and economically.

To conclude the chapter, a second set of assumed loss functions that favor under or over

estimation are implemented and the corresponding realizations are shown to sample the

endpoints of the NPV uncertainty distribution.

5.1 IMPLEMENTATION WITHIN SISIM ALGORITHM

Determining an optimal estimate for permeability at a location using a loss

function implies minimizing the expected loss. The loss function relates the estimation

error (X*-X) to the economic loss. The economic loss has to be computed for all possible

values of the “true” value X and that requires a description of the distribution describing

the RV X at location u.

The lcpdf describing the local uncertainty of the RV X can be established within

the SISIM framework. During SISIM, the previously simulated grid nodes along with

the prior conditional data are transformed into indicator variables using the prior

51

histogram thresholds. For each threshold, the grid nodes values within the variogram

range and search radius are assigned a value of one or zero based on Equation (2.7). The

kriged estimate I*(u;zk|(n)) corresponding to the threshold zk and conditioned to n data in

the vicinity of the estimation node is exactly the local uncertainty distribution F*(u;zk|(n))

Knowledge of F*(u;zk|(n)) and the corresponding pdf allows for the Equation (4.2) to be

implemented.

The advantage of developing an lcpdf within the sequential framework is that the

distribution is conditioned to all original data plus previously simulated values. This

ensures that the spatial correlation between simulated values reproduces the target

covariance model. Each time the optimal estimate at a grid node is computed, it is added

to the conditioning data set. It is the fact that indicator kriging directly yields conditional

probabilities that renders this methodology for implementing loss functions viable

without resorting to any multi-Gaussian assumptions.

In order to develop an optimized permeability model, SISIM is adapted to allow

nodes within the sub-domains to be estimated using the loss function (Equation (4.2))

determined for that sub-domain. Nodes outside the specified sub-domains are simulated

using regular (unaltered) indicator simulation (by Monte Carlo sampling the lcpdf).

Since at each step, the simulated grid value is used to update the lcpdf for the next grid

nodes, discontinuities at the border of sub-domains are minimized. The simulation

process is continued until all grids nodes have been estimated for a given realization.

Figure 15 shows the adaptation to the SISIM code.

52

Figure 15: SISIM algorithm incorporating loss function for optimal estimation.

5.2 LOSS FUNCTION IMPLEMENTATION FOR INDIVIDUAL SUB-DOMAINS

Two different models are used to represent the relationship between estimation

error and loss. These models and the development of the loss functions are described in

Chapter 4. The first function is linear and asymmetric where the loss corresponding to

under and overestimation are not equal. Table 2 shows the loss function associated with

53

each region. To recall, the loss is in terms of loss in NPV from the optimal value (in

millions of dollars) and the error is represented in millidarcies.

Table 2: Asymmetric linear loss function models for the four sub-domains.

Asymmetric Linear Loss Function Model    Underestimation  Overestimation 

Region 1 L(e)=0.41*(X‐X*)  L(e)=0.43*(X*‐X) Region 2  L(e)=0.54*(X‐X*)  L(e)=0.34*(X*‐X) 

Region 3 L(e)=0.18*(X‐X*)  L(e)=0.17*(X*‐X) 

Region 4  L(e)=0.08*(X‐X*)  L(e)=0.15*(X*‐X) 

The coefficients of these loss functions will be input into the modified SISIM

program. Since Equation (4.2) is derived for a parabolic expression, a=a’=e=e’=0, and b

is the coefficient of the underestimation while b’ is the coefficient of the overestimation.

For sub-domains 1 and 3, the coefficients for under and overestimation are relatively

similar, therefore the optimal risk neutral estimation values will be approximately the

median of the lcpdf. The averaged realization from the SISIM ensemble without

implementing the loss function accompanied by the sub-domain identification map is

shown in Figure 16. In comparison Figure 17 shows one realization of the permeability

model obtained by implementing the loss functions and the corresponding average over

several realizations. In contrast to the smooth result in Figure 16, the implementation of

the loss function yields several realizations that consistently reflect highs in some areas

and lows in some others and these persist after averaging. The averaged model therefore

retains the texture of the individual realizations.

54

Well P3 (30, 30)

Well P2 (50, 50)

Well P1 (70, 70)

Well I1 (90, 90)

Well P3 (30, 30)

Well P2 (50, 50)

Well P1 (70, 70)

Well I1 (90, 90)

Figure 16: The average of 50 realizations obtained using SISIM without implementing the loss functions (left), and the sub-domain identification plot (right).

Figure 17: A single permeability realization obtained after implementing the loss functions (left), and the average of 10 realizations obtained after implementing the loss functions.

The implementation of the loss function causes the simulation algorithm to

estimate permeability values that balance water and oil production. However, many of the

features in the optimized risk neutral realization are preserved from the initial SISIM

55

realizations because the final model have to also honor the conditioning data and the

prescribed spatial covariance model.

Region 1 in the realizations obtained using linear loss function as well as the basic

SISIM realizations without any loss function have low permeability values. Region 2

conserved the characteristic that well P2 (shown in Figures 22 and 23) has lower

permeability than P3. On a global scale, the permeability within the upper portion of the

models is significantly higher than those in the lower regions. This is an indication the

conditioning data is still affecting the lcpdf and preserving the essential geological

features.

Although there are a number of similarities between the linearly optimized

realizations and the original SISIM averaged model, there are also a number of important

differences. The permeability in region 4 has significantly increased, indicating that

given the characteristics of the loss function, it is better to have higher permeability

values and risk over-estimation in those regions so as to balance the under-estimation that

might occur in other regions of the reservoir. The presence of nearby higher valued

conditioning data facilitates the estimation of such high values. A second notable

distinction, as noted before, is that the amount of permeability increases in regions 2, 3,

and 4. As mentioned earlier, the imposition of a loss function introduces some discipline

in selecting values from the lcpdf that is reflected in all the realizations of the ensemble.

As opposed to the random sampling of values from the lcpdf that happens in the basic

SISIM algorithm. That causes the realizations to look different from one another and that

causes the averaged model to look smooth.

56

5.2.1 Parabolic Loss Function

In addition to the asymmetric linear loss function, a parabolic loss function is used

to create a second ensemble of optimum permeability models. The polynomial equations

used for the 4 regions are represented in Table 3. The equation for the under and

overestimation are the same. In each region a=a’, b=b’, and e=e’=0. One parabolic

equation is used to fit the data for both the under and overestimation estimation error.

Table 3: Parabolic loss function models.

   Parabolic Loss Function Model 

Region 1 L(e)=0.0071*|X‐X*|2+‐0.0367*|X‐X*| 

Region 2  L(e)=0.0212*|X‐X*|2+‐0.0039*|X‐X*| 

Region 3 L(e)=0.0019*|X‐X*|2+0.0147*|X‐X*| 

Region 4  L(e)=0.0016*|X‐X*|2+0.0226*|X‐X*| 

A single realization obtained after applying the parabolic function models and the

average of the ensemble are shown in Figure 18. These realizations should be compared

to the average of the models obtained using the linear loss function as well as the average

of the original SISIM realizations shown in Figure 17 and Figure 16 respectively. Many

of the features that were present in average model in Figure 23 are also present in the

ensemble average model in Figure 24. Region 1 has lower average permeability whereas

region 4 has higher average permeability. Just like in the models obtained using a linear

loss function, application of the parabolic loss function causes the reservoir regions to

reflect permeability values that are consistent from one model to the other. This causes

the average model also to exhibit more variability (instead of the smoothing that is

observed in the ensemble average of the original SISIM models)

57

Although there are some common structures and trends between the models

obtained using a linear loss function and the ones obtained using a second-order loss

function, there are some subtle differences. Most notably, the model in Figure 23 tends

to have a larger increase in permeability for regions 3 and 4 and the areas surrounding

wells P1 and I1. The model in Figure 18 indicates a lowering of permeability in region 1

and around wells P3 and P2. In general, a second order dependency of loss on the

estimate error implies that the loss is relatively insensitive to the error. This causes the

optimal estimate to cluster around the mean (green color). In regions 1 and 2 where the

second-order term in the loss function has a higher weight, this implies that the penalty

assigned to under or over-estimation is relatively steep and hence the tendency is for the

optimal estimates to tend towards the local mean. Since the conditioning data in region 1

are low, the local mean is low and that gets reflected by the permeability values in that

region. In region 2, the conditioning values are higher and the corresponding optimal

values are higher

Figure 18: Single and averaged optimized parabolic loss function realizations.

58

Well P3 (30, 30)

Well P2 (50, 50)

Well P1 (70, 70)

Well I1 (90, 90)

Figure 19: Reiteration of sub-domain identification by PCA.

Both the linear and parabolic loss function models show increase and decrease

permeability values within different regions, but how do the actual NPV corresponding to

a water injection scenario vary from one set of models to the next? The initial SISIM

ensemble or realizations has a mean NPV of 84.74 MM$ and a range from 67.8 to 105.0

MM$ or 37.2 $MM. The models obtained using a linear loss function have a mean NPV

of 82.4 $MM, and a range from 66.8 to 97.7 MM$ or 30.9 MM$, where as the models

obtained using the parabolic loss function have a mean of 87.6 MM$ and a range form

95.2-72.9 MM$ or 22.3 MM$. Below in Figure 20 is the histogram depicting the

distribution of NPV obtained using each set of models.

59

Figure 20: Histogram of NPVs corresponding to i) traditional SISIM realizations (blue); ii) asymmetric linear loss functions (brown); and iii) parabolic loss functions (green).

The mean NPV of the three set of models are close to each other. However, the

spread of NPV values is more for the traditional SISIM realizations. The spread is the

lowest for the models using the parabolic loss functions. This is understandable since the

models using the parabolic loss function model tend to cluster around the mean. The

variability observed in the original SISIM models is directly related to the width of the

lcpdfs at the un-simulated locations and given the sparse conditioning data, the variability

from one realization to the next is considerable.

Despite the lower uncertainty in NPV using the loss function models, there is still

considerable spread in the NPV values. There are several reasons for that spread. First,

the sub-domains were identified using grid-block pressure values, while the development

of loss functions were based on fluid production rates. It can be argued that grid block

60

pressure, being a smooth reservoir response, might not be truly representative of fluid

displacement. Besides, the domains were identified on the basis of upscaled realizations.

A second (and related) reason for the uncertainty in NPV is the large spread observed in

the loss function plots, (Figure 11). Since there is a large amount of spread within these

plots it is difficult to determine a function accurately describing the relationship between

estimation error and loss.

5.3.1 Sampling realizations within specific NPV ranges

In the traditional reservoir modeling workflow, several realizations are generated

and then range of uncertainty in NPV is computed. Statements about the extremes of the

NPV distribution can only be made after the entire distribution has been computed. This

usually is very time consuming. However, using appropriate loss functions, specifically

by altering the impact of over and underestimation extremes of the global uncertainty

distribution can be sampled directly, as shown below. The advantage to this method is

only three realizations would have to be produced; a risk neutral case, a risk-limiting

(conservative) case, and a risk-seeking (aggressive) case.

To generate the risk-seeking and risk-averse cases, the weights associated with the

under and overestimation have to be altered. If the weight or the loss associated with an

incorrect estimation is increased corresponding to over-estimation, the result will be

optimal estimates of permeability in different regions of the reservoir that will generally

tend to be lower. This will lower water production and also slow down oil production.

This will be a risk-averse case. In contrast if the weight (or loss) associated with

underestimation is increased, the resultant map will have higher permeability values. That

will increase oil production rate but also correspondingly increase the water production

and would be a risk-seeking (aggressive) scenario. The realizations will still be optimal,

61

but only corresponding to the newly weighted loss function. Table 4 shows the base loss

functions assumed in different regions (different from the earlier case) and the alterations

made to sample the extremes of the NPV uncertainty distribution.

Table 4: Alterations made to the loss function to sample specific parts of the global NPV distributions.

Underestimation L(e) =  10(x*‐x)2 +158.606(x*‐x) + 114820

Overestimation L(e) = 10(x‐x*)2 + 208.849(x‐x*) + 114820

Weighted Under L(e) =  50(x*‐x)2 +158.606(x*‐x) + 114820

Weighted Over L(e) = 50(x‐x*)2 + 208.849(x‐x*) + 114820

Underestimation L(e) = ‐0.00073(x*‐x)2 + 0.7756(x*‐x) + 33.51

Overestimation L(e) = 0.0022(x‐x*)2 +1.7048(x‐x*) + 33.51

Weighted Under L(e) = ‐0.00073(x*‐x)2 + 10(x*‐x) + 33.51

Weighted Over L(e) = 0.0022(x‐x*)2 +10(x‐x*) + 33.51

Underestimation L(e) = ‐0.00830(x*‐x)2 + 18.437(x*‐x) + 127209

Overestimation L(e) = ‐0.01400(x‐x*)2 +  25.508(x‐x*) + 127209

Weighted Under L(e) = ‐0.00830(x*‐x)2 + 100(x*‐x) + 127209

Weighted Over L(e) = ‐0.01400(x‐x*)2 +  100(x‐x*) + 127209

Underestimation L(e) = 0.00480(x*‐x)2 +  0.08990(x*‐x) + 133157.5

Overestimation L(e) = ‐0.00350(x‐x*)2 +  6.18490(x‐x*) + 133157.5

Weighted Under L(e) = 0.00480(x*‐x)2 +  10(x*‐x) + 133157.5

Weighted Over L(e) = ‐0.00350(x‐x*)2 +  10(x‐x*) + 133157.5

Region 1 Loss Function

Region 2 Loss Function

Region 3 Loss Function

Region 4 Loss Function

Figure 22 shows a comparison of a typical conditioned SISIM realization versus a

base case risk neutral realization generated by implementing the unaltered loss function

in Table 4. There are a number of important features to notice about the optimized

62

permeability. First, the permeability in region 4 has been significantly decreased.

Therefore, in order to create a risk neutral realization for region 4, the permeability is

decrease to prevent the migration of water production. In contrast, the permeability in

region 1 is increased to allow more fluid production. Next, the permeability in localized

areas around well P1 and I1 are increase. This allows for more fluid production due to

the increase in near well bore permeability. These assumed loss functions have some

characteristics that are similar to the linear and parabolic models, but as it can be seen

variations in loss functions, can great affect the permeability distribution within the

delineated regions.

Figure 21: Base case loss function model (left). For comparison the permeability model for a typical SISIM model is shown (right).

As the under and overestimation weights are altered, the regional permeability

values adjust to the new loss functions. In the case where the loss function is adjusted to

favor overestimation, the permeability in region 1 increases compared to the permeability

in region 1 of the base case risk-neutral realization. This alteration increases water

63

production and faster oil production. In contrast, the permeability in regions 2 and 4

decrease, causing the total field oil production to decreases. In region 3, the permeability

around the water injector increases, but the permeability around the well P1 decreases.

The net result is loss in field oil production and increase in field water production. In the

case where the loss function is such that under-estimation is preferred, the result is

flipped. First, the permeability in region 1 is decreased, while the permeability regions 2,

3, and 4 is increased. The result is an increase in NPV as a result of the increased oil

production. Table 5 shows the risk-neutral NPV estimate along with those corresponding

to the risk-averse and risk-seeking scenarios.

Well P3 (30, 30)

Well P2 (50, 50)

Well P1 (70, 70)

Well I1 (90, 90)

Figure 22: Permeability model corresponding to where the loss function penalizes over-estimation more. For comparison the permeability model for the base case loss function is also shown (right).

64

Well P3 (30, 30)

Well P2 (50, 50)

Well P1 (70, 70)

Well I1 (90, 90)

Figure 23: Permeability model corresponding to where the loss function penalizes under-estimation more. For comparison the permeability model for the base case is also shown (right).

Table 5: Global extreme estimations.

NPV ($MM) Heavily Weighted Overestimation (risk-averse) $ 70.31 Base Case Example $ 95.40 Heavily Weighted Underestimation (risk-seeking) $ 105.11

The results above are for individual realization. The NPV values for the risk-

averse and risk-seeking models can be compared to the distribution determined from the

original 50 SISIM ensemble. The original ensemble has a mean of 84.75 MM$ and a

range from 105.0-67.8 MM$ or 37.2 MM$. Figure 24 shows a histogram of the NPV for

the original SISIM ensemble. Notice the risk seeking NPV estimation of 105.11 MM$

falls in the highest portion of the histogram, while the risk adverse estimation of 70.31

MM$ fall in the lowest portion of Figure 24.

65

Figure 24: Histogram of case study including global extremes.

The alterations of the loss functions thus do allow us to sample the extremes of

the NPV distribution without going through the cumbersome process of generating

several realizations, performing flow simulations and evaluating the entire range of NPV

values for a suite of models.

By implementing loss function within a sequential framework a risk neutral

realization is generated the honors the spatial correlation. By comparing the average

ensemble created using loss function and the normal SISIM ensemble, it can be

concluded that many of the feature are preserved because the prior conditioning data is

honor. However, there is more variability within the loss function average ensemble

because there is some regulation in the estimation method in contrast to Monte Carlo

sampling form the kriged distribution. Finally, by manipulating the weights associated

with the loss for over and underestimation error, realizations with various risk attitudes

can be generated in an effort to locate the extremes of the NPV uncertainty distribution.

66

Chapter 6: Conclusion

Sequential Gaussian and Indicator simulation (SGSIM and SISIM) have been

used in a number of applications to represent the spatial variability of natural phenomena

accurately and to assess uncertainty in global response corresponding to a transfer

function (flow simulation). SGSIM makes the assumption of working under multivariate

Gaussianity, while SISIM is a non-parametric approach to modeling the lcpdf. However,

in both these sequential approaches, simulated values are sampled at random from the

kriged lcpdf in order to populate a realization. The research presented in this thesis

modifies the sequential simulation approach by replacing Monte Carlo sampling of the

kriged distribution with a strategy to retain an optimal estimate by minimizing the

expected loss.

By delineating the reservoir into a set of unique sub-domains, individual loss

function can be developed for each identified sub-domain. This delineation through

principle component analysis allows for modeling flexibility and better representation of

reservoir heterogeneity. In this thesis, each loss function represents the economic loss

associated with permeability estimation error within a particular region.

The estimation error is with respect to an unknown “true” value and for that

reason is a random variable that shares the probability distribution of the attribute being

modeled. The objective of optimal estimation is therefore to retrieve a suitable value from

the lcpdf that takes into account the penalty or loss associated with under or over-

estimation. In other words, the optimal estimate corresponds to one that minimizes

expected loss.

In this thesis, both an asymmetric linear loss function model and a parabolic loss

function model are used to relate estimation error to economic loss. After generating two

67

sets of realizations using both models, common characteristics were found in both sets of

realization. Both models estimated similar regions of high and low permeability values

that were consistent with the reference model used to sample the conditioning data.

However, some deviation does occur because of the different loss function models.

Some specific observations based on the results obtained are:

• Permeability in different regions of the reservoir influence the well responses

(and hence economic NPV) differently, thus the need to incorporate domain

decomposition.

• By applying a loss function, the optimized spatial distribution of permeability

within a delineated domain is obtained. This replaces randomly sampling

from a local conditional probability distribution (as in SGSIM or SISIM), an

optimal local estimate of permeability at a location is can be obtained by

minimizing the expected loss.

• The possible reasons for variability of loss functions within domains include:

(1) the sub-domains were identified using grid-block pressure values, while

the development of loss functions were based on fluid production rates, (2) it

can be argued that grid block pressure, being a smooth reservoir response,

might not be truly representative of production, (3) the domains were

identified on the basis of upscaled realizations.

• The implementation of both the asymmetric linear and parabolic models

causes the reservoir regions to reflect permeability values that are consistent,

indicated both models capture similar relationships between the estimation

error and NPV loss.

68

• Many of the features in the optimized risk neutral realization are preserved

from the initial SISIM realizations because the final model must also honor

the conditioning data and the prescribed spatial covariance model.

• The imposition of a loss function introduces some discipline in selecting

values from the lcpdf that is reflected in all the realizations of the ensemble.

Therefore, there is more variability in the loss function ensemble average

compared to the original SISIM models.

• There is a lower range of NPV values observed because of the ordered

sampling from lcpdf. In the case of the parabolic loss function, the second

order dependency of loss on the estimate error implies that the loss is

relatively insensitive to the error. This causes the optimal estimate to cluster

around the mean.

• If the weight or the loss associated with an incorrect estimation is increased,

corresponding to over-estimation, the result will be optimal estimates of

permeability in different regions of the reservoir that will generally tend to be

lower, creating a risk-adverse reservoir model. In contrast, if the weight (or

loss) associated with underestimation is increased, the resultant map will have

higher permeability values, creating a risk-seeking (aggressive) scenario.

The characteristic of the model obtained is a function of the economic model for

assessing NPV. If the economic model has a low water processing and purchasing price,

optimal estimates of permeability in regions of large water production will be large

because water-handling cost is not a major contribution to the overall economic value of

the project. In contrast, if water handling costs are significant, then the optimal

permeability will the region be lower thereby decreasing the water production. The

variables that are significant to the overall project economics determine regional

69

permeability variations. It is to be emphasized that the permeability models are still data-

conditioned and reflect the correct spatial variability. It is only that the modified

simulation approach samples a subset of reservoir models from the global uncertainty

distribution that are more relevant from the standpoint of detailed economic analysis.

Not only can risk-neutral realizations (in terms of using realistic values for

revenue and for handling produced water) be represented using the loss function, Loss

functions can be modified to sample permeability models at the extremes of the NPV

uncertainty distribution. By altering the loss function as to increase the loss associated

with over or underestimation, realizations that cluster at high or low NPV values can be

sampled. For example, if the loss function favors under-estimation (i.e. the loss

associated with over estimation is high), then lower permeability values will be simulated

and that tends to increase water breakthrough and lower water handling cost. Oil

production rates are also lower, but over time the ultimate recovery factor is about the

same as a case with higher permeability. This is confirmed by comparing the distribution

endpoints generated from a base case SISIM ensemble to the values determine by altering

the loss function to emphasize over and underestimation (Figure 24).

The work in this thesis successfully shows that reservoir realizations exhibiting

correct spatial characteristics can be generated with the implementation of loss functions

within the sequential framework. The method yields reservoir models that sample a sub-

space of the full uncertainty space and can be used to probe development decisions

further.

Future work could be in development of reservoir models using different

functional form of the loss function. A parabolic model was chosen to represent the loss

function, however the loss function could be possible better represented by a third-order

polynomial, logarithmic, or some alternative. In addition, this loss function optimization

70

scheme could be translated to Monte Carlo simulation. In this application, the estimation

error in the control variable (i.e. the Monte Carlo simulated values) would be mapped to

the loss in the response variable. There are an unlimited number of applications where

loss function optimization could be advantageous.

71

Appendix A: PCA Example

Table 6: Original data and adjusted data set for PCA example.

X Y Xadj Yadj 2.5 2.4 0.7 0.5 0.5 0.7 -1.3 -1.2 2.2 2.9 0.4 1.0 1.9 2.2 0.1 0.3 3.1 3.0 1.3 1.1 2.3 2.7 0.5 0.8 2.0 1.6 0.2 -0.3 1.0 1.1 -0.8 -0.8 1.5 1.6 -0.3 -0.3 1.1 0.9 -0.7 -1.0

Step 1: The original data X and Y are obtained and adjusted by subtracting their

respective means

Step 2: The covariances are calculated

615.0)1(

)()(),cov(),cov( ,,1 =

−−== ∑ =

nYYXX

XYYX iadjiadjn

iadjadj

;717.0)var()1(

)(),cov(,

;617.0)var()1(

)(),cov(

12

,

12

,

==−

−=

==−

−=

=

=

adj

n

i iadjadjadj

adj

n

i iadjadjadj

Yn

YYYYsimiliarly

Xn

XXXX

Step 3: Build the covariance matrix

72

⎟⎟⎠

⎞⎜⎜⎝

⎛=⎟

⎟⎠

⎞⎜⎜⎝

⎛===

717.0615.0615.0617.0

),cov(),cov(

),cov(),cov()),cov(,( ,,

adjadjadjadj

adjadjadjadjjijiji

nxn

YYXY

YXXXDimDimccC

Step 4: Calculate the eigenvalues

2

1

1

2

( ) 0

0.617 0.6150 0

0.615 0.717

det( ) 0det( ) 0

(0.617 )(0.717 ) 0.615 0

0.0491.284

(0.617 0.049) 0.6150

0.615 (0.717 1.284)

Ax x A I x

x Make A I

A I A I

Eigenvalues

For

xx

Eige

λ λ

λλ

λ

λ λ

λ λ

λ

λ

= = − =

−⎛ ⎞= − =⎜ ⎟−⎝ ⎠

− = − =

− − − =

⎛ ⎞= =⎜ ⎟

⎝ ⎠

− ⎛ ⎞⎛ ⎞=⎜ ⎟⎜ ⎟−⎝ ⎠⎝ ⎠

0.735 0.6780.678 0.735

nvectors− −⎛ ⎞

= ⎜ ⎟−⎝ ⎠

Step 5 and Step 6: Rank the eigenvalues based on magnitude

λ2>λ1 because 1.284>0.049

Only the larger will be considered for this exercise.

Step 7: Recreate the data set with the values attached to the chosen Eigenvalues

73

1 2( ... )

0.6780.735

[( ) ( )]

( 0.678 0.735)0.69 1.31 0.39 0.09 1.29 0.49 0.19 0.81 0.31 0.71

.49 1.21 0

n

T T T

T

T

Feature Vector eig eig eig

Feature Vector

Final Data Feature Vector Data Adjust

Feature Vector

Data Adjust

=

−⎛ ⎞= ⎜ ⎟−⎝ ⎠

=

= − −

− − − −=

− − .99 0.29 1.09 0.79 0.31 0.81 0.31 1.01

( ) ( ) ( 0.83 1.78 0.99 0.27 1.680.91 0.10 1.14 0.44 1.22)

-0.83 1.78-0.99-0.27-1.68-0.910.101.140.441.22

T TFeature Vector Data Adjust

Final Data

⎛ ⎞⎜ ⎟− − − −⎝ ⎠

= − − − −−

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟

= ⎜⎜⎜⎜⎜⎜⎜⎜⎝ ⎠

……

⎟⎟⎟⎟⎟⎟⎟⎟

74

Appendix B: Analytical Solution for Optimized Loss Function

Derivation for Loss Function Containing Quadratic, Linear, and Constant Error Terms Loss Function Generalized Form

*{ ( )} ( ) ( )xE L e L X X f X dx∞

−∞

= −∫

For underestimation and overestimation including the Quadratic, Linear, and Constant Error Term the loss function becomes:

* * ** 2 *

* 2 *

* * *

{ ( )} ( ) ( ) ( ) ( ) ( ) ...

' ( ) ( ) ' ( ) ( ) ' ( )

X X X

x x x

x x xX X X

E L e a X X dF x b X X dF x e dF x

a X X dF x b X X dF x e dF x

−∞ −∞ −∞

∞ ∞ ∞

= − + − + +

− + − +

∫ ∫ ∫

∫ ∫ ∫

For convenience each of the 6 integrals are defined below

** 2

1

**

2

*

3

* 24

*

*5

*

6*

{ ( )} ( ) ( )

{ ( )} ( ) ( )

{ ( )} ( )

{ ( )} ' ( ) ( )

{ ( )} ' ( ) ( )

{ ( )} ' ( )

X

x

X

x

X

x

xX

xX

xX

E L e a X X dF x

E L e X X dF x

E L e e dF x

E L e a X X dF x

E L e b X X dF x

E L e e dF x

−∞

−∞

−∞

= −

= −

=

= −

= −

=

Evaluating e1 term

( )

( )

* * * *2* 2 * * 2

1

* *2* * * 2

1

{ ( )} ( ) ( ) ( ) 2 ( ) ( )

{ ( )} ( ) 2 ( ) ( )

X X X X

x x x x

X X

x x

E L e a X X dF x a X dF x aX XdF x a X dF x

E L e a X F X aX XdF x a X dF x

−∞ −∞ −∞ −∞

−∞ −∞

= − = − +

= − +

∫ ∫ ∫ ∫

∫ ∫

Now taking the derivative

75

( ) ( )

( ) ( )

2* * * *1*

*2 2* * *

{ ( )} 2 ( ) ( )

2 ( ) 2 ( ) ( )X

x x

d E L e a X F X a X f XdX

a XdF x a X f X a X f x−∞

= + −

− +∫

Simplifying

( )*

* *1* { ( )} 2 ( ) 2 ( )

X

xd E L e a X F X a XdF x

dX −∞

= + ∫

Now evaluating e4 term and simplifying:

( )

( )

2* 2 2 * *4

* * * ** *

22 * * *4

{ ( )} ' ( ) ( ) ( ) 2 ( ) ' ( )

{ ( )} ' ( ) 2 ' ( ) ' 1 ( )

x x x xX X X XX X

x x

E L e a X X dF x a X dF x aX XdF x a X dF x

E L e a X dF x a X XdF x a X F X

∞ ∞ ∞ ∞

∞ ∞

= − = − +

⎡ ⎤= + + −⎣ ⎦

∫ ∫ ∫ ∫

∫ ∫

Taking the derivative:

( ) ( )

( ) ( )

*2 2* * * *

4*

2* * * *

{ ( )} ' ( ) 2 ' ( ) 2 ' ( )

2 ' 1 ( ) ' ( )

X

x x x

x

d E L e a X f X a XdF x a X f XdX

a X F X a X f X

= − + +

⎡ ⎤+ − −⎣ ⎦

Simplifying: * *

4**

{ ( )} 2 ' ( ) 2 ' 1 ( )xX

d E L e a XdF x a X F XdX

⎡ ⎤= + −⎣ ⎦∫

Now combining 1* { ( )}d E L edX

and 4* { ( )}d E L edX

( )*

* * * *1 4*

*

{ ( )} 2 ' ( ) 2 ' 1 ( ) 2 ( ) 2 ( )X

x xX

d E L e e a XdF x a X F X a X F X a XdF xdX

−∞

⎡ ⎤+ = + − + +⎣ ⎦∫ ∫

Simplifying *

* * *1 4*

*

* * *1 4*

{ ( )} 2 ( )[ '] 2 ' 2 ( ) 2 ( )

{ ( )} 2 ( )[ '] 2 ' 2 2

X

X x xX

X X X

d E L e e X F X a a a X a XdF X a XdF XdX

d E L e e X F X a a a X a adX

μ μ

−∞

< >

+ = − + − −

+ = − + − −

∫ ∫

Setting the derivative to zero gives the optimal estimate. So the optimal estimate for L(e1+e2) is:

76

** * *

1 4**

* * *1 4*

{ ( )} ( )[ '] ' ( ) ( )

{ ( )} ( )[ '] '

X

X x xX

X X X

d E L e e X F X a a a X a XdF X a XdF XdX

d E L e e X F X a a a X a adX

μ μ

−∞

< >

+ = − + − −

+ = − + − −

∫ ∫

The same process will be used for the linear terms. Now evaluating err2 term.

* * ** *

2

** *

2

{ ( )} ( ) ( ) ( ) ( )

{ ( )} ( ) ( )

X X X

x x x

X

X x

E L e X X dF x bX dF x b XdF x

E L e bX F X b XdF x

−∞ −∞ −∞

−∞

= − = −

= −

∫ ∫ ∫

Now taking the derivative * * * * *

2* { ( )} ( ) ( ) ( )X X Xd E L e bF X bX f X bX f X

dX= + −

Simplifying *

2* { ( )} ( )Xd E L e bF X

dX=

Evaluating e5

*5

* *

* *5

*

{ ( )} ' ( ) ' ( )

{ ( )} ' ( ) ' 1 ( )

x xX X

x XX

E L e b XdF x b X dF x

E L e b XdF x b X F X

∞ ∞

= +

⎡ ⎤= + −⎣ ⎦

∫ ∫

Now taking the derivative with respect to X* for determining the optimal

* * * * *5* { ( )} ' ( ) ' 1 ( ) ' ( )X X X

d E L e b X f X b F X b X f XdX

⎡ ⎤= + − −⎣ ⎦

Simplifying

*5* { ( )} ' 1 ( )X

d E L e b F XdX

⎡ ⎤= −⎣ ⎦

Combining 2* { ( )}d E L edX

and 5* { ( )}d E L edX

* *

2 5* { ( )} ' 1 ( ) ( )X Xd E L e e b F X bF X

dX⎡ ⎤+ = − +⎣ ⎦

Simplifying

77

*2 5* { ( )} ( )[ '] 'X

d E L e e F X b b bdX

+ = + −

Evaluating e3

**

3{ ( )} ( ) e ( )X

x xE L e e dF x F X−∞

= =∫

Taking the derivative *

3* { ( )} e ( )xd E L e f X

dX=

Evaluating e6

*6

*

{ ( )} e ' ( ) e ' 1 ( )x xX

E L e dF x F X∞

⎡ ⎤= = −⎣ ⎦∫

Taking the derivative *

6* { ( )} e ' ( )xd E L e f X

dX= −

Combining 3* { ( )}d E L edX

and 6* { ( )}d E L edX

*3 6* { ( )} ( )[e e ']x

d E L e e f XdX

+ = −

For the final solution all of the simplified derivative terms are combined

* * **

* *

{ ( )} ( )[ '] '

( )[ '] ' ( )[e e ']

X X X

X x

d E L e X F X a a a X a adX

F X b b b f X

μ μ< >= − + − − +

+ − + −

The value of X* that makes this solution equal zero is the optimal estimation.

78

Appendix C: Modification to SISIM Code

Modification to Reading Input Parameters: read(lin,*,err=98) ivtype write(*,*) ' variable type (1=continuous, 0=categorical)= ',ivtype read(lin,*,err=98) ncut write(*,*) ' number of thresholds / categories = ',ncut if(ncut.gt.MAXCUT) stop 'ncut is too big - modify .inc file' read(lin,*,err=98) (thres(i),i=1,ncut) write(*,*) ' thresholds / categories = ',(thres(i),i=1,ncut) read(lin,*,err=98) (cdf(i),i=1,ncut) write(*,*) ' global cdf / pdf = ',(cdf(i),i=1,ncut) read(lin,'(a40)',err=98) datafl call chknam(datafl,40) write(*,*) ' data file = ',datafl read(lin,*,err=98) ixl,iyl,izl,ivrl write(*,*) ' input columns = ',ixl,iyl,izl,ivrl read(lin,'(a40)',err=98) softfl call chknam(softfl,40) write(*,*) ' soft data file = ',softfl inquire(file=softfl,exist=testfl) if(testfl) then read(lin,*,err=98) ixs,iys,izs,(ivrs(i),i=1,ncut) write(*,*) ' columns = ',ixs,iys,izs,(ivrs(i),i=1,ncut) read(lin,*,err=98) imbsim write(*,*) ' Markov-Bayes simulation = ',imbsim if(imbsim.eq.1) then read(lin,*,err=98) (beez(i),i=1,ncut) else read(lin,*,err=98) end if else read(lin,*,err=98) read(lin,*,err=98) read(lin,*,err=98)

79

end if read(lin,*,err=98) tmin,tmax write(*,*) ' trimming limits ',tmin,tmax read(lin,*,err=98) zmin,zmax write(*,*) ' data limits (tails) ',zmin,zmax read(lin,*,err=98) ltail,ltpar write(*,*) ' lower tail = ',ltail,ltpar read(lin,*,err=98) middle,mpar write(*,*) ' middle = ',middle,mpar read(lin,*,err=98) utail,utpar write(*,*) ' upper tail = ',utail,utpar read(lin,'(a40)',err=98) tabfl call chknam(tabfl,40) write(*,*) ' file for tab. quant. ',tabfl read(lin,*,err=98) itabvr,itabwt write(*,*) ' columns for vr wt = ',itabvr,itabwt read(lin,*,err=98) idbg write(*,*) ' debugging level = ',idbg read(lin,'(a40)',err=98) dbgfl call chknam(dbgfl,40) write(*,*) ' debugging file = ',dbgfl read(lin,'(a40)',err=98) outfl call chknam(outfl,40) write(*,*) ' output file = ',outfl read(lin,*,err=98) nsim write(*,*) ' number of simulations = ',nsim read(lin,*,err=98) nx,xmn,xsiz write(*,*) ' X grid specification = ',nx,xmn,xsiz read(lin,*,err=98) ny,ymn,ysiz write(*,*) ' Y grid specification = ',ny,ymn,ysiz read(lin,*,err=98) nz,zmn,zsiz

80

write(*,*) ' Z grid specification = ',nz,zmn,zsiz nxy = nx*ny nxyz = nx*ny*nz read(lin,*,err=98) ixv(1) write(*,*) ' random number seed = ',ixv(1) do i=1,1000 p = acorni(idum) end do read(lin,*,err=98) ndmax write(*,*) ' ndmax = ',ndmax read(lin,*,err=98) nodmax write(*,*) ' max prev sim nodes = ',nodmax read(lin,*,err=98) maxsec write(*,*) ' max soft indicator data = ',maxsec read(lin,*,err=98) sstrat write(*,*) ' search strategy = ',sstrat read(lin,*,err=98) mults,nmult write(*,*) ' multiple grid search flag = ',mults,nmult read(lin,*,err=98) noct write(*,*) ' max per octant = ',noct read(lin,*,err=98) radius,radius1,radius2 write(*,*) ' search radii = ',radius,radius1,radius2 if(radius.lt.EPSLON) stop 'radius must be greater than zero' radsqd = radius * radius sanis1 = radius1 / radius sanis2 = radius2 / radius read(lin,*,err=98) sang1,sang2,sang3 write(*,*) ' search anisotropy angles = ',sang1,sang2,sang3 read(lin,*,err=98) mik,cutmik write(*,*) ' median IK switch = ',mik,cutmik read(lin,*,err=98) ktype write(*,*) ' kriging type switch = ',ktype c

81

c Output now goes to debugging file: c open(ldbg,file=dbgfl,status='UNKNOWN') do i=1,ncut read(lin,*,err=98) nst(i),c0(i) if(ivtype.eq.0) + write(ldbg,100) i,thres(i),cdf(i),nst(i),c0(i) if(ivtype.eq.1) + write(ldbg,101) i,thres(i),cdf(i),nst(i),c0(i) if(nst(i).gt.MAXNST) stop 'nst is too big' istart = 1 + (i-1)*MAXNST do j=1,nst(i) index = istart + j - 1 read(lin,*,err=98) it(index),cc(index),ang1(index), + ang2(index),ang3(index) if(it(index).eq.3) STOP 'Gaussian Model Not Allowed!' read(lin,*,err=98) aa(index),aa1,aa2 write(ldbg,102) j,it(index),aa(index),cc(index) anis1(index) = aa1 / max(EPSLON,aa(index)) anis2(index) = aa2 / max(EPSLON,aa(index)) write(ldbg,103) ang1(index),ang2(index),ang3(index), + anis1(index),anis2(index) end do end do c

• • •

Here the code is modified to read in the number or regions, domain indices, and loss function parameters.

cModified 11/21/2008 - Reading in Loss Functions Parameters c read(lin,*) nreg do ireg = 1, nreg read(lin,*) iid(ireg) read(lin,*) (iformunder(il,ireg),il=1,3) read(lin,*) (iformover(il,ireg),il=1,3) read(lin,*) (coeffover(il,ireg),il=1,3) read(lin,*) (coeffunder(il,ireg),il=1,3) end do

82

read(lin,'(a40)') fnamereg close(lin) 100 format(/,' Category number ',i2,' = ',f12.3,/, + ' global prob value = ',f8.4,/, + ' number of structures = ',i3,/, + ' nugget effect = ',f8.4) 101 format(/,' Threshold number ',i2,' = ',f12.3,/, + ' global prob value = ',f8.4,/, + ' number of structures = ',i3,/, + ' nugget effect = ',f8.4) 102 format( ' type of structure ',i3,' = ',i3,/, + ' aa parameter = ',f12.4,/, + ' cc parameter = ',f12.4) 103 format( ' ang1, ang2, ang3 = ',3f6.2,/, + ' anis1, anis2 = ',2f12.4) c c Perform some quick error checking: c if(nx.gt.MAXX) stop 'nx is too big - modify .inc file' if(ny.gt.MAXY) stop 'ny is too big - modify .inc file' if(nz.gt.MAXZ) stop 'nz is too big - modify .inc file'

• • •

Before the main Kriging loop the coefficient of the loss function are written to intermediate files.

c CALLED FunctionInputs.txt open(23,file='FunctionInputs.txt',status='unknown') c Trying to write a ccdf debug file for a particular region open(33,file='TempDebug.txt',status='unknown') do ireg=1,nreg write(23,*) iid(ireg) write(23,*) (iformover(il,ireg), il=1,3) write(23,*) (iformunder(il,ireg), il=1,3) write(23,131) (coeffover(il,ireg), il=1,3) write(23,131) (coeffunder(il,ireg), il=1,3) end do

83

131 format(3(f12.5,1x)) close(23) open(23,file=fnamereg,status='old') do i=1,3 read(23,*) end do do ixyz=1,nxyz read(23,*) iregind(ixyz) end do

• • •

This part of the code is modified so that nodes outside of the designated regions are estimated using kriging, but nodes within the sub-domains are estimated using the prescribed loss functions.

c Use the global distribution? c if((nclose+ncnode).le.0) then call beyond(ivtype,ncut,thres,cdf,ng,gcut,gcdf, + zmin,zmax,ltail,ltpar,middle,mpar, + utail,utpar,zval,cdfval,ierr) else c c Estimate the local distribution by indicator kriging: c do ic=1,ncut call krige(ix,iy,iz,xx,yy,zz,ic,cdf(ic), + ccdf(ic)) end do c c Correct order relations: c call ordrel(ivtype,ncut,ccdf,ccdfo,nviol,aviol, + xviol) c call lossfunc(ivtype,ncut,ccdf,3,lundertype,coeffunder, c + lovertype,coeffover,zval) c c Draw from the local distribution:

84

c if(iregind(index).le.UNEST) then call beyond(ivtype,ncut,thres,ccdfo,ng,gcut, + gcdf,zmin,zmax,ltail,ltpar,middle, + mpar,utail,utpar,zval,cdfval,ierr) c c Write some debugging information: c c Changes to accommodate loss function calculations. First write c the current cdf information to a file - Loss Inputs.txt c else do ic=1,ncut-1 if((ccdfo(ic+1)-ccdfo(ic)).eq.1.0) then call beyond(ivtype,ncut,thres,ccdfo,ng,gcut, + gcdf,zmin,zmax,ltail,ltpar,middle, + mpar,utail,utpar,zval,cdfval,ierr) go to 323 endif end do open(23,file='Loss Inputs.txt',status='unknown') write(23,*) ncut write(23,*) iregind(index) write(23,232) zmin, zmax write(23,231) (thres(ic),ic=1,ncut) write(23,231) (ccdfo(ic),ic=1,ncut) close(23) 232 format(2(f8.4,1x)) 231 format(5(f12.5)) JJ = system("LossOpt12")

• • •

Here the estimated values for the loss function are outputted to an intermediate file and combined with the kriged estimates to generate the realization.

c Code writes values to an intermediate file - OptOut.txt open(23,file='Optout.txt',status='old') read(23,*) zval close(23) if(iregind(index).eq.1) then

85

write(33,*) (ccdfo(ic),ic=1,ncut) endif endif 323 if(idbg.ge.3) then do ic=1,ncut write(ldbg,202) ccdf(ic),ccdfo(ic) 202 format(' CDF (original and fixed)',2f7.4) end do endif endif sim(index) = zval c c END MAIN LOOP OVER NODES:

86

Appendix D: Code for Implementation Analytical Solution for Optimized Loss Function

Private Sub Main() 'These are the variable that sisim has calculated in the original program 'These are read in from a text file ReDim ccdfvalue(1, 20) As Variant ReDim ccdfprob(1, 20) As Variant Dim sisimregindex, ncuts As Integer Dim ccdfvalueat1, ccdfvalue0, seed As Double 'These are loss function parameters 'Region 1 Dim coeffover1reg1, coeffover2reg1, coeffover3reg1 As Single Dim coeffunder1reg1, coeffunder2reg1, coeffunder3reg1 As Single Dim quadoverreg1, lineoverreg1, constoverreg1 As Integer Dim quadunderreg1, lineunderreg1, constunderreg1 As Integer 'Region 2 Dim coeffover1reg2, coeffover2reg2, coeffover3reg2 As Single Dim coeffunder1reg2, coeffunder2reg2, coeffunder3reg2 As Single Dim quadoverreg2, lineoverreg2, constoverreg2 As Integer Dim quadunderreg2, lineunderreg2, constunderreg2 As Integer 'Region 3 Dim coeffover1reg3, coeffover2reg3, coeffover3reg3 As Single Dim coeffunder1reg3, coeffunder2reg3, coeffunder3reg3 As Single Dim quadoverreg3, lineoverreg3, constoverreg3 As Integer Dim quadunderreg3, lineunderreg3, constunderreg3 As Integer 'Region 4 Dim coeffover1reg4, coeffover2reg4, coeffover3reg4 As Single Dim coeffunder1reg4, coeffunder2reg4, coeffunder3reg4 As Single Dim quadoverreg4, lineoverreg4, constoverreg4 As Integer Dim quadunderreg4, lineunderreg4, constunderreg4 As Integer Dim indexreg1, indexreg2, indexreg3, indexreg4 As Integer Dim coeffover1gen, coeffover2gen, coeffover3gen, coeffover4gen As Double Dim coeffunder1gen, coeffunder2gen, coeffunder3gen, coeffunder4gen As Double Dim muylower, muyhigher, muy As Single 'These are the variable need for the internal program Dim pxy0, pxytail As Single

87

ReDim Fxyslope(20) As Variant 'These are temporary variables used in computing ReDim pxyarray(20), product(20), pxytemp(20), ccdfvaluetemp(20), guess(20) As

Variant Dim Fxy, pxy, sumproduct As Double 'This is going to be zero becuase of how probability is defined as a horizontal line. Dim pxyprime As Single Dim Yinitial, epsilon, Nmax, sum, area, upperthresh As Single Dim Jprime, Ynew, Yoptimal As Double Dim Jfunc(10000), Y(10000) As Variant Dim i, J, n, g, Iterations, index As Single 'These are counters 'These are variable used for I/O of Data Dim LossFile, FunctionInputs, OptOut, JFuncOut As String 'Another attemp at finding the solution has new variables Dim TrueProb, TruePerm, JfuncMin As Single '------------------------------------------------------------------------------------------ 'Reading in the data from the two files. I also declare the name of the output file LossFile = "Loss Inputs.txt" OptOut = "OptOut.txt" FunctionInputs = "FunctionInputs.txt" JFuncOut = "JFuncOut.txt" Open LossFile For Input As #1 ' Open file for input. ' Opens Output file outside of main code, so it doesn't have to keep reopening 'This data being read in is the ccdfvalues, associated probabilities, and regional indix i = 0 Do While Not EOF(1) ' Loop until end of file. i = i + 1 If i = 1 Then Input #1, ncuts ElseIf i = 2 Then Input #1, sisimregindex ElseIf i = 3 Then Input #1, ccdfvalue0, ccdfvalueat1 ElseIf i = 4 Then Do While J < ncuts Input #1, ccdfvalue(1, J + 1) J = J + 1

88

Loop ElseIf i = 5 Then Do While n < ncuts Input #1, ccdfprob(1, n + 1) n = n + 1 Loop ElseIf i = 6 Then Input #1, seed End If Loop Close #1 ' Close file. Open FunctionInputs For Input As #2 'Reading in constants and indicators for the loss function for the regions i = 0 Do While Not EOF(2) i = i + 1 If i = 1 Then Input #2, indexreg1 ElseIf i = 2 Then Input #2, quadoverreg1, lineoverreg1, constoverreg1 ElseIf i = 3 Then Input #2, quadunderreg1, lineunderreg1, constunderreg2 ElseIf i = 4 Then Input #2, coeffover1reg1, coeffover2reg1, coeffover3reg1 ElseIf i = 5 Then Input #2, coeffunder1reg1, coeffunder2reg1, coeffunder3reg1 ElseIf i = 6 Then Input #2, indexreg2 ElseIf i = 7 Then Input #2, quadoverreg2, lineoverreg2, constoverreg2 ElseIf i = 8 Then Input #2, quadunderreg2, lineunderreg2, constunderreg2 ElseIf i = 9 Then Input #2, coeffover1reg2, coeffover2reg2, coeffover3reg2 ElseIf i = 10 Then Input #2, coeffunder1reg2, coeffunder2reg2, coeffunder3reg2 ElseIf i = 11 Then Input #2, indexreg3 ElseIf i = 12 Then Input #2, quadoverreg3, lineoverreg3, constoverreg3 ElseIf i = 13 Then Input #2, quadunderreg3, lineunderreg3, constunderreg3 ElseIf i = 14 Then Input #2, coeffover1reg3, coeffover2reg3, coeffover3reg3

89

ElseIf i = 15 Then Input #2, coeffunder1reg3, coeffunder2reg3, coeffunder3reg3 ElseIf i = 16 Then Input #2, indexreg4 ElseIf i = 17 Then Input #2, quadoverreg4, lineoverreg4, constoverreg4 ElseIf i = 18 Then Input #2, quadunderreg4, lineunderreg4, constunderreg4 ElseIf i = 19 Then Input #2, coeffover1reg4, coeffover2reg4, coeffover3reg4 ElseIf i = 20 Then Input #2, coeffunder1reg4, coeffunder2reg4, coeffunder3reg4 End If Loop Close #2 '----------------------------------------------------------------------- 'Assigning coefficient based of region index read from sisim 'This part of the code also checks to see if coefficient is considered 'Check for region 1 coefficients If sisimregindex = indexreg1 Then If quadoverreg1 = 0 Then coeffover1gen = 0 Else coeffover1gen = coeffover1reg1 End If If lineoverreg1 = 0 Then coeffover2gen = 0 Else coeffover2gen = coeffover2reg1 End If If constoverreg1 = 0 Then coeffover3gen = 0 Else coeffover3gen = coeffover3reg1 End If If quadunderreg1 = 0 Then coeffunder1gen = 0 Else coeffunder1gen = coeffunder1reg1 End If If lineunderreg1 = 0 Then coeffunder2gen = 0 Else coeffunder2gen = coeffunder2reg1

90

End If If constunderreg1 = 0 Then coeffunder3gen = 0 Else coeffunder3gen = coeffunder3reg1 End If End If 'Check for region 2 coefficients If sisimregindex = indexreg2 Then If quadoverreg2 = 0 Then coeffover1gen = 0 Else coeffover1gen = coeffover1reg2 End If If lineoverreg2 = 0 Then coeffover2gen = 0 Else coeffover2gen = coeffover2reg2 End If If constoverreg2 = 0 Then coeffover3gen = 0 Else coeffover3gen = coeffover3reg2 End If If quadunderreg2 = 0 Then coeffunder1gen = 0 Else coeffunder1gen = coeffunder1reg2 End If If lineunderreg2 = 0 Then coeffunder2gen = 0 Else coeffunder2gen = coeffunder2reg2 End If If constunderreg2 = 0 Then coeffunder3gen = 0 Else coeffunder3gen = coeffunder3reg2 End If End If 'Check for region 3 coefficients If sisimregindex = indexreg3 Then

91

If quadoverreg3 = 0 Then coeffover1gen = 0 Else coeffover1gen = coeffover1reg3 End If If lineoverreg3 = 0 Then coeffover2gen = 0 Else coeffover2gen = coeffover2reg3 End If If constoverreg3 = 0 Then coeffover3gen = 0 Else coeffover3gen = coeffover3reg3 End If If quadunderreg3 = 0 Then coeffunder1gen = 0 Else coeffunder1gen = coeffunder1reg3 End If If lineunderreg3 = 0 Then coeffunder2gen = 0 Else coeffunder2gen = coeffunder2reg3 End If If constunderreg3 = 0 Then coeffunder3gen = 0 Else coeffunder3gen = coeffunder3reg3 End If End If 'Check for region 4 coefficients If sisimregindex = indexreg4 Then If quadoverreg4 = 0 Then coeffover1gen = 0 Else coeffover1gen = coeffover1reg4 End If If lineoverreg4 = 0 Then coeffover2gen = 0 Else coeffover2gen = coeffover2reg4 End If If constoverreg4 = 0 Then

92

coeffover3gen = 0 Else coeffover3gen = coeffover3reg4 End If If quadunderreg4 = 0 Then coeffunder1gen = 0 Else coeffunder1gen = coeffunder1reg4 End If If lineunderreg4 = 0 Then coeffunder2gen = 0 Else coeffunder2gen = coeffunder2reg4 End If If constunderreg4 = 0 Then coeffunder3gen = 0 Else coeffunder3gen = coeffunder3reg4 End If End If '----------------------------------------------------------------------------------- Nmax = 32000 '----------------------------------------------------------------------------------- 'Making sure the lcdf monotonically increases i = 1 Line1: If ccdfprob(1, 1) = 0 Then For i = 1 To ncuts ccdfprob(1, i) = ccdfprob(1, i) + 0.001 Next i GoTo Line1 Else i = 1 For i = 1 To ncuts If ccdfprob(1, i) = ccdfprob(1, i + 1) Then J = i For J = i To ncuts ccdfprob(1, J + 1) = ccdfprob(1, J + 1) + 0.001 Next J End If Next i End If

93

ccdfprob(1, ncuts + 1) = ccdfprob(1, 20) i = 1 For i = 1 To ncuts Debug.Print ccdfprob(1, i) Next i muy = 0 sum = 0 For i = 1 To ncuts - 1 muy = (ccdfvalue(1, i) + ccdfvalue(1, i + 1)) / 2 * (ccdfprob(1, i + 1) - ccdfprob(1, i)) sum = sum + muy Next i muy = sum + (ccdfvalue0 + ccdfvalue(1, 1)) / 2 * ccdfprob(1, 1) + (ccdfvalueat1 + ccdfvalue(1, ncuts)) / 2 * (1 - ccdfprob(1, ncuts)) '------------------------------------------------------------------------------------------------- 'This just checks to see if probabilities are being calculated correctly J = 1 sum = 0 Do While J < ncuts pxyarray(J) = ccdfprob(1, J + 1) - ccdfprob(1, J) sum = sum + pxyarray(J) J = J + 1 Loop sum = sum + ccdfprob(1, 1) + (1 - ccdfprob(1, ncuts)) 'debug.print (sum) 'Sum =1 so this means all probabilities are accounted for '-------------------------------------------------------------------------------------------------- i = 1 Iterations = 10000 For g = 1 To Iterations 'Randomize TrueProb = Rnd(seed) n = 1 For n = 1 To ncuts If TrueProb < ccdfprob(1, 1) Then TruePerm = (ccdfvalue(1, 1) - ccdfvalue0) / ccdfprob(1, 1) * _ (TrueProb - ccdfprob(1, 1)) + ccdfvalue(1, 1) Exit For ElseIf TrueProb > ccdfprob(1, n) And TrueProb < ccdfprob(1, n + 1) Then TruePerm = (ccdfvalue(1, n + 1) - ccdfvalue(1, n)) / (ccdfprob(1, n + 1) - ccdfprob(1, n)) * _ (TrueProb - ccdfprob(1, n)) + ccdfvalue(1, n)

94

Exit For ElseIf TrueProb > ccdfprob(1, ncuts) Then TruePerm = (ccdfvalueat1 - ccdfvalue(1, ncuts)) / (1 - ccdfprob(1, ncuts)) * _ (TrueProb - ccdfprob(1, ncuts)) + ccdfvalue(1, ncuts) Exit For End If Next n 'Debug.Print TruePerm(i) Y(g) = TruePerm 'This is main loop J = 1 Do While J < ncuts + 1 If Y(g) > ccdfvalue0 And Y(g) < ccdfvalue(1, 1) Then pxy = ccdfprob(1, 1) Fxyslopetemp = (ccdfprob(1, 1) - 0) / (ccdfvalue(1, 1) - ccdfvalue0) Fxy = Fxyslopetemp * (Y(g) - ccdfvalue(1, 1)) + ccdfprob(1, 1) J = 1 muyuppertemp = 0 Do While J < ncuts + 1 If J < ncuts Then pxytemp(J) = ccdfprob(1, J + 1) - ccdfprob(1, J) product(J) = pxytemp(J) * (ccdfvalue(1, J + 1) + ccdfvalue(1, J)) / 2 Else pxytemp(J) = 1 - ccdfprob(1, ncuts) product(J) = pxytemp(J) * (ccdfvalueat1 + ccdfvalue(1, ncuts)) / 2 End If muyuppertemp = muyuppertemp + product(J) J = J + 1 Loop probbelongingtolower = pxy * (Y(g) - ccdfvalue0) / (ccdfvalue(1, 1) –

ccdfvalue0) probbelongingtoupper = pxy * (ccdfvalue(1, 1) - Y(g)) / (ccdfvalue(1, 1) –

ccdfvalue0) 'debug.Print (probbelongingtolower + probbelongingtoupper) muylower = (probbelongingtolower * (Y(g) + ccdfvalue0) / 2) Muyupper = muyuppertemp + probbelongingtoupper * (ccdfvalue(1, 1) + Y(g))

/ 2 'debug.Print (muyupper) 'debug.Print (muylower) 'debug.Print (muyupper + muylower) Exit Do

95

ElseIf Y(g) > ccdfvalue(1, J) And Y(g) < ccdfvalue(1, J + 1) Then pxy = ccdfprob(1, J + 1) - ccdfprob(1, J) Fxyslopetemp = (ccdfprob(1, J + 1) - ccdfprob(1, J)) _ / (ccdfvalue(1, J + 1) - ccdfvalue(1, J)) Fxy = Fxyslopetemp * (Y(g) - ccdfvalue(1, J + 1)) + ccdfprob(1, J + 1) i = 1 muylower = 0 Do While i < J pxytemp(i) = ccdfprob(1, i + 1) - ccdfprob(1, i) product(i) = pxytemp(i) * (ccdfvalue(1, i + 1) + ccdfvalue(1, i)) / 2 muylower = muylower + product(i) i = i + 1 Loop i = ncuts Muyupper = 0 Do While i > J If i < ncuts Then pxytemp(i) = ccdfprob(1, i + 1) - ccdfprob(1, i) product(i) = pxytemp(i) * (ccdfvalue(1, i + 1) + ccdfvalue(1, i)) / 2 Else pxytemp(i) = 1 - ccdfprob(1, ncuts) product(i) = pxytemp(i) * (ccdfvalueat1 + ccdfvalue(1, ncuts)) / 2 End If Muyupper = Muyupper + product(i) i = i - 1 Loop 'This calculates the restandardizing area for each part probbelongingtolower = pxy * (Y(g) - ccdfvalue(1, J)) / (ccdfvalue(1, J + 1) –

ccdfvalue(1, J)) probbelongingtoupper = pxy * (ccdfvalue(1, J + 1) - Y(g)) / (ccdfvalue(1, J + 1)

- ccdfvalue(1, J)) 'debug.Print (probbelongingtolower + probbelongingtoupper) 'This is adjusting the muy's for the part of the interval Y falls within muylower = muylower + (Y(g) + ccdfvalue(1, J)) / 2 * probbelongingtolower

+ (ccdfvalue(1, 1) + ccdfvalue0) / 2 * ccdfprob(1, 1) Muyupper = Muyupper + (Y(g) + ccdfvalue(1, J + 1)) / 2 * probbelongingtoupper

'debug.Print muylower 'debug.Print muyupper 'debug.Print (muylower + muyupper)

96

Exit Do ElseIf Y(g) > ccdfvalue(1, ncuts) And Y(g) < ccdfvalueat1 Then pxy = 1 - ccdfprob(1, ncuts) Fxyslopetemp = (1 - ccdfprob(1, ncuts)) / (ccdfvalueat1 - ccdfvalue(1,

ncuts)) Fxy = Fxyslopetemp * (Y(g) - ccdfvalue(1, ncuts)) + ccdfprob(1, ncuts) i = 1 muylower = 0 'Initializing the lower mean Do While i < ncuts pxytemp(i) = ccdfprob(1, i + 1) - ccdfprob(1, i) product(i) = pxytemp(i) * (ccdfvalue(1, i + 1) + ccdfvalue(1, i)) / 2 muylower = muylower + product(i) i = i + 1 Loop probbelongingtolower = pxy * (Y(g) - ccdfvalue(1, ncuts)) / (ccdfvalueat1 –

ccdfvalue(1, ncuts)) probbelongingtoupper = pxy * (ccdfvalueat1 - Y(g)) / (ccdfvalueat1 –

ccdfvalue(1, ncuts)) ' Debug.Print (probbelongingtolower + probbelongingtoupper) muylower = (ccdfvalue(1, 1) + ccdfvalue0) / 2 * ccdfprob(1, 1) + muylower

+ (Y(g) + ccdfvalue(1, ncuts)) / 2 * probbelongingtolower Muyupper = (Y(g) + ccdfvalueat1) / 2 * probbelongingtoupper 'debug.Print (muylower) 'debug.Print (muyupper) 'debug.Print (muylower + muyupper) Exit Do End If J = J + 1 Loop 'Debug.Print "Current Y estimation is:"; Y(g) 'Debug.Print "Pxy is:", pxy, "Fxy is:", Fxy 'Debug.Print "MuyLower is:", muylower 'Debug.Print "MuyUpper is:", Muyupper 'Debug.Print "Muy is:", muy 'Debug.Print "The Difference between Mu and Mulower+Muyupper is:", (muy - (muylower + Muyupper)) '---------------------------------------------------------------------------------------------- 'This is the analytical part 'Jfunc(g) = Y(g) * ((coeffunder1gen - coeffover1gen) * Fxy + coeffover1gen) - coeffunder1gen * muylower - coeffover1gen * Muyupper - _ coeffover2gen + Fxy * (coeffunder2gen + coeffover2gen) + (coeffunder3gen - coeffover3gen) * pxy

97

Jfunc(g) = Y(g) * ((coeffover1gen - coeffunder1gen) * Fxy + coeffunder1gen) - coeffover1gen * muylower - coeffunder1gen * Muyupper - _

coeffunder2gen + Fxy * (coeffover2gen + coeffunder2gen) + (coeffunder3gen - coeffover3gen) * pxy

'Debug.Print "Jfunc is", Jfunc(g) Next g 'Output the Y and Jfunc Values for debuggin-------------- Open JFuncOut For Output As #4 i = 1 For i = 1 To Iterations Print #4, Y(i); Jfunc(i) Next i Close #4 '------------------------------------------- 'Have added For loop to find the minimum g = 1 For g = 1 To Iterations Jfunc(g) = Abs(Jfunc(g)) Next g g = 1 JfuncMin = Jfunc(1) For g = 1 To Iterations If Jfunc(g) < JfuncMin Then JfuncMin = Jfunc(g) index = g End If Next g 'Output Answer--------------------------- Yoptimal = Y(index) Debug.Print (Yoptimal) Debug.Print (Jfunc(index)) Debug.Print sisimregindex i = 0 Open OptOut For Output As #3 Print #3, (Yoptimal) Debug.Print (Yoptimal) Close #3

End Sub

98

Appendix E: Code for Implementation Numerical Solution for Optimized Loss Function

Private Sub Main() 'These are the variable that sisim has calculated in the original program 'These are read in from a text file ReDim ccdfvalue(1, 20) As Variant ReDim ccdfprob(1, 20) As Variant Dim sisimregindex, ncuts As Integer Dim ccdfvalueat1, ccdfvalue0, seed As Double 'These are loss function parameters 'Region 1 Dim coeffover1reg1, coeffover2reg1, coeffover3reg1 As Single Dim coeffunder1reg1, coeffunder2reg1, coeffunder3reg1 As Single Dim quadoverreg1, lineoverreg1, constoverreg1 As Integer Dim quadunderreg1, lineunderreg1, constunderreg1 As Integer 'Region 2 Dim coeffover1reg2, coeffover2reg2, coeffover3reg2 As Single Dim coeffunder1reg2, coeffunder2reg2, coeffunder3reg2 As Single Dim quadoverreg2, lineoverreg2, constoverreg2 As Integer Dim quadunderreg2, lineunderreg2, constunderreg2 As Integer 'Region 3 Dim coeffover1reg3, coeffover2reg3, coeffover3reg3 As Single Dim coeffunder1reg3, coeffunder2reg3, coeffunder3reg3 As Single Dim quadoverreg3, lineoverreg3, constoverreg3 As Integer Dim quadunderreg3, lineunderreg3, constunderreg3 As Integer 'Region 4 Dim coeffover1reg4, coeffover2reg4, coeffover3reg4 As Single Dim coeffunder1reg4, coeffunder2reg4, coeffunder3reg4 As Single Dim quadoverreg4, lineoverreg4, constoverreg4 As Integer Dim quadunderreg4, lineunderreg4, constunderreg4 As Integer Dim indexreg1, indexreg2, indexreg3, indexreg4 As Integer Dim coeffover1gen, coeffover2gen, coeffover3gen, coeffover4gen As Double Dim coeffunder1gen, coeffunder2gen, coeffunder3gen, coeffunder4gen As Double Dim muylower, muyhigher As Single 'These are the variable need for the internal program Dim pxy0, pxytail As Single ReDim Fxyslope(20) As Variant

99

'These are temporary variables used in computing ReDim pxyarray(20), product(20), pxytemp(20), ccdfvaluetemp(20), guess(20) As

Variant Dim Fxy, pxy, sumproduct As Double 'This is going to be zero becuase of how probability is defined as a horizontal line. Dim pxyprime As Single Dim Yinitial, epsilon, Nmax, sum, area, upperthresh As Single Dim Jfunc, Jprime, Ynew, Y, Yoptimal As Double Dim i, j, n, g, Iterations As Integer 'These are counters 'These are variable used for I/O of Data Dim LossFile, FunctionInputs, OptOut As String '----------------------------------------------------------- 'Variable from Second Attempt Dim LossEv, MinLoss As Single Dim Loss(1000), TrueProb(1000), TruePerm(1000), BaseProb(1000) As Variant Dim BasePerm(1000), Err(1000), LossArray(1000) As Variant 'Reading in Variables----------------------------------------------------------------- 'Reading in the data from the two files. I also declare the name of the output file LossFile = "Loss Inputs.txt" OptOut = "OptOut.txt" FunctionInputs = "FunctionInputs.txt" Open LossFile For Input As #1 ' Open file for input. ' Opens Output file outside of main code, so it doesn't have to keep reopening 'This data being read in is the ccdfvalues, associated probabilities, and regional indix i = 0 Do While Not EOF(1) ' Loop until end of file. i = i + 1 If i = 1 Then Input #1, ncuts ElseIf i = 2 Then Input #1, sisimregindex ElseIf i = 3 Then Input #1, ccdfvalue0, ccdfvalueat1 ElseIf i = 4 Then Do While j < ncuts Input #1, ccdfvalue(1, j + 1) j = j + 1

100

Loop ElseIf i = 5 Then Do While n < ncuts Input #1, ccdfprob(1, n + 1) n = n + 1 Loop ElseIf i = 6 Then Input #1, seed End If Loop Close #1 ' Close file. Open FunctionInputs For Input As #2 'Reading in constants and indicators for the loss function for the regions i = 0 Do While Not EOF(2) i = i + 1 If i = 1 Then Input #2, indexreg1 ElseIf i = 2 Then Input #2, quadoverreg1, lineoverreg1, constoverreg1 ElseIf i = 3 Then Input #2, quadunderreg1, lineunderreg1, constunderreg2 ElseIf i = 4 Then Input #2, coeffover1reg1, coeffover2reg1, coeffover3reg1 ElseIf i = 5 Then Input #2, coeffunder1reg1, coeffunder2reg1, coeffunder3reg1 ElseIf i = 6 Then Input #2, indexreg2 ElseIf i = 7 Then Input #2, quadoverreg2, lineoverreg2, constoverreg2 ElseIf i = 8 Then Input #2, quadunderreg2, lineunderreg2, constunderreg2 ElseIf i = 9 Then Input #2, coeffover1reg2, coeffover2reg2, coeffover3reg2 ElseIf i = 10 Then Input #2, coeffunder1reg2, coeffunder2reg2, coeffunder3reg2 ElseIf i = 11 Then Input #2, indexreg3 ElseIf i = 12 Then Input #2, quadoverreg3, lineoverreg3, constoverreg3 ElseIf i = 13 Then Input #2, quadunderreg3, lineunderreg3, constunderreg3 ElseIf i = 14 Then Input #2, coeffover1reg3, coeffover2reg3, coeffover3reg3

101

ElseIf i = 15 Then Input #2, coeffunder1reg3, coeffunder2reg3, coeffunder3reg3 ElseIf i = 16 Then Input #2, indexreg4 ElseIf i = 17 Then Input #2, quadoverreg4, lineoverreg4, constoverreg4 ElseIf i = 18 Then Input #2, quadunderreg4, lineunderreg4, constunderreg4 ElseIf i = 19 Then Input #2, coeffover1reg4, coeffover2reg4, coeffover3reg4 ElseIf i = 20 Then Input #2, coeffunder1reg4, coeffunder2reg4, coeffunder3reg4 End If Loop Close #2 '----------------------------------------------------------------------- 'Assigning coefficient based of region index read from sisim 'This part of the code also checks to see if coefficient is considered 'Check for region 1 coefficients If sisimregindex = indexreg1 Then If quadoverreg1 = 0 Then coeffover1gen = 0 Else coeffover1gen = coeffover1reg1 End If If lineoverreg1 = 0 Then coeffover2gen = 0 Else coeffover2gen = coeffover2reg1 End If If constoverreg1 = 0 Then coeffover3gen = 0 Else coeffover3gen = coeffover3reg1 End If If quadunderreg1 = 0 Then coeffunder1gen = 0 Else coeffunder1gen = coeffunder1reg1 End If If lineunderreg1 = 0 Then coeffunder2gen = 0 Else coeffunder2gen = coeffunder2reg1

102

End If If constunderreg1 = 0 Then coeffunder3gen = 0 Else coeffunder3gen = coeffunder3reg1 End If End If 'Check for region 2 coefficients If sisimregindex = indexreg2 Then If quadoverreg2 = 0 Then coeffover1gen = 0 Else coeffover1gen = coeffover1reg2 End If If lineoverreg2 = 0 Then coeffover2gen = 0 Else coeffover2gen = coeffover2reg2 End If If constoverreg2 = 0 Then coeffover3gen = 0 Else coeffover3gen = coeffover3reg2 End If If quadunderreg2 = 0 Then coeffunder1gen = 0 Else coeffunder1gen = coeffunder1reg2 End If If lineunderreg2 = 0 Then coeffunder2gen = 0 Else coeffunder2gen = coeffunder2reg2 End If If constunderreg2 = 0 Then coeffunder3gen = 0 Else coeffunder3gen = coeffunder3reg2 End If End If 'Check for region 3 coefficients If sisimregindex = indexreg3 Then

103

If quadoverreg3 = 0 Then coeffover1gen = 0 Else coeffover1gen = coeffover1reg3 End If If lineoverreg3 = 0 Then coeffover2gen = 0 Else coeffover2gen = coeffover2reg3 End If If constoverreg3 = 0 Then coeffover3gen = 0 Else coeffover3gen = coeffover3reg3 End If If quadunderreg3 = 0 Then coeffunder1gen = 0 Else coeffunder1gen = coeffunder1reg3 End If If lineunderreg3 = 0 Then coeffunder2gen = 0 Else coeffunder2gen = coeffunder2reg3 End If If constunderreg3 = 0 Then coeffunder3gen = 0 Else coeffunder3gen = coeffunder3reg3 End If End If 'Check for region 4 coefficients If sisimregindex = indexreg4 Then If quadoverreg4 = 0 Then coeffover1gen = 0 Else coeffover1gen = coeffover1reg4 End If If lineoverreg4 = 0 Then coeffover2gen = 0 Else coeffover2gen = coeffover2reg4 End If If constoverreg4 = 0 Then

104

coeffover3gen = 0 Else coeffover3gen = coeffover3reg4 End If If quadunderreg4 = 0 Then coeffunder1gen = 0 Else coeffunder1gen = coeffunder1reg4 End If If lineunderreg4 = 0 Then coeffunder2gen = 0 Else coeffunder2gen = coeffunder2reg4 End If If constunderreg4 = 0 Then coeffunder3gen = 0 Else coeffunder3gen = coeffunder3reg4 End If End If '--------------------------------------------------------------------------------- 'Check to see if lcdf is monotonically increasing Iterations = 1000 i = 1 Line1: If ccdfprob(1, 1) = 0 Then For i = 1 To ncuts ccdfprob(1, i) = ccdfprob(1, i) + 0.001 Next i GoTo Line1 Else i = 1 For i = 1 To ncuts If ccdfprob(1, i) = ccdfprob(1, i + 1) Then j = i For j = i To ncuts ccdfprob(1, j + 1) = ccdfprob(1, j + 1) + 0.001 Next j End If Next i End If ccdfprob(1, ncuts + 1) = ccdfprob(1, 20)

105

'Sample the true Perm----------------------------------- i = 1 For i = 1 To Iterations 'Randomize TrueProb(i) = Rnd(seed) n = 1 For n = 1 To ncuts If TrueProb(i) < ccdfprob(1, 1) Then TruePerm(i) = (ccdfvalue(1, 1) - ccdfvalue0) / ccdfprob(1, 1) * _ (TrueProb(i) - ccdfprob(1, 1)) + ccdfvalue(1, 1) Exit For ElseIf TrueProb(i) > ccdfprob(1, n) And TrueProb(i) < ccdfprob(1, n + 1) Then TruePerm(i) = (ccdfvalue(1, n + 1) - ccdfvalue(1, n)) / (ccdfprob(1, n + 1) –

ccdfprob(1, n)) * _ (TrueProb(i) - ccdfprob(1, n)) + ccdfvalue(1, n) Exit For ElseIf TrueProb(i) > ccdfprob(1, ncuts) Then TruePerm(i) = (ccdfvalueat1 - ccdfvalue(1, ncuts)) / (1 - ccdfprob(1, ncuts)) * _ (TrueProb(i) - ccdfprob(1, ncuts)) + ccdfvalue(1, ncuts) Exit For End If Next n 'Debug.Print TruePerm(i) Next i 'Sample the Base Perm------------------------------------ i = 1 For i = 1 To Iterations 'Randomize BaseProb(i) = Rnd(seed) n = 1 For n = 1 To ncuts If BaseProb(i) < ccdfprob(1, 1) Then BasePerm(i) = (ccdfvalue(1, 1) - ccdfvalue0) / ccdfprob(1, 1) * _ (BaseProb(i) - ccdfprob(1, 1)) + ccdfvalue(1, 1) Exit For ElseIf BaseProb(i) > ccdfprob(1, n) And BaseProb(i) < ccdfprob(1, n + 1) Then BasePerm(i) = (ccdfvalue(1, n + 1) - ccdfvalue(1, n)) / (ccdfprob(1, n + 1) –

ccdfprob(1, n)) * _ (BaseProb(i) - ccdfprob(1, n)) + ccdfvalue(1, n) Exit For ElseIf BaseProb(i) > ccdfprob(1, ncuts) Then BasePerm(i) = (ccdfvalueat1 - ccdfvalue(1, ncuts)) / (1 - ccdfprob(1, ncuts)) * _ (BaseProb(i) - ccdfprob(1, ncuts)) + ccdfvalue(1, ncuts)

106

Exit For End If Next n 'Debug.Print BasePerm(i) Next i 'determining loss and expected loss---------------------------- i = 1 For i = 1 To Iterations 'Outerloop j = 1 LossEv = 0 For j = 1 To Iterations Err(j) = BasePerm(i) - TruePerm(j) If Err(j) > 0 Then Loss(j) = coeffover1gen * Err(j) ^ 2 + coeffover2gen * Err(j) + coeffover3gen Else Loss(j) = coeffunder1gen * Err(j) ^ 2 + coeffunder2gen * Abs(Err(j)) + coeffunder3gen End If LossEv = LossEv + Loss(j) Next j LossEv = LossEv / Iterations LossArray(i) = LossEv Next i i = 1 MinLoss = LossArray(1) For i = 1 To Iterations If MinLoss > LossArray(i) Then MinLoss = LossArray(i) Index = i Else End If Next i OptPerm = BasePerm(Index) Debug.Print (OptPerm) Debug.Print sisimregindex Open OptOut For Output As #3 Print #3, (Round(OptPerm, 3)) Close #3 End Sub

107

References

Alabert, F.G., Aquitaine, E., and Modot, V. (1992). Stochastic models of reservoir heterogeneity: impact on connectivity and average permeabilities. Paper SPE 24893 presented at the 67th Annual Technical Conference and Exhibition of the Society of Petroleum Engineers. Washington, D.C., U.S.A., Oct. 4-7, 1992.

Andres-Ferrer, J., Ortiz-Martinez, D., Garcia-Varea, I., and Casacuberta, F. (2007). On the use of different loss functions in statistical pattern recognition applied to machine. Science Direct.

Artiles-Leon, N. (1996-97). A pragmatic approach to multi-response problems using loss function. Quality Engineering, 9 (2), 213-220.

Bohling, Geoff (2005). Kriging. Kansas Geological Survey. C&PE 940.

Caers, J. (2001). Direct sequential indicator simulation. Department of Petroleum Engineering, Stanford University, Stanford, USA.

Casteel, Jerry (1997). Increased oil production and reserves from improved completion techniques in the Bluebell Field, Uinta Basin, Utah. Quarterly Technical Progress Report. United States DOE.

Culham, W.E., Farouq Ali, S.M., and Stahl, C.D. (1968). Experimental and numerical simulation of two-phase flow with interphase mass transfer in one and two dimensions. Paper SPE 2187 presented at the 43rd Annual Fall Meeting of the Society of Petroleum Engineers of AIME. Houston, Texas, U.S.A., Sept. 29-Oct. 2, 1968.

Deutsch, C.V. and Journel, A.G. (1998). Geostatistical Software Library and User’s Guide. 2nd Edition. New York: Oxford University Press.

Farouq Ali, S.M. and Nielsen, R.F. (1970). The material balance approach vs. reservoir simulation as an aid to understanding reservoir mechanics. Paper SPE 3080 presented at the 45th Annual Fall Meeting of the Society of Petroleum Engineers of AIME. Houston, Texas, U.S.A., Oct. 4-7, 1970.

Gonzalez, R., Schepers, K., and Reeves, S.R. (2008). Integrated clustering/geostatistical/evolutionary strategies approach for 3D reservoir characterization and assisted history-matching in a complex carbonate reservoir, SACROC unit, Permian Basin. Paper SPE 113978 presented at the 2008 SPE/DOE Improved Oil Recovery Symposium. Tulsa, Oklahoma, U.S.A., April 19-23, 2008.

108

Harris, T.J. (Aug., 1992). Optimal controllers for nonsymmetric and nonquadratic loss functions. Technometrics, American Statistical Association and American Society for Quality, 34(3), 298-306.

Hohn, M.E. (1999). Geostatistics and Petroleum Geology. Boston: Kluwer Academic Publishers.

Jensen, J.L., Lake, L.W., Corbett, P.W.M., and Goggin, D.J. (2000). Statistics for Petroleum Engineers and Geoscientists. 2nd Edition. Boston: Elsevier.

Journel, A.G. (1988). Non-parametric geostatistics for risk and additional sampling assessment. Principles of Environmental Sampling, ed. Larry Keith, American Chemical Society, 45-72.

Journel, A.G. (1989). Fundamentals of geostatistics in five lessons. American Geophysical Union. Washington, D.C.

Kim, Y. (2007). Probabilistic framework-based history matching algorithm utilizing sub-domain delineation and software 'Pro-HMS'. Austin: The University of Texas at Austin.

London, D. and Minc, H. (1972). Eigenvalues of matrices with prescribed entries. Proceedings of the American Mathematical Society, 34 (1), 8-14.

Ma, Y. and Zhao, F. (2004). An improved multivariate loss function. Journal of Systems Sciences and Systems Engineering, 13(3).

Mattax, C.C. and Dalton, R.L. (1990). Reservoir simulation. Journal of Petroleum Technology, 692-695.

Murray, C.J. (1992) Stochastic simulation of hydrocarbon pore volume for risk assessment and economic planning. Paper SPE 25527. Stanford, California, U.S.A., Sept. 8, 1992.

Naimi-Tajdar, R., Han, C., Sepehrnoori, K., Arbogast, T.J., and Miller, M.A. (2006). A fully implicit, compositional, parallel simulator for IOR processes in fractured reservoirs. Paper SPE 100079 presented at the 2006 SPE/DOE Symposium on Improved Oil Recovery. Tulsa, Oklahoma, U.S.A., April 22-26, 2006.

Rukhin, A.L. Estimate loss and admissible loss estimators. Technical Report #85-26. Department of Statistics, Purdue University.

Rukhin, A.L. (Sep., 1988). Loss functions for loss estimation. The Annals of Statistics, Institute of Mathematical Statistics, 16 (3), 1262-1269.

109

Schiozer, D.J. and Aziz, K. (1994). Use of domain decomposition for simultaneous simulation of reservoir and surface facilities. Paper SPE 27876 presented at the Western Regional Meeting. Long Beach, California, U.S.A., March 23-25, 1994.

Sener, I. and Bakiler, C.S. (1989). Basic reservoir engineering and history-match study on the fractured Raman reservoir, Turkey. Paper SPE 17955 presented at the SPE Middle East Oil Technical Conference and Exhibition. Manama, Bahrain, March 11-14, 1989.

Seth, M.S. (1974). A semi-implicit method for simulating reservoir behavior. Paper SPE 4979 presented at the 49th Annual Fall Meeting of the Society of Petroleum Engineers of AIME. Houston, Texas, U.S.A., Oct. 6-9, 1974.

Smith, L.I. (2002). A tutorial on principal components analysis. Cornell University, Ithaca, USA

Srinivasan, S. and Bryant, S. (2004). Integrating dynamic data in reservoir models using a parallel computational approach. Paper SPE 89444 presented at the 2004 SPE/DOE Thirteenth Symposium on Improved Oil Recovery. Tulsa, Oklahoma, U.S.A., April 17-21, 2004.

Srivastave, R.M. (Sep., 1988). An application of geostatistical methods for risk analysis in reservoir management. Paper SPE 20608 presented at the SPE Annual Technical Conference and Exhibition, New Orleans, Louisiana, U.S.A, September 23-26, 1990.

Srivastava, R.M. (1992). Reservoir characterization with probability field simulation. Paper SPE presented at the 67th Annual Technical Conference and Exhibition of the Society of Petroleum Engineers. Washington, D.C., U.S.A., Oct. 4-7, 1992.

Suhr, R. and Batson, R.G. (2001). Constrained multivariate loss function minimization. Quality Engineering, 13 (3), 475-483.

Suresh, S., Sundararajan, N., and Saratchandran P. (2008). Risk-sensitive loss functions for sparse multi-category classification problems. Information Sciences: an International Journal. Vol 178, Issue 12. New York: Elsevier Sicence Inc.

Tran, T.T., Deutsch, C.V., and Xie, Y. (2001). Direct geostatistical simulation with multiscale well, seismic, and production data. Paper SPE 71323 presented at the 2001 SPE Annual Technical Conference and Exhibition. New Orleans, Louisiana, U.S.A., Sept. 30-Oct. 3, 2001.

Wallace, T.D. and Hussain, A. (1969). The use of error components models in combining cross section with time series data. Econometrics, 37 (1), 55-72.

110

Yadav, S. (2006). History matching using face-recognition technique based on principal component analysis. Paper SPE 102148 presented at the 2006 SPE Annual Technical Conference and Exhibition. San Antonio, Texas, U.S.A., Sept. 24-27, 2006.

Yadav, S., Heim, R., Bryant, S., Sinha, R., and May, E. (2007). Optimal region delineation in a reservoir for efficient history matching. Paper SPE 108994 presented at the 2007 SPE Annual Technical Conference and Exhibition. Anaheim, California, U.S.A., Nov. 11-14, 2007.

Yadav, S., Srinivasan, S., Bryant, S.L., and Barrera, A. (2005). History matching using probabilistic approach in a distributed computing environment. Paper SPE 93399 presented at the 2005 SPE Reservoir Simulation Symposium. Houston, Texas, U.S.A., Jan. 31-Feb. 2, 2005.

111

Vita

Donovan Kilmartin, son of Jim and Carol Kilmartin, was born June 11, 1984 in

Silver City, New Mexico. In May 2003, He graduated from Eagle High School, Idaho.

From August 2003 until May 2007, he ran track for University of Texas while earning his

Bachelors in Science in Petroleum Engineer. After graduation, Donovan transitioned

directly into the Petroleum Engineering graduate program. Upon obtaining his Masters in

Science, Donovan will join ExxonMobil in their development planning group.

Permanent address: 2232 Sunny Hills Drive

Austin, TX, 78744

This thesis was typed by the author