Upload
doandien
View
217
Download
4
Embed Size (px)
Citation preview
Development of Reservoir Models using Economic Loss Functions
by
Donovan James Kilmartin, B.Sc
Thesis
Presented to the Faculty of the Graduate School of
The University of Texas at Austin
in Partial Fulfillment
of the Requirements
for the Degree of
Master of Science in Engineering
The University of Texas at Austin
May 2009
Development of Reservoir Models using Economic Loss Functions
Approved by Supervising Committee: Sanjay Srinivasan, Supervisor Larry W. Lake
Dedication
This thesis is dedicated to my loving fiancée Jill and my parents Carol and Jim. Without
their support through the years, none of this would be possible.
v
Acknowledgements
I am extremely grateful to my supervisor Dr. Srinivasan. His valuable feedback
and support have been priceless and have helped this project to continually move
forward. It has been a pleasure working for Dr. Srinivasan, and I am thankful he has
shared in knowledge and experience with me.
May 8, 2009
vi
Abstract
Development of Reservoir Models using Economic Loss Functions
Donovan James Kilmartin, M.S.E
The University of Texas at Austin, 2009
Supervisor: Sanjay Srinivasan
As oil and gas supply decrease, it becomes more important to quantify the
uncertainty associated with reservoir models and implementation of field development
decisions. Various geostatistical methods have assisted in the development of field scale
models of reservoir heterogeneity. Sequential simulation algorithms in geostatistic
require an assessment of local uncertainty in an attribute value at a location followed by
random sampling from the uncertainty distribution to retrieve the simulation value.
Instead of random sampling of an outcome from the uncertainty distrubution, the retrieval
of an optimal simulated value at each location by considering an economic loss function
is demonstrated in this thesis.
By applying a loss function that depicts the economic impact of an over or
underestimation at a location and retrieving the optimal simulated value that minimizes
the expected loss, a map of simulated values can be generated that accounts for the
impact of permeability as it relates to economic loss. Both an asymmetric linear loss
vii
function and a parabolic loss function models are investigated. The end result of this
procedure will be a reservoir realization that exhibits the correct spatial characteristics
(i.e. variogram reproduction) while, at the same time, exhibiting the minimum expected
loss in terms of the parameters used to construct the loss function.
The process detailed in this thesis provides an effective alternative whereby
realizations in the middle of the uncertainty distribution can be directly retrieved by
application of suitable loss functions. An extension of this method is to alter the loss
function (so as to emphasize either under or over estimation), other realizations at the
extremes of the global uncertainty distribution can also be retrieved, thereby eliminating
the necessity for the generation of a large suite of realizations to locate the global
extremes of the uncertainty distribution.
viii
Table of Contents
Table of Contents................................................................................................. viii
List of Tables ...........................................................................................................x
List of Figures ........................................................................................................ xi
Chapter 1: Introduction ............................................................................................1
Chapter 2: Literature Review...................................................................................6 2.1 Sequential Simulation Conditioned to Data..............................................6
2.2.1 Parametric Approach to Sequential Simulation............................8 2.2.2 Non-Parametric Approach to Sequential Simulation..................10
2.3 Loss Function in the Sequential Simulation Framework........................13 2.3.3 Applications of Loss Functions in Various Engineering Fields .16
Chapter 3: Problem Setup ......................................................................................23 3.1 Conditioning Data...................................................................................24 3.2 Sub-Domain Delineation ........................................................................26
Chapter 4: Loss Function Development ...............................................................35 4.1 Loss Function Development for Delineated Sub-Domains ....................35 4.2 Loss Function Optimization....................................................................41
4.2.1 Analytical Solution .....................................................................42 4.2.2 Numerical Solution .....................................................................47
Chapter 5: Implementation of Optimized Loss Function within a Sequential Simulation framework ..................................................................................50 5.1 Implementation within SISIM Algorithm...............................................50 5.2 Loss Function Implementation for Individual Sub-Domains ..............52
5.2.1 Parabolic Loss Function..............................................................56 5.3.1 Sampling realizations within specific NPV ranges.....................60
ix
Chapter 6: Conclusion............................................................................................66
Appendix A: PCA Example...................................................................................71
Appendix B: Analytical Solution for Optimized Loss Function ...........................74
Appendix C: Modification to SISIM Code............................................................78
Appendix D: Code for Implementation Analytical Solution for Optimized Loss Function ........................................................................................................86
Appendix E: Code for Implementation Numerical Solution for Optimized Loss Function ........................................................................................................98
References............................................................................................................107
Vita ..................................................................................................................111
x
List of Tables
Table 1: Loss function model summary. .................................................................41
Table 2: Asymmetric linear loss function models for the four sub-domains...........53
Table 3: Parabolic loss function models. .................................................................56
Table 4: Alterations made to the loss function to sample specific parts of the global NPV distributions. .........................................................................................61
Table 5: Global extreme estimations. ......................................................................64
Table 6: Original data and adjusted data set for PCA example. .............................71
xi
List of Figures
Figure 1: Methodology of thesis...............................................................................5
Figure 2: Asymmetric linear loss function. ............................................................15
Figure 3: Histogram of permeability used for the unconditional indicator simulation........................................................................................................................25
Figure 4: Location of the 100 conditioning data......................................................26
Figure 5: Flow chart PCA for sub-domain delineation...........................................29
Figure 6: Histogram of conditioning data used for generating the suite of realizations for domain delineation. ..................................................................................30
Figure 7: Sample realizations obtained using SISIM conditioned to the available data........................................................................................................................31
Figure 8: Upscaled versions of the sample models in Figure 7. ..............................32
Figure 9: Sub-domain identification by PCA. ........................................................33
Figure 10: Locations selected for computation of loss functions (left) accompanied by sub-domain identification for a reference (right). ..........................................37
Figure 11: Sample nodes from each region showing relation between economic loss and permeability estimation error (region 1 (upper left); region 2 (upper right); region 3 (lower left); region 4 (lower right)). Green dashed lines indicate a window containing 70% of the data, plus and circle represent under and overestimation respectively. ..........................................................................38
Figure 12: Same representative region nodes from Figure 11 fitted with asymmetric linear (black lines) and parabolic loss functions (black curves). ...................40
Figure 13: Linear and quadratic loss function models (average of best fits)...........41
Figure 14: Flow chart for numerical solution. ........................................................48
Figure 15: SISIM algorithm incorporating loss function for optimal estimation. ...52
Figure 16: The average of 50 realizations obtained using SISIM without implementing the loss functions (left), and the sub-domain identification plot (right). .......54
Figure 17: A single permeability realization obtained after implementing the loss functions (left), and the average of 10 realizations obtained after implementing the loss functions. ..........................................................................................54
Figure 18: Single and averaged optimized parabolic loss function realizations......57
Figure 19: Reiteration of sub-domain identification by PCA..................................58
xii
Figure 20: Histogram of NPVs corresponding to i) traditional SISIM realizations (blue); ii) asymmetric linear loss functions (brown); and iii) parabolic loss functions (green). ...........................................................................................59
Figure 21: Base case loss function model (left). For comparison the permeability model for a typical SISIM model is shown (right). ..................................................62
Figure 22: Permeability model corresponding to where the loss function penalizes over-estimation more. For comparison the permeability model for the base case loss function is also shown (right). .......................................................................63
Figure 23: Permeability model corresponding to where the loss function penalizes under-estimation more. For comparison the permeability model for the base case is also shown (right).......................................................................................64
Figure 24: Histogram of case study including global extremes...............................65
1
Chapter 1: Introduction
As the demand for oil continues to increase, so does the demand for new
technology and innovation that can help increase the understanding of petroleum
reservoirs. There are many areas of study that help to quantify and understand the fluid
dynamics, and rock-fluid interactions within the reservoir. This understanding is key to
developing field scale models of reservoir heterogeneity that can be used to assess future
reservoir performance. This research focuses on the development of optimized reservoir
models taking into account the effect of the loss associated with wrong estimation of
permeability in different regions of the reservoir. The regions are defined using a domain
delineation algorithm using principal component analysis (PCA).
Sequential simulation methods for reservoir model development include a variety
of different methods to generate multiple equi-probable models of the reservoir.
Sequential Gaussian simulation (SGSIM) and sequential indicator simulation (SISIM) are
examples of sequential simulation methods. In sequential Gaussian simulation, the
Kriged estimate and variance are identified with the mean and variance of a Gaussian
distribution and the simulated value at a node is obtained by randomly sampling from that
distribution. In other words, the simulated models are realizations of the Gaussian
random function model (Deutsch and Journel, 1998; Alabert et al., 1992; Tran, et al.,
2001; Caers, 2001; Jensen, 2000). In comparison, SISIM uses indicator Kriging to
directly model local conditional distributions that can be non-Gaussian (Journel, 1988)
and simulated values are obtained again by randomly sampling the Kriged distribution.
The realizations generated using both SGSIM and SISIM can be used for a number of
different applications including reservoir flow simulation for assessing flow performance.
2
There are many commercial software programs that are used for reservoir flow
simulation. Flow simulation software are complex and the accuracy of the results are
significantly affected by the numerical procedures used to solve the flow equations, as
well as other modeling decisions such as the spatial discretization scheme employed and
the selection of time step size used for modeling the dynamical system (Fanchi 2001).
Sub-domains specify which regions are the most important in terms of impacting
the well response (Yadav et al., 2005; Yadav, 2006; Smith, 2002). One approach for sub-
domain delineation uses principle component analysis (PCA) of the Hessian matrix that
reflects the variation in flow response due to changes in reservoir properties such as
permeability and porosity. PCA is a procedure that allows grouping of grid nodes into
regions based on the similarity of their influence on the flow response (Smith, 2002).
The optimized spatial distribution of permeability within a delineated domain is
obtained by applying a loss function. Instead of randomly sampling from a local
conditional probability distribution (as in SGSIM or SISIM), an optimal local estimate of
permeability at a location is obtained by minimizing the expected loss. A loss function is
a mathematical function that describes the relationship between estimation error and loss
associated with that error (Journel, 1988; Srivastava, 1990; Ma and Zhao, 2005; Rukhin,
1988; Suresh, 2008). Loss, in this case, is defined as the economic impact of wrongly
estimating the permeability value at a location. The key conjecture is that error in
estimation of permeability at a location translates to errors in prediction of well
response(s) and that translates to an economic loss. Since the sensitivity of flow response
to permeability could be different in different regions of the reservoir, hence the
precursory step of domain delineation. The end result of this procedure will be a reservoir
realization that exhibits the correct spatial characteristics (i.e. variogram reproduction)
while at the same time exhibiting the minimum expected loss in terms of the parameters
3
used to construct the loss function. For example, is the loss function considers the NPV
due to fluid (oil + water + gas) production as well as facilities cost, then the simulated
realization would be such as to exhibit the spatial permeability field that yields the
minimum expected loss or fluid rates that are neither too optimistic nor too pessimistic.
As reservoir simulators become more advanced, they are able to give insight into
future reservoir operations. The integration of past production data to adjust the pressure
response has become a very useful tool (Sener and Bakiler 1989; Culham et al., 1969;
Faroug Ali and Neison, 1970; Yadav 2006; Yadav et al., 2005). This amalgamation of
production data with the simulation is generally referred to as history matching. History
matching makes the simulation more accurate for understanding both current and future
production. The ultimate goal of reservoir and history matching is to develop an accurate
representation of reservoir heterogeneity that can be subsequently used to develop an
optimal strategy for reservoir production. An extension of the proposed methodology
would be to define the loss function in terms of deviation from the true NPV (obtained on
the basis of the available production data). The loss function would then account for the
sensitivity of the model NPV to changes in permeability at different locations in the
reservoir. The reservoir realization corresponding to the minimum expected loss would
be a model that comes the closest to the observed production data. While this application
to history matching is not discussed elaborately in this thesis, several other researches
have developed methodology that utilize the sensitivity matrix (Yadav, 2005; Yadav et
al,. 2007, and Yadav 2006) or Hessian to guide the history matching process. It can be
argued that the Hessian matrix (reflecting the sensitivity of the well response to changes
in model parameters) is a special case of a loss function (one that does not consider any
economic factors).
4
Traditional sequential simulation methodology is based upon the construction of
the local conditional probability distribution function (lcpdf) at each location conditioned
to the surrounding data. The simulated value is obtained by randomly sampling from the
lcpdf. In lieu of such random sampling of an outcome from the lcpdf, the retrieval of an
optimal simulated value at each location by considering an economic loss function is
demonstrated in this thesis. It is conjectured that by applying a loss function that depicts
the economic impact of an over or underestimation at a location, and retrieving the
optimal simulated value that minimizes the expected loss, a map of simulated values can
be generated that is risk neutral. Such a map when processed through a flow simulator
should yield profiles for oil rate and water production rate that are neither overly
optimistic nor overly pessimistic. In the traditional workflow for assessing uncertainty in
reservoir performance, a large suite of realizations of the reservoir model need to be
generated and then processed through the flow simulator. Realizations in the middle of
the uncertainty distribution (signifying risk-neutrality) can then be retrieved. The process
detailed in this thesis provides an effective alternative whereby realizations in the middle
of the uncertainty distribution can be directly retrieved by application of suitable loss
functions. By altering the loss function (so as to emphasize either under or over
estimation), other realizations at the extremes of the global uncertainty distribution can
also be retrieved, thereby eliminating the necessity for the generation of a large suite of
realizations.
5
Figure 1: Methodology of thesis.
The thesis organization follows a linear path, so the information in each section
flows into the next section or chapter’s discussion. First, the basic input parameters and
methodology for delineated the sub-domain will be introduced. Next there is a detailed
discussion on the development of a loss function for the delineated sub-domains. The
chapter details the implementation of both asymmetric and parabolic loss functions
within the sequential simulation framework. The final topic discusses a case study
showing how varying the loss functions can give an indication to the endpoint of the
NPV uncertainty distribution.
6
Chapter 2: Literature Review
Many of the topics involved with this thesis deal with the sequential paradigm.
Therefore, this chapter includes the theory involved in the sequential framework for both
parametric and non-parametric estimators. In addition, some of the strengths and
shortcomings of Gaussian and indicator kriging are shown. Loss functions are introduced
to help exploit the benefits of sequential modeling while avoiding kriging pitfalls. A few
simple optimal estimators for loss function are explores. These examples should help
with the understanding of the theory of how optimal solution can be formulated from loss
functions. The last part of the chapter discusses alternative loss function applications.
Loss functions are not a new concept and have been used in a number of different
engineering applications including but not limited to electrical, quality, mechanical, and
petroleum engineer. Some basic problem and various forms of loss functions from with
these different engineering fields are discussed. This chapter’s main focus is to provide
the theory necessary for comprehension of the material presented throughout this thesis.
2.1 SEQUENTIAL SIMULATION CONDITIONED TO DATA
There are a variety of different geostatistical methods that can be implemented for
generating realizations of the reservoir model. Much of the work related to this thesis
focuses on the framework of sequential simulation. Sequential simulation is a useful tool
to generate multiple realization that honor a set of conditioning data while representing
the spatial connectivity of the attribute being modeled as captured by the spatial
covariance or semivariogram. Sequential simulation is a sequential application of Bayes
rule for the synthesis of a multivariate distribution using conditional and marginal
7
distributions (Casteel, 1997; Journel 1989). Consider, a realization with N data to be
simulated initially conditioned on a set of n known values. The process of constructing
a model for attribute values at the N nodes is tantamount to sampling a realization from
the N-variate joint distribution shown on the left hand side of Equation (2.1). In
Equation (2.1), u denotes the locations and z indicates the outcome of the RV. The joint
distribution characterizes the joint variability of the N random variables.
By the definition of conditional distributions, a joint distribution of N RVs Ai can be
expressed as:
1 2 1 2 1
1 2 1
1 2 1 2 1
1 2 3 1
1
Prob( , ..., ) Prob( | , ,.... )*Prob( , ,..., )
Prob( , ..., ) Prob( | , ,.... )...*Prob( | , ,.... )...
*Prob( )
N N N N
N N
N N N N
N N N
A A A A A A AA A A
A A A A A A AA A A A
A
− −
− −
− −
− − −
=
= (2.1)
Applying this to the previous joint probability distribution of N RVs at spatial locations,
we get:
1 2 1 1 1
1 2 1 1 2 1
1 1 1 1 2 1
1 2 2 1 1 2
( , ,..., ; ,..., | ) ( ; | , ,..., )* ( , ,..., ; , ,..., | )
( ; | , ,..., )* ( ; | , ,... )...( , ,..., ; , ,..., ;| )
( ; | ( 1
N N N N N
N N
N N N N N N
N N
N N
F u u u z z n F u z n z zF u u u z z z n
F u z n z z F u z n z zF u u u z z z n
F u z n N
−
− −
− − − −
− −
=
=
= + − 1 1 1
1 1
))* ( ; ; | ( 2))...( ; | )
N n NF u z z n NF u z n
− − − + − (2.2)
In the implementation of Equation (2.2), the RHS proceeds from the last term to
the first. The conditional distribution at location u1 conditioned to the n data is
8
constructed first. An outcome z1 is sampled from that distribution. That simulated value
along with the n original data is used to construct the conditional distribution at the
location u2. The process of simulating and updating the conditioning data set is continued
along a random path through all the N nodes of the model. The various sequential
methods are different from each other in the approach used to determine the local
conditional distribution lcpdf (Journel, 1988). The lcpdf’s F(u1;z1| n) can be determined
using parametric and non-parametric methods. Below is a list of the steps for the general
sequential algorithm.
Steps in Sequential Simulation
1. Model the lcdf at the first location u1 using prior conditioning data n
2. Sample from z1 from the lcdf
3. Add the sample data to the conditioning data which is now of size n+1
4. Model a new lcdf using the updated conditioning data
5. Draw a sample z2
6. Repeat the process until realization is completely populated
Since the nodes are visited along a random path and the simulated value at a node is
obtained by randomly sampling from the lcpdf at that node, the process yields several
equi-probable realization of the RF represented by the LHS of expression (2).
2.2.1 Parametric Approach to Sequential Simulation
Sequential simulation is easily rendered feasible with the assumption of
multivariate Gaussianity (Deutsch and Journel, 1998; Alabeit et al., 1992; Tran, et al.,
2001; Caers, 2001; Jensen, 2000). Multivariate Gaussianity implies Gaussian local
conditional distributions. This in turn implies that the lcpdf’s and indeed the entire
9
multivariate distribution can be fully determined if the local conditional means and
variance can be accurately calculated. Furthermore, it is known (Deutsch, C.V. and
Journel, 1998) that the conditional mean of a Gaussian conditional distribution can be
expressed as a linear weighted combination of the conditioning values and the
corresponding conditional variance is homoscedastic i.e. independent of the conditioning
data (only a function of the correlations between the data and the unknown). The
equation for a Kriged estimator and Kriged variance are presented below in Equation
(2.3) and (2.4) respectively, where z(u) is the true value, z*(u) is the estimator, λ is the
weight associated with each known value, and C(h) is the covariance model as a function
of lag distance. Consistent with the requirements of the conditional mean of a Gaussian
lcpdf, the Kriging estimate is expressed as a linear combination of the available data and
the estimation variance is independent of the available data. The weights are determined
so that the error variance { }2*( ) ( )E z u z u⎡ ⎤−⎣ ⎦ is minimized (Bohling, 2005).
1
*
1( ) ( )
n i
z u z uα αα
λ+ −
=
= ∑ (2.3)
2 (0) ( )K oC C hα ασ λ= − ∑ (2.4)
In simple Kriging, all of the same parametric assumptions are made and Equation
(2.3) and (2.4) are still valid. However, the estimator can be rearranged into Equation
(2.5) where mo and mα represent the population mean and the mean of previously
simulate nodes respectively. Equation (2.6) provides the system equation used to
determining the weights for the Kriged estimation and variance.
10
* ( )o oz m z mα ααλ− = −∑ (2.5)
( ) ( )oC h C hβ αβ αλ =∑ (2.6)
Using simple Kriging, a localized cumulative density function (lcdf) can be
described completely for sampling the simulated value at a current grid location. For the
first grid node the lcdf is conditioned to the prior data. From there the continually
updating of the data conditioned to both the hard, initial condition data, and the newly
generated simulated values can proceed until an entire realization has been populated.
Since a Gaussian assumption is implicit in the above discussions, the resultant algorithm
is referred to as sequential Gaussian simulation (SGSIM).
There are two main problems with SGSIM. First, the assumption of multivariate
Gaussianiaty may not be true for the particular application. Many natural processes are
known to be not Gaussian (Caers, 2001). Second, Kriging is a variance minimizing
procedure meaning a quadratic form of the error is assumed. However, the assumption of
a quadratic loss (error) function may not be valid for all cases of practical application.
For instance an asymmetric linear loss function or a combination of quadratic and linear
error (loss) function might be necessary to accurately represent the impact of under or
over-estimation of the estimated value at a particular location. Such asymmetric loss
functions are impossible to integrate into a simple or ordinary Kriging framework.
2.2.2 Non-Parametric Approach to Sequential Simulation
The local conditional probability distribution within sequential simulation can
also be estimated following a non-parametric approach. The biggest advantage of using a
non-parametric approach is to allow for sampling from a non-Gaussian distribution
(Deutsch and Journel, 1998; Journel, 1988). Defining an indicator RV as:
11
1, ( )
( , )0
kk
if z u zI u z
otherwise≤⎧
= ⎨⎩
(2.7)
where u is the location and zk represent a particular threshold. A unique feature is
that the expected value of indicator variable at a particular threshold is equal to the
probability the estimate is less than the threshold:
{ } { }( , ) Prob ( ) ( , )k k kE I u z z u z F u z= ≤ =
or { }( , | ) ( ; | )k kF u z n E I u z nα=
Denoting the conditional expectation of the indicator variable as I*(u | n), the
projection theorem (Leunberger, 1968) states that the minimum L2 norm estimator of I*
based on the n indicator data Iα is the linear combination:
*( )I u Iα α
λ= ∑
In simple indicator kriging, the kriged estimate is expressed as an update over the
prior probability distribution F(z):
* *
1
( ; | ) ( ; | ) 1 ( ; ) ( )
( ; ) ( ; )k k k k
k k
I u z n F u z n u z F z
u z I u zα
α
λ
λ
⎡ ⎤= = −⎣ ⎦+
∑∑
(2.8)
The weights are the solution to the system:
( ; ) ( ; ) ( ; )k I k I o ku z C h z C h zβ αβ αλ =∑
Equation (2.8) can be used to describe the lcdf completely once the soft or hard
conditioning data has been indicator coded. The most important difference between the
12
previous Gaussian based approach and the indicator approach is that the task of
constructing the lcpdf at a location has been dissociated from the task of retrieving
sample from it. In the Gaussian approach, the mean of the lcpdf (which is a sample from
that distribution) is retrieved first and subsequently the distribution is constructed. That
sample (the mean) was obtained by minimizing a particular type of loss (error) function –
a symmetric quadratic function (error variance). In the indicator kriging approach, the
distribution is constructed directly. Subsequent retrieval of any sample can be based on
an loss function formalism. It is this important feature of indicator based approaches that
is exploited in this thesis.
The methodology for implementing indicator Kriging in sequential simulation
(SISIM) is the same as discussed previously under SGSIM. Similar to SGSIM Monte
Carlo sampling from the Kriged distribution provides the simulated values at grid nodes.
The simulated values are assimilated within the conditioning data set for the next node
visited along a random path. There are some advantages to SISIM compared to SGSIM.
The first advantage is the elimination of the multivariate Gaussianity assumption.
Second, at each threshold an independent indicator covariance CI(h;zk) can be used to
model heterogeneity or the underlying geological structure, whereas SGSIM is restricted
to one variogram model. The use of multiple variogram models allows for more
sophistication and better representation of the underlying geological model; however it
can be computationally expensive to develop unique variogram particularly as the
number of thresholds increases (Journel, 1988), in contrast too few thresholds can cause
the lcdf to be segmented (Jensen et al. (2000).
13
2.3 LOSS FUNCTION IN THE SEQUENTIAL SIMULATION FRAMEWORK
As mentioned previously, kriging based sequential Gaussian simulation is
predicated on the retrieving an estimate that minimizes a particular (quadratic) form of
loss function. The kriging estimate is obtained such that the estimation variance is
minimized. In order to dissociate the process of constructing the lcpdf from the task of
retrieving an estimate from that distribution, indicator simulation may be a more suitable
alternative.By indicator coding the available data and kriging using the indicator data, the
lcdf representing a general (non-Gaussian) distribution can be synthesized. Subsequently
an optimal value can be retained from the distribution by minimizing a suitable loss
(error) function.
Loss functions tie loss, either economical or physical, to an error in a control
variable. The following definition is taken from Journel (1988). Given an estimated value
u*(x)=u* of a certain phenomena, an estimation error can occur defined as (u*-u(x)),
where u(x) is the true value at x. L(u*-u(x)) is the loss associated with the estimation
error. Assuming that the distribution model, denoted as F(u |(n)), of u(x) is available, the
expected value of the loss can be determined as:
* { ( * ) | ( )} ( * ) ( | ( ))L E L u U n L u u dF u n+∞
−∞
= − = − ⋅∫ (2.9)
The optimal estimate is the value )(* xuL that minimizes the expected loss in
Expression (2.9). The integral in Expression (2.9) is sometimes very difficult to solve and
so a discrete approximation may be employed:
14
'11
{ ( * )| ( )} ( * ) ( | ( ))
( * ) [ ( | ( )) ( | ( ))]Kk k kk
E L u U n L u u dF u n
L u u F u n F u n
+∞
−∞
+=
− = − ⋅
≈ − ⋅ −
∫
∑
For example, in simple or ordinary Kriging, the estimate is obtained as that which
minimizes E{[z(u)-z*(u)]2} or the error variance, i.e. the estimate is obtained as that
minimizes a quadratic loss function : { }2 2( ) ( ) ( )E L e L e dF x∞
−∞
= ∫ .
There are a couple of common cases where loss functions have been optimized
and give interesting and relatively simple results that can us understand how loss
functions operate. The first case assumes the loss follows a quadratic form or L(e)=e2. In
this case, the best estimation for minimizing the expected loss is just the expected value
of the distribution (Journel, 1988). This comes about because of the characteristic
property of the expectation (or conditional expectation) that it corresponds to the
minimum error variance. The corresponding optimal estimate is thus referred to as the E-
type estimate. Linear least squares based estimation and kriging yield E-type estimates
based on the available data (Srivastava, 1999).
A second case to consider is when the loss function is of the form L(e)=|e|. In this
case, the optimal estimate is the median of the distribution. Stated mathematically this is
q0.5(x)=F-1(0.5;x|(n)) such that F(q0.5(x);x|(n))=0.5 (Journel, 1988). A third case arises
when the loss function is assumed to be a constant. For this example, the best estimate is
the mode (Srivastava, 1999). A fourth case arises when the loss function is assumed to be
an asymmetric linear loss function with w1 for the slope of the underestimation and w2
for the slope of the overestimation. In this case, the optimal estimation is 21
1
wwwp+
=
(Journel, 1988). Figure 2 is the graphical representation of an asymmetric linear loss
15
function, and it shows the slope of the loss function for an underestimation. This is
different than the slope of the loss function for overestimations. Although these cases
provide an interesting theoretical solution, in most practical cases the solutions cannot be
solved analytically and must be solved numerically.
L(e)
w2(e)w1(e)
e=u*‐u(x) OverestimationUnderestimation
Figure 2: Asymmetric linear loss function.
For an asymmetric linear loss function the expected loss is:
*
*
* *1 1{ ( )} ( ) ( ) ( ) ( )
X
X XX
E L e w X X dF X w X X dF X∞
−∞
= − + −∫ ∫
Each term is separated for integration purposes:
* *
* *
* *1 1 2 2{ ( )} ( ) ( ) ( ) ( )
X X
X X X XX X
E L e w X dF X w XdF X w XdF X w X dF X∞ ∞
−∞ −∞
= − + −∫ ∫ ∫ ∫
Rearranging the terms and using the second fundamental theorem of calculus reduces to:
16
* *
* *
*
* *
*1 1
*2 2
* *1 1 2
* *2 2 2
{ ( )} ( ) ( )
1 ( ) 1 ( )
{ ( )} ( ) ( )
( ) ( )
X X
X X
X X
X X
X
X X
X X
X X
E L e w X dF X w XdF X
w XdF X w X dF X
E L e w X F X w XdF X w
w XdF X w X w X dF X
−∞ −∞
∞ ∞
−∞
∞ ∞
= −
⎡ ⎤ ⎡ ⎤+ − − −⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦
= − +
− − +
∫ ∫
∫ ∫
∫
∫ ∫
Taking the derivative of the expected loss gives:
* * * * *
1 1 1*
* * * * *2 2 2 2
{ ( )} ( ) ( ) ( )
( ) ( ) ( )
X X X
X X X
dE L e w F X w X f X w X f XdX
w X f X w w F X w X f X
= + −
− − + +
With simplification this becomes and solving for the optimal estimate gives:
* *
1 2 2*
{ ( )} ( ) ( ) 0X XdE L e w F X w F X w
dX= + − =
1 *2
2 1
wF Xw w
− ⎛ ⎞=⎜ ⎟+⎝ ⎠
(2.10)
If w1=w2 the loss function will represent a symmetric. Using Equation (2.10) the
optimal value is determined to be the median.
1 1 12 2
2 1 2 2
12
w wF F F Medianw w w w
− − −⎛ ⎞ ⎛ ⎞ ⎛ ⎞= = =⎜ ⎟ ⎜ ⎟ ⎜ ⎟+ + ⎝ ⎠⎝ ⎠ ⎝ ⎠
2.3.3 Applications of Loss Functions in Various Engineering Fields
This section is not intended as in-depth discussion on the application of loss
function to non-petroleum related fields, but instead to give the reader an exposure to the
17
various forms of loss functions in actual applications and the general types of problems to
which they are applied.
One application area is pattern classification in electrical engineering. In this
field, multi-classification problems are often solved with loss functions (Suresh et al.,
2008). A multi-classification problem deals with accurately identifying the correct
classification for an observed pattern (Suresh et al., 2008). The authors demonstrate that
the most robust classification results when the loss function is modeled as Rn≥ lβ+ε1+ε2,
where lβ is the loss function, ε1 is the approximation error (due to the fact that the
estimation is done using a finite sample set), and ε2 is the estimation error (i.e. the
deviation of the estimates from the “true” values). The loss function is written as:
( )∑ ≠=
=C
jjj jjX XcPmEl *,1)|(β (2.11)
The mj term is the risk factor term for a particular classifier, and P(cj|X) is the
posterior probability of class j given X. The approximation and estimation errors are
given as:
)),,((11
1s
isif
N
if YWXNL
N ∑=
=ε
)],(),([ ****2 WXNWXNE s
ff −=ε
siX =Given Sample
fN =Classifier based on X and weights W
fL =Deviation between the predicted class and the actual class
W=Weight Parameter
18
Risk is introduced in Equation (2.11) as mj, but if the class label k is introduced
risk factor matrix mkj can be calculated as:
CjXcP
XcPXcPN
mjN
isik
sij
sik
j
jkj ,....2,1
)|(ˆ)|(ˆ)|(ˆ
1=
+
+= ∑
= ε
β (2.12)
)|(ˆ sik XcP =Posterior Probability of class k given Xi
jβ =Cost of misclassification
jN =Number of training samples
ε=Error term
kjm =Risk factor matrix
The numerator in Equation (2.12) is an example of cross-entropy i.e. the entropy
of class j compared to class label k, In order to determine the optimal classes, the
expected value of Rn needs to be minimized.
A second example on the application of loss function is from manufacture
engineering. Loss functions are commonly used to solve multi-variable control
optimization problems. This class of problem deals with quality control in manufacture
engineering. Quality control problems deal with setting design variables to achieve an
optimal compromise of the response variables (Ma and Zhao, 2004). Often, quality
control deals with N-type, L-type, and S-type tolerance criteria (Suhr and Batson, 2001;
Ma and Zhao, 2004). N-type stands for normal is the best, and an example would be the
design of a car door, which cannot be too large or too small. If it is too large, the door
will not shut, and if the door is too small, it will not latch. S-type stands for the smaller
the better. An example of this type of problem would be vehicle wear - smaller the wear,
the better the design. The third type of quality control criteria are the L-type where the
19
larger the response variable the better. An example would be fuel efficiency; the more
miles per gallon a vehicle could get the better.
Most quality characteristics are of N-type (Ma and Zhao, 2004). In
manufacturing, loss functions are connected to economic loss, and that has been found to
be proportional to the square of the error (Ma and Zhao, 2004; Artiles-Leon 1996). This
leads to a quadratic form of the loss functions proposed by Taguchi (Taguchi, 1990) and
confirmed in Artiles-Leon (1996), similar to the E-type equation described earlier:
2)()( TYkYLoss −=
22
⎟⎟⎠
⎞⎜⎜⎝
⎛
−=
LSLUSLk
In the above equation, k is the quality loss coefficient, T is the target value for
design, and Y is the quality characteristic, USL and LSL are the upper specification and
lower specification limits respectively.
One advantage of Artiles-Leon equation is that it has a normalized form, and
therefore allows the equation to compare a wide range of quality control problems on the
same scale. The next step is the combination of all of the quality characteristics, or the
control variables. This is simply done by extending the loss function definition:
2
2 2
( )( ( ), ) 4
( ) ( )...4 4
n i ii N
i i
n ni i i ii L i S
i i i i
Y X tL Y X XUSL LSL
Y X t Y X tUSL LSL USL LSL
∈
∈ ∈
⎛ ⎞−= +⎜ ⎟−⎝ ⎠
⎛ ⎞ ⎛ ⎞− −+⎜ ⎟ ⎜ ⎟− −⎝ ⎠ ⎝ ⎠
∑
∑ ∑
The above equation encompasses all of the control variables and the three
different types of quality control problems. However, this equation can only be used if
20
the response variables are independent of each other (Artiles-Leon, 1996). There are
often a variety of constraints that can add to the complexity of the problem. Different
types of constraints are structural, distribution, mechanical, cost, tolerance, and
specification (Suhr and Batson, 2001). Structural constraints relate to how the quality
characteristics interact with the control factors. Distribution constraints relate to how the
quality characteristics are limited by mean and standard deviation of their respective
distributions. When a control problem is limited by physical and chemical limitations,
then a mechanical constraint is present. Economic limitations often take the form of cost
constraints. Tolerance and specification constraints are normally the limitations due to
the customer’s specification, and lastly capability constraints are related to the acceptable
process capabilities indices.
Depending on the situation and the facets of the loss function, different
constraints may or may not apply. Often, different constraints overlap. For instance,
there may be physical constraints, but before the physical constraints can be met, a cost
constraint is discovered. Therefore, both constraints exist, but only one constraint can be
the limiting factor. An example of overlapping constraints is the possibility of a new
hybrid vehicle that can get 1000 mpg. Although a car can be developed with this type of
fuel efficiency, it does not really matter; the car will cost a million dollars to
manufacture. Therefore, cost constraint is reached before a physical limitation.
The third application of loss function is in the field of petroleum engineering. In
Srivastava (1990), geostatistical methods were used to model injected pore volumes of
solvent. Indicator simulation was used to make 500 initial realizations that were
conditioned using existing well information (Srivastava, 1990). From these 500
realizations, random walkers were used to map the connected pore volume between well
pairs. A random walker chooses a random path to get from point A to point B. The
21
trajectories of the random walkers and the length distribution of their paths were
recorded. The walkers were also required to have a certain net to gross ratio. The
random walkers’ paths were used to create pore volume distribution, and the number of
random walk trajectories at a location also gave a probabilistic estimate of that location
being a part of a connected pore volume (Srivastava, 1990). The pore volume
distribution could subsequently be used in conjunction with a loss function.
The distribution was used to build an economic model. The mean of the
distribution, which would be the arithmetic average of the pore volume realizations, was
used as a base case for determining the net present value (NPV). The full probability
distribution of connected pore volumes provides the probability the solvent volume will
deviate from the mean, affectively allowing the user to assign probabilities to the NPV
values. The error in pore volume (expressed as deviation from the mean) was then
plotted against the corresponding change in NPV. This represents the loss function. The
loss function assumed for this example was a linear asymmetric loss function and is
shown in Figure 2. The w1, for this example, was determined to be the cost of
underestimation, which was equal to the cost of the lost production minus the cost of the
solvent saved from not injecting (Srivastava, 1990). The slope, or w2, for the
overestimation was just the cost of the additional solvent, because the maximum amount
of oil would be produced but the extra solvent would be a waste (Srivastava, 1990).
As discussed before for linear loss functions, the minimum loss occurs
when21
1
wwwp+
= . This equation can be rewritten if r=Price Oil/Price Solvent as:
1
1 2
$ $ *($ ) $($ $ ) $ *($ )
w oil solvent r solvent solventpw w oil solvent solvent r solvent
− −= = =
+ − +
1rpr−
= (2.13)
22
Using Equation (2.13), different ratios of oil to solvent price r give insight into the
optimal pore volume estimation. If the solvent price equals the oil price, the minimum
loss would be at the zero quantile, and the smallest pore volume would be used.
However, if the oil price increased and solvent stayed constant, the minimum loss would
occur when the largest pore volume was used.
In this thesis, the concept of loss function is applied to synthesize reservoir
models. In a manner similar to the initial discussion on multi-classification, an economic
loss function is used to simulate (classify) an outcome of permeability at a location in the
reservoir. By visiting the simulation domain on a random path and repeating the process
of drawing an outcome of permeability from the local conditional distribution (lcpdf) at
that location using the loss function , it is demonstrated that a suite of reservoir models
can be developed that are risk-neutral towards the particular economic objective of the
modeling process.
23
Chapter 3: Problem Setup
A suite of equi-probable realization is used to identify regions within the reservoir
exhibiting similar well responses late in the field’s production life. A pressure covariance
matrix is developed to represent the similarity each grid node has with every other grid
node. By solving and retaining important Eigenvalues, a map of the retained pressure
vectors can be recreated, and the vectors affectively identify delineated sub-domains
sharing common heterogeneity.
Once the significant grid nodes are identified using PCA, they are be designated a
domain index based on their regional relationship. The index is used to identify which
loss function is appropriate for generating estimation. If the domain index indicates the
current grid node is within a delineated sub-domain the estimation will be preformed by
minimizing the expected loss; however, if the domain index indicated the current node is
outside of the delineated sub-domains the node will be sampled from the kriged
distribution. Since, the estimation is performed on a grid scale, it is important the sub-
domain delineation is likewise conducted on a grid scale.
The steps for calculating and implementing an optimized loss function for
reservoir modeling is demonstrated first through a synthetic example. The example setup
includes steps for creating conditioning data, generating multiple realizations, and
delineating individual region using PCA is discussed first.
24
3.1 CONDITIONING DATA
The conditioning data provides a common underlying structure to each individual
realization. Sequential indicator realization (SISIM) is used for generating multiple equi-
probable realizations conditioned to the available data. In the following example,
permeability data was assumed as the conditioning information, since it is expected that
the flow response (and hence economic NPV) will be significantly affected by the spatial
variation in permeability in various regions of the reservoir.
The conditioning data for this project is synthesized. An unconditional SISIM
was created on a 100 by 100 grid. The histogram used for an unconditional SISIM is
shown in Figure 3. As can be seen, the data is truncated at 0 and 600md. The thresholds
used to model this histogram were 108.5, 141, 199.5, 277.5, and 381.5 with associated
cdf probabilities of 0.11, 0.244, 0.499, 0.744, and 0.902. The variogram is modeled as
isotropic and replicated for each threshold. The permeability distribution is chosen to be
model as a log-normal distribution because permeability is often log-normal (Jansen et al.
2002). Once an initial realization is generated, 100 locations were chosen at random.
These locations and associated permeability values become the conditioning data. Below
in Figure 4 is the visual representation of the conditioning data used throughout this
thesis.
25
Conditioning Data Histogram
0
5
10
15
20
25
30
35
40
4547 83 119
154
190
225
261
297
332
368
404
439
475
510
546
582
617
653
688
724
760
795
831
866
902
More
Bin
Frequency
‐
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Cumulative Probability
Figure 3: Histogram of permeability used for the unconditional indicator simulation.
26
Figure 4: Location of the 100 conditioning data.
As a check, all of the subsequent realizations should have these matching
permeability values at the condition data locations. In addition, the synthetic geological
features should be preserved in each realization meaning the upper regions of the
reservoir should contain predominately higher permeability values than the lower
regions.
3.2 SUB-DOMAIN DELINEATION
The basic premise of this work is that realizations of permeability that are better
suited to address specific economic objectives can be constructed by using a economic
27
loss function to sample from local probability distributions (rather than Monte Carlo
sampling that is currently done). It is to be recognized, though, permeability in different
regions of the reservoir may influence the well responses (and hence economic NPV)
differently. Recognizing this important issue, specific reservoir regions with common
response characteristic were modeled separately. Loss functions were computed for these
regions and later used for indicator simulation of specific reservoir regions. Principal
component analysis (PCA) can help with region identification within the reservoir model.
The procedure implemented in Yadav (2005) was applied for domain delineation. A suite
of indicator simulation models were generated using the available conditioning data.
These models were processed through the simulator in order to compute the spatial
distribution of grid node pressures. The covariance matrix of grid node pressures was
computed.
The form of the covariance matrix of a model with three nodes can be found in
Equation (3.1) below. The diagonal of the covariance matrix represents the variance of
the pressure at a particular grid node computed over the suite of models. The covariance
matrix is a square matrix with n rows and n columns (Jensen et al., 2000; Smith, 2002),
where n represents the number of grid nodes in the model. Arguably, the most important
feature of the covariance matrix is that it is a semi-positive definite (Wallace and
Hussain, 1969). This means the Eigenvalues are real and distinct.
⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛=
),cov(),cov(),cov(),cov(),cov(),cov(),cov(),cov(),cov(
zzyyxzzyyyxyzxyxxx
C
(3.1)
Eigenvalues are a scale determinant of the Eigenvector, and are the solution to the
characteristic equation of the matrix being solved (Smith, 2002). The Eigenvalues are
sorted in the order of decreasing magnitude. The sum of the Eigenvalues is related to the
28
variance of the data and so a limited number of Eigenvalues and the corresponding
Eigenvectors are retained based on a variance cut-off. The Eigenvectors with the largest
Eigenvalues correspond to the dimensions that have the strongest correlation in the data
set. In other words, the procedure yields a grouping of nodes where pressure exhibits a
strong degree of correlation.
Starting from n nodes of presssure P, the covariance matrix is obtained as the
product TP P which is of the order n n× . PCA yields a reduced set n k× of Eigenvectors
i.e. a set of k Eigenvectors at each pressure node. The k principal components of pressure
summarizing the variability of pressures over the entire reservoir are obtained by
multiplying the transpose of the Eigenvector matrix with the original data vector P. Thus
each principal component can be construed as a weighted linear combination of the
original data with the weights (loadings) being the Eigenvector. Going back to each node,
there are k-Eigenvectors at each node. The maximum value of the k-Eigenvectors at each
node is identified and the rank of the Eigenvalue corresponding to that maximum value is
marked as the domain index at that node. All pressure nodes that exhibit the same domain
index constitute a grouping that exhibits similar pressure characteristics within the
reservoir (Yadav, 2006). In order to restrict the number of identified domains, a volume-
cutoff is applied. The volume-cutoff simply stipulates that only a certain volume fraction
of the reservoir will be covered by the identified domains. The remaining nodes that
contain domain indices that are low remain ungrouped implying that these nodes do not
exhibit any specific pressure signature that reflects any systematic response due to
underlying heterogeneity. An outline of the procedures for PCA is provided in Figure 5.
29
Step 3: Domain indices are identified by using maximum of the k eigenvalues.
Step 4: Apply a volume‐cutoff to limit the number of identified domains
Step 1: Determine the covariance matrix of the order nxn by
performing PTP where P is the pressure matrix
Step 2: Obtain k principal components of pressure by multiplying eigenvector matrix by P
Figure 5: Flow chart PCA for sub-domain delineation.
Since the procedure described above requires the pressure fields corresponding to
a large suite of realizations, significant savings in cpu cost result if upscaled flow
simulations are performed. In addition, the volume threshold used for delineating
domains regulates the number of grid nodes covered by the identified sub-domains. The
threshold determines interaction between regions (Yadav et al., 2005). Thus, if the
threshold is too large, there is too much interaction, and if the threshold is too small, the
reservoir volume within sub-domains is very small, implying that the estimates retained
from the local conditional distributions for a large number of nodes will utilize the same
30
loss function. This could render the simulated realizations sub-optimal. There needs to be
some trial and error for choosing a specific threshold.
In order to develop the covariance matrix that is central to PCA, an ensemble of
equi-probable realizations is generated. Indicator simulation (SISIM) was again used to
develop multiple realizations except the histogram used in these new set of simulations
use only the conditioning data in Figure 4. The corresponding histogram is shown in
Figure 6. Fifty realizations are generated for development of the covariance matrix, and
the first five are shown in Figure 7. The scale is from 600 md (red) to 100 md (blue).
Figure 6: Histogram of conditioning data used for generating the suite of realizations for domain delineation.
31
Millidarcies (m
d)
Figure 7: Sample realizations obtained using SISIM conditioned to the available data.
Notice the condition data and underlying geological features are preserved in all
five models. The covariance matrix is generated using the flow response, but prior to that
the 50 models were up-scaled to conserve computational expense. The grid is scaled
from a 100 by 100 grid to a 10 by 10 grid. In this up-scaling process some of the
resolution is inevitable lost. Below in Figure 8 are the up-scaled versions of the same
five realizations presented in Figure 7 above.
32
Figure 8: Upscaled versions of the sample models in Figure 7.
The five upscaled models exhibit the same spatial trend of permeability as the fine
scale models. The flow simulation model used for obtaining the flow response assumes a
basic black oil reservoir. As shown in Figure 9, three production wells are located at (30,
30), (50, 50), and (70, 70) while one injector is located at (90, 90). Once the flow
simulations are completed, a covariance matrix is calculated using the pressure at nodes
in the upscaled grid calculated at 1,260 days (at a mature stage during the water injection
process).
Since the covariance matrix of pressure is assured to be semi-positive definite
matix, real distinct Eigenvalues result ((Wallace and Hussain, 1969). The top 200
Eigenvalues (corresponding to a variance cut-off of 60%) are retained for sub-domain
delineation. The number of domains identified was further reduced by applying a 40%
volume cut-off. The corresponding 4 domains identified are shown in Figure 9. The
33
nodes in the un-shaded part of the reservoir constitute a fifth region comprising of nodes
that do not a have systematic impact on the response at wells. The final step is to down-
scale the identified regions into the original 100 by 100 grid. All fine scale nodes falling
within a coarse scale domain are assigned to that domain.
Well P3 (30, 30)
Well P2 (50, 50)
Well P1 (70, 70)
Well I1 (90, 90)
Figure 9: Sub-domain identification by PCA.
There are a few important features about Figure 9. First, most of regions are
located near to well bores. Since a large portion of the pressure drop occurs near the well
bore, it is reasonable for the nodes relatively close to the well bore to be considered
significant. Second, region 4 and region 3 map mostly the high permeability regions
located in the upper portion of the models. These two regions are likely in pressure
communication due to their high permeability values. The domains identified in Figure 9
will be used for subsequent discussion throughout this thesis. Now that the sub-domains
34
have been identified the loss function for each region can be established and that is
discussed in the next chapter.
35
Chapter 4: Loss Function Development
The loss function must map inaccurate permeability estimates to corresponding
economic loss when the inaccurate model is used to plan facilities and operations for a
water flood. In addition, the loss function must be applied on an individual grid node
basis since the SISIM simulations are performed on a grid. In order to accomplish that
objective, the calculation of the loss function will be performed within sub-domains, with
all the grid nodes comprising the sub-domain sharing the same loss function.
Before discussing the development of the loss function, the base case economic
model is described. Since a water flood scenario is considered, cost of purchasing and
processing water is factored into the NPV. The available produced water is assumed re-
injected, and only processing cost is associated with that water volume. However, when
there is not enough water for a specified injection rate, additional water is assumed
purchased, and that cost is added. Therefore, NPV considers both water and oil handling
cost and the revenue from the sale of oil.
This chapter has a number of different functions including (1) to show how
permeability estimation error is mapped to NPV loss, (2) propose a numerical method for
estimating the optimal estimated loss and analytical fit to the loss function.
4.1 LOSS FUNCTION DEVELOPMENT FOR DELINEATED SUB-DOMAINS
Consider X* as an estimation of the RV X, and the error (E) defined as E=X*-X.
Since the true variable X is inaccessible, E is a RV. If the error is positive, X* is an
overestimation of X, and if the error is negative, X* is an underestimation of X. The
36
impact of the an incorrect estimation X* can be related to a loss L(e). For this thesis the
attribute mapped by X is permeability, and L(e) is in terms of NPV.
For each grid node, a loss function can be determined based on an ensemble of 50
realizations generated previously. A permeability value can be sampled from the lcpdf
constructed using indicator kriging at a particular grid node using the loss function for a particular sub-domain. Denoting the simulated grid node permeability value to be ( )*l
nX ,
where l is the unique permeability realization from which that value is retained and n is
the grid node. In the previous expression for E, X denotes the “true” value at node n.
This is unknown. It can be assumed that the “true” value could be the permeability value
in any of the remaining L-1 realizations at the grid location n. Keeping the grid node estimation ( )*l
nX constant, subtracting the possible “true” value from the other 49
realization produces a set of 49 error values (e).
The 50 permeability realizations can be processed though a flow simulator in
order to obtain the well responses. Those in turn can be input for the economic module in
order to obtain the corresponding NPVs. A set of losses can therefore be defined as:
( ) ( )*1( ) cln nL e NPV X NPV X= − , where lc are all the other realizations other than l.
Once the error and the corresponding loss are determined, a function can be fit to
best describe the relationship between L(e) and estimation error (e). However, this is
only one node from n possible gird nodes within a particular region. In order to test the
representativeness of the constructed loss function, seven additional grid nodes are
sampled randomly within each region. The locations are chosen to best represent each
sub-domain and are shown in Figure 10 accompanied by a sub-domain map for reference.
37
Well P3 (30, 30)
Well P2 (50, 50)
Well P1 (70, 70)
Well I1 (90, 90)
Figure 10: Locations selected for computation of loss functions (left) accompanied by sub-domain identification for a reference (right).
Two factors that determine the characteristics of the loss function are node
sampling location (affecting the outcomes of e of the RV E) and the realization chosen for ( )*l
nX . Selecting a node from close to the edge of a sub-domain would not be
representative because the values at grid nodes outside the region influence the kriged
distribution. It is more efficient to sample location nodes well within the borders of each
region and that are spatially separated as much as possible.
The first factor influencing the development of the loss function is the set of
outcomes e, of the RV E, used for constructing the loss function. The “true” values are
unknown and are hence identified with the values in the simulated realizations. Second, the retained estimate ( )*l
nX also controls the error values. If ( )*lnX is closer to the mean
value of permeability expected in that reservoir region, then the resultant loss function
can be deemed risk neutral i.e. the resultant permeability model has values that balance
the risk of underestimation with that of over-estimation.. If instead the loss function is calculated using a low value of ( )*l
nX , then implicitly the resultant loss function is such
that cost associated with over-estimation of permeability is deemed less that the cost
38
associated with under-estimation. The opposite is true if a high value is retained for
( )*lnX in the calculation of the error e, It is also to be emphasized that this selection of
( )*lnX is critical only for the calculation of the loss function. After the loss function is
available, optimal estimate X* of permeability at a location in the simulation grid will be
obtained by using the loss function in conjunction with the probability distribution
describing the lcpdf at a location. Figure 11 shows the relationship between the
estimation error and economic loss for representative nodes from within each sub-
domain.
Figure 11: Sample nodes from each region showing relation between economic loss and permeability estimation error (region 1 (upper left); region 2 (upper right); region 3 (lower left); region 4 (lower right)). Green dashed lines indicate a window containing 70% of the data, plus and circle represent under and overestimation respectively.
39
There are number of reasons for the large amount of spread in the plots shown in
Figure 11. First, the sub-domains are identified according pressure correlation, however
these plots link permeability to NPV. Second, NPV is non-linear. Using permeability
estimation error to correlate to NPV might not be sufficient. Third, the sub-domains are
identified from the upscaled pressure relationships, but the plots in Figure 11 are
developed on a grid scale. However, the relationship between economic loss and
permeability estimation error is sufficient for the scope of this thesis.
Once the estimation error is plotted against the associated NPV loss, two different
types of loss function were fitted for each plot (Figure 12). The first was an asymmetric
linear loss function, and the second was a parabolic loss function. Since kriging is based
on the minimization of error variance, that implies a quadratic loss function. The
parabolic equation differs from kriging because it contains the linear term in addition to
the quadratic term.
40
Figure 12: Same representative region nodes from Figure 11 fitted with asymmetric linear (black lines) and parabolic loss functions (black curves).
An average of the eight nodes within each region was used to define the loss
function models used for developing the risk neutral realizations. Figure 13 shows the
four sub-domain loss functions (average of the best fits at the eight nodes) for both the
asymmetric linear and parabolic models. Region 1 and region 2 have the steepest slopes
indicating significant loss associated with incorrect estimations. Regions 3 and region 4
exhibit similar loss in both the asymmetric linear and parabolic models. The slope of the
loss function in regions 3 and region 4 are shallower indicating NPV is not as heavily
affected by permeability changes as in regions 1 and 2. Looking at the delineated
domains, it can be observed that there are no producing wells located in regions 3 or 4.
Since the absolute values of the slopes of the linear loss function for region 1 are
41
approximately the same, the optimal solution will be approximately the median. A
summary of the various regional loss function models is shown in Table 1.
Figure 13: Linear and quadratic loss function models (average of best fits).
Table 1: Loss function model summary.
4.2 LOSS FUNCTION OPTIMIZATION
Once the loss functions are established, the next step is to find the optimal
estimate of permeability. The optimal estimate, as has been discussed earlier in Chapter
42
2, is one that minimizes the expected loss. In the cases presented below, since the loss functions have been developed with the error of estimation e corresponding to ( )*l
nX in
the middle of the distribution, the corresponding optimal estimates using such loss
function will also be somewhere in the middle of the lcpdf. There are two ways to arrive
at the optimal estimate and these are described next.
4.2.1 Analytical Solution
Since a linear loss function is a reduced form a parabolic function, finding a
generalized solution for the polynomial model will also suffice for the asymmetric linear
loss function previously determined. The first step is to take the expected value of the
loss function. The general form of the expected loss is:
{ ( )} ( ) ( )EE L e L e F e de∞
−∞
= ∫
Since e=X*-X and X is a RV, the above reduces to:
* *{ ( )} { ( )} ( ) ( )xE L X X E L e L X X F X dx∞
−∞
− = = −∫
Inserting a general parabolic expression for the loss function yields:
43
*
* ** 2 *
** 2
*
*
* *
{ ( )} ( ) ( ) ( ) ( )
( ) ' ( ) ( )
' ( ) ( ) ' ( )
X X
x x
X
x xX
x xX X
E L e a X X dF x b X X dF x
e dF x a X X dF x
b X X dF x e dF x
−∞ −∞
∞
−∞
∞ ∞
= − + − +
+ − +
− +
∫ ∫
∫ ∫
∫ ∫
(4.1)
In Equation (4.1) a, b, and e are the coefficients associated with the quadratic, linear, and
constant underestimation respectively. Similarly, a’, b’, and c’ denote the coefficients of
an overestimation. For simplicity, each of the six integrals above is evaluated separately
and recombined for the final result. Therefore, each of the expected loss for each integral
is defined as: *
* 21
**
2
*
3
* 24
*
*5
*
6*
{ ( )} ( ) ( )
{ ( )} ( ) ( )
{ ( )} ( )
{ ( )} ' ( ) ( )
{ ( )} ' ( ) ( )
{ ( )} ' ( )
X
x
X
x
X
x
xX
xX
xX
E L e a X X dF x
E L e X X dF x
E L e e dF x
E L e a X X dF x
E L e b X X dF x
E L e e dF x
−∞
−∞
−∞
∞
∞
∞
= −
= −
=
= −
= −
=
∫
∫
∫
∫
∫
∫
Multiplying out the exponent and pulling the X* out of the integrals, E{L(e1)} can be
rearranged to:
( )* *
2* * * 21{ ( )} ( ) 2 ( ) ( )
X X
x xE L e a X F X aX XdF x a X dF x−∞ −∞
= − +∫ ∫
44
Taking the derivative with respects to X* and simplifying gives:
( )*
* *1* { ( )} 2 ( ) 2 ( )
X
xd E L e a X F X a XdF x
dX −∞
= + ∫
The same process is repeated for the quadratic term of the overestimation
E{L(e4)}. The second expression below the bounds have been switched using the second
fundamental thermo of calculus and the X* has been removed from the integrals.
( )
* 24
** *
22 * * *4
{ ( )} ' ( ) ( )
{ ( )} ' ( ) 2 ' ( ) ' 1 ( )
xXX X
x x
E L e a X X dF x
E L e a X dF x a X XdF x a X F X
∞
∞ ∞
= −
⎡ ⎤= + + −⎣ ⎦
∫
∫ ∫
Taking the derivative of the above expression and simplifying yields:
* *
4**
{ ( )} 2 ' ( ) 2 ' 1 ( )xX
d E L e a XdF x a X F XdX
∞
⎡ ⎤= + −⎣ ⎦∫
This same process is applied to the under and overestimation linear and constant terms.
The simplified derivatives are:
*
2* { ( )} ( )Xd E L e bF X
dX=
*5* { ( )} ' 1 ( )X
d E L e b F XdX
⎡ ⎤= −⎣ ⎦
*3* { ( )} ( )x
d E L e ef XdX
=
*6* { ( )} ' ( )x
d E L e e f XdX
= −
When the derivates are combined and simplified the expression is:
45
* * *
*
* *
{ ( )} ( )[ '] '
( )[ '] ' ( )[ ']
X
X X X x
d E L e X F X a a a XdX
a a F X b b b f X e eμ μ< >
= − +
− − + + − + −
By inserting the optimal estimate X*optimal in the above equation the final expression is:
* * *
*
*
0 ( )[ '] '
( )[ ']
' ( )[ ']
optimal X optimal optimal
X X X optimal
x optimal
X F X a a a X
a a F X b b
b f X e e
μ μ< >
= − +
− − + +
− + −
(4.2)
In this expression µX> and µX> are the means of the truncated distribution below
and above the X*. The value of X* is determine so that Equation (4.2) is equal to zero.
This X* will give the minimum expected loss. A complete derivation is presented in the
Appendix.
The last expression for X*optimal can not be solved analytically until simple
distribution types are pre-supposed for the lcpdf. There are three checks that can be
performed on the expression for X*optimal:
1. The solution must return the global mean as the optimal estimate if the linear and
constant coefficients are assumed to be zero and if the coefficients of the
quadratic terms are equal. The characteristic property of the expectation is the
global mean minimized the error variance. i.e a quadratic loss functions
46
* * **
* *
**
**
*
*
:' ' 0 '
{ ( )} ( )[ '] '
( )[ '] ' ( )[ '] ,
{ ( )}
{ ( )} 0
( )
X
X X X x
X X
X X
X X
X
For a quadratic loss functionb b e e and a a then
d E L err X F X a a a XdX
a a F X b b b f X e e becomesd E L e aX a a
dXd E L e aX a a
dXaX a
X
μ μ
μ μ
μ μ
μ μ
μ
< >
< >
< >
< >
<
= = = = =
= − +
− − + + − + −
= − −
= = − −
= +
= Xμ μ>+ =
2. If the linear coefficient remains, the optimal estimate should be the median as
seen before.
* * **
* *
**
*
* 1
:' ' '
{ ( )} ( )[ '] '
( )[ '] ' ( )[ ']
{ ( )} 0 ( )[ '] '
( )'
''
X
X X X x
X
X
For a symetric linear loss functiona a e e and b b then
d E L err X F X a a a XdX
a a F X b b b f X e ed E L err F X b b b
dXb F X
b bbX F if b b then X
b b
μ μ< >
−
= = = =
= − +
− − + + − + −
= = + −
⎛ ⎞ =⎜ ⎟+⎝ ⎠⎛ ⎞= =⎜ ⎟+⎝ ⎠
( )* 1 .5 i.e. *F X Median−= =
3. The third check is if only the constant term is considered in the loss function, then
the optimal estimate should be the mode. The loss function is such that for zero
error, the loss is zero but for any finite valued error, the loss quickly climbs to a
constant value. This implies that:
47
* * **
* *
**
* *
* *
' ' 0 '
{ ( )} ( )[ '] '
( )[ '] ' ( )[ ']
{ ( )} 0 ( )[ ']
( ) ( ) mode
X
X X X x
x
x xX X X X
If a a b b and e e thend E L err X F X a a a X
dXa a F X b b b f X e e
d E L err f X e edX
f X f X which implies the is optimal estimateLim Lim
μ μ< >
→ − → +
= = = = =
= − +
− − + + − + −
= = −
=
The analytical formula for the optimal estimate passes all three checks. However,
in a number of instances, the loss function may not have the congenial form assumed
here. In those cases, it may be necessary to calculate the optimal estimate numerically. A
numerical solution procedure is discussed next. Although the same parabolic form for the
loss function is assumed in the following, the procedure can be generalized to any other
loss function.
4.2.2 Numerical Solution
A numerically optimized solution for the expected loss is found using the same
general methodology as the analytical solution. A large ensemble of possible optimal
values (X*) are sampled from the lcpdf at a grid location. For each sampled value of X*,
the expected loss is calculated as:
{ } *
1( ) ( )
N
jj
E L e L X X=
= −∑
Here the error is defined as the deviation from an unknown “true” value that is
represented as a random variable with N possible outcomes. The calculation can be
repeated for M sampled values X*. The X* yielding the minimal expected loss is the
optimal estimate. Figure 14 depicts a flow chart of the methodology previously
described.
48
Figure 14: Flow chart for numerical solution.
Note that the form of the loss function and the probability distribution at the node
can be completely general. The procedure is based on identifying the minimum expected
loss among M calculated values of expected loss. Therefore, it is important to calculate
the expected loss corresponding to a wide range of guesses for X*.
The procedure for calculating the optimal estimate using a loss function hinges on
the representation of the local uncertainty F(x) at a grid location. The construction of that
49
lcpdf can be easily accomplished using indicator kriging. Furthermore, since the
simulated values should exhibit the correct spatial correlation, it would be appropriate to
construct the lcpdf within a sequential simulation framework. The implementation of the
loss function methodology within the SISIM framework is described in the next chapter.
50
Chapter 5: Implementation of Optimized Loss Function within a Sequential Simulation framework
The loss function previously derived needs to be combined with the SISIM
algorithm to generate permeability realizations that are loss optimal. As discussed in the
previous chapter, the loss function can be tailored to reflect preferences for risk-
neutrality, over-estimation or under-estimation. This chapter explains how the lcpdf is
used in conjunction with the loss functions within sub-domains to estimate permeability
values that minimize the expected loss. As discussed in the previous chapter, the loss can
be represented using both asymmetric linear and parabolic functions and the
corresponding permeability realizations are interpreted both physically and economically.
To conclude the chapter, a second set of assumed loss functions that favor under or over
estimation are implemented and the corresponding realizations are shown to sample the
endpoints of the NPV uncertainty distribution.
5.1 IMPLEMENTATION WITHIN SISIM ALGORITHM
Determining an optimal estimate for permeability at a location using a loss
function implies minimizing the expected loss. The loss function relates the estimation
error (X*-X) to the economic loss. The economic loss has to be computed for all possible
values of the “true” value X and that requires a description of the distribution describing
the RV X at location u.
The lcpdf describing the local uncertainty of the RV X can be established within
the SISIM framework. During SISIM, the previously simulated grid nodes along with
the prior conditional data are transformed into indicator variables using the prior
51
histogram thresholds. For each threshold, the grid nodes values within the variogram
range and search radius are assigned a value of one or zero based on Equation (2.7). The
kriged estimate I*(u;zk|(n)) corresponding to the threshold zk and conditioned to n data in
the vicinity of the estimation node is exactly the local uncertainty distribution F*(u;zk|(n))
Knowledge of F*(u;zk|(n)) and the corresponding pdf allows for the Equation (4.2) to be
implemented.
The advantage of developing an lcpdf within the sequential framework is that the
distribution is conditioned to all original data plus previously simulated values. This
ensures that the spatial correlation between simulated values reproduces the target
covariance model. Each time the optimal estimate at a grid node is computed, it is added
to the conditioning data set. It is the fact that indicator kriging directly yields conditional
probabilities that renders this methodology for implementing loss functions viable
without resorting to any multi-Gaussian assumptions.
In order to develop an optimized permeability model, SISIM is adapted to allow
nodes within the sub-domains to be estimated using the loss function (Equation (4.2))
determined for that sub-domain. Nodes outside the specified sub-domains are simulated
using regular (unaltered) indicator simulation (by Monte Carlo sampling the lcpdf).
Since at each step, the simulated grid value is used to update the lcpdf for the next grid
nodes, discontinuities at the border of sub-domains are minimized. The simulation
process is continued until all grids nodes have been estimated for a given realization.
Figure 15 shows the adaptation to the SISIM code.
52
Figure 15: SISIM algorithm incorporating loss function for optimal estimation.
5.2 LOSS FUNCTION IMPLEMENTATION FOR INDIVIDUAL SUB-DOMAINS
Two different models are used to represent the relationship between estimation
error and loss. These models and the development of the loss functions are described in
Chapter 4. The first function is linear and asymmetric where the loss corresponding to
under and overestimation are not equal. Table 2 shows the loss function associated with
53
each region. To recall, the loss is in terms of loss in NPV from the optimal value (in
millions of dollars) and the error is represented in millidarcies.
Table 2: Asymmetric linear loss function models for the four sub-domains.
Asymmetric Linear Loss Function Model Underestimation Overestimation
Region 1 L(e)=0.41*(X‐X*) L(e)=0.43*(X*‐X) Region 2 L(e)=0.54*(X‐X*) L(e)=0.34*(X*‐X)
Region 3 L(e)=0.18*(X‐X*) L(e)=0.17*(X*‐X)
Region 4 L(e)=0.08*(X‐X*) L(e)=0.15*(X*‐X)
The coefficients of these loss functions will be input into the modified SISIM
program. Since Equation (4.2) is derived for a parabolic expression, a=a’=e=e’=0, and b
is the coefficient of the underestimation while b’ is the coefficient of the overestimation.
For sub-domains 1 and 3, the coefficients for under and overestimation are relatively
similar, therefore the optimal risk neutral estimation values will be approximately the
median of the lcpdf. The averaged realization from the SISIM ensemble without
implementing the loss function accompanied by the sub-domain identification map is
shown in Figure 16. In comparison Figure 17 shows one realization of the permeability
model obtained by implementing the loss functions and the corresponding average over
several realizations. In contrast to the smooth result in Figure 16, the implementation of
the loss function yields several realizations that consistently reflect highs in some areas
and lows in some others and these persist after averaging. The averaged model therefore
retains the texture of the individual realizations.
54
Well P3 (30, 30)
Well P2 (50, 50)
Well P1 (70, 70)
Well I1 (90, 90)
Well P3 (30, 30)
Well P2 (50, 50)
Well P1 (70, 70)
Well I1 (90, 90)
Figure 16: The average of 50 realizations obtained using SISIM without implementing the loss functions (left), and the sub-domain identification plot (right).
Figure 17: A single permeability realization obtained after implementing the loss functions (left), and the average of 10 realizations obtained after implementing the loss functions.
The implementation of the loss function causes the simulation algorithm to
estimate permeability values that balance water and oil production. However, many of the
features in the optimized risk neutral realization are preserved from the initial SISIM
55
realizations because the final model have to also honor the conditioning data and the
prescribed spatial covariance model.
Region 1 in the realizations obtained using linear loss function as well as the basic
SISIM realizations without any loss function have low permeability values. Region 2
conserved the characteristic that well P2 (shown in Figures 22 and 23) has lower
permeability than P3. On a global scale, the permeability within the upper portion of the
models is significantly higher than those in the lower regions. This is an indication the
conditioning data is still affecting the lcpdf and preserving the essential geological
features.
Although there are a number of similarities between the linearly optimized
realizations and the original SISIM averaged model, there are also a number of important
differences. The permeability in region 4 has significantly increased, indicating that
given the characteristics of the loss function, it is better to have higher permeability
values and risk over-estimation in those regions so as to balance the under-estimation that
might occur in other regions of the reservoir. The presence of nearby higher valued
conditioning data facilitates the estimation of such high values. A second notable
distinction, as noted before, is that the amount of permeability increases in regions 2, 3,
and 4. As mentioned earlier, the imposition of a loss function introduces some discipline
in selecting values from the lcpdf that is reflected in all the realizations of the ensemble.
As opposed to the random sampling of values from the lcpdf that happens in the basic
SISIM algorithm. That causes the realizations to look different from one another and that
causes the averaged model to look smooth.
56
5.2.1 Parabolic Loss Function
In addition to the asymmetric linear loss function, a parabolic loss function is used
to create a second ensemble of optimum permeability models. The polynomial equations
used for the 4 regions are represented in Table 3. The equation for the under and
overestimation are the same. In each region a=a’, b=b’, and e=e’=0. One parabolic
equation is used to fit the data for both the under and overestimation estimation error.
Table 3: Parabolic loss function models.
Parabolic Loss Function Model
Region 1 L(e)=0.0071*|X‐X*|2+‐0.0367*|X‐X*|
Region 2 L(e)=0.0212*|X‐X*|2+‐0.0039*|X‐X*|
Region 3 L(e)=0.0019*|X‐X*|2+0.0147*|X‐X*|
Region 4 L(e)=0.0016*|X‐X*|2+0.0226*|X‐X*|
A single realization obtained after applying the parabolic function models and the
average of the ensemble are shown in Figure 18. These realizations should be compared
to the average of the models obtained using the linear loss function as well as the average
of the original SISIM realizations shown in Figure 17 and Figure 16 respectively. Many
of the features that were present in average model in Figure 23 are also present in the
ensemble average model in Figure 24. Region 1 has lower average permeability whereas
region 4 has higher average permeability. Just like in the models obtained using a linear
loss function, application of the parabolic loss function causes the reservoir regions to
reflect permeability values that are consistent from one model to the other. This causes
the average model also to exhibit more variability (instead of the smoothing that is
observed in the ensemble average of the original SISIM models)
57
Although there are some common structures and trends between the models
obtained using a linear loss function and the ones obtained using a second-order loss
function, there are some subtle differences. Most notably, the model in Figure 23 tends
to have a larger increase in permeability for regions 3 and 4 and the areas surrounding
wells P1 and I1. The model in Figure 18 indicates a lowering of permeability in region 1
and around wells P3 and P2. In general, a second order dependency of loss on the
estimate error implies that the loss is relatively insensitive to the error. This causes the
optimal estimate to cluster around the mean (green color). In regions 1 and 2 where the
second-order term in the loss function has a higher weight, this implies that the penalty
assigned to under or over-estimation is relatively steep and hence the tendency is for the
optimal estimates to tend towards the local mean. Since the conditioning data in region 1
are low, the local mean is low and that gets reflected by the permeability values in that
region. In region 2, the conditioning values are higher and the corresponding optimal
values are higher
Figure 18: Single and averaged optimized parabolic loss function realizations.
58
Well P3 (30, 30)
Well P2 (50, 50)
Well P1 (70, 70)
Well I1 (90, 90)
Figure 19: Reiteration of sub-domain identification by PCA.
Both the linear and parabolic loss function models show increase and decrease
permeability values within different regions, but how do the actual NPV corresponding to
a water injection scenario vary from one set of models to the next? The initial SISIM
ensemble or realizations has a mean NPV of 84.74 MM$ and a range from 67.8 to 105.0
MM$ or 37.2 $MM. The models obtained using a linear loss function have a mean NPV
of 82.4 $MM, and a range from 66.8 to 97.7 MM$ or 30.9 MM$, where as the models
obtained using the parabolic loss function have a mean of 87.6 MM$ and a range form
95.2-72.9 MM$ or 22.3 MM$. Below in Figure 20 is the histogram depicting the
distribution of NPV obtained using each set of models.
59
Figure 20: Histogram of NPVs corresponding to i) traditional SISIM realizations (blue); ii) asymmetric linear loss functions (brown); and iii) parabolic loss functions (green).
The mean NPV of the three set of models are close to each other. However, the
spread of NPV values is more for the traditional SISIM realizations. The spread is the
lowest for the models using the parabolic loss functions. This is understandable since the
models using the parabolic loss function model tend to cluster around the mean. The
variability observed in the original SISIM models is directly related to the width of the
lcpdfs at the un-simulated locations and given the sparse conditioning data, the variability
from one realization to the next is considerable.
Despite the lower uncertainty in NPV using the loss function models, there is still
considerable spread in the NPV values. There are several reasons for that spread. First,
the sub-domains were identified using grid-block pressure values, while the development
of loss functions were based on fluid production rates. It can be argued that grid block
60
pressure, being a smooth reservoir response, might not be truly representative of fluid
displacement. Besides, the domains were identified on the basis of upscaled realizations.
A second (and related) reason for the uncertainty in NPV is the large spread observed in
the loss function plots, (Figure 11). Since there is a large amount of spread within these
plots it is difficult to determine a function accurately describing the relationship between
estimation error and loss.
5.3.1 Sampling realizations within specific NPV ranges
In the traditional reservoir modeling workflow, several realizations are generated
and then range of uncertainty in NPV is computed. Statements about the extremes of the
NPV distribution can only be made after the entire distribution has been computed. This
usually is very time consuming. However, using appropriate loss functions, specifically
by altering the impact of over and underestimation extremes of the global uncertainty
distribution can be sampled directly, as shown below. The advantage to this method is
only three realizations would have to be produced; a risk neutral case, a risk-limiting
(conservative) case, and a risk-seeking (aggressive) case.
To generate the risk-seeking and risk-averse cases, the weights associated with the
under and overestimation have to be altered. If the weight or the loss associated with an
incorrect estimation is increased corresponding to over-estimation, the result will be
optimal estimates of permeability in different regions of the reservoir that will generally
tend to be lower. This will lower water production and also slow down oil production.
This will be a risk-averse case. In contrast if the weight (or loss) associated with
underestimation is increased, the resultant map will have higher permeability values. That
will increase oil production rate but also correspondingly increase the water production
and would be a risk-seeking (aggressive) scenario. The realizations will still be optimal,
61
but only corresponding to the newly weighted loss function. Table 4 shows the base loss
functions assumed in different regions (different from the earlier case) and the alterations
made to sample the extremes of the NPV uncertainty distribution.
Table 4: Alterations made to the loss function to sample specific parts of the global NPV distributions.
Underestimation L(e) = 10(x*‐x)2 +158.606(x*‐x) + 114820
Overestimation L(e) = 10(x‐x*)2 + 208.849(x‐x*) + 114820
Weighted Under L(e) = 50(x*‐x)2 +158.606(x*‐x) + 114820
Weighted Over L(e) = 50(x‐x*)2 + 208.849(x‐x*) + 114820
Underestimation L(e) = ‐0.00073(x*‐x)2 + 0.7756(x*‐x) + 33.51
Overestimation L(e) = 0.0022(x‐x*)2 +1.7048(x‐x*) + 33.51
Weighted Under L(e) = ‐0.00073(x*‐x)2 + 10(x*‐x) + 33.51
Weighted Over L(e) = 0.0022(x‐x*)2 +10(x‐x*) + 33.51
Underestimation L(e) = ‐0.00830(x*‐x)2 + 18.437(x*‐x) + 127209
Overestimation L(e) = ‐0.01400(x‐x*)2 + 25.508(x‐x*) + 127209
Weighted Under L(e) = ‐0.00830(x*‐x)2 + 100(x*‐x) + 127209
Weighted Over L(e) = ‐0.01400(x‐x*)2 + 100(x‐x*) + 127209
Underestimation L(e) = 0.00480(x*‐x)2 + 0.08990(x*‐x) + 133157.5
Overestimation L(e) = ‐0.00350(x‐x*)2 + 6.18490(x‐x*) + 133157.5
Weighted Under L(e) = 0.00480(x*‐x)2 + 10(x*‐x) + 133157.5
Weighted Over L(e) = ‐0.00350(x‐x*)2 + 10(x‐x*) + 133157.5
Region 1 Loss Function
Region 2 Loss Function
Region 3 Loss Function
Region 4 Loss Function
Figure 22 shows a comparison of a typical conditioned SISIM realization versus a
base case risk neutral realization generated by implementing the unaltered loss function
in Table 4. There are a number of important features to notice about the optimized
62
permeability. First, the permeability in region 4 has been significantly decreased.
Therefore, in order to create a risk neutral realization for region 4, the permeability is
decrease to prevent the migration of water production. In contrast, the permeability in
region 1 is increased to allow more fluid production. Next, the permeability in localized
areas around well P1 and I1 are increase. This allows for more fluid production due to
the increase in near well bore permeability. These assumed loss functions have some
characteristics that are similar to the linear and parabolic models, but as it can be seen
variations in loss functions, can great affect the permeability distribution within the
delineated regions.
Figure 21: Base case loss function model (left). For comparison the permeability model for a typical SISIM model is shown (right).
As the under and overestimation weights are altered, the regional permeability
values adjust to the new loss functions. In the case where the loss function is adjusted to
favor overestimation, the permeability in region 1 increases compared to the permeability
in region 1 of the base case risk-neutral realization. This alteration increases water
63
production and faster oil production. In contrast, the permeability in regions 2 and 4
decrease, causing the total field oil production to decreases. In region 3, the permeability
around the water injector increases, but the permeability around the well P1 decreases.
The net result is loss in field oil production and increase in field water production. In the
case where the loss function is such that under-estimation is preferred, the result is
flipped. First, the permeability in region 1 is decreased, while the permeability regions 2,
3, and 4 is increased. The result is an increase in NPV as a result of the increased oil
production. Table 5 shows the risk-neutral NPV estimate along with those corresponding
to the risk-averse and risk-seeking scenarios.
Well P3 (30, 30)
Well P2 (50, 50)
Well P1 (70, 70)
Well I1 (90, 90)
Figure 22: Permeability model corresponding to where the loss function penalizes over-estimation more. For comparison the permeability model for the base case loss function is also shown (right).
64
Well P3 (30, 30)
Well P2 (50, 50)
Well P1 (70, 70)
Well I1 (90, 90)
Figure 23: Permeability model corresponding to where the loss function penalizes under-estimation more. For comparison the permeability model for the base case is also shown (right).
Table 5: Global extreme estimations.
NPV ($MM) Heavily Weighted Overestimation (risk-averse) $ 70.31 Base Case Example $ 95.40 Heavily Weighted Underestimation (risk-seeking) $ 105.11
The results above are for individual realization. The NPV values for the risk-
averse and risk-seeking models can be compared to the distribution determined from the
original 50 SISIM ensemble. The original ensemble has a mean of 84.75 MM$ and a
range from 105.0-67.8 MM$ or 37.2 MM$. Figure 24 shows a histogram of the NPV for
the original SISIM ensemble. Notice the risk seeking NPV estimation of 105.11 MM$
falls in the highest portion of the histogram, while the risk adverse estimation of 70.31
MM$ fall in the lowest portion of Figure 24.
65
Figure 24: Histogram of case study including global extremes.
The alterations of the loss functions thus do allow us to sample the extremes of
the NPV distribution without going through the cumbersome process of generating
several realizations, performing flow simulations and evaluating the entire range of NPV
values for a suite of models.
By implementing loss function within a sequential framework a risk neutral
realization is generated the honors the spatial correlation. By comparing the average
ensemble created using loss function and the normal SISIM ensemble, it can be
concluded that many of the feature are preserved because the prior conditioning data is
honor. However, there is more variability within the loss function average ensemble
because there is some regulation in the estimation method in contrast to Monte Carlo
sampling form the kriged distribution. Finally, by manipulating the weights associated
with the loss for over and underestimation error, realizations with various risk attitudes
can be generated in an effort to locate the extremes of the NPV uncertainty distribution.
66
Chapter 6: Conclusion
Sequential Gaussian and Indicator simulation (SGSIM and SISIM) have been
used in a number of applications to represent the spatial variability of natural phenomena
accurately and to assess uncertainty in global response corresponding to a transfer
function (flow simulation). SGSIM makes the assumption of working under multivariate
Gaussianity, while SISIM is a non-parametric approach to modeling the lcpdf. However,
in both these sequential approaches, simulated values are sampled at random from the
kriged lcpdf in order to populate a realization. The research presented in this thesis
modifies the sequential simulation approach by replacing Monte Carlo sampling of the
kriged distribution with a strategy to retain an optimal estimate by minimizing the
expected loss.
By delineating the reservoir into a set of unique sub-domains, individual loss
function can be developed for each identified sub-domain. This delineation through
principle component analysis allows for modeling flexibility and better representation of
reservoir heterogeneity. In this thesis, each loss function represents the economic loss
associated with permeability estimation error within a particular region.
The estimation error is with respect to an unknown “true” value and for that
reason is a random variable that shares the probability distribution of the attribute being
modeled. The objective of optimal estimation is therefore to retrieve a suitable value from
the lcpdf that takes into account the penalty or loss associated with under or over-
estimation. In other words, the optimal estimate corresponds to one that minimizes
expected loss.
In this thesis, both an asymmetric linear loss function model and a parabolic loss
function model are used to relate estimation error to economic loss. After generating two
67
sets of realizations using both models, common characteristics were found in both sets of
realization. Both models estimated similar regions of high and low permeability values
that were consistent with the reference model used to sample the conditioning data.
However, some deviation does occur because of the different loss function models.
Some specific observations based on the results obtained are:
• Permeability in different regions of the reservoir influence the well responses
(and hence economic NPV) differently, thus the need to incorporate domain
decomposition.
• By applying a loss function, the optimized spatial distribution of permeability
within a delineated domain is obtained. This replaces randomly sampling
from a local conditional probability distribution (as in SGSIM or SISIM), an
optimal local estimate of permeability at a location is can be obtained by
minimizing the expected loss.
• The possible reasons for variability of loss functions within domains include:
(1) the sub-domains were identified using grid-block pressure values, while
the development of loss functions were based on fluid production rates, (2) it
can be argued that grid block pressure, being a smooth reservoir response,
might not be truly representative of production, (3) the domains were
identified on the basis of upscaled realizations.
• The implementation of both the asymmetric linear and parabolic models
causes the reservoir regions to reflect permeability values that are consistent,
indicated both models capture similar relationships between the estimation
error and NPV loss.
68
• Many of the features in the optimized risk neutral realization are preserved
from the initial SISIM realizations because the final model must also honor
the conditioning data and the prescribed spatial covariance model.
• The imposition of a loss function introduces some discipline in selecting
values from the lcpdf that is reflected in all the realizations of the ensemble.
Therefore, there is more variability in the loss function ensemble average
compared to the original SISIM models.
• There is a lower range of NPV values observed because of the ordered
sampling from lcpdf. In the case of the parabolic loss function, the second
order dependency of loss on the estimate error implies that the loss is
relatively insensitive to the error. This causes the optimal estimate to cluster
around the mean.
• If the weight or the loss associated with an incorrect estimation is increased,
corresponding to over-estimation, the result will be optimal estimates of
permeability in different regions of the reservoir that will generally tend to be
lower, creating a risk-adverse reservoir model. In contrast, if the weight (or
loss) associated with underestimation is increased, the resultant map will have
higher permeability values, creating a risk-seeking (aggressive) scenario.
The characteristic of the model obtained is a function of the economic model for
assessing NPV. If the economic model has a low water processing and purchasing price,
optimal estimates of permeability in regions of large water production will be large
because water-handling cost is not a major contribution to the overall economic value of
the project. In contrast, if water handling costs are significant, then the optimal
permeability will the region be lower thereby decreasing the water production. The
variables that are significant to the overall project economics determine regional
69
permeability variations. It is to be emphasized that the permeability models are still data-
conditioned and reflect the correct spatial variability. It is only that the modified
simulation approach samples a subset of reservoir models from the global uncertainty
distribution that are more relevant from the standpoint of detailed economic analysis.
Not only can risk-neutral realizations (in terms of using realistic values for
revenue and for handling produced water) be represented using the loss function, Loss
functions can be modified to sample permeability models at the extremes of the NPV
uncertainty distribution. By altering the loss function as to increase the loss associated
with over or underestimation, realizations that cluster at high or low NPV values can be
sampled. For example, if the loss function favors under-estimation (i.e. the loss
associated with over estimation is high), then lower permeability values will be simulated
and that tends to increase water breakthrough and lower water handling cost. Oil
production rates are also lower, but over time the ultimate recovery factor is about the
same as a case with higher permeability. This is confirmed by comparing the distribution
endpoints generated from a base case SISIM ensemble to the values determine by altering
the loss function to emphasize over and underestimation (Figure 24).
The work in this thesis successfully shows that reservoir realizations exhibiting
correct spatial characteristics can be generated with the implementation of loss functions
within the sequential framework. The method yields reservoir models that sample a sub-
space of the full uncertainty space and can be used to probe development decisions
further.
Future work could be in development of reservoir models using different
functional form of the loss function. A parabolic model was chosen to represent the loss
function, however the loss function could be possible better represented by a third-order
polynomial, logarithmic, or some alternative. In addition, this loss function optimization
70
scheme could be translated to Monte Carlo simulation. In this application, the estimation
error in the control variable (i.e. the Monte Carlo simulated values) would be mapped to
the loss in the response variable. There are an unlimited number of applications where
loss function optimization could be advantageous.
71
Appendix A: PCA Example
Table 6: Original data and adjusted data set for PCA example.
X Y Xadj Yadj 2.5 2.4 0.7 0.5 0.5 0.7 -1.3 -1.2 2.2 2.9 0.4 1.0 1.9 2.2 0.1 0.3 3.1 3.0 1.3 1.1 2.3 2.7 0.5 0.8 2.0 1.6 0.2 -0.3 1.0 1.1 -0.8 -0.8 1.5 1.6 -0.3 -0.3 1.1 0.9 -0.7 -1.0
Step 1: The original data X and Y are obtained and adjusted by subtracting their
respective means
Step 2: The covariances are calculated
615.0)1(
)()(),cov(),cov( ,,1 =
−
−−== ∑ =
nYYXX
XYYX iadjiadjn
iadjadj
;717.0)var()1(
)(),cov(,
;617.0)var()1(
)(),cov(
12
,
12
,
==−
−=
==−
−=
∑
∑
=
=
adj
n
i iadjadjadj
adj
n
i iadjadjadj
Yn
YYYYsimiliarly
Xn
XXXX
Step 3: Build the covariance matrix
72
⎟⎟⎠
⎞⎜⎜⎝
⎛=⎟
⎟⎠
⎞⎜⎜⎝
⎛===
717.0615.0615.0617.0
),cov(),cov(
),cov(),cov()),cov(,( ,,
adjadjadjadj
adjadjadjadjjijiji
nxn
YYXY
YXXXDimDimccC
Step 4: Calculate the eigenvalues
2
1
1
2
( ) 0
0.617 0.6150 0
0.615 0.717
det( ) 0det( ) 0
(0.617 )(0.717 ) 0.615 0
0.0491.284
(0.617 0.049) 0.6150
0.615 (0.717 1.284)
Ax x A I x
x Make A I
A I A I
Eigenvalues
For
xx
Eige
λ λ
λλ
λ
λ λ
λ λ
λ
λ
= = − =
−⎛ ⎞= − =⎜ ⎟−⎝ ⎠
− = − =
− − − =
⎛ ⎞= =⎜ ⎟
⎝ ⎠
− ⎛ ⎞⎛ ⎞=⎜ ⎟⎜ ⎟−⎝ ⎠⎝ ⎠
0.735 0.6780.678 0.735
nvectors− −⎛ ⎞
= ⎜ ⎟−⎝ ⎠
Step 5 and Step 6: Rank the eigenvalues based on magnitude
λ2>λ1 because 1.284>0.049
Only the larger will be considered for this exercise.
Step 7: Recreate the data set with the values attached to the chosen Eigenvalues
73
1 2( ... )
0.6780.735
[( ) ( )]
( 0.678 0.735)0.69 1.31 0.39 0.09 1.29 0.49 0.19 0.81 0.31 0.71
.49 1.21 0
n
T T T
T
T
Feature Vector eig eig eig
Feature Vector
Final Data Feature Vector Data Adjust
Feature Vector
Data Adjust
=
−⎛ ⎞= ⎜ ⎟−⎝ ⎠
=
= − −
− − − −=
− − .99 0.29 1.09 0.79 0.31 0.81 0.31 1.01
( ) ( ) ( 0.83 1.78 0.99 0.27 1.680.91 0.10 1.14 0.44 1.22)
-0.83 1.78-0.99-0.27-1.68-0.910.101.140.441.22
T TFeature Vector Data Adjust
Final Data
⎛ ⎞⎜ ⎟− − − −⎝ ⎠
= − − − −−
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟
= ⎜⎜⎜⎜⎜⎜⎜⎜⎝ ⎠
……
⎟⎟⎟⎟⎟⎟⎟⎟
74
Appendix B: Analytical Solution for Optimized Loss Function
Derivation for Loss Function Containing Quadratic, Linear, and Constant Error Terms Loss Function Generalized Form
*{ ( )} ( ) ( )xE L e L X X f X dx∞
−∞
= −∫
For underestimation and overestimation including the Quadratic, Linear, and Constant Error Term the loss function becomes:
* * ** 2 *
* 2 *
* * *
{ ( )} ( ) ( ) ( ) ( ) ( ) ...
' ( ) ( ) ' ( ) ( ) ' ( )
X X X
x x x
x x xX X X
E L e a X X dF x b X X dF x e dF x
a X X dF x b X X dF x e dF x
−∞ −∞ −∞
∞ ∞ ∞
= − + − + +
− + − +
∫ ∫ ∫
∫ ∫ ∫
For convenience each of the 6 integrals are defined below
** 2
1
**
2
*
3
* 24
*
*5
*
6*
{ ( )} ( ) ( )
{ ( )} ( ) ( )
{ ( )} ( )
{ ( )} ' ( ) ( )
{ ( )} ' ( ) ( )
{ ( )} ' ( )
X
x
X
x
X
x
xX
xX
xX
E L e a X X dF x
E L e X X dF x
E L e e dF x
E L e a X X dF x
E L e b X X dF x
E L e e dF x
−∞
−∞
−∞
∞
∞
∞
= −
= −
=
= −
= −
=
∫
∫
∫
∫
∫
∫
Evaluating e1 term
( )
( )
* * * *2* 2 * * 2
1
* *2* * * 2
1
{ ( )} ( ) ( ) ( ) 2 ( ) ( )
{ ( )} ( ) 2 ( ) ( )
X X X X
x x x x
X X
x x
E L e a X X dF x a X dF x aX XdF x a X dF x
E L e a X F X aX XdF x a X dF x
−∞ −∞ −∞ −∞
−∞ −∞
= − = − +
= − +
∫ ∫ ∫ ∫
∫ ∫
Now taking the derivative
75
( ) ( )
( ) ( )
2* * * *1*
*2 2* * *
{ ( )} 2 ( ) ( )
2 ( ) 2 ( ) ( )X
x x
d E L e a X F X a X f XdX
a XdF x a X f X a X f x−∞
= + −
− +∫
Simplifying
( )*
* *1* { ( )} 2 ( ) 2 ( )
X
xd E L e a X F X a XdF x
dX −∞
= + ∫
Now evaluating e4 term and simplifying:
( )
( )
2* 2 2 * *4
* * * ** *
22 * * *4
{ ( )} ' ( ) ( ) ( ) 2 ( ) ' ( )
{ ( )} ' ( ) 2 ' ( ) ' 1 ( )
x x x xX X X XX X
x x
E L e a X X dF x a X dF x aX XdF x a X dF x
E L e a X dF x a X XdF x a X F X
∞ ∞ ∞ ∞
∞ ∞
= − = − +
⎡ ⎤= + + −⎣ ⎦
∫ ∫ ∫ ∫
∫ ∫
Taking the derivative:
( ) ( )
( ) ( )
*2 2* * * *
4*
2* * * *
{ ( )} ' ( ) 2 ' ( ) 2 ' ( )
2 ' 1 ( ) ' ( )
X
x x x
x
d E L e a X f X a XdF x a X f XdX
a X F X a X f X
∞
= − + +
⎡ ⎤+ − −⎣ ⎦
∫
Simplifying: * *
4**
{ ( )} 2 ' ( ) 2 ' 1 ( )xX
d E L e a XdF x a X F XdX
∞
⎡ ⎤= + −⎣ ⎦∫
Now combining 1* { ( )}d E L edX
and 4* { ( )}d E L edX
( )*
* * * *1 4*
*
{ ( )} 2 ' ( ) 2 ' 1 ( ) 2 ( ) 2 ( )X
x xX
d E L e e a XdF x a X F X a X F X a XdF xdX
∞
−∞
⎡ ⎤+ = + − + +⎣ ⎦∫ ∫
Simplifying *
* * *1 4*
*
* * *1 4*
{ ( )} 2 ( )[ '] 2 ' 2 ( ) 2 ( )
{ ( )} 2 ( )[ '] 2 ' 2 2
X
X x xX
X X X
d E L e e X F X a a a X a XdF X a XdF XdX
d E L e e X F X a a a X a adX
μ μ
∞
−∞
< >
+ = − + − −
+ = − + − −
∫ ∫
Setting the derivative to zero gives the optimal estimate. So the optimal estimate for L(e1+e2) is:
76
** * *
1 4**
* * *1 4*
{ ( )} ( )[ '] ' ( ) ( )
{ ( )} ( )[ '] '
X
X x xX
X X X
d E L e e X F X a a a X a XdF X a XdF XdX
d E L e e X F X a a a X a adX
μ μ
∞
−∞
< >
+ = − + − −
+ = − + − −
∫ ∫
The same process will be used for the linear terms. Now evaluating err2 term.
* * ** *
2
** *
2
{ ( )} ( ) ( ) ( ) ( )
{ ( )} ( ) ( )
X X X
x x x
X
X x
E L e X X dF x bX dF x b XdF x
E L e bX F X b XdF x
−∞ −∞ −∞
−∞
= − = −
= −
∫ ∫ ∫
∫
Now taking the derivative * * * * *
2* { ( )} ( ) ( ) ( )X X Xd E L e bF X bX f X bX f X
dX= + −
Simplifying *
2* { ( )} ( )Xd E L e bF X
dX=
Evaluating e5
*5
* *
* *5
*
{ ( )} ' ( ) ' ( )
{ ( )} ' ( ) ' 1 ( )
x xX X
x XX
E L e b XdF x b X dF x
E L e b XdF x b X F X
∞ ∞
∞
= +
⎡ ⎤= + −⎣ ⎦
∫ ∫
∫
Now taking the derivative with respect to X* for determining the optimal
* * * * *5* { ( )} ' ( ) ' 1 ( ) ' ( )X X X
d E L e b X f X b F X b X f XdX
⎡ ⎤= + − −⎣ ⎦
Simplifying
*5* { ( )} ' 1 ( )X
d E L e b F XdX
⎡ ⎤= −⎣ ⎦
Combining 2* { ( )}d E L edX
and 5* { ( )}d E L edX
* *
2 5* { ( )} ' 1 ( ) ( )X Xd E L e e b F X bF X
dX⎡ ⎤+ = − +⎣ ⎦
Simplifying
77
*2 5* { ( )} ( )[ '] 'X
d E L e e F X b b bdX
+ = + −
Evaluating e3
**
3{ ( )} ( ) e ( )X
x xE L e e dF x F X−∞
= =∫
Taking the derivative *
3* { ( )} e ( )xd E L e f X
dX=
Evaluating e6
*6
*
{ ( )} e ' ( ) e ' 1 ( )x xX
E L e dF x F X∞
⎡ ⎤= = −⎣ ⎦∫
Taking the derivative *
6* { ( )} e ' ( )xd E L e f X
dX= −
Combining 3* { ( )}d E L edX
and 6* { ( )}d E L edX
*3 6* { ( )} ( )[e e ']x
d E L e e f XdX
+ = −
For the final solution all of the simplified derivative terms are combined
* * **
* *
{ ( )} ( )[ '] '
( )[ '] ' ( )[e e ']
X X X
X x
d E L e X F X a a a X a adX
F X b b b f X
μ μ< >= − + − − +
+ − + −
The value of X* that makes this solution equal zero is the optimal estimation.
78
Appendix C: Modification to SISIM Code
Modification to Reading Input Parameters: read(lin,*,err=98) ivtype write(*,*) ' variable type (1=continuous, 0=categorical)= ',ivtype read(lin,*,err=98) ncut write(*,*) ' number of thresholds / categories = ',ncut if(ncut.gt.MAXCUT) stop 'ncut is too big - modify .inc file' read(lin,*,err=98) (thres(i),i=1,ncut) write(*,*) ' thresholds / categories = ',(thres(i),i=1,ncut) read(lin,*,err=98) (cdf(i),i=1,ncut) write(*,*) ' global cdf / pdf = ',(cdf(i),i=1,ncut) read(lin,'(a40)',err=98) datafl call chknam(datafl,40) write(*,*) ' data file = ',datafl read(lin,*,err=98) ixl,iyl,izl,ivrl write(*,*) ' input columns = ',ixl,iyl,izl,ivrl read(lin,'(a40)',err=98) softfl call chknam(softfl,40) write(*,*) ' soft data file = ',softfl inquire(file=softfl,exist=testfl) if(testfl) then read(lin,*,err=98) ixs,iys,izs,(ivrs(i),i=1,ncut) write(*,*) ' columns = ',ixs,iys,izs,(ivrs(i),i=1,ncut) read(lin,*,err=98) imbsim write(*,*) ' Markov-Bayes simulation = ',imbsim if(imbsim.eq.1) then read(lin,*,err=98) (beez(i),i=1,ncut) else read(lin,*,err=98) end if else read(lin,*,err=98) read(lin,*,err=98) read(lin,*,err=98)
79
end if read(lin,*,err=98) tmin,tmax write(*,*) ' trimming limits ',tmin,tmax read(lin,*,err=98) zmin,zmax write(*,*) ' data limits (tails) ',zmin,zmax read(lin,*,err=98) ltail,ltpar write(*,*) ' lower tail = ',ltail,ltpar read(lin,*,err=98) middle,mpar write(*,*) ' middle = ',middle,mpar read(lin,*,err=98) utail,utpar write(*,*) ' upper tail = ',utail,utpar read(lin,'(a40)',err=98) tabfl call chknam(tabfl,40) write(*,*) ' file for tab. quant. ',tabfl read(lin,*,err=98) itabvr,itabwt write(*,*) ' columns for vr wt = ',itabvr,itabwt read(lin,*,err=98) idbg write(*,*) ' debugging level = ',idbg read(lin,'(a40)',err=98) dbgfl call chknam(dbgfl,40) write(*,*) ' debugging file = ',dbgfl read(lin,'(a40)',err=98) outfl call chknam(outfl,40) write(*,*) ' output file = ',outfl read(lin,*,err=98) nsim write(*,*) ' number of simulations = ',nsim read(lin,*,err=98) nx,xmn,xsiz write(*,*) ' X grid specification = ',nx,xmn,xsiz read(lin,*,err=98) ny,ymn,ysiz write(*,*) ' Y grid specification = ',ny,ymn,ysiz read(lin,*,err=98) nz,zmn,zsiz
80
write(*,*) ' Z grid specification = ',nz,zmn,zsiz nxy = nx*ny nxyz = nx*ny*nz read(lin,*,err=98) ixv(1) write(*,*) ' random number seed = ',ixv(1) do i=1,1000 p = acorni(idum) end do read(lin,*,err=98) ndmax write(*,*) ' ndmax = ',ndmax read(lin,*,err=98) nodmax write(*,*) ' max prev sim nodes = ',nodmax read(lin,*,err=98) maxsec write(*,*) ' max soft indicator data = ',maxsec read(lin,*,err=98) sstrat write(*,*) ' search strategy = ',sstrat read(lin,*,err=98) mults,nmult write(*,*) ' multiple grid search flag = ',mults,nmult read(lin,*,err=98) noct write(*,*) ' max per octant = ',noct read(lin,*,err=98) radius,radius1,radius2 write(*,*) ' search radii = ',radius,radius1,radius2 if(radius.lt.EPSLON) stop 'radius must be greater than zero' radsqd = radius * radius sanis1 = radius1 / radius sanis2 = radius2 / radius read(lin,*,err=98) sang1,sang2,sang3 write(*,*) ' search anisotropy angles = ',sang1,sang2,sang3 read(lin,*,err=98) mik,cutmik write(*,*) ' median IK switch = ',mik,cutmik read(lin,*,err=98) ktype write(*,*) ' kriging type switch = ',ktype c
81
c Output now goes to debugging file: c open(ldbg,file=dbgfl,status='UNKNOWN') do i=1,ncut read(lin,*,err=98) nst(i),c0(i) if(ivtype.eq.0) + write(ldbg,100) i,thres(i),cdf(i),nst(i),c0(i) if(ivtype.eq.1) + write(ldbg,101) i,thres(i),cdf(i),nst(i),c0(i) if(nst(i).gt.MAXNST) stop 'nst is too big' istart = 1 + (i-1)*MAXNST do j=1,nst(i) index = istart + j - 1 read(lin,*,err=98) it(index),cc(index),ang1(index), + ang2(index),ang3(index) if(it(index).eq.3) STOP 'Gaussian Model Not Allowed!' read(lin,*,err=98) aa(index),aa1,aa2 write(ldbg,102) j,it(index),aa(index),cc(index) anis1(index) = aa1 / max(EPSLON,aa(index)) anis2(index) = aa2 / max(EPSLON,aa(index)) write(ldbg,103) ang1(index),ang2(index),ang3(index), + anis1(index),anis2(index) end do end do c
• • •
Here the code is modified to read in the number or regions, domain indices, and loss function parameters.
cModified 11/21/2008 - Reading in Loss Functions Parameters c read(lin,*) nreg do ireg = 1, nreg read(lin,*) iid(ireg) read(lin,*) (iformunder(il,ireg),il=1,3) read(lin,*) (iformover(il,ireg),il=1,3) read(lin,*) (coeffover(il,ireg),il=1,3) read(lin,*) (coeffunder(il,ireg),il=1,3) end do
82
read(lin,'(a40)') fnamereg close(lin) 100 format(/,' Category number ',i2,' = ',f12.3,/, + ' global prob value = ',f8.4,/, + ' number of structures = ',i3,/, + ' nugget effect = ',f8.4) 101 format(/,' Threshold number ',i2,' = ',f12.3,/, + ' global prob value = ',f8.4,/, + ' number of structures = ',i3,/, + ' nugget effect = ',f8.4) 102 format( ' type of structure ',i3,' = ',i3,/, + ' aa parameter = ',f12.4,/, + ' cc parameter = ',f12.4) 103 format( ' ang1, ang2, ang3 = ',3f6.2,/, + ' anis1, anis2 = ',2f12.4) c c Perform some quick error checking: c if(nx.gt.MAXX) stop 'nx is too big - modify .inc file' if(ny.gt.MAXY) stop 'ny is too big - modify .inc file' if(nz.gt.MAXZ) stop 'nz is too big - modify .inc file'
• • •
Before the main Kriging loop the coefficient of the loss function are written to intermediate files.
c CALLED FunctionInputs.txt open(23,file='FunctionInputs.txt',status='unknown') c Trying to write a ccdf debug file for a particular region open(33,file='TempDebug.txt',status='unknown') do ireg=1,nreg write(23,*) iid(ireg) write(23,*) (iformover(il,ireg), il=1,3) write(23,*) (iformunder(il,ireg), il=1,3) write(23,131) (coeffover(il,ireg), il=1,3) write(23,131) (coeffunder(il,ireg), il=1,3) end do
83
131 format(3(f12.5,1x)) close(23) open(23,file=fnamereg,status='old') do i=1,3 read(23,*) end do do ixyz=1,nxyz read(23,*) iregind(ixyz) end do
• • •
This part of the code is modified so that nodes outside of the designated regions are estimated using kriging, but nodes within the sub-domains are estimated using the prescribed loss functions.
c Use the global distribution? c if((nclose+ncnode).le.0) then call beyond(ivtype,ncut,thres,cdf,ng,gcut,gcdf, + zmin,zmax,ltail,ltpar,middle,mpar, + utail,utpar,zval,cdfval,ierr) else c c Estimate the local distribution by indicator kriging: c do ic=1,ncut call krige(ix,iy,iz,xx,yy,zz,ic,cdf(ic), + ccdf(ic)) end do c c Correct order relations: c call ordrel(ivtype,ncut,ccdf,ccdfo,nviol,aviol, + xviol) c call lossfunc(ivtype,ncut,ccdf,3,lundertype,coeffunder, c + lovertype,coeffover,zval) c c Draw from the local distribution:
84
c if(iregind(index).le.UNEST) then call beyond(ivtype,ncut,thres,ccdfo,ng,gcut, + gcdf,zmin,zmax,ltail,ltpar,middle, + mpar,utail,utpar,zval,cdfval,ierr) c c Write some debugging information: c c Changes to accommodate loss function calculations. First write c the current cdf information to a file - Loss Inputs.txt c else do ic=1,ncut-1 if((ccdfo(ic+1)-ccdfo(ic)).eq.1.0) then call beyond(ivtype,ncut,thres,ccdfo,ng,gcut, + gcdf,zmin,zmax,ltail,ltpar,middle, + mpar,utail,utpar,zval,cdfval,ierr) go to 323 endif end do open(23,file='Loss Inputs.txt',status='unknown') write(23,*) ncut write(23,*) iregind(index) write(23,232) zmin, zmax write(23,231) (thres(ic),ic=1,ncut) write(23,231) (ccdfo(ic),ic=1,ncut) close(23) 232 format(2(f8.4,1x)) 231 format(5(f12.5)) JJ = system("LossOpt12")
• • •
Here the estimated values for the loss function are outputted to an intermediate file and combined with the kriged estimates to generate the realization.
c Code writes values to an intermediate file - OptOut.txt open(23,file='Optout.txt',status='old') read(23,*) zval close(23) if(iregind(index).eq.1) then
85
write(33,*) (ccdfo(ic),ic=1,ncut) endif endif 323 if(idbg.ge.3) then do ic=1,ncut write(ldbg,202) ccdf(ic),ccdfo(ic) 202 format(' CDF (original and fixed)',2f7.4) end do endif endif sim(index) = zval c c END MAIN LOOP OVER NODES:
86
Appendix D: Code for Implementation Analytical Solution for Optimized Loss Function
Private Sub Main() 'These are the variable that sisim has calculated in the original program 'These are read in from a text file ReDim ccdfvalue(1, 20) As Variant ReDim ccdfprob(1, 20) As Variant Dim sisimregindex, ncuts As Integer Dim ccdfvalueat1, ccdfvalue0, seed As Double 'These are loss function parameters 'Region 1 Dim coeffover1reg1, coeffover2reg1, coeffover3reg1 As Single Dim coeffunder1reg1, coeffunder2reg1, coeffunder3reg1 As Single Dim quadoverreg1, lineoverreg1, constoverreg1 As Integer Dim quadunderreg1, lineunderreg1, constunderreg1 As Integer 'Region 2 Dim coeffover1reg2, coeffover2reg2, coeffover3reg2 As Single Dim coeffunder1reg2, coeffunder2reg2, coeffunder3reg2 As Single Dim quadoverreg2, lineoverreg2, constoverreg2 As Integer Dim quadunderreg2, lineunderreg2, constunderreg2 As Integer 'Region 3 Dim coeffover1reg3, coeffover2reg3, coeffover3reg3 As Single Dim coeffunder1reg3, coeffunder2reg3, coeffunder3reg3 As Single Dim quadoverreg3, lineoverreg3, constoverreg3 As Integer Dim quadunderreg3, lineunderreg3, constunderreg3 As Integer 'Region 4 Dim coeffover1reg4, coeffover2reg4, coeffover3reg4 As Single Dim coeffunder1reg4, coeffunder2reg4, coeffunder3reg4 As Single Dim quadoverreg4, lineoverreg4, constoverreg4 As Integer Dim quadunderreg4, lineunderreg4, constunderreg4 As Integer Dim indexreg1, indexreg2, indexreg3, indexreg4 As Integer Dim coeffover1gen, coeffover2gen, coeffover3gen, coeffover4gen As Double Dim coeffunder1gen, coeffunder2gen, coeffunder3gen, coeffunder4gen As Double Dim muylower, muyhigher, muy As Single 'These are the variable need for the internal program Dim pxy0, pxytail As Single
87
ReDim Fxyslope(20) As Variant 'These are temporary variables used in computing ReDim pxyarray(20), product(20), pxytemp(20), ccdfvaluetemp(20), guess(20) As
Variant Dim Fxy, pxy, sumproduct As Double 'This is going to be zero becuase of how probability is defined as a horizontal line. Dim pxyprime As Single Dim Yinitial, epsilon, Nmax, sum, area, upperthresh As Single Dim Jprime, Ynew, Yoptimal As Double Dim Jfunc(10000), Y(10000) As Variant Dim i, J, n, g, Iterations, index As Single 'These are counters 'These are variable used for I/O of Data Dim LossFile, FunctionInputs, OptOut, JFuncOut As String 'Another attemp at finding the solution has new variables Dim TrueProb, TruePerm, JfuncMin As Single '------------------------------------------------------------------------------------------ 'Reading in the data from the two files. I also declare the name of the output file LossFile = "Loss Inputs.txt" OptOut = "OptOut.txt" FunctionInputs = "FunctionInputs.txt" JFuncOut = "JFuncOut.txt" Open LossFile For Input As #1 ' Open file for input. ' Opens Output file outside of main code, so it doesn't have to keep reopening 'This data being read in is the ccdfvalues, associated probabilities, and regional indix i = 0 Do While Not EOF(1) ' Loop until end of file. i = i + 1 If i = 1 Then Input #1, ncuts ElseIf i = 2 Then Input #1, sisimregindex ElseIf i = 3 Then Input #1, ccdfvalue0, ccdfvalueat1 ElseIf i = 4 Then Do While J < ncuts Input #1, ccdfvalue(1, J + 1) J = J + 1
88
Loop ElseIf i = 5 Then Do While n < ncuts Input #1, ccdfprob(1, n + 1) n = n + 1 Loop ElseIf i = 6 Then Input #1, seed End If Loop Close #1 ' Close file. Open FunctionInputs For Input As #2 'Reading in constants and indicators for the loss function for the regions i = 0 Do While Not EOF(2) i = i + 1 If i = 1 Then Input #2, indexreg1 ElseIf i = 2 Then Input #2, quadoverreg1, lineoverreg1, constoverreg1 ElseIf i = 3 Then Input #2, quadunderreg1, lineunderreg1, constunderreg2 ElseIf i = 4 Then Input #2, coeffover1reg1, coeffover2reg1, coeffover3reg1 ElseIf i = 5 Then Input #2, coeffunder1reg1, coeffunder2reg1, coeffunder3reg1 ElseIf i = 6 Then Input #2, indexreg2 ElseIf i = 7 Then Input #2, quadoverreg2, lineoverreg2, constoverreg2 ElseIf i = 8 Then Input #2, quadunderreg2, lineunderreg2, constunderreg2 ElseIf i = 9 Then Input #2, coeffover1reg2, coeffover2reg2, coeffover3reg2 ElseIf i = 10 Then Input #2, coeffunder1reg2, coeffunder2reg2, coeffunder3reg2 ElseIf i = 11 Then Input #2, indexreg3 ElseIf i = 12 Then Input #2, quadoverreg3, lineoverreg3, constoverreg3 ElseIf i = 13 Then Input #2, quadunderreg3, lineunderreg3, constunderreg3 ElseIf i = 14 Then Input #2, coeffover1reg3, coeffover2reg3, coeffover3reg3
89
ElseIf i = 15 Then Input #2, coeffunder1reg3, coeffunder2reg3, coeffunder3reg3 ElseIf i = 16 Then Input #2, indexreg4 ElseIf i = 17 Then Input #2, quadoverreg4, lineoverreg4, constoverreg4 ElseIf i = 18 Then Input #2, quadunderreg4, lineunderreg4, constunderreg4 ElseIf i = 19 Then Input #2, coeffover1reg4, coeffover2reg4, coeffover3reg4 ElseIf i = 20 Then Input #2, coeffunder1reg4, coeffunder2reg4, coeffunder3reg4 End If Loop Close #2 '----------------------------------------------------------------------- 'Assigning coefficient based of region index read from sisim 'This part of the code also checks to see if coefficient is considered 'Check for region 1 coefficients If sisimregindex = indexreg1 Then If quadoverreg1 = 0 Then coeffover1gen = 0 Else coeffover1gen = coeffover1reg1 End If If lineoverreg1 = 0 Then coeffover2gen = 0 Else coeffover2gen = coeffover2reg1 End If If constoverreg1 = 0 Then coeffover3gen = 0 Else coeffover3gen = coeffover3reg1 End If If quadunderreg1 = 0 Then coeffunder1gen = 0 Else coeffunder1gen = coeffunder1reg1 End If If lineunderreg1 = 0 Then coeffunder2gen = 0 Else coeffunder2gen = coeffunder2reg1
90
End If If constunderreg1 = 0 Then coeffunder3gen = 0 Else coeffunder3gen = coeffunder3reg1 End If End If 'Check for region 2 coefficients If sisimregindex = indexreg2 Then If quadoverreg2 = 0 Then coeffover1gen = 0 Else coeffover1gen = coeffover1reg2 End If If lineoverreg2 = 0 Then coeffover2gen = 0 Else coeffover2gen = coeffover2reg2 End If If constoverreg2 = 0 Then coeffover3gen = 0 Else coeffover3gen = coeffover3reg2 End If If quadunderreg2 = 0 Then coeffunder1gen = 0 Else coeffunder1gen = coeffunder1reg2 End If If lineunderreg2 = 0 Then coeffunder2gen = 0 Else coeffunder2gen = coeffunder2reg2 End If If constunderreg2 = 0 Then coeffunder3gen = 0 Else coeffunder3gen = coeffunder3reg2 End If End If 'Check for region 3 coefficients If sisimregindex = indexreg3 Then
91
If quadoverreg3 = 0 Then coeffover1gen = 0 Else coeffover1gen = coeffover1reg3 End If If lineoverreg3 = 0 Then coeffover2gen = 0 Else coeffover2gen = coeffover2reg3 End If If constoverreg3 = 0 Then coeffover3gen = 0 Else coeffover3gen = coeffover3reg3 End If If quadunderreg3 = 0 Then coeffunder1gen = 0 Else coeffunder1gen = coeffunder1reg3 End If If lineunderreg3 = 0 Then coeffunder2gen = 0 Else coeffunder2gen = coeffunder2reg3 End If If constunderreg3 = 0 Then coeffunder3gen = 0 Else coeffunder3gen = coeffunder3reg3 End If End If 'Check for region 4 coefficients If sisimregindex = indexreg4 Then If quadoverreg4 = 0 Then coeffover1gen = 0 Else coeffover1gen = coeffover1reg4 End If If lineoverreg4 = 0 Then coeffover2gen = 0 Else coeffover2gen = coeffover2reg4 End If If constoverreg4 = 0 Then
92
coeffover3gen = 0 Else coeffover3gen = coeffover3reg4 End If If quadunderreg4 = 0 Then coeffunder1gen = 0 Else coeffunder1gen = coeffunder1reg4 End If If lineunderreg4 = 0 Then coeffunder2gen = 0 Else coeffunder2gen = coeffunder2reg4 End If If constunderreg4 = 0 Then coeffunder3gen = 0 Else coeffunder3gen = coeffunder3reg4 End If End If '----------------------------------------------------------------------------------- Nmax = 32000 '----------------------------------------------------------------------------------- 'Making sure the lcdf monotonically increases i = 1 Line1: If ccdfprob(1, 1) = 0 Then For i = 1 To ncuts ccdfprob(1, i) = ccdfprob(1, i) + 0.001 Next i GoTo Line1 Else i = 1 For i = 1 To ncuts If ccdfprob(1, i) = ccdfprob(1, i + 1) Then J = i For J = i To ncuts ccdfprob(1, J + 1) = ccdfprob(1, J + 1) + 0.001 Next J End If Next i End If
93
ccdfprob(1, ncuts + 1) = ccdfprob(1, 20) i = 1 For i = 1 To ncuts Debug.Print ccdfprob(1, i) Next i muy = 0 sum = 0 For i = 1 To ncuts - 1 muy = (ccdfvalue(1, i) + ccdfvalue(1, i + 1)) / 2 * (ccdfprob(1, i + 1) - ccdfprob(1, i)) sum = sum + muy Next i muy = sum + (ccdfvalue0 + ccdfvalue(1, 1)) / 2 * ccdfprob(1, 1) + (ccdfvalueat1 + ccdfvalue(1, ncuts)) / 2 * (1 - ccdfprob(1, ncuts)) '------------------------------------------------------------------------------------------------- 'This just checks to see if probabilities are being calculated correctly J = 1 sum = 0 Do While J < ncuts pxyarray(J) = ccdfprob(1, J + 1) - ccdfprob(1, J) sum = sum + pxyarray(J) J = J + 1 Loop sum = sum + ccdfprob(1, 1) + (1 - ccdfprob(1, ncuts)) 'debug.print (sum) 'Sum =1 so this means all probabilities are accounted for '-------------------------------------------------------------------------------------------------- i = 1 Iterations = 10000 For g = 1 To Iterations 'Randomize TrueProb = Rnd(seed) n = 1 For n = 1 To ncuts If TrueProb < ccdfprob(1, 1) Then TruePerm = (ccdfvalue(1, 1) - ccdfvalue0) / ccdfprob(1, 1) * _ (TrueProb - ccdfprob(1, 1)) + ccdfvalue(1, 1) Exit For ElseIf TrueProb > ccdfprob(1, n) And TrueProb < ccdfprob(1, n + 1) Then TruePerm = (ccdfvalue(1, n + 1) - ccdfvalue(1, n)) / (ccdfprob(1, n + 1) - ccdfprob(1, n)) * _ (TrueProb - ccdfprob(1, n)) + ccdfvalue(1, n)
94
Exit For ElseIf TrueProb > ccdfprob(1, ncuts) Then TruePerm = (ccdfvalueat1 - ccdfvalue(1, ncuts)) / (1 - ccdfprob(1, ncuts)) * _ (TrueProb - ccdfprob(1, ncuts)) + ccdfvalue(1, ncuts) Exit For End If Next n 'Debug.Print TruePerm(i) Y(g) = TruePerm 'This is main loop J = 1 Do While J < ncuts + 1 If Y(g) > ccdfvalue0 And Y(g) < ccdfvalue(1, 1) Then pxy = ccdfprob(1, 1) Fxyslopetemp = (ccdfprob(1, 1) - 0) / (ccdfvalue(1, 1) - ccdfvalue0) Fxy = Fxyslopetemp * (Y(g) - ccdfvalue(1, 1)) + ccdfprob(1, 1) J = 1 muyuppertemp = 0 Do While J < ncuts + 1 If J < ncuts Then pxytemp(J) = ccdfprob(1, J + 1) - ccdfprob(1, J) product(J) = pxytemp(J) * (ccdfvalue(1, J + 1) + ccdfvalue(1, J)) / 2 Else pxytemp(J) = 1 - ccdfprob(1, ncuts) product(J) = pxytemp(J) * (ccdfvalueat1 + ccdfvalue(1, ncuts)) / 2 End If muyuppertemp = muyuppertemp + product(J) J = J + 1 Loop probbelongingtolower = pxy * (Y(g) - ccdfvalue0) / (ccdfvalue(1, 1) –
ccdfvalue0) probbelongingtoupper = pxy * (ccdfvalue(1, 1) - Y(g)) / (ccdfvalue(1, 1) –
ccdfvalue0) 'debug.Print (probbelongingtolower + probbelongingtoupper) muylower = (probbelongingtolower * (Y(g) + ccdfvalue0) / 2) Muyupper = muyuppertemp + probbelongingtoupper * (ccdfvalue(1, 1) + Y(g))
/ 2 'debug.Print (muyupper) 'debug.Print (muylower) 'debug.Print (muyupper + muylower) Exit Do
95
ElseIf Y(g) > ccdfvalue(1, J) And Y(g) < ccdfvalue(1, J + 1) Then pxy = ccdfprob(1, J + 1) - ccdfprob(1, J) Fxyslopetemp = (ccdfprob(1, J + 1) - ccdfprob(1, J)) _ / (ccdfvalue(1, J + 1) - ccdfvalue(1, J)) Fxy = Fxyslopetemp * (Y(g) - ccdfvalue(1, J + 1)) + ccdfprob(1, J + 1) i = 1 muylower = 0 Do While i < J pxytemp(i) = ccdfprob(1, i + 1) - ccdfprob(1, i) product(i) = pxytemp(i) * (ccdfvalue(1, i + 1) + ccdfvalue(1, i)) / 2 muylower = muylower + product(i) i = i + 1 Loop i = ncuts Muyupper = 0 Do While i > J If i < ncuts Then pxytemp(i) = ccdfprob(1, i + 1) - ccdfprob(1, i) product(i) = pxytemp(i) * (ccdfvalue(1, i + 1) + ccdfvalue(1, i)) / 2 Else pxytemp(i) = 1 - ccdfprob(1, ncuts) product(i) = pxytemp(i) * (ccdfvalueat1 + ccdfvalue(1, ncuts)) / 2 End If Muyupper = Muyupper + product(i) i = i - 1 Loop 'This calculates the restandardizing area for each part probbelongingtolower = pxy * (Y(g) - ccdfvalue(1, J)) / (ccdfvalue(1, J + 1) –
ccdfvalue(1, J)) probbelongingtoupper = pxy * (ccdfvalue(1, J + 1) - Y(g)) / (ccdfvalue(1, J + 1)
- ccdfvalue(1, J)) 'debug.Print (probbelongingtolower + probbelongingtoupper) 'This is adjusting the muy's for the part of the interval Y falls within muylower = muylower + (Y(g) + ccdfvalue(1, J)) / 2 * probbelongingtolower
+ (ccdfvalue(1, 1) + ccdfvalue0) / 2 * ccdfprob(1, 1) Muyupper = Muyupper + (Y(g) + ccdfvalue(1, J + 1)) / 2 * probbelongingtoupper
'debug.Print muylower 'debug.Print muyupper 'debug.Print (muylower + muyupper)
96
Exit Do ElseIf Y(g) > ccdfvalue(1, ncuts) And Y(g) < ccdfvalueat1 Then pxy = 1 - ccdfprob(1, ncuts) Fxyslopetemp = (1 - ccdfprob(1, ncuts)) / (ccdfvalueat1 - ccdfvalue(1,
ncuts)) Fxy = Fxyslopetemp * (Y(g) - ccdfvalue(1, ncuts)) + ccdfprob(1, ncuts) i = 1 muylower = 0 'Initializing the lower mean Do While i < ncuts pxytemp(i) = ccdfprob(1, i + 1) - ccdfprob(1, i) product(i) = pxytemp(i) * (ccdfvalue(1, i + 1) + ccdfvalue(1, i)) / 2 muylower = muylower + product(i) i = i + 1 Loop probbelongingtolower = pxy * (Y(g) - ccdfvalue(1, ncuts)) / (ccdfvalueat1 –
ccdfvalue(1, ncuts)) probbelongingtoupper = pxy * (ccdfvalueat1 - Y(g)) / (ccdfvalueat1 –
ccdfvalue(1, ncuts)) ' Debug.Print (probbelongingtolower + probbelongingtoupper) muylower = (ccdfvalue(1, 1) + ccdfvalue0) / 2 * ccdfprob(1, 1) + muylower
+ (Y(g) + ccdfvalue(1, ncuts)) / 2 * probbelongingtolower Muyupper = (Y(g) + ccdfvalueat1) / 2 * probbelongingtoupper 'debug.Print (muylower) 'debug.Print (muyupper) 'debug.Print (muylower + muyupper) Exit Do End If J = J + 1 Loop 'Debug.Print "Current Y estimation is:"; Y(g) 'Debug.Print "Pxy is:", pxy, "Fxy is:", Fxy 'Debug.Print "MuyLower is:", muylower 'Debug.Print "MuyUpper is:", Muyupper 'Debug.Print "Muy is:", muy 'Debug.Print "The Difference between Mu and Mulower+Muyupper is:", (muy - (muylower + Muyupper)) '---------------------------------------------------------------------------------------------- 'This is the analytical part 'Jfunc(g) = Y(g) * ((coeffunder1gen - coeffover1gen) * Fxy + coeffover1gen) - coeffunder1gen * muylower - coeffover1gen * Muyupper - _ coeffover2gen + Fxy * (coeffunder2gen + coeffover2gen) + (coeffunder3gen - coeffover3gen) * pxy
97
Jfunc(g) = Y(g) * ((coeffover1gen - coeffunder1gen) * Fxy + coeffunder1gen) - coeffover1gen * muylower - coeffunder1gen * Muyupper - _
coeffunder2gen + Fxy * (coeffover2gen + coeffunder2gen) + (coeffunder3gen - coeffover3gen) * pxy
'Debug.Print "Jfunc is", Jfunc(g) Next g 'Output the Y and Jfunc Values for debuggin-------------- Open JFuncOut For Output As #4 i = 1 For i = 1 To Iterations Print #4, Y(i); Jfunc(i) Next i Close #4 '------------------------------------------- 'Have added For loop to find the minimum g = 1 For g = 1 To Iterations Jfunc(g) = Abs(Jfunc(g)) Next g g = 1 JfuncMin = Jfunc(1) For g = 1 To Iterations If Jfunc(g) < JfuncMin Then JfuncMin = Jfunc(g) index = g End If Next g 'Output Answer--------------------------- Yoptimal = Y(index) Debug.Print (Yoptimal) Debug.Print (Jfunc(index)) Debug.Print sisimregindex i = 0 Open OptOut For Output As #3 Print #3, (Yoptimal) Debug.Print (Yoptimal) Close #3
End Sub
98
Appendix E: Code for Implementation Numerical Solution for Optimized Loss Function
Private Sub Main() 'These are the variable that sisim has calculated in the original program 'These are read in from a text file ReDim ccdfvalue(1, 20) As Variant ReDim ccdfprob(1, 20) As Variant Dim sisimregindex, ncuts As Integer Dim ccdfvalueat1, ccdfvalue0, seed As Double 'These are loss function parameters 'Region 1 Dim coeffover1reg1, coeffover2reg1, coeffover3reg1 As Single Dim coeffunder1reg1, coeffunder2reg1, coeffunder3reg1 As Single Dim quadoverreg1, lineoverreg1, constoverreg1 As Integer Dim quadunderreg1, lineunderreg1, constunderreg1 As Integer 'Region 2 Dim coeffover1reg2, coeffover2reg2, coeffover3reg2 As Single Dim coeffunder1reg2, coeffunder2reg2, coeffunder3reg2 As Single Dim quadoverreg2, lineoverreg2, constoverreg2 As Integer Dim quadunderreg2, lineunderreg2, constunderreg2 As Integer 'Region 3 Dim coeffover1reg3, coeffover2reg3, coeffover3reg3 As Single Dim coeffunder1reg3, coeffunder2reg3, coeffunder3reg3 As Single Dim quadoverreg3, lineoverreg3, constoverreg3 As Integer Dim quadunderreg3, lineunderreg3, constunderreg3 As Integer 'Region 4 Dim coeffover1reg4, coeffover2reg4, coeffover3reg4 As Single Dim coeffunder1reg4, coeffunder2reg4, coeffunder3reg4 As Single Dim quadoverreg4, lineoverreg4, constoverreg4 As Integer Dim quadunderreg4, lineunderreg4, constunderreg4 As Integer Dim indexreg1, indexreg2, indexreg3, indexreg4 As Integer Dim coeffover1gen, coeffover2gen, coeffover3gen, coeffover4gen As Double Dim coeffunder1gen, coeffunder2gen, coeffunder3gen, coeffunder4gen As Double Dim muylower, muyhigher As Single 'These are the variable need for the internal program Dim pxy0, pxytail As Single ReDim Fxyslope(20) As Variant
99
'These are temporary variables used in computing ReDim pxyarray(20), product(20), pxytemp(20), ccdfvaluetemp(20), guess(20) As
Variant Dim Fxy, pxy, sumproduct As Double 'This is going to be zero becuase of how probability is defined as a horizontal line. Dim pxyprime As Single Dim Yinitial, epsilon, Nmax, sum, area, upperthresh As Single Dim Jfunc, Jprime, Ynew, Y, Yoptimal As Double Dim i, j, n, g, Iterations As Integer 'These are counters 'These are variable used for I/O of Data Dim LossFile, FunctionInputs, OptOut As String '----------------------------------------------------------- 'Variable from Second Attempt Dim LossEv, MinLoss As Single Dim Loss(1000), TrueProb(1000), TruePerm(1000), BaseProb(1000) As Variant Dim BasePerm(1000), Err(1000), LossArray(1000) As Variant 'Reading in Variables----------------------------------------------------------------- 'Reading in the data from the two files. I also declare the name of the output file LossFile = "Loss Inputs.txt" OptOut = "OptOut.txt" FunctionInputs = "FunctionInputs.txt" Open LossFile For Input As #1 ' Open file for input. ' Opens Output file outside of main code, so it doesn't have to keep reopening 'This data being read in is the ccdfvalues, associated probabilities, and regional indix i = 0 Do While Not EOF(1) ' Loop until end of file. i = i + 1 If i = 1 Then Input #1, ncuts ElseIf i = 2 Then Input #1, sisimregindex ElseIf i = 3 Then Input #1, ccdfvalue0, ccdfvalueat1 ElseIf i = 4 Then Do While j < ncuts Input #1, ccdfvalue(1, j + 1) j = j + 1
100
Loop ElseIf i = 5 Then Do While n < ncuts Input #1, ccdfprob(1, n + 1) n = n + 1 Loop ElseIf i = 6 Then Input #1, seed End If Loop Close #1 ' Close file. Open FunctionInputs For Input As #2 'Reading in constants and indicators for the loss function for the regions i = 0 Do While Not EOF(2) i = i + 1 If i = 1 Then Input #2, indexreg1 ElseIf i = 2 Then Input #2, quadoverreg1, lineoverreg1, constoverreg1 ElseIf i = 3 Then Input #2, quadunderreg1, lineunderreg1, constunderreg2 ElseIf i = 4 Then Input #2, coeffover1reg1, coeffover2reg1, coeffover3reg1 ElseIf i = 5 Then Input #2, coeffunder1reg1, coeffunder2reg1, coeffunder3reg1 ElseIf i = 6 Then Input #2, indexreg2 ElseIf i = 7 Then Input #2, quadoverreg2, lineoverreg2, constoverreg2 ElseIf i = 8 Then Input #2, quadunderreg2, lineunderreg2, constunderreg2 ElseIf i = 9 Then Input #2, coeffover1reg2, coeffover2reg2, coeffover3reg2 ElseIf i = 10 Then Input #2, coeffunder1reg2, coeffunder2reg2, coeffunder3reg2 ElseIf i = 11 Then Input #2, indexreg3 ElseIf i = 12 Then Input #2, quadoverreg3, lineoverreg3, constoverreg3 ElseIf i = 13 Then Input #2, quadunderreg3, lineunderreg3, constunderreg3 ElseIf i = 14 Then Input #2, coeffover1reg3, coeffover2reg3, coeffover3reg3
101
ElseIf i = 15 Then Input #2, coeffunder1reg3, coeffunder2reg3, coeffunder3reg3 ElseIf i = 16 Then Input #2, indexreg4 ElseIf i = 17 Then Input #2, quadoverreg4, lineoverreg4, constoverreg4 ElseIf i = 18 Then Input #2, quadunderreg4, lineunderreg4, constunderreg4 ElseIf i = 19 Then Input #2, coeffover1reg4, coeffover2reg4, coeffover3reg4 ElseIf i = 20 Then Input #2, coeffunder1reg4, coeffunder2reg4, coeffunder3reg4 End If Loop Close #2 '----------------------------------------------------------------------- 'Assigning coefficient based of region index read from sisim 'This part of the code also checks to see if coefficient is considered 'Check for region 1 coefficients If sisimregindex = indexreg1 Then If quadoverreg1 = 0 Then coeffover1gen = 0 Else coeffover1gen = coeffover1reg1 End If If lineoverreg1 = 0 Then coeffover2gen = 0 Else coeffover2gen = coeffover2reg1 End If If constoverreg1 = 0 Then coeffover3gen = 0 Else coeffover3gen = coeffover3reg1 End If If quadunderreg1 = 0 Then coeffunder1gen = 0 Else coeffunder1gen = coeffunder1reg1 End If If lineunderreg1 = 0 Then coeffunder2gen = 0 Else coeffunder2gen = coeffunder2reg1
102
End If If constunderreg1 = 0 Then coeffunder3gen = 0 Else coeffunder3gen = coeffunder3reg1 End If End If 'Check for region 2 coefficients If sisimregindex = indexreg2 Then If quadoverreg2 = 0 Then coeffover1gen = 0 Else coeffover1gen = coeffover1reg2 End If If lineoverreg2 = 0 Then coeffover2gen = 0 Else coeffover2gen = coeffover2reg2 End If If constoverreg2 = 0 Then coeffover3gen = 0 Else coeffover3gen = coeffover3reg2 End If If quadunderreg2 = 0 Then coeffunder1gen = 0 Else coeffunder1gen = coeffunder1reg2 End If If lineunderreg2 = 0 Then coeffunder2gen = 0 Else coeffunder2gen = coeffunder2reg2 End If If constunderreg2 = 0 Then coeffunder3gen = 0 Else coeffunder3gen = coeffunder3reg2 End If End If 'Check for region 3 coefficients If sisimregindex = indexreg3 Then
103
If quadoverreg3 = 0 Then coeffover1gen = 0 Else coeffover1gen = coeffover1reg3 End If If lineoverreg3 = 0 Then coeffover2gen = 0 Else coeffover2gen = coeffover2reg3 End If If constoverreg3 = 0 Then coeffover3gen = 0 Else coeffover3gen = coeffover3reg3 End If If quadunderreg3 = 0 Then coeffunder1gen = 0 Else coeffunder1gen = coeffunder1reg3 End If If lineunderreg3 = 0 Then coeffunder2gen = 0 Else coeffunder2gen = coeffunder2reg3 End If If constunderreg3 = 0 Then coeffunder3gen = 0 Else coeffunder3gen = coeffunder3reg3 End If End If 'Check for region 4 coefficients If sisimregindex = indexreg4 Then If quadoverreg4 = 0 Then coeffover1gen = 0 Else coeffover1gen = coeffover1reg4 End If If lineoverreg4 = 0 Then coeffover2gen = 0 Else coeffover2gen = coeffover2reg4 End If If constoverreg4 = 0 Then
104
coeffover3gen = 0 Else coeffover3gen = coeffover3reg4 End If If quadunderreg4 = 0 Then coeffunder1gen = 0 Else coeffunder1gen = coeffunder1reg4 End If If lineunderreg4 = 0 Then coeffunder2gen = 0 Else coeffunder2gen = coeffunder2reg4 End If If constunderreg4 = 0 Then coeffunder3gen = 0 Else coeffunder3gen = coeffunder3reg4 End If End If '--------------------------------------------------------------------------------- 'Check to see if lcdf is monotonically increasing Iterations = 1000 i = 1 Line1: If ccdfprob(1, 1) = 0 Then For i = 1 To ncuts ccdfprob(1, i) = ccdfprob(1, i) + 0.001 Next i GoTo Line1 Else i = 1 For i = 1 To ncuts If ccdfprob(1, i) = ccdfprob(1, i + 1) Then j = i For j = i To ncuts ccdfprob(1, j + 1) = ccdfprob(1, j + 1) + 0.001 Next j End If Next i End If ccdfprob(1, ncuts + 1) = ccdfprob(1, 20)
105
'Sample the true Perm----------------------------------- i = 1 For i = 1 To Iterations 'Randomize TrueProb(i) = Rnd(seed) n = 1 For n = 1 To ncuts If TrueProb(i) < ccdfprob(1, 1) Then TruePerm(i) = (ccdfvalue(1, 1) - ccdfvalue0) / ccdfprob(1, 1) * _ (TrueProb(i) - ccdfprob(1, 1)) + ccdfvalue(1, 1) Exit For ElseIf TrueProb(i) > ccdfprob(1, n) And TrueProb(i) < ccdfprob(1, n + 1) Then TruePerm(i) = (ccdfvalue(1, n + 1) - ccdfvalue(1, n)) / (ccdfprob(1, n + 1) –
ccdfprob(1, n)) * _ (TrueProb(i) - ccdfprob(1, n)) + ccdfvalue(1, n) Exit For ElseIf TrueProb(i) > ccdfprob(1, ncuts) Then TruePerm(i) = (ccdfvalueat1 - ccdfvalue(1, ncuts)) / (1 - ccdfprob(1, ncuts)) * _ (TrueProb(i) - ccdfprob(1, ncuts)) + ccdfvalue(1, ncuts) Exit For End If Next n 'Debug.Print TruePerm(i) Next i 'Sample the Base Perm------------------------------------ i = 1 For i = 1 To Iterations 'Randomize BaseProb(i) = Rnd(seed) n = 1 For n = 1 To ncuts If BaseProb(i) < ccdfprob(1, 1) Then BasePerm(i) = (ccdfvalue(1, 1) - ccdfvalue0) / ccdfprob(1, 1) * _ (BaseProb(i) - ccdfprob(1, 1)) + ccdfvalue(1, 1) Exit For ElseIf BaseProb(i) > ccdfprob(1, n) And BaseProb(i) < ccdfprob(1, n + 1) Then BasePerm(i) = (ccdfvalue(1, n + 1) - ccdfvalue(1, n)) / (ccdfprob(1, n + 1) –
ccdfprob(1, n)) * _ (BaseProb(i) - ccdfprob(1, n)) + ccdfvalue(1, n) Exit For ElseIf BaseProb(i) > ccdfprob(1, ncuts) Then BasePerm(i) = (ccdfvalueat1 - ccdfvalue(1, ncuts)) / (1 - ccdfprob(1, ncuts)) * _ (BaseProb(i) - ccdfprob(1, ncuts)) + ccdfvalue(1, ncuts)
106
Exit For End If Next n 'Debug.Print BasePerm(i) Next i 'determining loss and expected loss---------------------------- i = 1 For i = 1 To Iterations 'Outerloop j = 1 LossEv = 0 For j = 1 To Iterations Err(j) = BasePerm(i) - TruePerm(j) If Err(j) > 0 Then Loss(j) = coeffover1gen * Err(j) ^ 2 + coeffover2gen * Err(j) + coeffover3gen Else Loss(j) = coeffunder1gen * Err(j) ^ 2 + coeffunder2gen * Abs(Err(j)) + coeffunder3gen End If LossEv = LossEv + Loss(j) Next j LossEv = LossEv / Iterations LossArray(i) = LossEv Next i i = 1 MinLoss = LossArray(1) For i = 1 To Iterations If MinLoss > LossArray(i) Then MinLoss = LossArray(i) Index = i Else End If Next i OptPerm = BasePerm(Index) Debug.Print (OptPerm) Debug.Print sisimregindex Open OptOut For Output As #3 Print #3, (Round(OptPerm, 3)) Close #3 End Sub
107
References
Alabert, F.G., Aquitaine, E., and Modot, V. (1992). Stochastic models of reservoir heterogeneity: impact on connectivity and average permeabilities. Paper SPE 24893 presented at the 67th Annual Technical Conference and Exhibition of the Society of Petroleum Engineers. Washington, D.C., U.S.A., Oct. 4-7, 1992.
Andres-Ferrer, J., Ortiz-Martinez, D., Garcia-Varea, I., and Casacuberta, F. (2007). On the use of different loss functions in statistical pattern recognition applied to machine. Science Direct.
Artiles-Leon, N. (1996-97). A pragmatic approach to multi-response problems using loss function. Quality Engineering, 9 (2), 213-220.
Bohling, Geoff (2005). Kriging. Kansas Geological Survey. C&PE 940.
Caers, J. (2001). Direct sequential indicator simulation. Department of Petroleum Engineering, Stanford University, Stanford, USA.
Casteel, Jerry (1997). Increased oil production and reserves from improved completion techniques in the Bluebell Field, Uinta Basin, Utah. Quarterly Technical Progress Report. United States DOE.
Culham, W.E., Farouq Ali, S.M., and Stahl, C.D. (1968). Experimental and numerical simulation of two-phase flow with interphase mass transfer in one and two dimensions. Paper SPE 2187 presented at the 43rd Annual Fall Meeting of the Society of Petroleum Engineers of AIME. Houston, Texas, U.S.A., Sept. 29-Oct. 2, 1968.
Deutsch, C.V. and Journel, A.G. (1998). Geostatistical Software Library and User’s Guide. 2nd Edition. New York: Oxford University Press.
Farouq Ali, S.M. and Nielsen, R.F. (1970). The material balance approach vs. reservoir simulation as an aid to understanding reservoir mechanics. Paper SPE 3080 presented at the 45th Annual Fall Meeting of the Society of Petroleum Engineers of AIME. Houston, Texas, U.S.A., Oct. 4-7, 1970.
Gonzalez, R., Schepers, K., and Reeves, S.R. (2008). Integrated clustering/geostatistical/evolutionary strategies approach for 3D reservoir characterization and assisted history-matching in a complex carbonate reservoir, SACROC unit, Permian Basin. Paper SPE 113978 presented at the 2008 SPE/DOE Improved Oil Recovery Symposium. Tulsa, Oklahoma, U.S.A., April 19-23, 2008.
108
Harris, T.J. (Aug., 1992). Optimal controllers for nonsymmetric and nonquadratic loss functions. Technometrics, American Statistical Association and American Society for Quality, 34(3), 298-306.
Hohn, M.E. (1999). Geostatistics and Petroleum Geology. Boston: Kluwer Academic Publishers.
Jensen, J.L., Lake, L.W., Corbett, P.W.M., and Goggin, D.J. (2000). Statistics for Petroleum Engineers and Geoscientists. 2nd Edition. Boston: Elsevier.
Journel, A.G. (1988). Non-parametric geostatistics for risk and additional sampling assessment. Principles of Environmental Sampling, ed. Larry Keith, American Chemical Society, 45-72.
Journel, A.G. (1989). Fundamentals of geostatistics in five lessons. American Geophysical Union. Washington, D.C.
Kim, Y. (2007). Probabilistic framework-based history matching algorithm utilizing sub-domain delineation and software 'Pro-HMS'. Austin: The University of Texas at Austin.
London, D. and Minc, H. (1972). Eigenvalues of matrices with prescribed entries. Proceedings of the American Mathematical Society, 34 (1), 8-14.
Ma, Y. and Zhao, F. (2004). An improved multivariate loss function. Journal of Systems Sciences and Systems Engineering, 13(3).
Mattax, C.C. and Dalton, R.L. (1990). Reservoir simulation. Journal of Petroleum Technology, 692-695.
Murray, C.J. (1992) Stochastic simulation of hydrocarbon pore volume for risk assessment and economic planning. Paper SPE 25527. Stanford, California, U.S.A., Sept. 8, 1992.
Naimi-Tajdar, R., Han, C., Sepehrnoori, K., Arbogast, T.J., and Miller, M.A. (2006). A fully implicit, compositional, parallel simulator for IOR processes in fractured reservoirs. Paper SPE 100079 presented at the 2006 SPE/DOE Symposium on Improved Oil Recovery. Tulsa, Oklahoma, U.S.A., April 22-26, 2006.
Rukhin, A.L. Estimate loss and admissible loss estimators. Technical Report #85-26. Department of Statistics, Purdue University.
Rukhin, A.L. (Sep., 1988). Loss functions for loss estimation. The Annals of Statistics, Institute of Mathematical Statistics, 16 (3), 1262-1269.
109
Schiozer, D.J. and Aziz, K. (1994). Use of domain decomposition for simultaneous simulation of reservoir and surface facilities. Paper SPE 27876 presented at the Western Regional Meeting. Long Beach, California, U.S.A., March 23-25, 1994.
Sener, I. and Bakiler, C.S. (1989). Basic reservoir engineering and history-match study on the fractured Raman reservoir, Turkey. Paper SPE 17955 presented at the SPE Middle East Oil Technical Conference and Exhibition. Manama, Bahrain, March 11-14, 1989.
Seth, M.S. (1974). A semi-implicit method for simulating reservoir behavior. Paper SPE 4979 presented at the 49th Annual Fall Meeting of the Society of Petroleum Engineers of AIME. Houston, Texas, U.S.A., Oct. 6-9, 1974.
Smith, L.I. (2002). A tutorial on principal components analysis. Cornell University, Ithaca, USA
Srinivasan, S. and Bryant, S. (2004). Integrating dynamic data in reservoir models using a parallel computational approach. Paper SPE 89444 presented at the 2004 SPE/DOE Thirteenth Symposium on Improved Oil Recovery. Tulsa, Oklahoma, U.S.A., April 17-21, 2004.
Srivastave, R.M. (Sep., 1988). An application of geostatistical methods for risk analysis in reservoir management. Paper SPE 20608 presented at the SPE Annual Technical Conference and Exhibition, New Orleans, Louisiana, U.S.A, September 23-26, 1990.
Srivastava, R.M. (1992). Reservoir characterization with probability field simulation. Paper SPE presented at the 67th Annual Technical Conference and Exhibition of the Society of Petroleum Engineers. Washington, D.C., U.S.A., Oct. 4-7, 1992.
Suhr, R. and Batson, R.G. (2001). Constrained multivariate loss function minimization. Quality Engineering, 13 (3), 475-483.
Suresh, S., Sundararajan, N., and Saratchandran P. (2008). Risk-sensitive loss functions for sparse multi-category classification problems. Information Sciences: an International Journal. Vol 178, Issue 12. New York: Elsevier Sicence Inc.
Tran, T.T., Deutsch, C.V., and Xie, Y. (2001). Direct geostatistical simulation with multiscale well, seismic, and production data. Paper SPE 71323 presented at the 2001 SPE Annual Technical Conference and Exhibition. New Orleans, Louisiana, U.S.A., Sept. 30-Oct. 3, 2001.
Wallace, T.D. and Hussain, A. (1969). The use of error components models in combining cross section with time series data. Econometrics, 37 (1), 55-72.
110
Yadav, S. (2006). History matching using face-recognition technique based on principal component analysis. Paper SPE 102148 presented at the 2006 SPE Annual Technical Conference and Exhibition. San Antonio, Texas, U.S.A., Sept. 24-27, 2006.
Yadav, S., Heim, R., Bryant, S., Sinha, R., and May, E. (2007). Optimal region delineation in a reservoir for efficient history matching. Paper SPE 108994 presented at the 2007 SPE Annual Technical Conference and Exhibition. Anaheim, California, U.S.A., Nov. 11-14, 2007.
Yadav, S., Srinivasan, S., Bryant, S.L., and Barrera, A. (2005). History matching using probabilistic approach in a distributed computing environment. Paper SPE 93399 presented at the 2005 SPE Reservoir Simulation Symposium. Houston, Texas, U.S.A., Jan. 31-Feb. 2, 2005.
111
Vita
Donovan Kilmartin, son of Jim and Carol Kilmartin, was born June 11, 1984 in
Silver City, New Mexico. In May 2003, He graduated from Eagle High School, Idaho.
From August 2003 until May 2007, he ran track for University of Texas while earning his
Bachelors in Science in Petroleum Engineer. After graduation, Donovan transitioned
directly into the Petroleum Engineering graduate program. Upon obtaining his Masters in
Science, Donovan will join ExxonMobil in their development planning group.
Permanent address: 2232 Sunny Hills Drive
Austin, TX, 78744
This thesis was typed by the author