Upload
others
View
14
Download
0
Embed Size (px)
Citation preview
Artificial neural networks and conditional stochasticsimulations for characterization of aquifer heterogeneity
Item Type text; Dissertation-Reproduction (electronic)
Authors Balkhair, Khaled Saeed
Publisher The University of Arizona.
Rights Copyright © is held by the author. Digital access to this materialis made possible by the University Libraries, University of Arizona.Further transmission, reproduction or presentation (such aspublic display or performance) of protected items is prohibitedexcept with permission of the author.
Download date 29/12/2020 20:05:14
Link to Item http://hdl.handle.net/10150/284451
INFORMATION TO USERS
This manuscript has been reproduced from the microfilm master. UMI films the
text directly fi om the original or copy submitted. Thus, some thesis and
dissertation copies are In typewriter face, while others may be from any type of
computer printer.
The quality of this reproduction is dependent upon the quality of the copy
submitted. Broken or indistinct print, colored or poor quality Illustrations and
photographs, print bleedthrough, substandard margins, and Improper alignment
can adversely affect reproductlon.
In the unlikely event that the author did not send UMI a complete manuscript and
there are missing pages, these will be noted. Also, if unauthorized copyright
material had to be removed, a note will indicate the deletion.
Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning
the original, beginning at the upper left-hand comer and continuing firom left to
right in equal sections with small overiaps. Each original is also photographed in
one exposure and Is included in reduced form at the back of the book.
Photographs included in the original manuscript have been reproduced
xerographically in this copy. Higher quality 6" x 9" black and white photographic
prints are available for any photographs or illustrations appearing in this copy for
an additional charge. Contact UMI directly to order.
Bel! & Howell Information and Learning 300 North Zeeb Road, Ann Artxjr, Ml 48106-1346 USA
800-521-0600
ARTIFICIAL NEURAL NETWORKS AND CONDITIONAL STOCHASTIC SIMULATIONS
FOR CHARACTERIZATION OF AQUIFER HETEROGENEITY
By
Khaled Saeed Baikhair
A Dissertation Submitted to the Faculty of the
DEPARTMENT OF HYDROLOGY AND WATER RESOURCES
In Partial Fulfillment of the Requirements For the Degree of
DOCTOR OF PHILOSOPHY WITH A MAJOR IN HYDROLOGY
In the Graduate College
THE UNIVERSITY OF ARIZONA
1999
UMI NxJinber: 9934843
UMI Microform 9934843 Copyright 1999, by UMI Company. All rights reserved.
This microform edition is protected against unauthorized copying under Title 17, United States Code.
UMI 300 North Zeeb Road Ann Arbor, MI 48103
2
THE UNIVERSITY OF ARIZONA ® GRADUATE COLLEGE
As members of the Final Examination Committee, we certify that we have
read the dissertation prepared by Khaled Saeed Balkhair
entitled Artificial Neural Networks and Conditional Stochastic
.<^TTnTtlations for Characterization of Aquifer Heterogeneity
and recommend that it be accepted as fulfilling the dissertation
requirement for the Degree of Doctor of Philosophy
i-f I Ijlj Dat/e -
Thoma
Mac Nish
Date
Date
—Peter TT Wierenga
David M. Hendricks
Date I
Date
Final approval and acceptance of this dissertation is contingent upon the candidate's submission of the final copy of the dissertation to the Graduate College.
I hereby certify that I have read this dissertation prepared under my direction and recommend that it be accepted as fulfilling the dissertation requirement.
Dissertation Director-^Lucien Duckstei^
-b Date
3
STATEMENT BY AUTHOR
This dissertation has been submitted in partial fulfillment of requirements for an advanced degree at The University of Arizona and is deposited in the University Library to be made available to borrowers under rules of the Library.
Brief quotations from this dissertation are allowable without special permission, provided that accurate acknowledgment of the source is made. Requests for permission for extended quotation from or reproduction of this manuscript in whole or in part may be granted by the head of the major department or the Dean of the Graduate College. In all other instances, however, permission must be obtained from the author.
SIGNED;
4
ACKNOWLEDGEMENTS
I have had the pleasure of knowing my advisor Professor Lucien Duckstein since the fall of 1996 when I took his course in Fuzzy Logic. His enthusiasm, encouragement, and advice were instrumental in helping me complete my dissertation and degree requirements. I would also like to thank my other committee members. Dr. Thomas Maddock IH, Dr. Robert MacNish, Dr. Peter Wierenga, and Dr. David Hendricks for offering ideas and suggestions pertinent to my research both during and after my oral comprehensive exam.
Gratitude is extended to Professor Allan Gutjahr of New Mexico Tech.; our discourses on stochastic approach were highly beneficial. I also extend this gratitude to his former student. Dr. Debra Hughson, for sharing with me her research experience, which helped facilitate implementation of the stochastic approach.
I am greatly indebted to the financial sponsorship provided by my government through the Saudi Culture Mission. This assistance enabled me to focus primarily on my coursework and research.
Special thanks to my friend and classmate Emery Coppola Jr. for helping me reviewing parts of the dissertation manuscript. His suggestions were very helpful.
I feel particularly formnate to be a graduate of this great school and outstanding Department. There are many names to mention which deserve special thanks. They include Dr. Donald Davis for his continual support in providing useful references; his office door was always open to me; Teresa Handloser, Academic Advisor, for providing useful administrative advice; Department secretaries Frances Janssen, Chris Wenger, and Mary Nett who lent me all sorts of support and help when needed.
I would like to express my deepest gratitude to my parents for their continual support and encouragement; my wife for her patience, care and understanding throughout the tenure of my graduate study; and the rest of my family for their kind words of encouragement.
DEDICATION
To
My parents
My wife
My children
And
My Family
TABLE OF CONTENTS
LIST OF HGURES
LIST OF TABLES 1
ABSTRACT 1
L INTRODUCTION 1
1.1 Background 1
1.2 Objectives 2
1.3 Contents of the Dissertation 2
2. SPATIAL HETEROGENEITY AND RANDOM FIELDS 2
2.1 Heterogeneity and Stochastic Process 2
2.2 Geostatistical Representation of Spatial Variability 3
2.3 Random Processes and Random Fields 3
2.4 Stationarity and Ergodicity of Random Processes 3
2.5 Random Field Generators (RFG) 3
2.5.1 Spectral Representation of Random Variables 4
2.5.1.1 Unconditional two-dimensional Random
Field Generation 4
2.5.1.1 Conditional two-dimensional Random Fields 4
3. STOCHASTIC MODEL DEVELOPMENT 5
3.1 Introduction 5
3.2 Mathematical Development of the Stochastic Model 5
3.2.1 Spectral Representation of the Flow Equation 5
3.2.2 Ln[T] Spectra and Covariance Function 5
3.2.3 Head Variance and Covariance Functions 6
3.2.4 Cross-Covariance Functions 6
3.3 Conditioning 6
7
TABLE OF CONTENTS - CONTINUED
3.3.1 Cokriging 64
3.3.3 Iterative Conditioning Procedure 68
4. NEURAL NETWORKS 72
4.1 Introduction 72
4.2 Basis of Neural Networks 76
4.3 Artificial Neurons 79
4.4 Artificial Neural Networks 83
4.5 Example of Artificial Neural Network 85
4.6 Learning and Recall 88
5. BACK-PROPAGATION NEURAL NETWORK 91
5.1 General 91
5.2 Widrow-Hoff Delta Learning Rule 92
5.3 Multi-layer Back-propagation Training 98
5.3.1 Calculation of Weights for the Output-Layer Neurons 102
5.3.2 Calculation of Weights for the Hidden Layer Neurons 106
5.4 Momentum Ill
6. STOCHASTIC AND NEURAL NETWORK MODEL APPLICATIONS 113
6.1 Methodology 113
6.2 Iterative Conditional Simulation (ICS) 117
6.3 Neural Network Simulation 121
6.3.1 Development of training and testing patterns 122
7. RESULTS AND DISCUSSION 128
8. CONCLUSIONS AND RECOMMENDATIONS 175
8.1 Conclusions 175
8.2 Recommendations for future work 179
TABLE OF CONTENTS - CONTLNUED
REFERENCES
9
LIST OF FIGURES
Figure 3.1 DIustration of conditioning of a random field on data 69
Figure 4.1 Sketch of Biological Neuron 78
Figure 4.2 Sketch of Artificial Neuron 80
Figure 4.3 Transfer Functions for Neurons 82
Figure 4.4 Architecture of Neural Network 84
Figure 4.5 Feed-Forward Neural Network 86
Figure 5.1 Sketch of an artificial neuron 93
Figure 5.2 Derivatives of Logistic Function 95
Figure 5.3 Sketch of Neuron without Activation Function 97
Figure 5.4 Sketch of multilayer back propagation neural network 101
Figure 5.5 Layers of neurons for calculating weights at output
during training 103
Figure 5.6 Representation of neurons for calculating the change
of weight in the hidden layer 108
Figure 6.1 Hypothetical Aquifer Layout 116
Figure 6.2 Uniform sampling scheme of transmissivities and head data 118
Figure 6.3 Non-uniform sampling scheme of transmissivities
and head data 119
Figure 6.4 Architecture of f-ANN 124
Figure 6.5 Architecture of fh-ANN 125
2
Figure 7.1 True and estimatedf fields for cr^- = 1.0, (RF#1) 130
2
Figure 7.2 True and estimated f fields for CTy = l .0, (RF#2) 131
2
Figure 7.3 True and estimated/fields for cr^ = l.O, (RF#3) 132
10
LIST OF FIGURES - CONTINUED
Figure 7.4 True vs. estimated ICS, fh-ANN, and f-ANN/fields
2 for = 1.0,(RF#1, 2, and 3) 134
2
Figure 7.5 True and estimated/fields for O"/ = 2.0, (RF#1) 135
2 Figure 7.6 True and estimated /fields for cT/- = 2.0, (RF#2) 136
2
Figure 7.7 True and estimated/fields for Of/ = 2.0, (RF#3) 137
Figure 7.8 True vs. estimated ICS, fh-ANN, and f-ANN/fields
2 for a J- = 2.0, (RF#1, 2, and 3) 138
2 Figure 7.9 True and estimated/fields for = 5.0, (RF#1) 139
2
Figure 7.10 True and estimated/fields for '^f = 5.0, (RF#2) 140
2 Figure 7.11 True and estimated/fields for = 5.0, (RF#3) 141
Figure 7.12 True vs. estimated ICS, fli-ANN, and f-ANN/fields
2
for a J- = 5.0, (RF#1, 2, and 3) 143
2
Figure 7.13 True and simulated h fields for <^/ = 1-0, (RF#1) 144
2
Figure 7.14 True and simulated h fields for cr^ = 1.0, (RF#2) 145
2
Figure 7.15 True and simulated h fields for ^f — l-O, (RF#3) 146
Figure 7.16 True vs. simulated ICS, fh-ANN, and f-ANN h fields
2 for CTj. = 1.0, (RF#1, 2, and 3) 147
I I
LIST OF FIGURES - CONTINUED
2
Figure 7.17 True and simulated h fields for CT/ = 2.0, (RF#1) 148
2
Figure 7.18 True and simulated h fields for cT/ = 2.0, (RF#2) 149
2
Figure 7.19 True and simulated h fields for cT/ = 2.0, (RF#3) 150
Figure 7.20 True vs. simulated ICS, fh-ANN, and f-ANN h fields
2
for af = 2.0, (RF#1, 2, and 3) 151
2 Figure 7.21 True and simulated h fields for = 5.0, (RF#1) 152
2
Figure 7.22 True and simulated h fields for <J = 5.0, (RF#2) 153
2
Figure 7.23 True and simulated h fields for cT/ = 5.0, (RF#3) 154
Figure 7.24 True vs. simulated ICS, fh-ANN, and f-ANN h fields
2
for CTf = 5.0, (Ea^#l, 2, and 3) 155
Figure 7.25 True vs. simulated ICS, fh-ANN, and f-ANN head at sampled
2
locations for = l.Q, (RF#1, 2, and 3) 156
Figure 7.26 True vs. simulated ICS, fh-ANN, and f-ANN head at sampled
2
locations for <^/ = 2.0, (RF#1, 2, and 3) 157
Figure 7.27 True vs. simulated ICS, fh-ANN, and f-ANN head at sampled
2
locations for orj- = 5.0, (RF#1, 2, and 3) 158
2 Figure 7.28 Mean Square Error (MSE) of 100/fields for cr^ =1.0 160
2
Figure 7.29 Mean Square Error (MSE) of 100/fields for cr^ = 2.0 161
2
Figure 7.30 Mean Square Error (MSE) of 100/fieIds for = 5.0 162
12
LIST OF FIGURES - CONTEWED
2
Figure 7.31 Mean Square Error (MSB) of 100 A fields for cf = 1.0 163
2
Figure 7.32 Mean Square Error (MSE) of 100 fields for (J^ =2.0 164
2
Figure 7.33 Mean Square Error (MSE) of 100 h fields for cr^- = 5.0 165
Figure 7.34 Mean Square Error (MSE) fluctuations during training 168
Figure 7.35 True vs. estimated ICS, fli-ANN, and f-ANN mean/fields 171
Figure 7.36 True vs. simulated ICS, fli-ANN, and f-ANN mean h fields 172
Figure 131 True vs. estimated ICS, fli-ANN, and f-ANN
variance of/fields 173
Figure 7.38 True vs. simulated ICS, fli-ANN, and f-ANN
variance oih fields 174
13
LIST OF TABLES
Table 7.1 Scenarios considered in the model applications 129
Table 7.2 MSE of ANNs at the end of 15,000 iterations for
different variances 169
14
Abstract
Although it is one of the most difficult tasks in hydrology, delineation of aquifer
heterogeneity is essential for accurate simulation of groundwater flow and transport.
There are various approaches used to delineate aquifer heterogeneity from a limited data
set, and each has its own difficulties and drawbacks. The inverse problem is usually used
for estimating different hydraulic properties (e.g. transmissivity) from scattered
measurements of these properties, as well as hydraulic head. Difficulties associated with
this approach are issues of indentifiability, uniqueness, and stability. The Iterative
Conditional Simulation (ICS) approach uses kriging (or cokriging), to provide estimates
of the property at unsampled locations while retaining the measured values at the
sampled locations. Although the relation between transmissivity (7) and head (Ji) in the
governing flow equation is nonlinear, the cross covariance function and the covariance of
h are derived from a first-order-linearized version of the equation. Even if the log
transformation of T is adopted, the nonlinear nature between/(mean removed Ln[T]) and
h still remains. The linearized relations then, based on small perturbation theory, are valid
only if the unconditional variance of /is less than 1.0. Inconsistent transmissivity and
head fields may occur as a result of using a linear relation between T and h.
In this dissertation. Artificial Neural Networks (ANN) is investigated as a means
for delineating aquifer heterogeneity. Unlike ICS, this new computational tool does not
rely on a prescribed relation, but seeks its own. Neural Networks are able to learn
arbitrary non-linear input-output mapping directly from training data and have the very
advantageous property of generalization.
15
For this study, a random field generator was used to generate transmissivity fields
from known geostatistical parameters. The corresponding head fields were obtained using
the governing flow equation. Both T and h at sampled locations were used as input
vectors for two different back-propagation neural networks designed for this research.
The corresponding values of transmissivities at unsampled location (unknown),
constituting the output vector, were estimated by the neural networks. Results from the
ANN were compared to those obtained from the (ICS) approach for different degrees of
heterogeneity. The degree of heterogeneity was quantified using the variance of the
transmissivity field, where values of 1.0, 2.0, and 5.0 were used. It was found that ANN
overcomes the limitations of ICS at high variances. Thus, ANN was better able to
accurately map the highly heterogeneous fields using limited sample points.
16
1. INTRODUCTION
1.1 Background
The movement of water through natural earth materials is governed not only by the
distribution of inflow in time and space, but also by the nature of the material through
which the water flows. Atmospheric variations and geologic processes, over longer term,
create earth materials that have highly variable hydraulic properties. Such variations in
subsurface conditions are reflected in typical subsurface hydrologic observations.
The variation in hydraulic conductivity is often quantified in terms of standard
deviation of the logarithm of the hydraulic conductivity. Data compilations cited in many
different types of aquifers and soils [Freeze, (1975); Delhomme, (1979), Harr, (1987)]
indicate that standard deviation of the natural log of hydraulic conductivity, may range
from 0.4 to 4. Similar variation is observed for the depth-averaged transmissivity
[Delhomme, (1979)]. These ranges of variations indicate that the hydraulic properties of
many aquifers and soils will vary over several orders of magnitude, even in a given
aquifer or soil type. This suggests that any scheme used to quantify the variability in time
and/or space in a subsurface-flow system must consider the scale of variations in the
earth materials. The scale of variability of hydraulic conductivity may be as low as a
fraction of meter, and time variations over a period of hours or days may be significant in
many aquifers and soils. Most hydrologic investigations deal with applications that
require predictions of the quantity or quality of water over scales that are much larger
than the scale of variability discussed.
In the last two decades, research efforts focused towards developing general
quantitative concepts to describe the spatial variability (heterogeneity) of subsurface
porous media. Much of this research applied a geostatistical approach in a stochastic
framework (modeling). The overall goal of stochastic modeling in subsurface hydrology
is to develop methods that can be used to quantify large-scale flow and transport in
complex, naturally variable, subsurface flow system. In this sense, the concern would be
devoted to the dominant large-scale effects that could be reflected in a solution for the
mean behavior. In many cases, it is found that the mean behavior is very similar to the
classical deterministic descriptions.
Yeh (1992) and Yeh and McCord (1994) provided an overview of several
stochastic approaches developed in the last few years for modeling water flow and solute
transport in heterogeneous aquifers, they classified them into two main categories:
homogeneous or effective parameters and heterogeneous approaches. Most of these
models are known to be valid only if the spatial heterogeneity of the soil is moderate and
are limited to a relatively simplified analytical models, which are discussed below.
The effective parameter approach assumes that the heterogeneous geologic
formation can be homogenized to obtain effective parameters with which one can predict
the ensemble behavior of the flow and transport processes. Ensemble is defined as a
collection of all possible values of the variation of a random variable in a stochastic
process. Examples of such studies include those by [Gelhar and Axness (1983), and
Dagan (1988)] for saturated porous media, and those by [Yeh et al. (1985a,b,c);
Mantoglou and Gelhar (1987a,b,c); Russo and Dagan (1991a); and Russo (1993 a,b)] for
unsaturated media. Although these studies have contributed to a better understanding of
flow and transport in heterogeneous aquifers, the major drawback of these approaches is
that it predicts only the ensemble behavior of the aquifer, which can be quite different
from that of a particular realization (single possible random field) encountered in reality.
Theoretically, the ensemble behavior approaches the behavior of a single realization after
flow encounters heterogeneity of different scales over a large portion of the aquifer.
Such a theory appears valid for mildly heterogeneous aquifer at certain scales
[Sudicky (1986); and Garabedian et al. (1991)]. As the process grows and encounters
heterogeneities of different scales, the validity of this theory is questionable. Therefore,
for field applications, the predictive ability of the equivalent homogeneous approach is
considered limited and the prediction based on the equivalent homogeneous approach
involves large uncertainties.
The heterogeneous approach is designed to consider the nature of spatial
variability of hydrologic properties of the aquifer with limited amount of data. Methods
in this approach generally consist of geostatistics, Monte Carlo simulation, and
conditional simulation. Geostatistics is a mathematical interpolation and extrapolation
tool, which uses the spatial Statistics of the data set to estimate the property at unsampled
locations. The cokriging approach is computationally economic and is often considered a
more practical and realistic approach than the effective parameter approach. Although
hydraulic head and transmissivity fields derived from cokriging have been found to be
reasonable, there is no guarantee these estimates satisfy the principle of conservation of
mass [Harter and Yeh (1993); and Yeh et al. (1995,)]. The fact that kriged/cokriged map
19
is inevitably smoother than the true map can not be ignored. Conditional simulation,
which is a generalization of kriging or cokriging, provides a solution to this problem
[Matheron (1973); and Delhomme (1979)].
Monte Carlo simulation is the most intuitive approach for dealing with spatial
variability in a stochastic sense. Although it belongs to the heterogeneous approach since
hydraulic property at every point in the aquifer is specified, it is, in principle, equivalent
to the effective parameter approach. Both Monte Carlo simulation and the effective
parameter approach derive the mean and variance of the hydraulic head, but Monte Carlo
simulation requires fewer assumptions, and it can predict shape of frequency distribution
of the output variables. The principle of Monte Carlo simulation is straightforward. First
it assumes that the probability distribution of the parameter and its covariance function
are either known or can be derived firom available field data. Using some mathematical
techniques, it generates many possible realizations of the hydraulic conductivity field that
conform to the assumed probability distribution and the covariance function. Each
generated realization is inputted into the flow model from which the corresponding
hydraulic head distribution is determined. Finally, the head distribution resulting from all
realizations are used to determine its expected value, variance, covariance, and
distribution. Typical examples of studies using this approach can be found in Freeze
(1975), and Smith and Freeze (1979).
Conditional simulation is an approach that combines geostatistics and Monte
Carlo simulation. Unlike Monte Carlo simulation, it provides only a subset of all possible
realizations of the hydrologic property, which consists of the values of the properties at
sample locations and confirms with a predefined spatial statistics of the hydrologic
property. In this context, realizations that do not agree with measured values at the
sampled locations are discarded. Because the conditional simulation includes the data
values at the sampled location and all possible values at the unsampled locations, the
conditional simulation is considered the most rational approach for dealing with
uncertainties in heterogeneous geologic formations, [Yeh (1992) and Yeh and McCord
(1994)]. The complete theory of conditional simulation is given by Matheron (1973) and
Joumel and Huijbregts (1978).
A great challenge facing hydrologists today is posed by the inverse problem of
estimating aquifer hydraulic properties, (e.g. transmissivity (J) from scattered
measurements of these properties and hydraulic head (^. The difficulties with the inverse
problem are associated with issues of identifiability, uniqueness, and stability, as
discussed by W. Yeh (1986); and Carrera and Neuman (1986b). The problem is further
complicated by the consensus recognition of the inherent heterogeneities of an aquifer's
hydraulic properties, the nonlinear relationships between T and (p, and the fact that
observations of T and ^ are usually limited.
Various methods have been developed to solve the inverse problem given scattered
head and conductivity or transmissivity measurements. One popular method among these
is the minimum-output-error based approaches [Yeh and Tauxe (1971); Gavalas et al.
(1976); Willis and Yeh (1987); Cooley (1982); Neuman and Yakowitz (1979); Neuman
(1980); Clifton and Neuman (1982); and Carrera and Neuman (1986a,b)]. A shortcoming
of this approach is that the identity of the estimate is often undefined. In other words, it is
unclear what the transmissivity and head fields derived from these methods represent in
the case where only scattered head and transmissivity measurements are given. Being
unable to ascertain their identities, this approach suffers from the same difficulty as any
manual model calibration approaches, e.g. Yeh and Mock (1996), because the uncertainty
associated with the output can not be addressed.
The geostatistical approaches, [Kitanidis and Vomvoris (1983); Hoeksema and
Kitanidis (1984); Dagan (1985a); Rubin and Dagan (1987); and Gutjahr and Wilson
(1989)], have received increasing attention recently. The geostatistical approaches to the
inverse problem rely on the use of kriging/cokriging estimation techniques. For example,
Kriging is defined as the best linear unbiased predictor which provides the estimates of
the property at unsampled locations but retains the sample values at locations where the
values are known, i.e., kriging is a special type of conditional expectation. In the case of
simulation of flow in large-scale aquifers where only limited amounts of hydraulic
conductivity data are collected, geostatistics is generally found to be useful [Gutjahr
(1981) and Clifton andNeuman (1982)].
Similarly, in groundwater hydrology, cokriging using head and transmissivity
values at sample locations, has been used to estimate the unknown transmissivity and/or
hydraulic head values at other locations, [Ahmed and de Marsily (1993); Kitanidis and
Vomvoris (1983); Hoeksema and Kitanidis (1984); Gutjahr and Wilson (1989); and
Hoeksema and Kitanidis (1989)]. The resultant hydraulic head and transmissivity fields
are then used to simulate the solute transport. Cokriging is based explicitly on the
statistical characterization of the spatial variability of aquifer log transmissivity, Ln[T].
The idea is to take advantage of the spatial continuity of the Ln[T] field implied by a
covariance function or variogram, and to make use of the linearized relationship between
Ln[T] and (f> implied by the stochastic flow equation. In cokriging, the unknown/ (mean
removed Ln[TX' value at a point of interest is estimated by a weighted linear combination
of the observed f and A(mean removed (f). The weights are determined by requiring that
the estimator be unbiased and have minimum variance. By casting the problem in a
probabilistic framework, Dagan [1982, 1985] and Rubin and Dagan (1987) showed that
when the random transmissivity/and head h fields are jointly Gaussian (or multivariate
normal), with known mean and covariance, cokriging estimate and cokriging covariance
are equivalent to the conditional mean and conditional covariance of the new joint
probability distribution function conditioned on the measurements.
The classical cokriging is a linear predictor. In addition, the cross covariance
function between/and h and the covariance of h required in cokriging are derived from a
first-order linearized version of the governing flow equation [Mizell et al. (1982);
Kitanidis and Vomvoris (1983); Hoeksema and Kitanidis (1984),(1989)], while the
relation between T and ^ is nonlinear. Even if the log transformation of T is adopted, the
nonlinear nature between / and h still remains. The linearized relations, based on small
perturbation theory, are valid only if the unconditional variance of/is less than 1.0. The
nonlinearly in the flow equation implies that in general h will not be normal, and/ and h
will not be jointly normal, even if / is normal. As a result, the use of classical
geostatistical techniques is not justified.
23
A major problem with the classical geostatistical approaches is that they often
produce inconsistent T and ^ estimates [Yeh et al. (1993b), (1995a)]. If cokriging/co-
conditional simulation is used to obtain both Ln[T] and (f) without solving the flow
equation, the resulting velocity field is prone to serious mass balance errors, especially
for highly heterogeneous aquifers or under highly non-uniform flow conditions. This is
clearly undesirable in transport modeling. On the other hand, if cokriging/co- conditional
simulation is employed to obtain Ln[T], and the flow equation is solved for the head and
velocity fields, the resulting numerical solution of (f) is not guaranteed to be, at least
approximately, consistent with measured values of <f) at mejisurement locations, [Harter
and Yeh (1994)].
Carrera and Glorioso (1991) made a comparison between the classical cokriging
approach and an iterative statistical inverse approach. They concluded that basic
hypotheses are similar for the two formulations and main differences stem from
linearization performed about the estimated mean in cokriging methods and around the
estimated Ln[T] in iterative statistical approach. As a result, the latter is less constrained
by linearity than the former and lead to better estimates and more consistent estimation
covariance matrices. However, the identity of the estimate remains unknown.
Yeh et al. (1995a) developed an iterative co-kriging-like approach that combines
the cokriging and numerical flow model to estimate transmissivity based on observed
transmissivities and hydraulic heads in the saturated aquifer. The method suffers from the
drawback that the iterative process often diverges for large domain or when the number
24
of head measurements is small. An altemative iterative co-conditional simulation
approach was suggested by Gutjahr et al. [1993, 1995]. This iterative approach is more
computationally efficient than the one proposed by Yeh et al. [1993, 1995]. However, it
does not guarantee that the iterative conditional head values are going to converge to the
measured head values at sampling locations.
In light of the above discussion on geostatistical and stochastic approaches, one can
say that in many real world problems, precise solutions do not exist. In these cases,
acquiring knowledge by example may be the best altemative. In other words, if it is not
possible to describe the logic behind a problem or predict behavior with analytical or
numerical solutions to governing equations, traditional predictive analysis will be
difficult.
As an altemative solution. Artificial Neural Network (ANN) will be investigated in
this dissertation. This new tool does not rely on a prescribed relation, but seeks its own,
and may be superior to the traditional predictive tools. Previous research has shown that
the iterative conditional simulation (ICS) outperforms cokriging [Yeh et al. (1996) and
Hanna S. (1995)]. In this research, the predictive capabilities of the neural network will
be compared to the ICS approach.
Early work in the ANN technology was done by Rosenblatt [1958, 1962] on the
perceptron and by Widrow and Hoff (1960) on the AD ALINE. Rosenblatt demonstrated
that a MeCulloch and Pitts (1943) neuron could be trained to solve any linearly separable
problem, i.e., separated by a hyper-plane [Rumelhart et al., 1986], in a finite number of
steps. He called this trained device a perceptron. Widrow developed the device
25
AD ALINE (Adaptive Linear Element, originally the Adaptive Linear Neuron) which can
be used as a signal-processing filter. This device has a long history of successful
applications including echo elimination, automated control, and antenna array
adjustment. See Widrow and Steams (1985) for a discussion.
Minsky and Papert (1969) are often credited with (or accused of!) fostering a
pessimistic outlook in the one- or two-layered perceptron networks, which precipitated a
dark age for ANN that lasted until the early 1980s. Rumelhart et al. (1986) and
MeClelland et al. (1986) are often credited with leading the modem renaissance in ANN
technology. The criticisms that Minsky and Papert directed at the perceptron neural
networks were correct. However, the addition of more complexity into the networks,
specifically in adding a middle (hidden) layer to a multi-layer perceptron network,
together with a clear explanation of the back propagation learning algorithm, overcame
many of the limitations of the one- or two-layered perceptron neural networks.
Since 1986, the variety of ANNs has rapidly expanded. In their now classic work
Rumelhart et al. (1986) and McClelland et al. (1986) described four types of ANNs. One
year later, Lippman (1987) reviewed six types of ANNs at the First International
Conference on Neural Networks. A few months later, Hecht-Nielsen (1988) described 13
ANNs. In 1988, Simpson circulated a review of 26 types of ANNs that was later
published as a book [Simpson, (1990)]. Maren et al. (1990) described about 24 ANNs, of
which about 12 differ from those described by Simpson (1990). Maren (1991) suggested
that the rapid increase is continuing, with the number of ANNs now approaching 48.
26
There are currently a vast array of ANN applications in the cognitive sciences, the neuro-
sciences, engineering, computer science, and the physical sciences.
It may be helpful to think of an ANN as a nonparametric, nonlinear regression
technique. In traditional regression, one must decide a priori on a model to which the data
will be fitted. The ANN approach is not as restrictive, because the data will be fitted with
the best combination of nonlinear or linear functions as necessary, without the searcher
rigidly pre-selecting the form of these functions. Neural networks can organize data into
the vital aspects or feamres that enable one pattem to be distinguished from another. This
quality of a neural network stems from its adaptability in learning by example and leads
to its ability to generalize. Generalization may be thought of as the ability to abstract or to
respond appropriately to input patterns from those involved in the training of the network.
1.2 Objectives
The main objective of this study is to introduce an Artificial Neural Network
(ANN) and explore its feasibility as a new tool in computational hydrology. This tool
attempts to leam, capture, and then map conditionally generated random fields of aquifer
parameter values.
The second objective is to build ANN models capable of learning effectively, by
example, the large-scale natural variability in subsurface earth materials, and to
efficiently reproduce as output the randomly generated aquifer parameter fields.
The third objective is to identify the strengths and weaknesses of the ANN method
for characterizing variability in aquifer parameter fields, and compare it to the Iterative
Conditional Simulation approach.
1.3 Contents of the Dissertation
Chapter 2 discusses spatial heterogeneity and its geostatistical representation, along
with spectral representations of random variables. In addition, the concepts of random
processes, random fields, and two-dimensional conditional and unconditional random
field generators are introduced. It presents some definitions of probability terms,
including random variables, random fields and their generations, and stochastic process
terms used in the stochastic approach. Chapter 3 presents the stochastic approach
methodology used in this study, along with a mathematical development of the stochastic
model. Chapter 4 introduces artificial neural networks, their basis, mechanism, and
necessary elements. Chapter 5 provides a mathematical development of feed-forward
back-propagation neural network and its algorithm. Chapter 6 presents applications of
both a stochastic model and a back-propagation neural network to a hypothetical aquifer.
Chapter 7 presents and discusses results of the applications. Finally, Chapter 8 presents
conclusions derived from this study, as well as recommendations for future studies.
28
2. SPATIAL HETEROGENEITY AND RANDOM FIELDS
2.1 Heterogeneity and Stochastic Process
Spatial heterogeneity refers to the variation of a physical property in two or three-
dimensional space. This physical variation is encountered in many earth science
applications; it is of particular interest when studying flow and transport processes in the
saturated and unsaturated zone. When examining soil media, spatial heterogeneity is
observed on many different scales such as the micro-scale of a single pore, the
intermediate scale of laboratory experiments, the scale of field experiments, and the
mega-scale, which encompasses entire regions. This work is not concerned with the
spatial heterogeneity on the micro-scale or pore scale because the governing physical
laws for porous media flow are only valid on a scale larger than the micro-scale.
Bear (1972) defined "Representative Elementary Volume"' as the smallest volume
over which there is a constant "effective" proportionality factor between the flux and the
total pressure gradient or total head gradient. This proportionality factor is called the
hydraulic conductivity of the REV. By definition of the REV, the hydraulic conductivity
does not rapidly change as the volume to which it applies is increased to sizes larger than
the REV. This is based on the conceptual notion that either no heterogeneity is
encountered at a scale larger than the REV or that heterogeneity occurs on distinctly
scales, the smallest of which is the REV [de Marsily, (1986)]. The latter model assumes
that within each scale relatively homogeneous regions exist. Within these homogeneous
units heterogeneities can only be defined on a significantly smaller scale. Geologists refer
to these different scales as fades [Anderson, (1991)] while hydrologists commonly speak
in terms of hydrologic units [Neuman, (1991)]. Analysis of a large number of hydrologic
and geologic data from different sites associated with different scales has shown that the
existence of discrete hierarchical scales for any particular geologic or hydrologic system
vanishes in the global view as the multitude of different geologic or hydrologic units
allows for a continuous spectrum of scales [Neuman, (1990)].
For the scale of the REV, mathematical models based on the physics of flow and
transport in homogeneous porous media have been well-established in the literature and
their accuracy has been verified in many laboratory experiments [Hillel, (1980)]. The
physical meaning of the underlying model parameters is already well understood [Jury,
(1991)].
Depending on the problem formulation, spatially heterogeneous properties can
belong to either measurable or predictable porous media property. Measurable properties
that are seen as the cause of flow and transport behavior in soils such as pore geometry,
the saturated permeability of the soil, the soil textural properties, etc. Predictable
properties are usually based on physical laws or functions. This dissertation deals with
spatial heterogeneity that is predictable given some knowledge the heterogeneity of
measurable properties.
Stochastic analysis is closely associated with the theory of random processes,
which is a branch of mathematics called probability theory. Probability theory itself is a
branch of mathematics called measure theory. "Probability theory and measure theory
both concentrate on functions that assign real numbers to certain sets in an abstract
30
space according to certain rules." [Gray and Davisson, (1986), p.27]. The treatment of
spatial heterogeneity in terras of random processes is a highly abstract procedure, the
appropriateness of which has been questioned. However, this treatment is justified by
Athanasios Papoulis [(19840, p. xi]:
"Scientific theories deal with concepts, not with reality. All theoretical results are
derived from certain axioms by deductive logic. In physical sciences the theories
are so formulated as to correspond in some useful sense to the real world,
whatever that may mean. However, this correspondence is approximate, and the
physical justification of all theoretical conclusions must be based on some form of
inductive reasoning."
For a complete derivation of the concepts of random variables, random processes,
and stochastic differential equations there is a vast amount of literature that has been
published in this area for many different applications [see e.g. Gray and Davisson,
(1986); Papoulis, (1984); Priestley, (1981)].
2.2 Geostatistical Representation of Spatial Variability
Aquifers are inherently heterogeneous at various observation scales.
Characterizing the heterogeneity at a scale of our interest generally requires information
of hydrologic properties at every point in the aquifer. Such a detailed hydraulic property
distribution in aquifers requires numerous measurements, considerable time, and great
expense, and is generally considered impractical and infeasible. The alternative is to
utilize a small number of samples to estimate the variability of parameters in a statistical
31
framework. That is, the spatial variation of a hydraulic property is characterized by its
probability distribution estimated from samples. Law (1944) and Bennion and Griffiths
(1969) reported that the distribution of porosity data in an aquifer is normal. Hoeksema
and Kitanidis (1985) suggested that the spatial distribution of storage coefiRcient might be
log-normal. Hydraulic conductivity distributions are usually reported to be log-normal,
[Law (1944); Bulness (1946); Bakr (1976); de Marsily (1986); Sudicky (1986) and
Jensen et al. (1987)].
Based on such a statistical approach. Freeze (1975) treated hydraulic conductivity
as a random variable and analyzed the uncertainty in groundwater flow modeling.
However, recent analyses of hydraulic conductivity data showed that, although the
hydraulic conductivity values vary significantly in space, the variation is not entirely
random, but correlated in space [Bakr (1976); Byers and Stephens (1983); Hoeksema and
Kitanidis (1985); and Russo and Bouron (1992)). Such a correlated nature implies that
the parameter values are not statistically independent in space and they must be treated as
a stochastic process, instead of a single random variable.
To illustrate the stochastic conceptualization of the spatial variability of
hydrologic parameters, the hydraulic conductivity data measured along a vertical
borehole are used as an example. The value of hydraulic conductivity at a point JC , i =
1,2,3,...,n, along the bore hole can be conceptualized as one of many possible geological
materials that may have been deposited at that given point. Thus, the hydraulic
conductivity at that point is a random variable, K(X^,Q)) . The o) indicates that there are
many possible values of K at x.. As a result, hydraulic conductivity values of the entire
32
depth of the borehole may be considered as a collection of many random variables in
space. 'Ez.chK{xI,co) has a probability distribution, which may be interrelated. The
probability of finding a particular sequence of hydraulic conductivity values along the
borehole, ), depends not only on the probability distribution of the hydraulic
conductivity at one location, but also on those at other locations. This implies that, actual
hydraulic conductivity values along the borehole are one possible sequence
of ATCXjjty,) out of all the possible sequences, K{Xf,CQ) . In the vocabulary of
stochastic processes, the probability of finding that sequence is then defined as the joint
probability distribution or joint distribution. All these possible sequences are called an
ensemble, and a realization refers to any one of these possible sequences.
The covariance C,^-of two random variables x,.and x^, is a measure of the
physical correlation between these variables and is a second moment defined as:
where m,- and rtij are the mean of x- and Xj respectively.
If £ = j, the covariance function C,y =Var(x) or cr^ (varinace).
The autocorrelation function p(^) where ^ represents a separation distance
between variables represents the persistence of the value of a property in space. An
autocorrelation function is simply defined as the ratio of the covariance function to its
variance, i.e..
Cij = [(x,. -m^XXj -THj)] (2.1)
(2.2)
33
Generally, the value of yo(i^) of the hydraulic conductivity data tends to drop
rapidly as increases. Different autocorrelation models can represent the decline of the
correlation. The one cormnonly used in three dimension is an exponential decay model,
[Bakret al. (1978); Gelhar and Axness (1983); and Yeh et al., (1985a,b,c)]:
/7(0=exp« A A
+ (2.3)
Where ^is the separation vector (^'1,^2'^3)' are the integral scales (or
correlation scales) in the x, y and z directions, respectively. The integral scale is defined
as the area under an autocorrelation fixnction if the area is a positive and non-zero value,
Lumley and Panofsky (1964). For the exponential model, the integral scale is the
-1 separation distance at which the correlation drops to the e level. At this level, the
correlation between data points is considered insignificant. Furthermore, if the correlation
scales of a random field are the same in all the directions, the random field is said to be
statistically Isotropic. Thus, the autocorrelation function is considered a statistical
measure of the spatial structure of hydrogeologic parameters.
34
2.3 Random Processes and Random Fields
The term "random process" or "stochastic process" is mostly used if the index set
is the time variable, while the term "random field" is commonly applied for index sets of
spatial locations. A random process is an infinite collection of random variables, where
the random variables are indexed on a discrete or continuous "index set" I, which
corresponds to spatial location x in this research applications.
The randomness lies in the lack of knowledge, and inability to acquire it fully,
about what these porous medium properties exactly are. Soil physical or chemical
properties are commonly determined by either an acmal measurement of soil properties
or by the intuitive, graphical, or mathematical estimation of soil properties from related
data (inverse distance interpolation, kriging, etc.). Both measurement and estimation are
associated with errors. The (physically deterministic) errors occurring during the
measurement and/or estimation process have the properties of random variables and thus
allow a rigorous analysis with statistical tools. This is the key to stochastic analysis and
the bridge between reality and conceptual model. Stochastic analysis in subsurface
hydrology is about modeling the limitations of our knowledge! How limited our
knowledge is will in turn depend on the porous medium heterogeneity [Harter (1994)].
The probability distributions encountered in stochastic modeling are essentially a
reflection of the fuzziness or uncertainty of our knowledge about the soil properties.
Hence, the justification for treating porous media as random fields lies NOT in the
physical nature of the porous medium (which is deterministic) but in the limitation of our
knowledge ABOUT the porous medium. This is not to say however that heterogeneity is
35
unrelated to the statistical analysis. Indeed, the estimation error is a direct function of the
soil heterogeneity. If the porous medium is relatively homogeneous, the properties of the
soil at unmeasured locations are estimated with great certainty given a few sample data.
On the other hand, if the porous medium is very heterogeneous and soil properties are
correlated over only short distances, an estimation of the exact soil properties at
unmeasured locations is associated with large errors. Hence, the heterogeneity of the soil
is a measure of the estimation error or prediction uncertainty.
2.4 Stationarity and Ergodicity of Random Processes
The sample statistics give a quantitative estimate of the degree of heterogeneity in
the porous medium, which also is an estimate of the expected estimation error. Then two
problems need to be addressed:
1. The sample taken from measuring MANY random variables ONCE must be related to
the MANY possible outcomes of any particular ONE random variable X(x) at
location x.
2. The sample statistics must be related to the ensemble statistics of the random field.
These two points are crucial to the stochastic analysis and in particular the first
one must not be underestimated [Harter (1994)]. Recall that a random field consists of an
infinite number of random variables, each of which has its own marginal pdf. The
random variables in a random field need not have identical probability distributions.
Estimates of soil properties that are conditioned on field data, are indeed always
random fields with random variables whose probability distribution function is a function
of the location in space. This is because the uncertainty about field properties may vary
from location to location depending on how closeness to a measurement point. In order to
determine the probability of occurrence of a particular sequence of random variables, a
joint distribution of these random variables, e.g. hydraulic conductivity K{x^,eo^) must
be known. Obviously, the joint distribution is not available in real-life situations, because
K{XI,q3^) values sampled along a borehole represent only one realization out of the
ensemble K{x-,Cl>^) Therefore, one must resort to simplified assumptions,
namely, stationarity and ergodicity.
Translating field measurements, a "sample", into statistical parameters defining
random variables is a practical problem. This leads to the problem of deriving
"ensemble" statistical parameters of random fields from a small sample that gives one
measurement of each of an infinite number of random variables. "Sample" statistical
parameters and a "sample probability distribution" or histogram of the measured random
field parameters can be computed which can give a quantitative estimate of the degree of
heterogeneity in the porous medium, which also is an estimate of the expected estimation
error.
Only a single realization of the random field is available, since all regional and
sub-regional geologic and other environmental phenomena are unique and do not repeat
themselves elsewhere. Realizations (samples) of random fields are a basic element of the
numerical stochastic analysis. The realizations are often referred to as random fields. In
numerical applications, random fields are always dicretized in a finite domain.
To simplify the problem, it is assumed that the marginal probability distribution
function of each random variable is identical at every location in the random field. This
implies that the mean, the variance, and the other moments of the probability distribution
are identical for every location in the random field. This property is called "stationarity"
or "strict stationarity". In this study, a weaker form of stationarity is assumed: "second
order stationarity" or "weak stationarity", which requires that only the mean and
covariance are identical everywhere in the random field:
IJ.^ (x) = fj. for all X (2.4)
Cov^[[Xix),(XCx+AO)] = Cov,iAO (2.5)
The existence of stationarity in porous medium properties cannot be proven
rigorously at any single field site. Data are often sparsely distributed. In the best of cases
a linear or higher order trend can reasonably be removed from the data. For all practical
purposes, it is dierefore convenient to hypothesize that the field site is a realization of a
weakly stationary random field (after removing an obvious trend). This is a reasonable
assumption in many field applications. Once this working hypothesis is postulated, the
sample of measurements at different locations is treated as if it were a sample of several
realizations of the same random variable (i.e. at the same location).
In order to solve the problem of relating sample statistics to ensemble parameters,
it is necessary that the sample statistics taken from a single realization indeed converge to
the ensemble statistics of the random variables as the number of samples increases. A
random field or random process that satisfies this theorem is called "mean ergodic".
Ergodicity means that by observing the spatial variation of a single realization of a
38
stochastic process, it is possible to determine the statistical property of the process for all
realizations.
Like stationarity, mean ergodicity cannot be measured in a single field site i.e., a
single realization of the hypothetical random field. Hence, mean ergodicity is taken as a
working hypothesis, i.e. it is assumed a priori that the measured sample statistics
converge in the mean square to the true ensemble parameters as the number of samples
increases. Ergodic processes need not be stationary, and similarly stationary random
fields need not be ergodic. The difference between the sample statistical parameters of X
and its ensemble moments is generally referred to as parameter estimation error and will
subsequently be neglected.
2.5 Random Field Generators (RFG)
To carry out the conditional simulations, one needs to be able to generate random
fields with prescribed covariance and cross-covariance functions. There are several
techniques to generate these fields.
Harter [1992, 1994] did a comprehensive review of the different techniques to
generate unconditional and conditional random fields. The generation of spatially
correlated samples of random fields plays a fundamental role in the numerical analysis of
stochastic functions. The purpose of random field simulation is to create numerical
samples or "realizations or random fields" of stochastic processes with well-defined
properties. The simplest and most commonly available form of simulation is the random
number generator on a calculator or computer. These readily accessible simulators
generate independent, uniformly distributed random numbers, [i.e. sample of a single
random variable with a uniform, univariate distribution. Press et al. (1992)].
A particular challenge arises when the random variables are dependent, which is
spatially correlated and defined through a joint or multivariate distribution. Not only do
the generated random fields have to converge (in the mean square) to the desired
ensemble mean and variance, and any higher order moments if appropriate, but they also
have to converge (in the mean square) to the desired correlation strucmre as the number
of samples increases. The purpose of a random field generator is to transform an
orthogonal realization, consisting of independently generated random numbers, with a
prescribed univariate distribution, into a correlated random field with the desired joint
probability distribution. If the distribution is Gaussian, the joint pdf is expressed by its
first two moments, the mean and the covariance.
In practice, the joint probability distribution function is often inferred from field
data obtained at the site of interest. The joint probability distribution is commonly
described by invoking the ergodicity and stationarity hypotheses discussed before, and by
taking the sample mean and the sample covariance function as the moments of the
underlying multivariate pdf. The simulations must be conditioned on the information
known at the measurement points. This amounts to the generation of random variables
with a conditional joint probability distribution function. Tompson et al. (1989) and
Harter (1994) have assessed the numerical efficiency of several popular RFG's.
40
The most commonly used methods for generating spatially correlated random fields
are:
1. Spectral representation and a fast Fourier transform FFT,
2. The turning band method (Th),
3. The matrix LU decomposition method, and;
4. The sequential Gaussian simulation method (s).
The first three RFG methods use finite linear combinations of uncorrelated random
variables, thus, the common distribution of these random variables should be such that it
is preserved under finite linear combinations, Myers (1989). Gaussian distribution is
frequently used, since it is the only distribution with this property, plus it is the simplest
distribution, requiring only the first two moments. The fourth RFG method features a rich
family of spatial strucmres not limited to the Gaussian distribution. Each of the above
RFG methods can serve as a basis for generating both unconditional and conditional
simulations.
In this study the spectral approach utilizing Fast Fourier Transform (FFT) is used
due primarily to its computational speed advantage and ease of incorporating an existing
subroutine into the algorithm.
2.5.1 Spectral Representation of Random Variables
In the analysis of random processes (time series), "spectral analysis" has been an
important tool for many different tasks, and is a well-established field of probability
theory, C.F. Priestley (1981). Recently, spectral analysis has also become important for
the study of spatially variable processes (random fieldsGelhar et al. (1974) introduced
spectral analysis to study groundwater systems and it has been applied to a great variety
of subsurface hydrologic problems [Bakr et al. (1978); Gutjahr et al. (1978); Gelhar and
Axness (1983); Yeh et al. (1985a,b); and Li and McLaughlin. (1991)].
Spectral analysis propounds that a single realization of a random process or of a
random field as a superposition of many (even infinitely) different (n-dimensional) sine-
cosine waves, each of which has different amplitude and frequency. Then any particular
realization can be expressed either in terms of a spatial function or in terms of the
frequencies and amplitudes of the sine-cosine waves and their amplimdes (called 'Fourier
series' of a discrete process and 'Fourier transform' of a continuous process). The latter
are collectively called the "spectral representation" of the random field. The spectral
representation of a single random field realization can intuitively be understood as a field
of amplitudes, where the coordinates are the frequencies of the sine-cosine waves. In
other words, instead of an actual value for each location in space, the spectral
representation gives amplimde for each possible frequency (wavelength). Note that in n-
dimensional space, n<3, sine-cosine waves are defined by n-dimensional frequencies
(with one component for each spatial direction) and therefore the spectral representation
of a n-dimensional random field is also n-dimensional.
42
Each realization of a random field has its own spectral representation. But
obviously the amplitudes of the sine-cosine waves can be defined as random variables
with the frequency domain as the index field. Then, the problem turns to deal with the
probability space of the spectral representation, which in torn is also a random field, but
defined in the frequency domain; (i.e. the probability space of the spatial random field is
mapped onto the probability space of the spectral random field).
There are many advantages to representing a random field in terms of its
underlying spectral properties, (i.e., in terms of the probability distribution of amplitudes
and frequencies of "waves"). Among these, there are two of particular importance. The
first is that the spectral representation of a spatially correlated random field is a random
field with uncorrelated random variables, making the analysis much easier than that of
correlated random variables which require a multivariate joint distribution function. The
second is that the spectral transformation of a partial differential equation is a polynomial
whose solution is found much easier than the solution to the partial differential equation
in the spatial domain.
The spectral representation of a single realization X(x) of a random field with mean
zero is formally defined in terms of the Fourier-Stieltjes integral, [Wiener (1930)]:
CO
(2.6)
43
Where the integral is n-dimensional, and Z(k) is a complex-valued function,
called the Fourier-Stieltjes transform of X(x). The Fourier-Stieltjes integral must be
chosen over the more common Fourier-Riemann integral:
f { x ) = ] e ' ^ g { k ) d k (2.7) ->00
where g(k) is the Fourier transform of f(x), since the Fourier-Stieltjes transform Z(k) of
the random field X(x) is generally not differentiable such that dZ(k) = Z(k) dk. Z(k) can
be understood as an integrated measure of the amplitudes of the frequencies between (-
oo,k) contributing to the realization X(x). The new probabilit>' space of the random
variables Z(k) in (2.1) has several important properties:
1. [JZ(fc)] = -^ fe-'^[Z(x)]d!r —CO
2. \\dZ(k)\-] = S{k)dk (2.8)
3. [dZ(Jc^)dZ'(Jc2)]=Q for all / tj
The first property states that the mean [dz(k)] of the random variables dZ(k) is
equal to the Fourier transform of the mean of the random variables X(x). In this study,
only zero-mean random processes are considered, hence the spectral representations are
also of zero means. The second property defines the variance of the random variable
dz(k) as [S(k) dk]. The term [S(k) dk] is a measure of the average 'energy per unit area'
or 'power' contribution of the amplimde of a frequency to the random field X(x). S(k) is
called the "spectral density" or "spectrum" of the random field x, which depends purely
44
on the probabilistic properties of the random field X(x). The third property states that the
increments dZ(ki) and dZ(k2) at two different frequencies ki and kj are uncorrelated.
dz(k) is called an orthogonal' random field. Through (2.8), the first two moments of the
random field dz(k) are defined solely in terms of the first two moments of the stationary
random field X(x). Hence, if the first two moments of the random field X(x) are known,
then the first two moments of the spectral representation dz(k) are known. Note that the
spectral representation dz(k) of a weakly stationary random field X(x) is only stationary
to first order: The mean [dz(k)] is constant (first property), but the variance S(k) of the
random field dz(k), is a function of the location k in the frequency domain (second
property).
In summary, a new probability space, called the spectral representation of a random
field, was defined on the known probability space of a random field. The mapping of a
stationary correlated random field X into its spectral representation dZ provides the
important advantage of creating an equivalent dZ to the random field X that consist of
orthogonal or uncorrelated random variables.
2.5.1.1 Unconditional two-dimensional Random Field Generation
The Spectral representation dZ of a correlated random field variable (RFV) X, is
itself a RFV of independent random variables with a variance defined by the spectral
density function of X, S(k) dk. It can be shown that the spectral density S(k) is simply
the Fourier transform of the covariance C(Q of X where ^ is the separation distance.
45
Hence, if random, zero-mean dz(k) are generated with a variance SQt) dk then their
inverse Fourier transform yields a correlated random field with X(x) that have zero-mean
and the desired covariance function by virtue of (2.3). RFG's based on Fourier transforms
have first been introduced by Shinozuka [1972, 1991]. Gutjahr (1989) describes a two-
dimensional random field generator based on a fast Fourier transform algorithm, which
has been adopted in my study.
In this study, as applied before by Harter (1994), realizations are generated on a
rectangular domain defined over a regular grid centered around the origin with grid
points being Ax = (AXj, Ax,)^ apart. The size of the domain is defined by MAx such that
the rectangle spans the area between - MAx and (M-l)Ax and the number of grid points
in the random field is 2M by 2M. Since the spectral representation of a stationary random
field is only defined for an infinite domain, it is further assumed that the random field is
periodic with period 2M in both dimensions. This is a necessary assumption for the
formal derivation of its spectral representation, since the analysis of an infinite process
can be used for the generation of a finite random field. Also, periodic functions are
known to have a discrete rather than a continuous spectrum. Hence, dZ(k) exists only for
discrete k, for which it can be generated, such that [dz(k)] = 0 and [ldz(k)i"j = S(k) dk.
The discretization of X(x) limits the wavelengths "seen" by the discrete random
field to all those that are at least of length 2Ax, i.e. to all (angular) frequencies k <
2ji/(2Ax).Higher frequencies cannot be distinguished firom frequencies within this limit,
an effect referred to as "aliasing". In other words, heterogeneities on a scale smaller than
46
the discretization Ax are not resolved by the random field. Similarly, the longest possible
wavelength "seen" by a finite random field is less than or equal to 2MAx i.e., the lowest
(angular) frequency is Ak= 2K /(2MAX), and all other frequencies k must be multiples of
Ak. Hence, the spectral representation dz(k) of a finite, discrete random field X(x) with
2 (2M)2 gnd points in 2-D space is also a finite, discrete random field defined on a (2M)
grid in the 2-D frequency domain. For discrete dz(k) the Fourier-Stieltjes integral (2.2)
becomes a Fourier series such that;
W = (2.9) m=—\i
where z(k) are (complex valued) random Fourier coefficients with the same
properties as dZ(k) in (2.7), namely zero-mean, a variance , and all z(k)
are independent for ^ .To ensure that X(x) is a real valued random field, the z(k)
field must be constructed such that:
z { - k ) - z { k ) (2.10)
(i.e. random numbers z(k) need only to be generated for one half the size of the
rectangle). The asterisk * stands for complex conjugate. Complex valued, Gaussian
2 distributed z(k) for discrete kj ,J=1, (2M) /2 are obtained by generating two independent
Gaussian random numbers aj and Pj for each kj. Each is independent, Gaussian
distributed random numbers with zero mean and variance of Vz. For one half of the
random field:
47
z(,k,) = [S(ir^)A/r)]"' (2.11)
The other half is obtained through the symmetry relation (2.10). Gutjahr et al.
(1989) showed that the above Construction of z(k) satisfies the required properties. The
correlated random field X(x) is obtained by performing the Fourier summation (2.9). The
double summation in (2.9) is most efficiently done by a numerical Fourier transform
technique called the "Fast Fourier transform", or simply FFT, Brigham (1988). FFT
algorithms can be found in many computer libraries, e.g. IBM (1993) and Press et al.
(1992). Most available FFT algorithms are written using the fi-equency u as argument
instead of the angular fi^equency k, where k = 27cu. Fourier transform pairs can be defined
as follows:
'S(i)=-^r f ^x(x)dx (2;r) J-"-'-"
Changing the variables of integration fi-om k to u, we can get similar pairs as in
Equations (2.12), [Harter (1994)]. Typical FFT algorithms require the summation in (2.9)
to be over the interval [0,2M-1] rather than over the interval [-M, M-l], Using the
periodicity assumption z (m Ak), m > M-l, are obtained fi-om;
C(;c,)=r r e-^S(Jc)dk J— CO J— CO
S(.k) = f (iTt) J-®-"-"
x ( x ) = r r e'^^dzck) J—oo J—CO (2.12)
48
z(mAk) = z[(m-2M) Ak], all m > M-1
Recalling that Ak=27r/(2MAx), this leads to the following constmction of the
correlated random field X (x), with entries X(niAxi, n2Ax2), 0 < ni, ni < 2M-1 ;
ni|=0 nh=0
where;
- 2M 2M (2.13)
Itc iTC •sa Irnn^ iTtm.
- t l /2
IMAaTi 2MAX2 2MAXi ' 2MAX2 ) (cc (2.14)
with a and P being zero-mean, independent , Gaussian distributed random
numbers of known variance . In this study, random fields are generated using equation
(2.13) with FOURN subroutine to perform the FFT and with the GAUSDEV and RAN2
subroutines to generate the random numbers a and (3. Original implementation of this
RFG was provided by Gutjahr (1989) and has been applied by Harter (1994).
2.5.1.2 Conditional two-dimensional Random Fields
Assume an array of measurements AT^ = {Xj,..., x^} is available and a two
dimensional conditional random field is generated such that at location{XJ,..., x^} the
measured values of the random variables X^are reproduced, and such that at all other
locations{x^jv» \} the generated random numbers, } have a sample
mean and sample covariance that converge to the conditional mean (X^ and conditional
49
covariance E22 , in the limit as the number of random fields generated becomes infinite,
[Harter(1992, 1994)],
00
[Z,.r =[X.XJ \ x^,x^,...,xJdX; -<a
ae 00
= [ X i X J I Xi, X,] = J J (x,., Xj i Xi, Xjx„ i,J= 1 n -<0—00
To implement the conditional random field generation, Delhomme (1979) used the
following approach based on work by Matheron (1973) and Joumel (1974,1978):
Initially, the measured data Xj are used to infer the moments, mean and covariance, of the
unconditional joint pdf of the random field. Then an estimate of the conditional mean
is obtained by simple kxiging. The kriging weights and the estimated conditional
mean [X^]*' are retained for the subsequent generation of conditional random fields, Xj^
Which are constructed as:
Z; = [ X ] ' + (Z, - [ZJ*) = [ X r + (2.15)
where [X]'' is the kriged random field given the simulated data ft-om the
unconditionally generated random field X^. X^ has a joint probability distribution defined
by the measured moments. [XJ*^ is the simulated equivalent to [X]'': It preserves the data
X in the unconditionally generated random fields at and only at the locations {x,,..., x }, aS X 111
where measurements are available in the real field site as well, and of the kriged
estimates [XjJ*' at all other locations given the unconditionally simulated
data . The difference (X-[X]'^) is a realization e^ of a possible estimation error
incurred by estimating the data through the kriged values [Xj''. The simulated error is
added to the originally estimated conditional mean [X]'' obtain a possible conditional
random field X . 5
The simulated estimation error e^ has the same conditional moments as for the real
estimation error e = (X-[X]'^) since the unconditional probability density functions of the
real and the simulated fields are identical (neglecting measurement and moment
estimation errors), and because the conditioning occurs at the same locations both at the
field site and in the simulations [Joumel (1974) and Delhomme (1979)]. The
unconditional random field generator and the kriging of the generated random field, from
the simulated measurement data, are repeated for each realization. Each simulation yields
a random field of estimation error e^, which can be added to the kriging estimate of the
real data to obtain a conditional random field.
For a large number of samples, the sample variance of X2s(x2) will converge in the
mean square to the true conditional variance or kriging variance of X2(x2) , Delhomme
(1979). This conditioning technique is independent of the method used to generate the
unconditional random field and is not related to the spectral random field generator.
51
3. STOCHASTIC MODEL DEVELOPMENT
3.1 Introduction
To achieve the above objectives and for a comparison study, a stochastic
conditional simulation approach will be introduced first. Random field generators as
discussed earlier are often used in the stochastic approach to generate realizations of
transmissivity field under some geostatistical assumptions and known statistical
parameters. These realizations are considered to represent real transmissivity fields
(reality). Since NN learns by examples, the above mentioned realizations will serve as
learning sets during the design of the NN.
An assumption of statistical second-order stationarity is made here in describing the
spatial heterogeneity of the transmissivity. Essentially this means that the statistical
properties are not a function of location but only of separation distance between locations
in the field. In an attempt to circumvent some problems of scale, the method described
here discretizes a block-centered finite differences grid so that the block size
approximates the scale of the sample volume of pumping tests. Alternatively, the point-
scale process could be integrated over the block domain but there remains the problem of
initially determining that point scale covariance. Concerning spatial discretization, there
needs to be at least 10 blocks of the finite difference grid per correlation scale to
adequately model the spatial variability [Van Lent and Kitanidis, 1996] and several
correlation lengths per scale of the bounded domain.
This chapter develops the theory and algorithm for stochastic approach of
conditioning random simulations of perturbations in log-transmissivity on measurements
of head. Equations for covariances are approximated by a small-perturbation linearization
of the two-dimensional vertically integrated groundwater flow equation. The covariance
equations are then solved, along with equations for head and mean head, by multigrid
partial differential equation solver developed by Steve Schaffer's (1995). Conditioning
on data is accomplished by cokriging using the covariances obtained from the numerical
solution, and head is solved directly from the cokriged transmissivities and the
groundwater flow model. An improvement to the match between simulated heads and
actual head data is accomplished by iteratively cokriging on the differences between
model heads and the data and solving the flow equation.
53
3.2 Mathematical Development of the Stochastic Model
3.2.1 Spectral Representation of the Flow Equation
The classical, depth-averaged equation for two-dimensional, steady groundwater
flow, with heterogeneous transmissivity is written as:
where T is transmissivity and ^ is hydraulic head, and x and y are the coordinates of
spatial location. Transmissivity is a spatially heterogeneous parameter that is unknown,
except at data locations (pumping tests), and is modeled statistically as a second-order
stationary random field. Uncertainty in the transmissivity parameter then propagates
through the model and results in uncertainty in the hydraulic head. Head also is measured
only at well locations in space. This model can be applied to basin-scale with steady flow
in confined aquifers or in unconfined aquifers v/here the change in saturated thickness is
negligible. This model is not valid where there is a significant component of vertical
flow, such as may be the case near point sources, sinks, or boundaries or where there is a
significant change in saturated thickness.
Boundary and initial conditions are required in order to complete the model. The general
boundary condition can be formulated as
aA^.n + b(^ = c
in which n is a unit vector outward normal to the boundary and a, b, and c are constants
whose values determine the boundary conditions. The types of boundaries which will be
(3.1)
54
considered here are where a = 0 and b = I (type 1), and where a = 1 and b = Q (type 2).
The value of c will be treated as deterministic along the boundary. Spatial location of the
model boundaries is an extremely important aspect of designing a model to represent
some real aquifer system. For this research, however, boundaries are considered to be of
known location, type, and magnimde.
Randomness is introduced and the model is linearized by a first-order small
perturbation expansion. The log-transformed transmissivity is expanded into the sum of a
constant mean, F = E{Ln(T{x))}, and small mean zero permrbation, f(x). Likewise,
hydraulic head is the sum of the mean head, /^(jc) = £{^(x)}, and a small mean zero
perturbation in head, h (x).
Equation (3.1) can be written in terms of LnIT] as:
d\nT dS d\nT d<f> „ — — T + - + ^ = 0 ( 3 . 2 ) dx~ dy dx dx dy dy
Assuming head and log-transmissivity to be spatial stochastic processes, they may
be decomposed as the sum of mean and perturbation parts about the mean
(p{x, y)=H(x) + hix,y)
(3.3)
LnTix, y) = F +fix, y)
In (3.3) H is only a function of the x-direction, implying a unidirectional mean-flow
assumption. Unidirectional mean flow is not excessively restrictive; the coordinate
55
system can be oriented so that the x-axis and mean-flow directions are parallel. The
LnlT] processes is assumed to have a constant mean, F.
Substituting (3.3) into (3.2) leads to:
d-H d-h d^h 8f dH df dh df dh ^ — — • : r + — = 0 (3.4) dx dx~ dy~ dx dx dx dx dy dy
If the variations in the transmissivity are small, then the second-order terms (products of
perturbations) may be neglected [see Mizell et al., 1982, p. 1054; Dagan, 1982, p. 819].
Then equation for mean flow is found by taking the expectation of (3.4)
\ ax dxj = 0 (3.5)
ydy dy^
and subtracting (3.5) from (3.4) results in the governing equation of perturbation in flow,
(3.6) dx^ dy"- ' dx
where /, =-dHldx is the mean gradient. Note that the second-order perturbation
products are neglected.
Dagan (1985) wrote: "In a recent paper, Gutjahr (1984) has shown that if the log-
conductivity and head fields are jointly normal, the first-order approximation becomes
exact and is valid for arbitrary crj. The present study indicates that this is not the case,
in the sense that the first-order approximation has to be supplemented by additional
terms. Nevertheless, for small to moderate values of al, it is seen that these additional
56
terms may be neglected". Recently, Loaiciga and Marino (1990) analyzed the error in
dropping the perturbation terms in (3.6) in the case of steady state flow case. They
reinforced those statements by Dagan (1985) and presented a necessary and sufficient
condition for the smallness of these two terms.
An assumption of statistical homogeneity (stationarity) for both T and h
fluctuations is made. Statistically homogeneous processes are described by statistical
properties (covariance and related functions) which depend only on the lag vector
separating observations, and not on actual locations of observations. Assuming statistical
homogeneity allows representation of input/and output h by Fourier-Stieltjes integrals
[Lumley and Panofsky (1964), p. 16]
00 J —00
(3.7)
where dZf and dZh represent complex Fourier amplitudes of the respective fluctuations
over kj and k^, wave-number space. Substituting (3.7) into (3.6) gives, after
differentiating,
-iidZ, =0 (3.8)
57
where (3.8) is true if the bracketed term is zero. Thus, an equation relating the
Fourier amplitudes of/and h by uniqueness of the representation yields
dZ dZ i k , , ) (3.9) ( ^ I + 2 )
From Lumley and Panofsky (1964, p. 16), the expectation of the product of the
Fourier amplitude with its complex conjugate is the spectrum for the random process. By
this procedure a relationship between the spectra of/and h can be derived from (3.9) as
2 2
S,{k„k^)dk = EidZ,dzl)=-p^^l^Sf{k„k,)dk (3.10) (^1 +^2)
where is the head spectrum and Sj- is the LnlT] spectrum (asterisk designates the
complex conjugate).
The head spectrum (3.10) is directly proportional to the Ln[T] spectrum, and the
proportionality factor is a function of the mean gradient and wave-number space location.
Mathematically, knowledge of either the h or/spectrum would allow evaluation of the
other. In the following section, the Ln[T] spectrum will be identified for use in (3.10).
58
3.2.2 Lii[T] Spectra and Covariance Function
A spectrum-covariance pair for two-dimensional spatial processes was proposed by
Whittle (1954) based on analysis of agricultural yields. The Whittle spectrum has the
form:
S.<.k,.k,_)= {3.11) +k^ + a " ) "
a = — 2X Jo a-
where A is the integral scale and cr^ the variance of the whittle process. The associated
covariance function obtained from Fourier transform of ^^(Luniley and Panofsky,
1964), is formulated in terms of modified Bessel function
where C is the magnitude of the separation, or lag vector between observation points,
and kj is the modified Bessel function of second kind, first order. Whittle [(1954), p.
448) describes (3.12), when normalized by the variance, as "the elementary correlation
function in two-dimensions, similar to the exponential e~'^''^in one dimension." Gelhar
(1976) uses the Whittle spectrum and covariance functions to describe the log-
conductivity process when analyzing two-dimensional phreatic flow.
59
The head variance is obtained by using the assumed Ln[T] spectrum, (3.11), in
(3.10) and integrating over wave-number space. Note that singularity at = 0
remains in the head spectrum, and as a result the integral is divergent. Thus, the Whittle
spectrum will not yield a finite variance for two-dimensional confined groundwater flow
under the assumption of statistically homogeneous A. As an alternative, Mizell (1982)
considered two spectra, A and B, appropriately modified to eliminate the singularity in
the head spectrum. In this study only modified spectra A will be adopted. The modified
spectra A for two-dimensional spatial process is
la- a-jk, -^kl) ^ Tvik ' -+kl+a^y
(3.13)
The associated covariance fiinctions and integral scale are;(see Mizell, 1982 for detail)
^.(^) = ^ 7^ 4A
K fjL 4JL J
£L 4 A
\
K. 4A
(3.14)
where
A= — 4a
60
3.2.3 Head Variance and Covariance Functions
The head covariance function is detennined by the Fourier transform of the head
spectrum. Variance of the head process may be found by evaluating the Fourier
transform of the spectrum at zero lag.
OO OO
(3.15)
OO OO
= j \ S^iki,k^)dk^dk. (3.16)
where the h subscript denotes statistical properties of the head fluctuations and and
are components of the lag vector separating observation points of the process.
Substituting (3.10) into (3.15), assuming (3.13) for the Ln[T] spectrum, and
performing the integration (see Mizell, 1982 for details) leads to the head covariance
function associated with input spectrum A. Written in polar coordinates, this head
covariance function has the form
cos^ X
RAH =
4 { 4 Z J K.
y4Aj 4X \4A ,
4 U > ^ .
-I
\4AJ l4Z
\-2
y
\ 4AJ 4^4J lJ \ 4ZJ
J J
(3.17)
61
where x is the angle between the lag vector, and the mean-flow direction, and
A is the integral scale associated with . Equation (3.16) gives the head covariance
3.2.4 Cross-Covariance Functions
Evaluation of the cross-covariance function between head and Ln[T], a measure of
dependence between the two processes at various lag distances, first requires evaluation
of the cross-spectrum. Formulation of the cross-spectrum parallels that of the head
spectrum previously discussed. The cross-spectrum is given by the expectation of the
Fourier amplitude of the head fluctuations, (3.9), multiplied by the complex conjugate of
the Ln[T] Fourier amplitude (Lumley and Panofsky, 1964, p. 21):
The cross-covariance function is the Fourier transform of the cross-spectmm.
Substituting spectrum A,(3.13), into (3.19) and completing the transform (see Mizell,
1982) gives the associated cross-covariance function for spectra A, which in polar
coordinate, is
;r " ^ r 2 _ 2 3 2 . 2 'J \ ^ f ^ (3.18)
S^ ikMdk = E(dZ,dZ\ ) = — ( A r f + A : - )
(3.19)
(3.20)
62
7t Note that (3.20) is zero valued when = 0 or when - Thi^s, head and Ln[T]
fluctuations are uncorrelated at zero lag; £[/z/] = 0.
Covariance and correlation functions (correlation functions are the covariance
function divided by the variance) measure the degree of dependence of information
obtained at various lag distances. These functions are constructed by taking the expected
value of the product of two observations made at various lag distances (Luniley and
Panofsky, 1964, p. 14)
(3.21, cr a
where p is the correlation function and X has zero mean. Positive correlation values
imply correlation between observations of similar magnitude (high values with high
values or low with low), and negative values for the correlation function imply
correlation of observations or opposite magnitude (high values with low). Larger
absolute values of correlation indicate stronger dependence between observations at that
lag distance. Maximum correlation occurs at zero lag distance because observations are
compared with themselves.
63
3.3 Conditioning
Consider the natural log of transmissivity, Ln[T(x)], of an aquifer to be a
stationary stochastic process, with a constant unconditional mean EPLn(T)]=F, and the
unconditional perturbation f. The corresponding hydraulic head is given by
^{x) = H{x) + h{x), where and h is the unconditional head perturbation.
Suppose that a limited number of transmissivity and head measurements, designated
and n^, respectively, are available for the aquifer, where /)* = Ln[7)* ] - F and
represent the observed transmissivity and head values, respectively, with i =
1 , 2 , . . . , r i j -and j=nj^+l , . . . , n^+n^ .
One possible solution of conditional simulation is to preserve observed head and
transmissivity values at sample locations, while satisfying both the underlying statistical
properties (i.e. mean and covariance, etc.) of the field and the governing flow equation.
Cokriging, as discussed in the next section, is considered the best linear unbiased
estimator, and is usually used for conditional simulations. In the conditional probability
concept, a head or transmissivity field represents a conditioned realization of ^ or Ln[T]
in the ensemble. Infinitely many possible realizations of conditioned fields ^ or Ln[T]
exist.
3.3.1 Cokriging
Cokxiging is an estimation technique useful when two (or more) correlated
variables are measured in ±e field and can be estimated together [de Marsily (1986);
Kitanidis and Vomvoris (1983); Neuman (1984) and Hoeksema and Kitanidis (1984)].
For example, transmissivity values for an aquifer, which is generally correlated with
hydraulic head, can be estimated from not only measurements of transmissivity itself, but
also simultaneously from hydraulic head measurements.
In cokriging, the unknown mean removed value Ln[T] is estimated at any point
in the aquifer by a weighted linear combination of the observed Ln[T] and h. The weights
are determined by requiring the estimator to be unbiased with minimum variance. In this
sense, cokriging is referred to as a Best Linear Unbiased Estimator. Cokriging is also an
exact interpolator; that is, at any measurement point, the cokriged estimated value is
equal to the measured one, and the cokriging variance is zero.
By casting the problem in a probability firamework [Dagan (1982a,b).(1985a,b) ;
and Rubin and Dagan (1987)] show that when the random transmissivity T and head h
fields are jointly Gaussian, with known mean and covariance, the cokriging estimate and
its covariance are equivalent to the conditional mean and covariance of the new joint pdf
that is conditioned on the measurements. Even if cokriging is an optimal and exact
estimator, it can not reproduce all the details about the transmissivity field that have not
been measured, regardless of the quantity and quality of head measurements, because it
reflects conditional expectation. As a result, the cokriged map is inevitably smoother than
the actual one. This would be true even if head information were available everywhere.
65
In this section, cokriging will be described briefly; detailed discussions of the
theory is found in many literamre sources, e.g. de Marsily(1986) and Vauclin et al.
(1983).
Assume that Zi(x) and Z2(x) are two correlated second-order stationary random
fields with known means and covariances. The estimation of Zi, and Z?, if necessary,
using cokriging, is based on the best linear unbiased estimator technique, which is,
C-^o) — ^ A/^l ^ ^2 ^n+k ) (3 -22) j=i 4=1
where:
Z, (x,) = the estimate at location ,
Z, {Xj) = the measured values of Z, at locations x, .Xj,..., . Similarly,
2^2(^n+i) = measured values of Z, at locations • Similarly,
and = are the cokriging weights corresponding to Zj and Z^, respectively.
Note that, Z, and Z^ need not to be measured at the same location. The minimal variance
criterion requires that:
= Minimal (3.23)
If Z, and Z, are second order stationary processes, the cokriging system of equations can
be written as:
66
n m + = - X . ) for / = ! n
(3.24) n m
The auto-covariances of Z, and Zj, and the cross covariances between Z, and Z,
are denoted by Cj^Cjiand C,,, respectively. By solving the system of Equations (3.24),
the cokriging weights can be obtained. Following this, equation (3.22) can be used to
obtain the estimates of either or both variables at any location. Prior to the solution of the
normal system of equations, the auto-covariances and cross covariances as functions of
the separation distance must be known. This information can be obtained from either
field data or some theoretical analyses.
One of the applications of the covariance and cross-covariance approximations
developed in section 3.2 is that they can be used for conditioning randomly simulated
fields jointly on head and transmissivity data by linear cokriging estimation. Given a data
vector of transmissivity and hydraulic head measured at spatial locations, the cokriging
estimator of the perturbation in LnlT(x)], at some locations,, is the sum of the product of
the data and the weight applied to each datum. That is:
(3.25)
67
where is ±e cokriged/value at location x,. Then, transmissivity becomes
fk^^o) co-kriging weights A-.and jUj^ associated with pointcan be
evaluated as;
n, " f+nn
1=1 j=n^+l
(3.26)
^i„Rfh(.^i^ (Xy , ) = /?yj^ (Xo , X; ) / +l,/2y +2,.../2y- +/Z^
where R^. and R^ are covariances of / and h, respectively, and R^ is the cross-
covariance of/and h. Note that F is assumed to be known and H(x) is derived by solving
the governing flow equation with the transmissivity equals to e''. Similarly, one can use
classical cokriging based on the observed f' and h'j to construct a cokriged head field.
That is:
hkM = (3.27) 1=1 7=1/+1
where h^. (x,) is the cokriged h value at location x,, and /3-^ and are the co-kriging
weights, which can be evaluated as follows:
«/ "/+"!,
j^j3ioRffix^,x,)+ = /?^(x,,x,) / = l,2,..-,/Zj. i=l
(3.28)
^r j ,Rh/ , ( ^Xj 'X , ) = Ru,{x„Xi ) l=nf+Xn^+2, . . .n f+n^
The covariances and cross-covariance , in (3.26) and (3.28) are derived from
the first-order approximation in equations (3.14),(3.17), and (3.20).
3.3.2 Iterative Conditioning Procedure
Conditioning on data using the cokriging estimator is accomplished by a standard
geostatistical procedure. This procedure is illustrated in Figure (3.1). First, classical
cokriging is performed based on the observed f' and h'- to construct (x), a cokriged
/field, mean removed Ln[T], as shown in Equation (3.25), where ck stands for cokriging.
The cokriging coefficients are calculated using Equation (3.26). Similarly, classical
cokriging is performed based on the observed f' and h'. to construct {x), a cokriged
h field, mean removed (p as shown in Equation (3.27), and the cokriging coefficients are
calculated using Equation (3.28).
Second, an unconditional realization of the random perturbation of the Ln[T(x)]
field, /„ (x), which maintains the prescribed covariance function, is generated, where
subscript u denotes the unconditional simulation. Cokriging can then be performed using
the measured values at the acmal sample locations to derive the kriging estimate, (x),
which matches the random field samples at the data locations as shown schematically in
Figure (3.1). The cokriging error of the unconditional simulation can be calculated as the
difference between the transmissivity value of the given unconditional simulation and its
cokriging estimate, represented by [/„(x)-/„*(x)], and is always zero at sample
Figure (3.1) Illustration of conditioning a random field on data
70
locations. The corresponding unconditional head field, (x) = H{x) + (x), is
generated by solving the governing flow equation, with (x) = .
The head perturbations obtained firom the flow model parameterized with the
conditioned/field do not exactly match the head perturbation data. This is due, in part, to
the first-order linearization approximation [Dagan (1985)]. As the input variance of the/
field is reduced, the discrepancy between the data and the model results also diminishes.
A deparmre fi-om a multivariate Gaussian distribution of the correlated/and h variables
may also affect this discrepancy [Gutjahr (1984)]. The match between head perturbation
data and the head perturbations resulting from the flow model parameterized with the
conditioned / field can be improved by iteratively conditioning on head data [Gutjahr et
al. (1994)]. The strategy is to take the difference between the hd data and the model
results and add this difference to the unconditioned hu.
First, the unconditioned random field /„(x) is substimted into the flow model
equation, and the resulting perturbations around the mean head (x) are sampled at the
locations corresponding to the observation points in the true field (call these samples
0
K (^j) )• These samples and the samples of the unconditioned / field are used in the
cokriging equation (3.25) to obtain a cokriged /^(x) field, and the cokriging weights
\ and /jj_ are determined from (3.26) for each location. Then, is substituted into
the flow model, and the resulting discrepancy between head perturbation data and the
71
model results are added to the unconditionedfield. At iteration m, the conditioned /
where is a vector of samples of head perturbations resulting from the flow model at
i te ra t ion m-l , and / / . i s the cokr ig ing weight assoc ia ted wi th loca t ion i .
These iterations are repeated until a prescribed convergence criterion is reached.
As a result, T^^x) = md ^^(x ) = H(x) + h^{x) agree with the measured values in
the true fields, where the conditioned 7].(x)and the true transmissivity field have the
same covariance function. The above procedure can then be repeated using different seed
numbers to generates different realizations for conditioning.
m field (x-) at location x. is
(3.29)
m—I
72
4. NEURAL NETWORKS
4.1 Introduction
Fundamentally, there are two major different approaches in the field of artificial
intelligence (AI) for realizing human intelligence in machines [Munakata, (1994)]. One is
symbolic Al, which is characterized by a high level of abstraction and a macroscopic
view. Classical psychology operates at a similar level, and knowledge engineering
systems and logic programming fail in this category. The second approach is based on
low-level microscopic biological models. It is similar to the emphasis of physiology or
genetics. Artificial neural networks and genetic algorithms are the prime examples of this
latter approach. They originated from modeling of the brain and evolution. However,
these biological models do not necessarily resemble their original biological counterparts.
Neural networks are a new generation of information processing systems that are
deliberately constructed to make use of some of the orgaiiizational principles that
characterize the human brain. The main theme of neural network research focuses on
modeling of the brain as a parallel computational device for various computational tasks
that were performed poorly by traditional serial computers. Neural networks have a large
number of interconnected processing elements (nodes) that usually operate in parallel
and are configured in regular architectures. The collective behavior of an NN, like a
human brain, demonstrates the ability to learn, recall, and generalize from training
patterns or data. Since the first application of NNs to consumer products appeared at the
end of 1990, scores of industrial and commercial applications have come into use. Like
73
fuzzy systems, the applications where neural networks have the most promise are also
those with a real-world flavor, such as speech recognition, speech-to-text conversion,
image processing and visual perception, medical applications, loan applications and
counterfeit checks, and investing and trading.
Models of neural networks are specified by three basic entities: models of the
processing element themselves, models of interconnections and strucmres (network
topology), and the learning rules (the ways information is stored in the network).
Each node in a neural network collects the values from all its input connections,
performs a predefined mathematical operation, and produces a single output value. The
information processing of a node can be viewed as consisting of two parts; input and
output. Associated with the input of a node is an integration function (typically a dot
product) which serves to combine information or activation from an external source or
other nodes into a net input to the node. A second action of each node is to output an
activation value as a function of its net input through an activation function, which is
usually nonlinear. Each connection has an associated weight that determines the effect of
the incoming input on the activation level of the node. The weights may be positive
(excitatory) or negative (inhibitory). The connection weights store the information, and a
neural network learning procedure often determines the value of the connection weights.
It is through adjustment of the connection weights that the neural network is able to leam.
In a neural network, each node output is connected through weights to other nodes
or to itself. Hence, the structure that organizes these nodes and the connection geometry
among them should be specified for a neural network. We can first take a node and
combine it with other nodes to make a layer of nodes. Inputs can be connected to many
nodes with various weights, resulting in a series of outputs, one per node. This results in a
single-layer feed-forward network. We can further interconnect several layers to form a
multi-layer feed-forward network. The layer that receives inputs is called the input
layer and typically performs no computation other than buffering of the input signal. The
outputs of the network are generated from the output layer. Any layer between the input
and the output layers is called a hidden layer because it is internal to the network and has
no direct contact with the external environment. There may be from zero to several
hidden layers in a neural network. The two types of networks mentioned above are
feedforward networks since no node output is an input to a node in the same layer or
preceding layer. When outputs are directed back as inputs to same- or preceding-layer
nodes, the network is a feedback network. Feedback networks that have closed loops are
called recurrent networks.
The third important element in specifying a neural network is the learning scheme.
Broadly speaking, there are two kinds of learning in neural networks; parameter
learning, which concems the updating of the connection weights in a neural network,
and structure learning, which focuses on the change in the network structure, including
the number of nodes and their connection types. These two kinds of learning can be
performed simultaneously or separately. Each kind of learning can be further classified
into three categories:
Supervised learning, reinforcement learning, and unsupervised learning.
Among these, the supervised learning is of more concern in this study as will be
discussed later.
Among the existing neural network models, two of the important ones that started
the modem era of neural networks are the Hopfleld network and the back-propagation
network. The Hopfield network is a single-layer feedback network with symmetric
weights proposed by John Hopfield in 1982. The network has a point attractor dynamics,
making its behavior relatively simple to understand and analyze. This feature provides
the mathematical foundation for understanding the dynamics of an important class of
networks. The back-propagation network is a multi-layer feed-forward network with a
gradient-descent-type learning algorithm, called the back-propagation learning rule [by
Rumelhart, Hinton, and Williams's (1986b)]. In this learning scheme, the error at the
output layer is propagated backward to adjust ±e connection weights of preceding layers
and to minimize output errors. The back-propagation learning rule is an extension of the
work of Rosenblatt, Widrow, and Hoff to deal with learning in complex multi-layer
networks providing an answer to one of the most severe criticisms of the neural network
field. With this learning algorithm, multi-layer networks can be reliably trained. The
back-propagation network has had a major impact on the field of neural networks and is
the primary method employed in most of the applications.
Neural networks offer the following characteristics and properties;
Nonlinear input-output mapping: Neural networks are able to leam arbitrary
nonlinear input-output mapping directly from training data.
16
Generalization: Neural networks can sensibly interpolate input patterns that are
new to the network. From a statistical point of view, neural networks can fit the desired
function in such a way that they have the ability to generalize to situations that are
different from the collected training data.
Adaptivity: Neural networks can automatically adjust their connection weights, or
even network structures (number of nodes or connection types), to optimize their
behavior as controllers, predictors, pattern recognizers, decision makers, and so on.
Fault tolerance: The performance of a neural network is degraded gracefully under
faulty conditions such as damaged nodes or connections. The inherent fault-tolerance
capability of neural network stems from the fact that the large number of connections
provides much redundancy, each node acts independently of all the others, and each node
relies only on local information.
4.2 Basis of Neural Networks
The human brain is a very complex system capable of thinking, remembering, and
problem solving. There have been many attempts to emulate brain functions with
computer models, and although rather spectacular achievements have stemmed coming
from these efforts, all of the models developed to date pale in comparison to the complex
functioning of the human brain.
A neuron is the fundamental cellular unit of the brain's nervous system. It is a
simple processing element that receives and combines signals from other neurons through
input paths called dendrites. If the combined input signal is strong enough, the neuron
"fires," producing an output signal along the axon that connects to the dendrites of many
other neurons. Figure (4.1) is a sketch of a neuron showing the various components. Each
signal coming into a neuron along a dendrite passes through a synapse or sj^aptic
junction. This junction is an infinitesimal gap in the dendrite that is filled with
neurotransmitter fluid that either accelerates or retards the flow of electrical charges. The
fundamental actions of the neuron are chemical in nature, and this neurotransmitter fluid
produces electrical signals that go to the nucleus or soma of the neuron. The adjustment
of the impedance or conductance of the synaptic gap is a critically important process.
Indeed, these adjustments lead to memory and learning. As the synaptic strengths of the
neurons are adjusted, the brain "leams" and stores information.
When a person is bom the cerebral cortex portion of his or her brain contains
approximately 100 billion neurons. The outputs of each of these neurons are connected
through their respective axons (output paths) to about 1000 other neurons. Each of these
1000 paths contains a synaptic junction by which the flow of electrical charges can be
controlled by a neurochemical process. Hence, there are about 100 trillion synaptic
junctions that are capable of having influence on the behavior of the brain. It is readily
apparent that in our attempts to emulate the processes of the human brain, we cannot
think of billions of neurons and trillions of synaptic junctions. Indeed, the largest of our
neural networks typically contain a few thousand artificial neurons and less than a million
artificial synaptic junctions.
78
Dendrites Cell body(Soma)
Figure (4.1) Sketch of biological neuron
The one area in which artificial neural networks may have an advantage is speed.
When a person walks into a room, it typically takes another person about half a second to
recognize them. This recognition process involves about 200-250 individual separate
operations within the brain. As a benchmark for speed, this means that the human brain
operates at about 400-500 hertz (Hz). Modem digital computers typically operate at clock
speeds between 100 and 500 megahertz (MHz), which mean that they have a very large
speed advantage over the brain. However, this advantage is dramatically reduced because
digital computers operate in a serial mode whereas the brain operates in a parallel mode.
However, neural network chips have been developed in recent years that enable neural
computers to operate in a parallel mode.
4.3 Artificial Neurons
An artificial neuron is a model whose components have direct analogs to
components of an actual neuron. Figure (4.2) shows the schematic representation of an
artificial neuron. The input signals are represented by xo, Xi, xi,..., Xn. These signals are
continuous variables, not the discrete electrical pulses that occur in the brain. Each of
these inputs is modified by a weight (sometimes called the synaptic weight) whose
function is analogous to that of the synaptic junction in a biological neuron. These
weights can be either positive or negative, corresponding to acceleration or inhibition of
the flow of electrical signals. This processing element consists of two parts. The first part
simply aggregates (sums) the weighted inputs resulting in a quantity I; the second part is
80
w. 2
Sum of weights Axon Activation function
X
Soma
Neuron J
X n
Figure (4.2) Sketch of artificial neuron
81
effectively a nonlinear filter, usually called the activation function or transfer function
±rough which the combined signal flows.
Figure (4.3) shows several possible activation functions. The threshold function as
shown in Figure (4.3 a) passes information only when the output I of the first part of the
artificial neuron exceeds the threshold T. The signum function (shown in Figure (4.3b),
sometimes called a quantizer function) passes negative information when the output is
less than the threshold T, and positive information when the output is greater than the
threshold T. More commonly, the activation function is a continuous function that varies
gradually between two asymptotic values, typically 0 and 1, or - 1 and + 1 (called the
sigmoidal function). The most widely used activation function is the logistic function,' a
member of the sigmoidal activation functions, (shown in Figure (4.3c)), and is
represented by the equation
1 + e" m=T-^ (4-1)
where a is a coefficient that adjusts the abruptness of this function as it changes between
the two asymptotic values.
A more descriptive term for the activation function is "squashing function," which
indicates that this function squashes or limits the values of the output of an artificial
neuron to values between the two asymptotes. This limitation is very useful in keeping
the output of the processing elements within a reasonable dynamic range. However, there
are certain situations in which a linear relation, sometimes only in the right half-plane, is
used for the activation function. It should be noted, however, that the use of a linear
1
(a)
0 ( / ) = ^ ^ I [ 0 , / < T \
^ 0 ( / ) 1
(a)
0 ( / ) = ^ ^ I [ 0 , / < T \
0 Threshhold T
4<D(/ )
0(7) =
Threshhold T
0(7)
^ ( / ) =
Figure (4.3) Transfer functions for neurons
83
activation function removes the nonlinearity from the artificial neuron. Without
nonlinearities, a neural network carmot model nonlinear phenomena.
4.4 Artificial Neural Networks
Lefteri H. et al. (1997) defined an artificial neural network as
"A data processing system consisting of a large number of simple, highly
intercormected processing elements (artificial neurons) in an architecture inspired by the
structure of the cerebral cortex of the brain".
These processing elements are usually organized into a sequence of layers or slabs
with full or random connections between the layers. This arrangement as shown in Figure
(4.4) uses the input layer as a buffer that presents data to the network. This input layer is
not a neural computing layer because the nodes have no input weights and no activation
functions. The top layer is the output layer, which presents the output response to a given
input. The other layer (or layers) is (are) called the intermediate or hidden layer(s)
because it (they) usually, has (have) no connections to the outside world. Typically the
input, hidden, and output layers are designated the i''', and k^ layers, respectively.
Two general kinds of neural networks are in use: the hetero-associative neural
network in which the output vector is different than the input vector, and the auto-
associative neural network in which the output is identical to the input.
84
yi yr
Output layer
Hidden layer
input layer
1
Figure (4.4) Architecture of neural network
4.5 Example of Artificial Neural Network
A typical neural network is "fully connected" meaning there is a connection
between each of the neurons in any given layer with each of the neurons in the next layer
as shown in Figure (4.5). When there are no lateral connections between neurons in a
given layer or previous layers, the network is said to be a feed-forward network.
Networks with connections from one layer back to a previous layer are called feedback
networks. Lateral connections between neurons in the same layer are also called feedback
connections. In certain cases, a neuron has feedback from its output to its own input. Ih
all cases, these connections have weights that must be trained.
Each of the connections between neurons has an adjustable weight (shown in
Figure (4.5)). This simple neural network is a fully connected, feed-forward network with
three neurons in the input layer, four in the middle or hidden layer, and two in the output
layer. The individual weights are shown on the connection and are designated by symbols
such as w-j. For instance, the symbol indicates a weight on the connection between
neurons 2 and 6.
Let us consider the neural network of Figure (4.5), which has an input vector X
consisting of components x^,x^,x^,and an output vector Y having components
and . When a signal x, is applied to neuron 1 in the input layer, the output of x,
goes to each of the artificial neurons in the middle or hidden layer, passing through
weights w,j, w,6, and w^-,. The input signals and x^ behave in a similar manner.
86
J^8 ^9
Figure (4.5) Feedforward Neural Network
87
sending signals to neurons 5, 6, and 7 with the appropriate weights (shown in Figure
(4.5)).
Now consider the behavior of neuron 6. It has four inputs from the four neurons in
the input layer that have been modified by the connection weights w,g, Wjs. and .
The first part of this neuron simply sums up these three weighted inputs. This summation
is then passed to the second part of the neuron, which is typically a nonlinear function
(e.g. a logistic curve between 0 and 1 as shown in Figure (4.3c)). The output of this
activation or squashing function is then sent to neurons 8 and 9 through weights
vvgg and Wg,. Neurons 5 and 7 behave in a similar manner. Neurons 8 and 9 collect the
weighted inputs from neurons 5, 6 and 7, sum them, and pass the sums through the
activation functions to produce and yg, the components of the output vector Y.
It is convenient to use vector and matrix notation in writing the inputs and outputs
relationship between layers. For simplicity, assuming the activation functions are linear,
the input output relationship can be written in vector matrix form as:
V,=w,-X, (4.1)
where Vj is the output vector of the neurons in the layer, W,j is the weight matrix of
neurons connecting neuron i and j, and X, is the input vector. For example, in Figure
(4.5), Vj which is the output of neuron 5 in the j''' layer can be explicidy written as:
V5 = W15X1 + W25X2 + W35X3 + W45X4 (4.2)
88
Similarly, the output vector Y,j of layer k can be written as the dot product of the weight
matrix Wj^ and the input vector Vj, that is
Y,=W^.V, (4.3)
Combining (4.1) and (4.3), Y,^can be written as:
Y.=w;, .w^.x, (4.4)
4.6 Learning and Recall
Neural networks perform two major functions: learning and recall. Learning is the
process of adapting the connection weights in an artificial neural network to produce the
desired output vector in response to a stimulus vector presented to the input buffer. Recall
is the process of accepting an input stimulus and producing an output response in
accordance with the network weight strucmre. Recall occurs when a neural network
globally processes the stimulus presented at its input buffer and creates a response at the
output buffer. Recall is an integral part of the learning process since a desired response to
the network must be compared to the actual output to create an error function.
The leaming rules of neural computation indicate how connection weights are
adjusted in response to a leaming example. In supervised learning, the artificial neural
network is trained to give the desired response to a specific input stimulus. In graded
learning, the output is "graded" as good or bad on a numerical scale, and the connection
weights are adjusted in accordance with the grade.
89
In unsupervised learning, no specific response is sought, but rather, the response
is based on the network's ability to organize itself. Only the input stimuli are applied to
the input buffers of the network. The network then organizes itself internally so that each
hidden neuron responds strongly to a different set of input stimuli. These sets of input
stimuli represent clusters in the input space (which often represent distinct real-world
concepts or features).
The vast majority of learning in engineering applications involves supervised
learning. In this case, a stimulus buffer representing the input vector is presented at the
input, and another stimulus representing the desired response to the given input is
presented at the output buffer. This desired response must be provided by a
knowledgeable teacher. The difference between actual output and desired response
constitutes an error, which is used to adjust the connection weights. In other cases, the
weights are adjusted in accordance with criteria that are prescribed by the nature of the
learning process, as in competitive learning or in Hebbian learning.
There are a number of common supervised learning algorithms utilized in neural
networks. Perhaps the oldest is Hebbian learning, named after Donald Hebb, who
proposed a model for biological learning (Hebb, 1949), where a connection weight is
incremented if both the input and the desired output are large. This type of learning
comes from the biological world, where a neural pathway is strengthened each time it is
used. "Delta rule" learning takes place when the error (i.e., the difference between the
desired output response and the actual output response) is minimized, usually by a least
squares process. Competitive learning, on the other hand, occurs when the artificial
neurons compete among themselves, and only the one that yields the largest response to a
given input modifies its weight to become more like the input. There is also random
learning in which random incremental changes are introduced into the weights, and then
either retained or dropped, depending upon whether the output is improved or not (based
on whatever criteria the user specifies).
In the recall process, a neural network accepts the signal presented at the input
buffer and then produces a response at the output buffer that is determined by the training
of the network. The simplest form of recall occurs when there are no feedback
connections from one layer to another or within a layer (i.e., the signal flow from the
input buffer to the output buffer in a "feed-forward" manner). In a feed-forward network,
the response is produced in one cycle of calculations by the computer.
91
5. BACK-PROPAGATION NEURAL NETWORK
5.1 General
Backpropagation is a systematic method for training multiple (three or more)-Iayer
artificial neural networks. The elucidation of this training algorithm in 1986 by
Rumelhart, Hinton, and Williams (1986) was the key step in making aeural networks
practical in many real-world situations. However, Rumelhart, Hinton, and Williams were
not the first to develop the backpropagation algorithm. It was developed independently
by Parker (1987) in 1987 and earlier by Werbos (1974) in 1974 as part of his Ph.D.
dissertation at Harvard University. Nevertheless, the backpropagation algorithm was
critical to the advances in neural networks because of the limitations of the one- and two-
layer networks discussed previously. Indeed, backpropagation played a critically
important role in the resurgence of the neural network field in the mid-1980s. Today, it is
estimated that 80% of all applications utilize this backpropagation algoritfmi in one form
or another. In spite of its limitations, backpropagation has dramatically expanded the
range of problems to which neural network can be applied, perhaps because it has a
strong mathematical foundation.
A backpropagation neural network is typically initialized with random weights on its
synapses. During training when the network is exposed to a data set of input and output
vectors, its weights are incrementally adjusted to reduce the difference between estimated
and acmal output vectors. When a sufficiently small difference (mean square error)
between the two output vectors has been achieved, training of the network is terminated.
92
At this point, the network is ready to categorize previously unknown inputs in the
application it was trained for.
Backpropagation belongs to the class of algorithms that perform gradient descent.
As such, it is a descendant of the Widrow-Hoff or delta rule which will be discussed in
the following section.
Among the advantages of backpropagation is its ability to store numbers of patterns
far in excess of its built-in vector dimensionality. On the other hand, it requires very large
training sets in the leaming phase, and the minima it converges to, may or may not be
global minima.
5.2 Widrow-Hoff Delta Leaming Rule
Consider a typical neuron, shown in Figure (5.1), with inputs x,-, weights w- ,a
summation function in the left half of the neuron, and a nonlinear activation function in
the right half. The summation of the weighted inputs designated by I is given by:
The nonlinear activation function used is the typical sigmoidal function and is
given by:
This function is, in fact, the logistic function, one of several sigmoidal functions, which
monotonically increase from a lower limit (0 or - I) to an upper limit (+1) as I increase.
n (5.1)
(5.2)
93
I = Y.w,x, ®(/) = ^>(/)
X n
Figure (5.1) Sketch of an artificial neuron
94
A plot of a logistic function is shown in Figure (4.3c), in which values vary between 0
and 1, with a value of 0.5 when I is zero. An examination of this figure shows that the
derivative (slope) of the curve asymptotically approaches zero as the input I approaches
minus infinity and plus infinity, and it reaches a maximum value of a/4 when I equals
zero as shown in Figure (5.2). Since this derivative function will be utilized in
backpropagation, it can be reduced to its most simple form. If we take the derivative of
equation (5.2), we get
^^^£1 = (_i)(i + e-^ (-a) = ae-^0'(/) (5.3) dl
If we solve equation (5.2) for e'"', substitute it into equation (5.3), and simplify,
we get
= {a[l-0(/)}l>(/)}=a(l-0)0 (5.4)
where 0(/) has been simplified to O by dropping (I). It is important to note that multi
layer networks have greater representational power than single-layer networks only if
nonlinearities are introduced. The logistic function (also called the "squashing" function)
provides the needed nonlinearity. However, in the use of the backpropagation algorithm,
any nonlinear function can be used if it is everywhere differentiable and monotonically
increasing with I. Sigmoid functions, including logistic, hyperbolic tangent, and
arctangent functions, meet these requirements.
The use of a sigmoid (squashing) function provides a form of "automatic gain
control"; that is, for small values of I near zero, the slope of the input-output curve is
95
1.5
a =6
1.0 —
a = 3
0.5 — a =2
a = 1
0.0
-1.0 0.0 1.0 2.0 -2.0
I
Figure (5.2) Derivative of Logistic Function
96
steep, producing a high gain. All sigmoid activation functions have derivatives wi± bell
shapes of the type (shown in Figure (5.2)). The derivative of the sigmoid has its largest
value (slope) in the transition region. Once the unit has been saturated, either positively
or negatively, the derivative of the sigmoid is almost zero. Hence, the derivative is used
to control when units are allowed to make relatively large changes to their input
connections, because it governs the magnitude of the error signal at each output unit.
Output units that are saturated are very slow to change, while output units that have
inputs not yet saturated them can be strongly influenced by an error signal.
The Widrow-Hoff delta learning rule can be derived by considering the node of
Figure (5.3), where T is the target or desired value vector and I is defined by equation
(5.1) as the dot product of the weight and input vectors and is given by
n ^ = Z (5.5)
i= 1
For this derivation, no quantizer or other nonlinear activation function is included,
but the result presented here is equally valid when such nonlinear elements are included.
2 From Figure (5.3), the squared error e as a function of all weights w,.can be
written as:
s^=(T-lf (5.6)
The gradient of the square error vector is the partial derivatives with respect to each
of these i weights:
w
X
Figure (5.3) Sketch of a neuron without activation function
98
^ = -2(r - /) = -2(r - DX; (5.7) ow. ow.
Since this gradient involves only the ith weight component, the summation of
equation (5.5) disappears.
The Widrow-Hoff delta-training rule ensures that the change in each weight vector
component is proportional to the negative of its gradient:
a — 2 Aw. = -K -— = K.2(T - /)x, = IKex, (5.8)
where AT is a constant of proportionality. The negative sign is introduced because a
minimization process is involved. It is common to normalize the input vector component
x,.by dividing by|Xj^. Equation (5.8) now becomes
= (5.9)
if we define the learning constant t] to be equal to the terms in the brackets:
i2 77 = 2ii: X (5.10)
5.3 Multi-layer Back-propagation Training
Before discussing the details of the backpropagation process, let us consider the
benefits of the middle layer(s) in an artificial neural network. A network with only two
layers (input and output) can only represent the input with whatever representation
already exists in the input data. Hence, if the data are discontinuous or nonlinearly
separable, the innate representation is inconsistent, and the mapping cannot be learned.
Adding a third (middle) layer to the artificial neural network allows it to develop its own
internal representation of this mapping. Having this rich and complex internal
representation capability allows the hierarchical network to leam any mapping, not just
linearly separable ones.
Some guidance to the number of neurons in the hidden layer is given by
Kolmogorov's theorem as it is applied to artificial neural networks. In any artificial neural
network, the goal is to map any real vector of dimension m into any other real vector of
dimension n. Let us assume that the input vectors are scaled to lie in the region from 0 to
1, but there are no constraints on the output vector. Then, Kolmogorov's theorem tells us
that a three-layer neural network exists that can perform this mapping exactly (not an
approximation) and that the input layer will have m neurons, the output layer will have n
neurons, and the middle layer will have 2m + 1 neurons. Hence, Kolmogorov's theorem
guarantees that a three-layer artificial neural network will solve all nonlinearly separable
problems. What it does not say is that (1) this network is the most efficient one for this
mapping, (2) a smaller network cannot also perform this mapping, or (3) a simpler
network cannot perform the mapping just as well. Unfortunately, it does not provide
enough detail to find and build a network that efficiently performs the mapping we want.
It does, however, guarantee that a method of mapping does exist in the form of an
artificial neural network [Poggio and Girosi, (1990)].
Let us consider the three-layer network shown in Figure (5.4), where all activation
functions are logistic functions. It is important to note that back-propagation can be
applied to an artificial neural network with any number of hidden layers [Werbos,
(1994)]. The training objective is to adjust the weights so that the application of a set of
inputs produces the desired outputs. To accomplish this the network is usually trained
with a large number of input-output pairs, which we also call examples.
The training procedure is as follows:
1. Randomize the weights to small random values (both positive and negative) to
ensure that the network is not saturated by large values of weights. (If all
weights start at equal values, and the desired performance requires unequal
weights, the network would not train at all.)
2. Select a training pair fi-om the training set.
3. Apply the input vector to network input.
4. Calculate the network output.
5. Calculate the error, (the difference between the network output and the desired
output).
6. Adjust the weights of the network in a way that minimizes this error. (This
adjustment process is discussed later.)
7. Repeat steps 2-6 for each pair of input-output vectors in the training set until
the error for the entire system is acceptably low.
101
w, 47
yv,
24 w.
T >58]) <Comp>—^ —yz —I
w,
w. w.
67,
W, 36
Layer i Layer j Layer k
Figure (5.4) Sketch of multilayer backpropagation neural network
102
Training of an artificial neural network involves two passes. In the forward pass the
input signals propagate from the network input to the output. In the reverse pass, the
calculated error signals propagate backward through the network, where they are used to
adjust the weights. The calculation of the output is carried out, layer by layer, in the
forward direction. The output of one layer is the input to the next layer. In the reverse
pass, the weights of the output neuron layer are adjusted first since the target value of
each output neuron is available to guide the adjustment of the associated weights, using
the delta rule. Next, middle layer weights are adjusted. The problem is that the middle-
layer neurons have no target values. Hence, the training is more complicated, because the
error must be propagated back through the network, including the nonlinear functions,
layer by layer.
5.3.1 Calculation of Weights for the Output-Layer Neurons
Let us consider the details of the backpropagation learning process for the weights
of the output layer. Figure (5.5) is a representation of a train of neurons leading to hidden
and output layers with neurons 5 and 8, input weights Wjsand Wjg, outputs OjjC/) and
OjgC/) respectively. For illustration purposes, the path with a target value of Tg wiU be
considered here in the calculation of weights. The notation (I) in O will be dropped for
convenience. The output of the neuron in layer k is subtracted from its target value and
squared to produce the square error signal, which for neuron 8 at layer k is
£=S^ =[T^ -^58]=[^8 ->^8] (5.11)
w 25
C 25 o 25
w 58
Layer i Layer j
—*( ) J
t;
Layer k
Figure (5.5) Layers of neurons for calculating weights at output during training
O
104
Since only one output error is involved
(5.12)
The delta rule indicates that the change in a weight is proportional to the rate of change of
the square error with respect to that weight, that is,
ds; — (5.13)
5W58
where ^jgis constant of proportionality called learning rate. To evaluate this partial
derivative, the chain rule of differentiation is used:
^ ggg- d y ^
Swjg 5^8 ^^58 ^^58
Each of these terms are evaluated in turn. The partial derivative of equation (5.12) with
respect to gives
^ = -2[T,-y,] (5.15) ^8
From equation (5.4), we get
^Zs _ ^^58
= ay^\X-y^] (5.16)
From Figure (5.4) is the sum of the weighted inputs from the middle layer, that is,
^ S 8 = Z ^ / 8 ^ m y m = j - 3 (5.17) 7=4
105
Taking the partial derivative with respect to Wjg yields
1^ = 25 (5.18)
Since we are dealing with one weight, only one term of the summation of equation (5.17)
remains. Substitoting equations (5.15), (5.16), and (5.18) into equation (5.14) gives
=-2a[T,-y,]y,[l-y,]0^=-S,s'^^ (5.19) dwss
where is defined as
<^58 = 2a[r8 - jglygtl- = 2^8 (5.20) 5/58
Substituting equations (5.19) into equation (5.13) gives
d s ^ ^^58 "758^58^25 (5-21)
or
= + (5.22)
Equation (5.22) can be written for any neuron in layer j and k as:
+ = + (5.23)
where N is the number of iterations . An identical process is performed for each weight of
the output layer to give the adjusted values of the weights. The error term from
equation (5.20) is used to adjust the weights of the output layer neurons using equation
106
(5.11) and (5.12). It is useful to discuss why the derivative of the activation function is
involved in this process. In equation (5.10), we have calculated an error which must be
propagated back through the network. This error exists because ±e output neurons
generate the wrong outputs. The reasons are (1) their own incorrect weights and (2) the
middle-layer neurons generated the wrong output. To assign this blame, we
backpropagate the errors for each output-layer neuron, using the same interconnections
and weights as the middle layer used to transmit its outputs to the output layer.
When a weight between a middle-layer neuron and an output-layer neuron is large,
and the output layer neuron has a very large error, the weights of the middle layer
neurons may be assigned a very large error, even if that neuron has a very small output,
and could not have contributed significantly to the output error. By applying the
derivative of the squashing function, this error is moderated, and only small to moderate
changes are made to the middle-layer weights because of the bell-shaped curve of the
derivative function (shown in Figure (5.2)).
5.3.2 Calculation of Weights for the Hidden Layer Neurons
Backpropagation trains hidden layers by propagating the adjusted error back
through the network, layer by layer, adjusting the weight of each layer as it goes. The
equations for the hidden layer are the same as for the output layer except that the error
term <?25 must be generated without a target vector. We must compute for each
107
neuron in the middle layer that includes contributions from errors in each neuron in the
output layer to which it is connected.
Consider a single neuron number 5 in the hidden layer just before the output layer,
(see Figure (5.5)). In the forward pass, this neuron propagates its output values to neurons
7,8, and 9 in the output layer through the interconnecting weights Wj,, Wsg, andvvsg.
During training, these weights operate in reverse order, passing the value of the error
S from the output layer back to the hidden layer. For example the error term <^58 is
produced when the signal flow back form neuron 8 to neuron 5. Each of the above
connection weights is multiplied by the value of the neurons in the output layer. The
value of S needed for the hidden-layer neuron is produced by surmning all such
products.
The arrangement in Figure (5.6) shows the errors that are backpropagated to
produce the change in Since all error terms of the output layer are involved, the
partial derivative involves a summation over the outputs of the neurons in the k'^ layer.
The procedure for calculating is substantially the same as calculating <5"25; start with
the derivative of the square error with respect to the weight for the middle layer that is to
be adjusted. Then, in a manner analogous to equation (5.13), the delta rule training gives
(5.24)
Layer j
»(Comp>— y?
7:
Layer i
*C K.Comp>
— O q
0^r> >(Comp>-3
"J Layer k
Figure (5.6) Representation of neurons for calculating the change of weight in the hidden layer
O 00
109
where the total mean square is now defined as:
(5.25) k=l t=7
Since several output errors may be involved, he learning constant 7735 is usually,
but not necessarily, equal to 753.
Using the chain rule of differentiation, the last term of equation (5.24) can be written as:
.££L= y d i s k ^^25 ^^25 , d w ^ T t i d y ^ 50 25 5/25 5^25
Let us consider neurons 5 and 8. Each term in equation (5.26) is evaluated similar to
equation (5.14).
The first two terms are already given by equation (5.15) and (5.16), which are:
^ = -2[r,->-, 1 = -2£s (5.27)
IZi. diss
= ccy^ [1 - 8 ] (5-28)
Taking the derivative of equation (5.17) with respect to O25 gives
dl.
5CD35 =^58 (5.29)
Note that only one connection is involved.
110
Changing equation (5.16) to account for the middle layer gives
^^=a<I> j3 [ l -®js ] (5 .30 ) 5/25
Similarly, equation (5.17) can be rewritten in terms of middle layer by substituting the i'*'
layer input JCjfor the layer input taking the derivative gives
dl. 25 _
dw^s = x. (5.31)
Substituting equation (5.28) thought (5.31) into equation (5.26) yields
= 2 ~ lyt (1 - 3'fc )]H'54a[025 (1 - 25 )]^2 (5.32) dW2s k=l
Using the definition of ^58 or in equation (5.20) yields
de^ ^ 50, = "Z sk v^5i " = 4,5,6 (5.33)
5W2„ U " "
Defining as
50 ^2n~^Sk^Sk (5.34)
equation (5.33) becomes
(5.35) ^2„ n=7
Substituting equation (5.33) and (5.34) into equation (5.24) gives
I l l
9
(5.36)
and hence
6
(^ +1) = H'2„ (N) + 773„ (^N)x^ J^S.„ (iV) (5.37)
If there are more than one middle layer of neurons, this process moves through the
network, layer to layer, to the input, adjusting the weights as it goes. After completion, a
new training input is applied and the process repeats until an acceptably low error is
reached. At this point the network is trained.
5.4 Momentum
The gradient descent can be very slow if the leaming rate rj is small and can
oscillate widely if 7] is too large. This problem essentially results from error-surface
valleys with steep sides but a shallow slope along the valley floor. One efficient and
commonly used method that allows a large leaming rate without divergent oscillations
occurring is the addition of a momentum term to the normal gradient descent method
[Plaut et. al., (1986)]. The idea is to give the weight some inertia or momentum so that it
tends to change in the direction of the average downhill force that it feels. Hence,
equation (5.23) becomes
Awj,(.N+1)= (5.38)
112
where // is the momentum coejficient (typically about 0.9). The new value of the weight
then becomes equal to the previous value of the weight plus the weight change of
equation (5.21), which includes the momentum term. Equation (5.23) now becomes
= Wj^{N) + Ah'^^.(A^+1) (5.39)
This process works well in many problems, but not so well in others. Another way
of viewing the purpose of momentum is it overcomes the effects of local minima. The
momentum term will often carry a weight change process through one or more local
minima, and the network will converge to global minima. This is perhaps its most
important function.
113
6. STOCHASTIC AND NEURAL NETWORK MODEL APPLICATIONS
6.1 Methodology
Assessment of the neural network approach or the stochastic approach under field
conditions requires a large amount of transmissivity and head data with free measurement
errors, which rarely exists. Model validation under field conditions is another issue
fraught with difficulty and uncertainty. A typical example of this can be viewed by
contrasting two approaches to estimation of the transmissivity distribution in Avra Valley
in Arizona. A composite objective function method was used by Clifton and Neuman
(1982), and a geostatistical cokriging method was used by Rubin and Dagan (1987)
developed for this site. There was a dramatic difference between the transmissivity
distributions predicted by these two methods. These differences are particularly
dismrbing when one recognizes that these two methods are among the more advanced
and highly regarded techniques, and that this field involves an atypically large amount of
transmissivity data. In any case, the results from this field application are essentially
ambitious, see Gelhar (1993) for more details.
In this study, two-dimensional steady-state saturated flow will be considered in the
stochastic simulations and the neural network design. Boundary conditions will be treated
deterministically so that the only random parameter is spatially heterogeneous
transmissivity. Transmissivity represents the permeability of the porous aquifer matrix,
with respect to the fluid properties of water, integrated vertically over the thickness of the
114
saturated zone of the aquifer. Data on transmissivity is commonly obtained from
pumping tests where a well is pumped for some period of time and water levels in the
well or nearby observation wells are monitored. Measurements thus obtained often show
less variability than, measurements made in the laboratory on "undismrbed" aquifer
matrix samples or measurements obtained by slug injection or withdrawal, because
pumping tests average a greater volume of aquifer material. While the true sample
volume of a pumping test is unknown, treating transmissivity data as point measurements
makes sense only if the scale of the model is much greater than the scale encompassed by
the pumping test. A heterogeneous property averaged over larger volumes of sample
shows less variability and a longer correlation scale [Joumel and Huijbreghts, 1978] than
does the theoretical underlying point-scale random field. Using a finite element or finite
difference discretization for a numerical solution can introduce an artificial correlation
between transmissivity values unless the element size is much smaller than the
correlation scale of the transmissivity field [Dagan, 1985].
Because of the problems described above, for this study, neural networks and
stochastic approaches were assessed and compared using hypothetical heterogeneous
aquifers. It was assumed that all necessary statistical parameters characterizing the spatial
variability of Ln[T] are known exactly and measurements are considered error-free.
To demonstrate the performance of the neural network simulation, several
realizations of transmissivity fields consisting of different degrees of heterogeneity for a
hypothetical two-dimensional aquifer were generated, and these fields are considered as
the real world or "true" fields. In this analysis, the perturbed f-field (transmissivity) was
115
generated using a spectral random field generator, Gutjahr (1989,1992,1994), assuming
an exponential covariance function. Using the generated transmissivity fields, head fields
for these hypothetical aquifers were obtained fi^om the governing flow equation (1) using
multi-grid finite difference method Steve Schaffer's (1995) with known boundary
conditions of types 1 and 2. The resultant head field is considered as the real-world or
"true" hydraulic head corresponding to the "true" transmissivity field.
The layout of the hypothetical aquifer is shown in Figure (6.1). For the purpose of
random field simulation and numerical solution of the governing flow equation, the
domain was dicretized into a grid of 32 x 32 uniform blocks, each measuring 1x1. Each
block was assigned a constant mean transmissivity value. The upper and lower sides of
the aquifer were defined as no flux boundaries and the left and right-hand sides are
assigned as prescribed head boundaries (to maintain a uniform gradient of 0.1). Once the
transmissivity of the hypothetical aquifer was generated fi-om which the corresponding
head field was simulated, either a random or specified sampling scheme was then
employed to determine the sample locations for transmissivity and head data in the
hypothetical aquifer. Note that the aquifer dimensions could be scaled by any factor,
provided the conductivity, and constant head boundaries are scaled by the same factor,
Harvey and Gorelick (1995).
No FLow Boundary
Grid size = 32 X 32
Blocks = 1X1 uniform
Ln[T] = -3.0
Integral scale = 0.5
aj = 1,2, andS
No FLow Boundary
Figure (6.1) Hypothetical Aquifer Layout
117
Both the stochastic (ICS) and the neural network approaches are examined and
compared using the mean square error (MSE) criteria as follows:
where and are the observed and estimated head or transmissivity values at location
i respectively, and N is the total number of nodes compared.
6.2 Iterative Conditional Simulation (ICS)
Realizations of f-fields for three different cases with an exponential correlation
scale of 0.5 (approximately one sixth of the domain) in both x and y directions were
generated for the stochastic model application. All these cases used a mean transmissivity
value of Ln|T] = -3.0, and each case used a different variance of transmissivity, these are
1, 2, and 5.
The next step is to obtain the h-field that satisfies the full nonlinear flow equation.
Both the generated/field and the boundary conditions given by the gradient of h are used
to solve for the h field. Both the/and h fields were sampled at specified locations using
both uniform and non-uniform sampling schemes. For the uniform sampling scheme,
thirty evenly spaced data locations of both the / and h fields, as shown in Figure (6.2),
were sampled. Figure (6.3) shows ten transmissivity and thirty head sample locations of
for the non-uniform sampling scheme. Note that for both sampling schemes, /and h data
are not necessarily sampled at the same locations.
(6.1)
I | 2 | 3 | . | S | 6 | 7 | a 1 ? 1 ia| 11 12| 13| 14| 1S| 16| I7| IS isl 20J21 26|27|3s|29|30|3l|32
No Flow Boundary
fh fil fh. fll fh fh.
fll fh. fh fh fh fh. & •S c a 0 m
ir •0 a a 0 0
tJ 0 fll fh. fh. fh fh fh. •0 • X • X *> c a g jj a 0 u fh. fh fh fh fh fh. e 0 u
fh fh fh fh fh fh
No Flow Boimdary
Figure (6.2) Unifonn Sampling Scheme of Transmissivities and Head Data
4 5 6 1 7 s 1 9 1 la 11 12 13 14 15 16 17 IS 19 20 2l|22|23 24 2SI26 27|2a 29130 31 "1 No Flow Boxmdary
No Flow Boundary
Figure (6.3) Non-Uniform Sampling Scheme of Transmissivtties and Head Data
120
The final step is to compute covariances and cross-covariances between actual
conditioning locations, and to compute covariances between all grid locations and
conditioning locations to obtain the kriging matrices for conditioning. At this point, the
model is ready for acmal conditioning.
To perform an actual conditioning, a new random f-field (realization) has to be
generated and conditioned on previously selected/data values at the specified locations.
The corresponding h-field is obtained in the same manner mentioned above. As
conditioning goes, these two conditioned fields should match, within measurement errors,
the values of the sampled data. Note that the conditioned f and h fields do not actually
satisfy the flow equation, but they do satisfy the linearized form of the flow equation,
particularly when high variances of transmissivities are used. To overcome this
limitation, an alternative method, which involved iterative conditioning, is used.
In the iterative approach, the transmissivity field is conditioned on both the head
and transmissivity measurements. This transmissivity field is then used, along with the
boundary conditions, to solve for a new head field. The resulting fields satisfy the
continuity conditions, but the iterated heads may not agree with the measured heads.
Consequently, the new head field at the previous data locations is used again to condition
the transmissivity field, and the process is repeated until the iterated heads are "close" to
the sampled heads. Unfortunately, the solution to the acmal flow equation is very
different from the linearized equation for high variances of transmissivities. Thus,
iteration can only try to improve the head differences, but it can never completely remove
121
them. However, this procedure seems to significantly improve the head differences after
four to six iterations.
6.3 Neural Network Simulation
The successful performance of a neural network is sensitive to its physical structure
or architecture (i.e., number of input nodes, hidden layer nodes, and output nodes). The
appropriate architecture for a neural network is highly problem dependent. In order to
map inputs of the real system to the system's outputs, the strucmre of the neural network
must first be defined. The number of neurons in the input and output layers coincides
with the number of the system inputs and outputs, respectively. The number of neurons
in the hidden layer H must be found empirically. The notation I-H-O is often used to
denote a network with I neurons in the input layer, H neurons in the hidden layer, and O
neurons in the output layer. For example, 4-5-1 ANN has four, five, and one neuron in
the input, hidden, and output layers, respectively.
To reproduce relationships between input and output data sets of a system, an ANN
must be trained. Training consists of (i) calculating output sets from input sets, (ii)
comparing measured and calculated outputs, and (iii) adjusting the weights and the bias
in the transfer function for each neuron to decrease the difference between measured and
calculated values. Each input-output data set, which is often called a pattern, has to be
presented to the ANN many times until changes in errors becomes insignificant. The
mean square sum of differences between measured and calculated value serves as a
measure of the goodness-of-fit.
122
In this study, a feed-forward backpropagation neural network was used to map
input of the real system to its output. Knowledge acquisition during training is
accomplished by supervised learning, in which the neural network is provided with the
desired response to the training input patterns. This learning approach allows the neural
network to develop generalization or abstraction capabilities, unlike unsupervised
learning (self-organization) or reinforcement (grading output).
6.3.1 Development of training and testing patterns
Generated transmissivity fields (realizations) as explained in sec. (2.5) served as
input training patterns into the ANN. Each training pattern consists of an input and output
sets of data values. Transmissivity values at sampled locations are used as input to the
ANN (total number of neurons in the input layer equals the total number of sampled
locations). The corresponding outputs are the values of transmissivities at unsampled
locations (total number of neurons in the output layer equals the total number of
unsampled locations). The number of neurons in the hidden layer H are usually found
empirically; one suggestion is the geometric mean of the number of neurons in input and
output layers, as recommended by Maren et al., (1990); another given by Kolmogorov's
theorem (1957), suggests that the number of neurons in the hidden layer H is 21+1 (twice
the input +1). Both suggestions were tried in this study, and no significant difference was
found in the performance measure of the ANN. In order to keep the ANN as simple as
possible, and without compronusing performance, the second suggestion was used,
guaranteeing a smaller number of connection weights.
123
In this study, two different artificial neural networks (ANN) were built to simulate
transmissivity field of the hypothetical aquifer shown in Figure (6.1). Both ANNs have
three feed-forward layers (input, hidden and output), and they both share the same output
target (transmissivities at unsampled locations). The first ANN, named f-ANN as shown
in Figure (6.4) uses only transmissivity values at sampled locations (/^^ ... /^^ ) in its
input layer. The corresponding output layer () has neurons representing all
unknown (to be estimated) transmissivity values at unsampled locations as shown in
Figure (6.4), this design corresponds to the aquifer with a uniform sampling scheme
shown in Figure (6.2). With this input scheme, the f-ANN is written as 30-61-994 (iaput-
hidden-output). The second ANN, named fh-ANN, is the same as f-ANN, except it
utilizes both sampled transmissivities and head fields in its input layer. The fh-ANN is
designated as 60-121-994. A sketch of this ANN is shown in Figure (6.5); this design
corresponds to the aquifer with a non-uniform sampling scheme shown in Figure (6.3).
The success of the training strongly depends on the normalization of the data and
on ±e training parameters, namely the learning rate and momentum coefficients.
Unfortunately, there are no general rules for selecting the optimal values of these
parameters and one must resort to experience. In this study, both networks were trained
the same way, using a total of 2000 generated training patterns, and learning rate and
momentum of 0.7 and 0.8 respectively.
<3r
<r
f (US): 60-121-994 (NUS): 40-81-994
Output Layer
Hidden Layer
Input Layer
h
Figure (6.5) Architecture of fh-ANN
to
126
The training procedure as follows:
Step 0: Connection weights are first randomly initializing within ±e interval
(-0.3,0.3), this is done using a standard random generator algorithm to produce
uniformly distributed random numbers in a specified range. A small range was
selected to ensure diat the network was not saturated by large values of weights.
Step 1: Pairs of training patterns are selected randomly firom the training sets and
presented to the ANN. Note that the data were normalized to fall within the interval
(0.2,0.8).
Step 2: Input vector is applied to the network input.
Step 3: Network output is calculated.
Step 4: Mean Square Error (MSE) is calculated, which is the difference between
the network output and the desired output.
Step 5: Connection weights are adjusted to minimize the MSE.
Step 6: Steps 1-5 are repeated for each pair of input-output vector in the training
set, until no significant change in the MSE is detected for the system, or until a
desired value of MSE is reached.
After training is completed, the final connection weights are kept fixed, and new
input patterns are presented to the network to produce the corresponding output
consistent with the intemal representation of the input/output mapping. Notice that the
nonlinearity of the sigmoidal function in the processing elements allows the neural
\21
network to learn arbitrary nonlinear mappings. Moreover, each node acts independently
of all the others and its functioning relies only on the local information provided through
the adjoining connections. In other words, the functioning of one node does not depend
on the states of those other nodes to which it is not connected. This allows for efficient
distributed representation and parallel processing, and for intrinsic fault-tolerance and
generalization capability.
Two common strategies for measuring how successfully the ANN has been trained
are to test the capability of the network to correctly predict the output for (1) the input
sets that were originally used to train the network, and (2) input sets that were not in the
training set. The first strategy is often referred to as an accuracy performance, and the
second as a generalization performance. Both f-ANN and fh-ANN showed an average
high accuracy performance of 99.9%; the average generalization performance over 100
new patterns was 95.4% and 96.3% for f-ANN and fh-ANN respectively. Note that these
high scores of performances does not mean that a perfect match is found between
network output and the desired output, but rather it indicates that the networks are
healthy, well trained and capable to generalize, regardless of its output.
128
7. RESULTS AND DISCUSSION
To examine both stochastic and neural approaches, 100 f-fields for each of three/
variances (LO, 2.0, and 5.0) were generated, and their corresponding h-fields simulated.
Both/and h fields represent real or "true" transmissivity and head distributions over the
entire domain. Both uniform and non-uniform sampling schemes were utilized for both
the stochastic and neural network applications. These applications were used to estimate
the true fields for all realizations. Although similar results were achieved for all fields,
only three randomly selected fields for each variance are discussed in detail below.
Combining the three variances with the two sampling schemes produced six
scenarios, shown in table (7.1). The table also shows the number of/and h data used for
each scenario. The results of the estimation methods are shown in figures, each having
seven contour maps. The contour maps show true and simulated/or h fields generated
by the (ICS) and the two neural networks (fh-ANN, f-ANN), utilizing both the uniform
and non-uniform sampling schemes. Corresponding to these contour maps are scatter-
plots depicting real versus simulated/or h values at unsampled locations.
129
Table (7.1) Scenarios considered in the model applications
Scenario Sampiing Scheme #o f /
data
# o f h
data
1 Uniform (US) 30 30 1
2 Non-Uniform (NUS) 10 30 1
3 Uniform (US) 30 30 2
4 Non-uniform (NUS) 10 30 2
5 Uniform (US) 30 30 5
6 Non-uniform (NUS) 10 30 5
For visual comparisons, all contour maps of the same figure are scaled with the
same range of colors as shown in the bar-scale. As the legends show, color shading
indicates the magnitudes of the transmissivity values over the domain.
Contour maps are shown in Figures (7.1) through (7.3) for Scenarios 1 and 2. Each
figure represents one of the three randomly selected transmissivity fields of variance 1.0.
For both scenarios, all three methods performed well in estimating the overall spatial
variability of the "true" transmissivity fields. The results of Scenario 1 (uniform
sampling) were always superior to those of Scenario 2 (non-uniform sampling),
especially for the conditional iterative simulation.
Scenario 1 (a) True f F»ld
0 10 20 30
(b) ICS (US)
(d) fhnANN (US)
(f) frANN (US)
Scenario 2 (c) ICS (NUS)
0 10 20 30
(e) fh^N (NUS)
0 10 20 30
(g) f-ANN (NUS)
Figure (7.1) True and estimated f fields for Cr^= 1.0, (RF#1)
Scenario 1
(a) True f Field
0 10 20 30
(b) ICS (US)
0 10 20 30
(d) fh-ANN (US)
(f) f-ANN (US)
Scenario 2
(c) ICS (NUS)
0 10 20 30
(g) f-ANN (NUS)
0 10 20 30
Figure (7.2) True and estimated f fields for (T^= 1.0, (RF#2)
Scenario 1 (a) Taie f Rdd
(d) fh^N (US)
0 10 20 30
(f) f- N (US)
0 10 20 30
Scenario 2 (c) ICS (NUS)
0 10 20 30
(g) frANN (NUS)
0 10 20 30
Figure (7.3) True and estimated f fields for cr^= 1.0, (RF#3)
133
The contour maps show that in general, the neural networks tend to smooth the
transmissivity fields. That is, gradual transitions from lower to higher regions of
transmissivity exist. This contrasts with the ICS, which produces abrupt changes in the
transmissivity fields.
For both Scenarios 1 and 2, the (fh-ANN) neural network outperformed both the (f-
ANN) neural network and the ICS. This is verified by the computed mean square errors
as shown by the bar charts in Figure (7.4). The scatter-plots also show how uniform
sampling (Scenario 1) resulted in better estimates of the true transmissivity values than
non-uniform sampling (Scenario 2). This is shown by the greater concentration of points
along the 45-degree line for Scenario 1.
The results of Scenarios 3 and 4, shown in Figures (7.5) through (7.7), had similar
behavior to Scenarios 1 and 2. However, since the variance is twice that of Scenarios 1
and 2, the computed mean square errors for Scenarios 3 and 4 were larger, as shown in
Figure (7.8). Also, the discrepancy between the uniform (Scenario 3) and non-uniform
(Scenario 4) sampling estimates increased, particularly for the ICS and the (f-ANN)
neural network.
The results of Scenarios 5 and 6 are shown in Figures (7.9) through (7.11). There
is a significant difference in perform.ance between the ICS and the neural networks. As
shown by the contour maps, regions of shading not seen in the true field appear in the
ICS fields. This indicates poorer performance in estimating the true transmissivity fields.
As with earlier Scenarios, the neural networks were better able to estimate the true fields.
This is shown by the close agreement in color shading of the contour maps. Both the
Estimated f Estimated f Estimated f
orci e •n n
4^
O "1 q H s o>
I. i a n «)
hJ (0 c 3
U) 0> 3 a 3' a
Estimated f Estimated f Estimated f
Si rt a (A
O O
z 0 3 1 c 3
(0 fi) 3 D 3' 0
U) 4^
Scenario 3 (a) True f Field
10 20 30
(b) ICS (US)
0 10 20 30
(d) fh-ANN (US)
0 10 20 30
(f) f N (US)
-2.5
-2.5
I -5
-2.5
I 4
2
0
10 20 30
-2.5
I -5
Scenario 4 (c) ICS (NUS)
0 10 20 30
(e) fhrANN (NUS)
0 10 20 30
(g) fnANN (NUS)
10 20 30
Figure (7.5) True and estimated f fields for cr^= 2.0, (RF#1)
Scenario 3 (a) Tme f Field
(d) fh-ANN (US)
0 10 20 30
(f) fnANN (US)
0 10 20 30
Scenario 4 (c) ICS (NUS)
(e) fh-ANN (NUS)
(g) f-ANN (NUS)
Figure (7.6) True and estimated f fields for 0"^= 2.0, (RF#2)
137
Scenario 3
(a) True f Field
0 10 20 30
(b) ICS (US)
0 10 20 30
(d) fh^N (US)
0 10 20 30
(f) f-ANN (US)
1 - 6
Scenario 4
(c) ICS (NUS)
0 10 20 30
(e) fhrANN (NUS)
0 10 20 30
(g) f N (NUS)
10 20 30
s
10 20 30
Figure (7.7) True and estimated f fields for (7^=2.0, (RF#3)
138
Unifonn SamDUna Non-Uniform Sampling
"O Q
ca m
0.0 -
-5.0
T3 O
OS LU
•a
<0 Ui
RF#1
Truef
RF#2
0.0 -
True f
RF#3
True f
-5.0
6.0
-6.0
0.0 -
True f
True f
True f
Figure (7.8) True vs. estimated ICS(«), fh-ANN(«), and f-ANN(#) f fields
for 0"y- =2.0, (RF# 1,2, and 3).
Scenario 5
(a) True f Field
0 10 20 30
(b) ICS (US)
(d) fhnANN (US) 30
30 20 0 10
(f) f N (US)
139
Scenario 6
(c) ICS (NUS)
(e) fh^N (NUS)
0 10 20 30
(g) f-ANN (NUS)
Figure (7.9) True and estimated f fields for <T^ = 5.0, (RF#1)
140
Scenario 5 (a) True f Field
I
0 10 20 30
(b) ICS (US)
0 10 20 30
(d) fh-ANN (US)
0 10 20 30
(f) f-ANN (US)
I 12
1 -7 .5
-15
12
I ^ 0
-7 .5
I -15
-7 .5
' -15
0 10 20 30
-7 .5
-15
Scenario 6 (c) ICS (NUS)
0 10 20 30
(e) fh- N (NUS)
0 10 20 30
(g) f-ANN (NUS)
-7 .5
-15
-7 .5
-15
• IIIISSSSBSSIMJTX -7 .5
-15
Figure (7.10) True and estimated f fields fora^= 5.0, (RF#2)
141
Scenario 5 (I^^ITiSHetFiFidtl
0 10 20 30
(b) ICS (US)
0 10 20 30
(d) fhrANN (US)
0 10 20 30
(f) f-ANN (US)
Scenario 6 (c) ICS (NUS)
0 10 20 30
(e) fh-ANN (NUS)
0 10 20 30
(g) fnANN (NUS)
10 20 30
I ^ 3
0
-4
I ^ 3
0
-4
' -8
10 20 30
Figure (7.11) True and estimated f fields for<Tj^=5.0, (RF#3)
142
scatter-plots and computed mean square errors shown in Figure (7.12) further support
these observations.
For all scenarios, the corresponding head contour maps and scatter-plots are shown
in Figures (7.13) through (7.24). Because of the non-linearity between/and h, the results
of the scenarios for head are not necessarily consistent with those of transmissivity.
The (f-ANN) neural network generally performed poorly in estimating the true
head field for almost all scenarios. As in the transmissivity case, all estimation methods
were superior when utilizing the uniform sampling to the non-uniform sampling scheme.
In all scenarios utilizing non-uniform sampling, the (fh-ANN) neural network
outperformed the other two estimation methods. The ICS achieved better results in
estimating the head fields than the transmissivity fields. This is particularly true for the
uniform sampling scheme.
Figures (7.25) through (7.27) are scatter-plots depicting real versus simulated head
at all sampled locations for variances 1.0, 2.0, and 5.0, respectively. Each figure shows
both uniform and non-uniform sampling schemes for each of the three randomly selected
realizations.
For all variances of f, the conditional iterative simulation reproduces the exact
values of transmissivity at the sampled locations. For/variances of 1.0 and 2.0, the ICS
approach preserves the true values of h at the sampled locations. However, at a variance
of 5.0, the true h values were not preserved at all sampled locations, especially for the
Estimated f
Estimated f
Estimated f Estimated f
Estimated f Estimated f
144
Scenario 1 (a) True h field
0 10 20 30
(b) ICS (US)
0 10 20 30
(d) fhrANN (US)
0 10 20 30
(f) f-ANN (US)
i
I 0 .07
0 .035
0
-0 .045
-0 .09
I 0 .07
0 .035
0
-0 .045
-0 .09
1 0 .07
0 .035
0
-0 .045
-0 .09
0 .07
0 .035
0
-0 .045
' -0 .09
Scenario 2 (c) ICS (NUS)
0 10 20 30
(e) fhnANN (NUS)
0 10 20 30
(g) f-ANN (NUS) 30
0.07
0.035
Q
I -0 .045
I -0 .09
0.07
0 .035
0
1-0 .045
' -0 .09
0.07
0 .035
0
-0 .045
-0 .09
Figure (7.13) True and simulated h fields for crj^= 1.0, (RF#1)
145
Scenario 1
(d) fh-ANN (US)
(f) fnANN (US)
0 10 20 30
Scenario 2 (c) ICS (NUS)
0 10 20 30
(e) fh-ANN (NUS)
0 10 20 30
Figure (7.14) True and simulated h fields for cr^= 1.0, (RF#2)
146
Scenario 1 (a) True h field
0 10 20 30
(b) ICS (US)
0 10 20 30
(d) fh-ANN (US)
0 10 20 30
(f) f-ANN (US)
Scenario 2
(e) fh-ANN (NUS)
0 10 20 30
(g) f-ANN (NUS)
Figure (7.15) True and simulated h fields foro'y= 1.0, (RF#3)
147
Uniform Sampling Non-Unifonn Sampling
hu1-23
RF#1 OXU-
-QM -
-0.12
<0.12 -0X8 -OJ04. OJOO True h
0.04 00)8
hn1-23
•0.12 -0.08 -OJM 0.00 True h
QM 0.08
True h True h
hu1-99
•OM -OJ02
True h True h
Figure (7.16) True vs. simulated ICS(* ), fh-ANN(« ), and f-ANN(» ) h fields
for 0'j-= 1.0, (RF# 1,2, and 3).
Scenario 3
(a) True h field
0 10 20 30
(b) ICS (US)
0 10 20 30
(d) fh-ANN (US)
0 10 20 30
(f) f-ANN (US)
148
10 . 1 8
jo.09
H-0.04
-0.08 Scenario 4
(c) ICS (NUS)
0 10 20 30
(e) fh-ANN (NUS)
0 10 20 30
(g) f- N (NUS)
0 10 20 30
Figure (7.17) True and simulated h fields for cr^= 2.0, (RF#1)
149
Scenario 3
(a) True h field
0 10 20 30
(b) ICS (US)
0 10 20 30
(d) fh-ANN (US)
0 10 20 30
(f) f N (US)
I 0 .04
. . . .
-0.065
1-0.13
I 0 .04
-0 .065
-0 .13
0 .04
0.02
0
1-0.065
I -0 .13
I 0 .04
0 . 0 2
-0 .065
I -0 .13
Scenario 4
(c) ICS (NUS)
0 10 20 30
(e) fh-ANN (NUS)
0 10 20 30
(g) f-ANN (NUS)
0.04
0 . 0 2
0
-0 .065
1-0 .13
I 0 .04 . . . .
-0 .065
-0 .13
r 0 .04 ...3
-0 .065
-0 .13
Figure (7.18) True and simulated h fields for C7^= 2.0, (RF#2)
150
Scenario 3
(a) True h field
0 10 20 30
(b) ICS (US)
0 10 20 30
(d) fhrANN (US)
0 10 20 30
(f) f-ANN (US)
0 10 20 30
Scenario 4
(c) ICS (NUS)
(e) fh-ANN (NUS)
(g) frANN (NUS)
0 10 20 30
Figure (7.19) True and simulated h fields for cr^= 2.0, (RF#3)
simulated h
Simulated h
S) rt a VI
6 u ot
6 U o
S Ih
8"
Simulated h Simulated h
Simulated h Simulated h
Scenario 5 (a) True h field
0 10 20 30
(b) ICS (US)
(d) fh-ANN (US)
(f) f-ANN (US)
0 10 20 30
152
Scenario 6 (c) ICS (NUS)
0 10 20 30
(e) fh-ANN (NUS)
(g) f-ANN (NUS)
0 10 20 30
Figure (7.21) True and simulated h fields for(T^= 5.0, (RF#1)
153
Scenario 5 (a) True h field
0 10 20 30
(b) ICS (US)
0 10 20 30
(d) fh^N (US)
0 10 20 30
(f) f-ANN (US)
m
. . .
0.05
0
-0 .0S5
' -0 .17
0.1
O.OS
0
-o.oas 10
Scenario 6 (c) ICS (NUS)
' -0 .17
f " ' 0.05
0
-0 .085 10
0 10 20 30
(e) fh-ANN (NUS)
' -0 .17
0.1
0.05
0
-0 .085 10
0 10 20 30
(g) f-ANN (NUS)
10 20 30 ' -0 .17
10 20 30
0.1
0.05
0
-0 .085
' -0 .17
0.1
0.05
0
-0 .085
' -0 .17
0.1
0.05
0
-0 .085
' -0 .17
Figure (7.22) True and simulated h fields for crj^= 5.0, (RF#2)
154
Scenario 5 (a) True h field
10 20 30
(b) ICS (US)
0 10 20 30
(d) fh-ANN (US)
0 10 20 30
(f) f-ANN (US)
0.13
0 .065
0.05S
r 0.13
0 .065
0
I -0 .055
-0.11
0.13
0 .065
-0 .055
-0.11
10 20 30
I 0.13
0 .065
-0 .055
'-0.11
Scenario 6 (c) ICS (NUS)
30
20
10
0 10 20 30
(e) fh-ANN (NUS)
0 10 20 30
(g)f- N(NUS) 30
20
10
10 20 30
I
I
I 0 .13
0 .065
0
I -0 .055
I-0.11
0.13
0 .065
0
I -0 .055
-0.11
0.13
0 .065
0
I -0 .055
1-0.11
Figure (7.23) True and simulated h fields for C7j:= 5.0, (RF#3)
simulated h
Simulated h
B) 2. a v>
simulated h Simulated h
Simulated h Simulated h
Ul
156
Uniform Sampling Non-Uniform Sampling
hu1-23
0.02
CO -0.04 -
0.08
-0.10 "T r •0.04 0.02
True h
•0.04 0.00 0.04 True h
0.08
hu1^79
0.04 -
CO 0.00 -
0.08
0.02 -
hn1-23
•0.04 •0.04 0.00 0.04
True h
0.08
CO -0.04 >
•0.10 -0.10
True h
0.08 hn1-79
0J14 '
CO 0.00 -
0.08
0.08 hu1-99
JZ 0.04
0.00 0.04 0.08
0.08
.C 0.04
3 E
CO 0X0
-0.04
True h True h
Figure (7.25) True vs. simulated ICS (O), fh-ANN (A), and f-ANN (+) head
at sampled locations for = 1.0, (RF# 1,2, and 3)
157
Uniform Sampiina
0.20
0.15 -
•a o « 0.10 3 E cn
0.05 -
0.00
hu2-56
0.00 0.05 0.10 0.15
True h 0.20
Non-Uniform Sampiinq
0.15
"O 0.10 o 3 E
CO
0.00
-0.05
hn2-56
A
•0.05 0.00 0.05 0.10 0.15 0.20
True h
hu2-83
CO 0.00 -
True h •0.08 0.16
hn2-«3
<5 0.00
•0.08
-0.16 -0.08 0J)0 0.08 True h
0.16
0.12
0.08
0.04 O a 3 E 0.00
CO
•0.08
0.00 -0.08 -0.04 0.04 0.08 0.12
0.16
0.12
0.08
0.04
0.00
•0.08
•0.08 ^0.04 0.00 0.04 0.08 0.12 0.16
True h True h
Figure (7.26) True vs. simulated ICS (o), fh-ANN and f-ANN (+) head
at sampled locations for = 2.0, (RF# 1,2, and 3)
158
Uniform Sampling Non-Uniform Sampling
HU5-74
0.00 -
CO -0.09 -
•0.09
True h
0.09
-0.18
-0.18 0.09
JZ 0.00 -
"O o a 3 H
•0.18
hn5-74
40
* ?
y/O + O
o Oq
A
yT &
-0.18 -0.09 0.00
True h 0.09
0.08
0.00
0
a
•0.16
huS-24
+• y '
•«> 4' +Ay'
t -fr
A
y' A
•0.16 •0.08 0.00
True h 0.08
hnS^24
0.00 •
CO -0.08
-r--0.08 0.00
True h 0.08
0.14
0.07
£ 0.00
•0.07
-0.14 -0.14 0.00 0.07 0.14
0.14
T3 O o 0.00 -3 E
CO •0.07 -
-0.07 0i)0 0.07 0.14 True h True h
Figure (7.27) True vs. simulated ICS (o), fh-ANN (A), and f-ANN (+) head
at sampled locations for 0*^= 3.0, (RF# 1,2, and 3)
159
non-uniform sampling scheme. This failure to preserve all true values may be due to the
high variance of/magnifying the highly non-linear relationship between/and h.
Unlike the ICS, the neural network does not preserve the true values of h at the
sampled locations. For the neural network to do so, its estimated transmissivity fields
would have to be conditioned, which is not possible. This is because the neural network,
after sufficient training, maps input to output with a fixed weight vector. This weight
vector does not condition mapping to reproduce values at the sampled locations, but
seeks to minimize the global error.
Figures (7.28) through (7.33) show mean square error o f f o r h for each of the 100
realizations for the three variances and two sampling schemes. The legend in the figure
indicates the number of realizations for which the neural networks outperformed the ICS.
For a variance of 1.0 with a uniform sampling scheme, both neural networks
outperformed the ICS for all / realizations, as shown in Figure (7.28a). For the non
uniform sampling scheme, only the fh-ANN always outperformed the ICS. However, as
shown in Figure (7.28b), the f-ANN outperformed the ICS in only 59 of the 100 /
realizations.
Similar results were obtained for a variance of 2.0 for the/fields. For the uniform
sampling scheme, both neural networks always outperformed the ICS, as shown in Figure
(7.29a). For the non-uniform sampling scheme, the fh-ANN and f-ANN neural networks
outperformed the ICS 94 and 60 times, respectively, as shown in Figure (7.29b).
1.S0 ICS
fh-ANN (100)
0 f-ANN (100)
(a) ful (US)
1.00
UJ v> •s.
0.50 O Q
O o
0.00 0 20 40 60 80 100
Realizations
1.S0 ICS
fh-ANN (100)
0 f-ANN (59)
(b) fnl (NUS)
1.00
UJ (0 S
O o o.so o O
0.00
0 20 40 60 80 100 Realizations
Figure (7.28) Mean Square Error (MSE) of 100 f fields for (jj- 1.0
0\ o
3.00
ICS
fh-ANN (100)
O f-ANN (100)
(a) fu2 (US)
2.00
lU (0 s
1.00
O Q O
0.00 0 20 40 60 80 100
Realizations
5 ICS
fh-ANN (94)
f-ANN (60)
(b) fn2 (NUS) 4
3
2
1
0 0 20 40 60 80 100
Realizations
Figure (7.29) Mean Square Error (MSE) of 100 f fields for (j^ ~ 2.0
100.0 ICS
fh-ANN(IOO)
O f-ANN (100)
(a) fuS (US)
Ui 50.0
««»»<»« I t • 1) . « 0 U • 0.0 0 20 40 60 80 100
Realizations
250.0
(b) fnS (NUS) ICS
fh-ANN(IOO)
O '-ANN (100)
200.0
150.0 UI (0 S
100.0
50.0
f lnoaanow4onraniO»nonnnn«nna • n a Q.n n 0.0 0 20 40 60 80 100
Realizations
Figure (7.30) Mean Square Error (MSE) of 100 f fields for (^= 5.0
0.0008 ICS fh-ANN(41)
O '-ANN (7)
(a) hu1 (US)
Ui (0 0.0004 -s
0 20 40 60 80 100 Realizations
ICS ' 1 1 1
fh-ANN(95) (b) hn1 (NUS)
0 f-ANN (8)
o 0
o 0
0
o 0 0 o ° O Q
o o °°o ° " o o °
0 o
0 0
0 " 0 0 o A 0 ® 0 0 0 O ® -1 ^ 11 I_I 1 111 1
0 o ® " - A o 0„ ^
0 _ O o ° ° a „ ° 0 ° -° • •
0 20 40 60 80 100 Realizations
Figure (7.31) Mean Square Error (MSE) of 100 h fields for (Jf = 1.0
Ul (0 S
0.0015
0.001 -
0.0005
ICS
fh-ANN(36)
O f-ANN(12)
(a) hu2 (US)
•-•A'A 40 60
Realizations 100
0.008
0.006 -
UJ (0 0.004 S
0.002
1 1 ICS
1 1 •— •• (b)hn2(NUS) 0
fh-ANN(58)
1 1 •— •• (b)hn2(NUS)
- 0 f-ANN(2)
o 0
0
0 » "
o o
0
-
0 0 o o L
o
o
c
o
o o
o o a
A c
o
o A o 0
3 O
o
3 O
9
O o
O ® 0 O O O n ^ _ o O /
20 40 60 Realizations
80 100
Figure (7.32) Mean Square Error (MSE) of 100 h fields for (J^=2.0
o\ 4^
o.oos
ICS
(h-ANN (59)
f-ANN (39)
(a) hu5 (US)
UJ W 0.0025 -
0 20 40 60 80 100 Realizations
0.02
ICS
m-ANN(93)
O f-ANN (36)
(b) fnS (NUS)
0.015 -
UJ (0 s
0.01
o.oos -O o
o o 0
0 20 40 60 80 100 Realizations
Figure (7.33) Mean Square Error (MSE) of 100 h fields for 5.0
ON
166
For a variance of 5.0, both neural networks always outperformed the ICS for both
the uniform and non-uniform sampling schemes, as shown in Figures (7.30a) and (7.30b),
respectively. Note that both figures show that about 10 percent of the estimated ICS
fields have very high mean square errors.
The corresponding head fields are shown in Figures (7.31) through (7.33). As
previously mentioned, the relationship between/and h in non-linear. Consequently, the
results for h are not consistent with those obtained for f. For example, as Figure (7.31a)
shows, for a variance of 1.0 with a uniform sampling scheme, the fh-ANN outperformed
the ICS only 41 times-, as compared to 100 times obtained for the/realizations. Better
performance was obtained for the same variance with the non-uniform sampling scheme,
as shown in Figure (7.31b). In this case, the fh-ANN outperformed the ICS 95 times.
This supports the earlier observation that the ICS did not estimate the h field as
accurately for the non-uniform sampling scheme as it did for the uniform sampling
scheme. The f-ANN outperformed the ICS only 7 and 8 times for the uniform and non
uniform sampling scheme, respectively.
For the variance of 2.0, the fh-ANN outperformed the ICS 36 and 58 times for the
uniform and non-uniform sampling schemes, respectively, as shown in Figures (7.32a)
and (7.32b). No significant differences in performance were found for the f-ANN as
compared to both sampling schemes for a variance of 1.0.
For the variance of 5.0, both neural networks achieved their best overall
performance against the ICS, as shown in Figures (7.33a) and (7.33b). This is largely a
consequence of significantly poorer ICS performance with higher variance, rather than
167
improvement in neural network performance. This is underscored by the fact that the
mean square error for all approaches increases as the variance increases.
The level of neural network performance can be attributed to how successfully the
network is trained. The mean square errors for all scenarios as a function of iteration
during training are shown in Figure (7.34). A lower mean square error for the fh-ANN
was always achieved for both uniform and non-uniform sampling schemes, indicating
greater success in training. This is because this neural network, utilizing more processing
elements in its input layer, uses more information to estimate the fields. This behavior is
tme for all three /variances, indicating network stability. Table (7.2) tabulates the final
mean square error achieved at the end of the 15,(X)0 iteration for all trained neural
networks.
168
0.12
0.09
0.06 -
0.03
0.12
0.09
0.06 -
0.03
0.12
0.09
0.06 -
0.03
a'r =1.0 ftl-ANN (US)
fh^ANH (NUS)
f-ANN (US)
f ANN (NUS)
4000
4000
8000 Iterations
12000 16000
= 2.0 m-ANN (US)
fh-ANN (NUS)
I MNN (US)
^— f-ANN (NUS)
8000 Iterations
12000 16000
fll-ANN (US)
fh-ANN (NUS)
f-ANN (US)
^— f-ANN (NUS)
4000 8000 Iterations
12000 16000
Figure (7.34) Mean Square Error (MSE) fluctuations during training
169
Table (7.2) MSE of ANNs at the end of 15,000 iteration for different variances
MSE
Variance ANN Sampling Training Testing
1.0
fh-ANN US 0.0445 0.0481
1.0
fh-ANN NUS 0.0508 0.0559
1.0 f-ANN US 0.0625 0.0664 1.0
f-ANN NUS 0.0680 0.0742
2.0
fh-ANN US 0.0502 0.0480
2.0
fh-ANN NUS 0.0505 0.0557
2.0 f-ANN US 0.0642 0.0668 2.0
f-ANN NUS 0.0690 0.0756
3.0
fh-ANN US 0.0473 0.0493
3.0
fh-ANN NUS 0.0507 0.0568
3.0 f-ANN US 0.0674 0.0710 3.0
f-ANN NUS 0.0693 0.0798
170
The goal of the conditional simulation was to not only preserve the true values at
the sampled locations but to also conform to the pre-established statistics of the true
fields. For all scenarios, the mean and variance were computed for the / and h fields
estimated by the three approaches. Figure (7.35) and (7.36) are scatter-plots depicting
tme versus computed means for the / and h fields, respectively. Each represents a
different scenario, and exhibits the results for the 100 realizations.
It is clear from both figures that the estimated means of/and h fields for all three
approaches are in good agreement with the true field for the uniform sampling scheme.
For the non-uniform sampling scheme, the points are scattered about the 45-degree line
with greater dispersion. The ICS always outperformed both neural networks in
preserving the mean / and h fields. This is because the ICS conditions the field and
preserves its statistics. In contrast, the neural networks are not constrained to preserve
the pre-established statistics.
Similarly, Figures (7.37) and (7.38) show scatter-plots depicting true versus
computed variances for the/and h fields, respectively. For the/field, the ICS always
outperformed the neural networks for variances 1.0 and 2.0. Poor/field results for the
ICS were generally obtained for both sampling schemes with a variance of 5.0. This is
consistent with the magnification of the non-linear behavior between / and h with high
variances in/.
Uniform Sampling Non-Uniform Sampling
0.8
0.4
® 0.0 .S 3 E -0.4 CO
-0.8
-1.2
T • 'l Var. = 1.0
• "J y A • A Ojifta
( 1
-
/ 1 f 1 t -1.2 -0.8 -0.4 0.0 0.4 0.8
Real
0.8
0.4
"O ® 0.0
JO 3 .§ -0.4 CO
-0.8
-1.2
—1 1 1 1 y Var-= 1.0 0 y
-
i- —
r- "
-
»-f
o
^ "SEaSS *
t/
/ 1 1 1 1 -1.2 -0.8 -0.4 0.0 0.4 0.8
Real
Var. s 2.0
® 0.0
1.0
Figure (7.35) Tare vs. estimated ICS (o), fh-ANN (A), and f-ANN (-H)
mean f fields.
Uniform Samplina
0.10 Var. = 1.0
•o o (0 •5 0.00
E tn
-0.10 -0.10 0.00 0.10
Real
0.10
Var. = 2.0
•5 0.00
E tn
-0.10 0.00 0.10
Real
0.12
Var. = 5.0
0°
•a o (D 0.00
E CO
•0.12 -0.12 0.00 0.12
Real
Non-Uniform Sampling
0.10
Var. = 1.0
•a a a •5 0.00
E CO
•0.10
-0.10 0.00 0.10 Real
0.10
Var.» 2.0
•o 0
CO t* 3 0.00
E CO
-0.10 0.00 0.10 Real
0.12
Var. > 5.0
"O o CO
•5 0.00
E CO O •
-0.12
-0.12 0.00 0.12
Real
Figure (7.36) Ture vs. simulated ICS (0), fh-ANN (a), and f-ANN (+)
mean h fields.
Uniform Sampling
0.5
0.0
1 1 Var. = 1.0 O
—1 —~p
oX °/ <b 8/
3°rv/6 A +•
o
/ 1 r '
Non-Uniform Sampling
2.0
1.5
1.0
0.5 -
0.0 0.5 1.5
5.0 6.0
4.0
"D 4.0 ® 3.0
£ 2.0 CO 2.0
1.0
0.0 0.0
Var. = 2.0 Var. > 2.0
CD
0.0 1.0 2.0 3.0 4.0 5.0
Real 0.0 2.0 4.0
Real 6.0
100.0
Var. = 5.0
250.0
75.0 100.0
•Var. a 5.0
200.0
w 150.0
.5 100.0
0.0 0.0 50.0 100.0 150.0 200.0 250.0
Real
Figure (7.37) Ture vs. estimated ICS (o), fh-ANN (A), and f-ANN (+)
variance of f fields.
174
Uniform Sampling
0.002 Var. = 1.0
— 0.001
0.001 0.000
0.000 0.002
Uniform Sampling
0.002 var. = 1.0
"= 0.001
0.000 0.000 0.001
Real 0.002
0.004
"D O (B •= 0.002 3 E
CO
0.000
1 1 1 "p Var. = 2.0
A i
A + ^ / •Hi / A A p6 *•
& Qix
c?* * p*- +•
t
0.0050
0.000 0.002
Real 0.004
Var. a 2.0
3 0.0025
0.0000 0.0000 0.0025
Real 0.0050
0.008
Var. = S.o
0.004
0.000 0.000 0.004
Real 0.008
0.010
•a o (Q -= 0.005
E (O
0.000
• '• 1 1 Var.« 5.^ °
I 1 J
A
o o A O OA A. o a O yA A
A A
I* 0.000 0.005
Real 0.010
Figure (7.38) Ture vs. simulated ICS (o), fh-ANN (A), and f-ANN (+)
variance of h fields.
175
8. CONCLUSIONS AND RECOMMENDATIONS
8.1 Conclusions
In this research, the feasibility of using ANN as a tool for estimating
transmissivity and the corresponding head fields from limited sample data was explored.
Two different ANN architectures were used; one utilizing f data only, and the other
utilizing both/and h data in the input vector. The ability of these ANNs' to estimate the
true fields under different scenarios was compared with ICS. The scenarios considered
different sampling schemes (uniform versus non-uniform) and different/field variances
(1.0, 2.0, and 5.0). Based upon the results of this research, the following observations
can be made:
1. Although the ANN approach is not physically based like flow or transport simulation
models (stochastic approaches), it shows capability and superiority in learning a
highly heterogeneous random fields of transmissivities from a limited amount of
scattered/and h data. This is supported by the high accuracy performance of 99.9%,
with an average generalization performance for 219 patterns of 94.2% and 96 % for f-
ANN and fh-ANN, respectively.
2. As any physically based model, ANN performs better when more information is
used. It has been shown that ANN performed better when both/and h data (fh-ANN)
are used, as compared with (f-ANN), where only/data are utilized.
3. It was found that the ICS is very sensitive to the variance of / field. That is, its
estimates of the / and h fields become markedly worse as the variance increases.
176
This sensitivity is magnified when non-uniform sampling is used. This indicates that
ICS is a poor estimator in reproducing the true fields for the high variance cases. This
may be due to attempting to condition a relatively large number of/and/or h data that
have high variances.
4. Regardless of the variance of the training set, ANN achieves a similar minimum
global error. Thus, the ANN is able to leam and generalize the heterogeneous namre
of the fields regardless of their spatial variability.
5. ANN overcomes the limitations of ICS at high variances since it produces much
smaller MSE than the ICS.
6. ANN can not reproduce the exact h values at the sampled locations. This is because
after sufficient training, the ANN maps input to output with a fixed weight vector.
This weight vector does not condition mapping to reproduce values at the sampled
locations, but seeks to minimize the global error.
7. ANNs are not constrained to preserve the pre-established statistics as in the case of
ICS. This problem can not be solved internally by the ANN, but may be externally
"solved" by accepting only those outputs that preserve the statistics within some
acceptable tolerance.
8. One clear drawback of ANN is the 'smoothing effect' that appears in the estimated/
fields. In other words, there was a gradual transition from regions of low to high
values, unlike the sharp boundary interfaces produced by ICS. This smoothing effect
could be due to the high number of neurons in the output layer relative to the input
layer. The ratios of input/output neurons for uniform sampling were 1:33 and 1:16.5
177
for f-ANN and fh-ANN respectively. The ratios, even larger for non-uniform
sampling, were 1:203.8 and 1:29 for f-ANN and fh-ANN, respectively. These small
input/output ratios were not cited anywhere in the literature, meaning that the number
of input neurons is always much higher than number of output neurons. Improved
performance would be obtained vidth a larger input/output ratio.
9. In conditioning with ICS, values at unsampled locations (majority) are estimated by
co-kriging, which is a linear estimator. ANN relates input to output through a transfer
or activation function, which can take many different forms, from simple linear to
highly nonlinear. This characteristic helps ANN make better estimates of the output
vector (e.g. transmissivity values at unsampled locations).
10. In contrast, larger input vectors (f and/or h) improves ANN learning. This is
underscored by the fact that the fh-ANN significantly outperformed the f-ANN in
estimating the true fields.
11. In the derivation of the stochastic flow equation, an assumption of statistical
homogeneity for both/and h was necessary for spectral representation of input/and
output h by Fourier-Stieltjes integral. For development of the ANN, such
assumptions are not necessary.
12. ANN requires a large number of training sets to eflfectively train the network. This
means thousands of f and/or h data sets are required for a particular field size; this
number increases as the field size is increased or refined. In real world cases, such
data sets rarely, if ever exist. This means that the real world case must be
transformed into an equivalent hypothetical case, where random fields are generated
178
to represent the real field.
13. Once the ANN is trained and the final connection weights obtained, producing
estimates of the true field is faster and simpler with the neural network approach than
with the stochastic simulation or classical co-kriging approach. Faster implies less
computational time and simpler less complex mathematical operations. For example,
it takes only two seconds to produce a single estimate using ANN, as compared to an
average time of 75 seconds for ICS. This CPU time increases exponentially for ICS
when high variances are used. These runs were made on an IBM compatible PC
with CPU-350 of type AMD K6-II.
In conclusion, the results of this research agree with previous findings where ICS
performed well only at variances less than or equal to 2.0. Above this value, conditioning
head data becomes increasingly problematic. In contrast, the ANN that used
transmissivity and head data performed well, both at low and high variances. However,
when only transmissivity data was used, the ANN did not accurately estimate the head
field. The former case is consistent with real world situations, where both transmissivity
and head data are measured in the field. The non-uniform sampling scheme used in this
study is particularly applicable to many field sites, where head outnumbers transmissivity
measurements.
179
8.2 Recommendations for future work
• In tiiis study, it is assumed that measurements of transmissivity and head are error
free. One may consider errors and/or uncertainty and develop a kind of neuro-fuzzy
system to incorporate these errors and uncertainties.
• The ANN approach can also be tried to solve contaminant transport problems. In this
case, ANN can be built considering velocity field and solute concentration in its input
and/or output vectors.
Since ANN leams by example, this approach can be used in cases of transient flow
with source and sink terms. In this case the ANN can be trained on time bases.
Unsaturated flow systems can also investigated by ANN.
Since there is no restriction in the design of ANN, different architectures can be tried
to solve the same problem. For example, geostatistical parameters and the
coordinates of sampled and unsampled data could be incorporated into the design of
the ANN.
Beside the Backpropagation NN, Other ANN types can also be tried, for example
Learning vector quantization. Self organizing map. General regression neural
network. Modular neural networks. Radial bases function NN, and Probabilistic NN.
180
9. REFERENCES
Ahmed, S.; and G. de Marsily, Cokxiged estimation of aquifer transmissivity as an indirect solution of the inverse problem: A practical approach, Water Resour. Res., 29(2), pp.512-53, 1993.
Alabert, F., The practice of fast conditional simulations through the LU decom.position of the covariance matrix. Mathematical Geology, 19, pp.369-386, 1987.
Anderson, M. P., Comment on: Universal scaling of hydraulic conductivities and dispersivities in geologic media". Water Resour. Res., 27, 1381-1382, 1991.
Bakr, A.A., Stochastic analysis of the effect of spatial variations in hydraulic conductivity on groundwater flow, Ph.D. dissertation. New Mexico Institute of Mining and Technology, Socorro, 1976.
Bakr, A.A; L.W. Gelhar; A. L. Gutjahr; and J. R. McMillan, Stochastic analysis of the effect of spatial variability in subsurface flows, 1: Comparison of one- and three-dimensional flows. Water Resour. Res., 14(2), pp.263-271, 1978.
Bear, J., Dynamics of fluids in porous media, Elsevier, New York, 1972.
Bennion, D.W.; and J.C. Griffiths, A stochastic model for predicting variations in reservoir rock properties. Trans. AIME, 237, Part 2, pp.9-16, 1969
Bras, R.L.; and 1. Rodriguez-Iturbe, Random functions and hydrology, Addison-Wesley, Reading, Massachusetts, 1985.
Brigham, E.O., The fast Fourier transform and its applications, Englewood Cliffs, N.J., 1988.
Brooker, P. I., Two-dimensional simulation by turning bands. Mathematical Geology, 17, pp.81-90, 1985.
Bulness, A.C., An application of statistical methods to core analysis data of dolomitic limestone, Tran. MME 165, pp.223-240, 1946.
Byers, E.; and D.B. Stephens, Statistical and stochastic analysis of hydraulic conductivity and particle size in a fluvial sand. Soil Science Sac. Am. J., 47, pp. 1072-1080, 1983.
Carrara, J.; and L. Glorioso, On Geostatistical formulation of the groundwater flow inverse problem, Water Resour., 14(5), pp.273-283, 1991.
181
Carrera, J.; and S. P. Neuman, Estimation of aquifer parameters under transient and steady state conditions 1. Maximum likelihood method incorporating prior information. Water Resour. Res., 22(2), pp.199-210, 1986a.
Carrera, J.; and S. P. Neuman, Estimation of aquifer parameters under transient and steady state conditions 2. Uniqueness, stability, and solution algorithms. Water Resour. Res., 22(2), pp.211-227, 1986b.
Carrera, J.; and S. P. Neuman, Estimation of aquifer parameters under transient and steady state conditions 3. Application to synthetic and field. Water Resour. Res., 22(2), pp. 228-242, 1986c.
Clifton, P. M.; and S. P. Neuman, Effects of kriging and inverse modeling on conditional simulation of the Avra Valley aquifer in southem Arizona, Water Resour. Res., 18(4), pp.1215-1234, 1982.
Cooley, R.L., Incorporation of prior information of parameters into nonlinear regression groundwater models, l.Theory, Water Resour. Res., 1S(4), pp.965-976, 1982.
Dagan, G., A note on higher-order corrections of the head covariances in steady aquifer flow. Water Resour. Res., 21, pp.573-578, 1985b.
Dagan, G., Stochastic modeling of groundwater flow by unconditional and conditional probabilities, 1. Conditional simulation and the direct problem. Water Resour. Res., 18(4), pp.813-833, 1982a.
Dagan, G., Stochastic modeling of groundwater flow by unconditional and conditional probabilities, 2. The solute transport. Water Resour. Res., 18(4), pp.835-848, 1982b.
Dagan, G., Stochastic modeling of groundwater flow by unconditional and conditional probabilities: The inverse problem. Water Resour. Res., 21(1), pp.65-72, 1985a.
Dagan, G., Time-dependent macro dispersion for solute transport in anisotropic heterogeneous aquifers. Water Resour. Res., 24(9), pp. 1491-1500, 1988.
Davis, M.W., Production of conditional simulations via the LU decomposition of the covariance matrix. Mathematical Geology, 19, pp.91-98, 1987.
de Marsily, G., Quantitative hydrogeology, groundwater hydrology for engineers. Academic Press, San Diego, 1986.
Freeze, R. A., A stochastic-conceptual analysis of one-dimensional groundwater flow in non-uniform, homogeneous media. Water Resour. Res., 11(9), pp.725-741, 1975.
182
Garabedian, S. P.; D. R. LeBlanc; L. W. Gelhar; and M. A. Celia, Large scale natural gradient tracer test in sand and gravel. Cape Cod, Massachusetts, 2. Analysis of spatial moments for a nonreactive tracer. Water Resour. Res., 27(5), pp.911-924, 1991.
Gavalas, G. R.; P. C. Shah; and J. H. Seinfeld, Reservoir history matching by Bayesian estimation, Tran. AIME 261, pp.337-350, 1976.
Gelhar, L. W., Effects of hydraulic conductivity variations on groundwater flow, in Proceedings, Second International lAHR Symposium on Stochastic Hydraulics, International Association of Hydraulic Research, Lund, Sweden, 1976.
Gelhar, L. W.; P.Y. Ko; H.H. Kwai; J. L. Wilson, Stochastic modeling of groundwater systems, R.M. Parsons Laboratory for Water Resources and Hydrodynamics Report 189. Cambridge, Mass,; MTT, 1974.
Gelhar, L.W., Stochastic subsurface hydrology, Prentic Hall, 1993.
Gelhar, L.W; and C. L. Axness, Three-dimensional stochastic analysis of macro dispersion in aquifers. Water Resour. Res., 19(1), pp.161-180, 1983.
Gomez-Hernandez, J. J., A stochastic approach to the simulation of block conductivity fields conditioned upon data measured at a smaller scale, Ph.D. dissertation. Department of Applied Earth Sciences, Stanford University, 351 pp., 1991.
Gray, R. M., L. D. Davisson, Random processes: a mathematical approach for engineers, 305pp., Prentice-Hall, Englewood-Cliffs, New Jersey, 1986.
Gutjahr, A. L., Kriging in stochastic hydrology: assessing the worth of data, AGU Chapman conference on spatial variability in hydrologic modeling. Fort Collins, Colorado., 1981.
Gutjahr, A. L., Q. Bai, S. Hatch, Conditional simulation applied to contaminant flow modeling, Technical Completion Report, Department of Mathematics, New Mexico Institute of Mining and Technology, Socorro, New Mexico, 1992.
Gutjahr, A. L.; and 3.L. Wilson, Co-kriging for stochastic models. Transport in Porous Media, ^(6), pp.585'598. 1989.
Gutjahr, A. L.; L.W. Gelhar; A. A. Bakr; and J. R. McMillan, Stochastic analysis of spatial variability in subsurface flows. Part 11: Evaluation and application. Water Resources Research, 14(5), pp.953-960, 1978.
183
Gutjahr, A. L.; S. J. Colarullo; and F. M. Phillips, Validation of a two-scale aquifer characterization model using Las Cruces Trench experimental data, EOS Transaction AGU, V. /7/7.250, Fall's meeting abstract, 1993.
Gutjahr, AL., Fast Fourier transforms for random field generation. Project Report for Los Alamos Grant to New Mexico Tech, Contract number 4-R58-2690R, Department of Mathematics, New Mexico Tech, Socorro, New Mexico, 1989.
Gutjahr, AL.; S. J. Colarullo; S. Hatch; and L. Hughson, Joint conditional simulations and the spectral approach for flow modeling, J. of Stochastic Hydrology, pp. 80-108,1995.
Hanna, S., An iterative Monte Carlo technique for estimating conditional mean and variances of transmissivity and hydraulic head fields, Ph.D. dissertation. Department of Hydrology and Water Resources, University of Arizona, Tucson, Arizona, 1995.
Harr, M.E., Reliability-based design in civil engineering, McGraw-Hill, New York, 1987.
Harter, Th., Conditional simulation: a comprehensive review of some important numerical techniques, preliminary examination report, personal communication, 1992.
Harter, Th., Unconditional and conditional simulation of flow and transport in heterogeneous, variably saturated porous media, Ph.D. dissertation. Department of Hydrology and Water Resources, University of Arizona, Tucson, Arizona, 1994.
Harter, Th.; and T.-C.J. Yeh, An efficient method for simulating steady unsaturated flow in random porous media; using an analytical perturbation solution as initial guess to a numerical model. Water Resour. Res., 29(12), pp.4139-4149, 1993.
Harvey, F.H.; and S.M. Gorelick, Mapping hydraulic conductivity: sequential conditioning with measurements of solute arrival time, hydraulic head, and local conductivity. Water Resour. Res., 31(7), pp.1615-1626, 1995.
Hebb, D., Organization of behavior, John Wiley, New York, 1949.
Hillel D., Fundamentals of soil physics, 413 pp.. Academic Press, New York, 1980.
Hoeksema, R J.; and P.K Kitanidis, Comparison of Gaussian conditional mean and kriging estimation in the geostatistical solution of the inverse problem. Water Resour. Res., 21(6), pp. 825-836, 1985.
184
Hoeksema, R. J.; and P.K. Kitanidis, An application of the geostatistical approach to the inverse problem in two-dimensional groundwater modeling. Water Resour. Res., 20(7), pp. 1003-1020, 1984.
Hoeksema, R. J.; and P.K. Kitanidis, Prediction of transmissivities, heads, and seepage velocities using mathematical modeling and geostatistics. Adv. Water Resour., 12, pp.90-101, 1989.
Hopfield, J. J., Neural Network and physical systems with emergent collective computational abilities, Proc. Nat. Acad. Sci. USA 79:2554-2558, 1982.
IBM, Engineering and scientific subroutine library, guide and reference, IBM publication, Joumel, A. G., Geostatistics for conditional simulations of ore bodies, Econ. GeoL, 69,
pp. 673-687, 1974.
Jensen, J.L.; D.V. Hinkley; and L.W. Lake, A statistical study of reservoir permeability: distribution, correlation and averages, Soc. Petr. Eng. Formation Evaluation, pp.461-468, 1987.
Joumel, A. G.; and Ch. J. Hujibregts, Mining Geostatistic, 600 pp.. Academic Press, San Diego, 1978.
Joumel, A.G.; and J.J. Gomez-Hernandez, Stochastic imaging of the Wilmington clastic sequence, 64th Annual Technical Conference and Exhibition of the Society of petroleum Engineers, San Antonio, Texas, October 8-11, 1989, SPE 19857, pp.591-606, 1989.
Jury, W. A., W. R. Gardner, W. H. Gardner, Soil physics. New York (Wiley), 1991.
Kitanidis, P.K.; and E.G. Vomvoris, A geostatistical approach to the inverse problem in groundwater modeling (steady state) and one-dimensional simulations, Water Resour. Res., 19(3), pp. 677-690,1983.
Kolmogorov, A. N., On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition, DokJ. Akad. Nauk USSR, 114:953-956. 1957, In Russian.
Law, 3., A statistical approach to the interstitial heterogeneity of sand reservoirs. Trans MME, 155, pp.202-222,1944.
Lent, T. V. and BCitanidis, P. K., Effects of first-order approximations on head and specific discharge covariances in high-contrast log conductivity. Water Resour. Res., 32(5), pp. 1197-1207, 1996.
185
Lefteri, H. T., and Robert, E. U., Fuzzy and Neural Approaches in Engineering, John Wiley & Sons, Inc., 1997.
Li, S.-G.; D. McLaughlin, A nonstationary spectral method for solving stochastic groundwater problems: unconditional analysis. Water Resour. Res., 27, pp.1589-1605, 1991.
Lumley, J.L.; and H.A Panofsky, The structure of atmospheric turbulence, John Wiley, New York, pp. 239, 1964.
Mantoglou, A; and J.L. Wilson, The turning bands method for simulation of random fields using line generation by a spectral method. Water Resour. Res., 18(5), pp. 1379-1394, 1982.
Mantoglou, A, L.W. Gelhar, Stochastic modeling of large-scale transient unsaturated flow systems. Water Resour. Res., 23(1), 31-46, 1987a.
Mantoglou, A., L.W. Gelhar, Capillary tension head variance, mean soil moisture content, and effective specific soil moisture capacity of transient unsaturated flow in stratified soils. Water Resour. Res., 23(1), 47-56, 1987b.
Mantoglou, A, L.W. Gelhar, Effective hydraulic conductivities of transient unsaturated flow in stratified soils, Water Resour. Res., 23(1),57-67, 1987c.
Mantoglou, A., Digital simulation of multi-variate two- and three-dimensional stochastic processes with a spectral turning bands method. Mathematical Geology, 19, 129-149, 1987.
Maren, A. J., C. T. Harston, and R. M. Pap, Handbook of neural computing applications, Academic Press, San Diego, 1990.
Matheron, G., The intrinsic random functions and their applications, Advan. Appl Probab., 5, pp.438-468, 1973.
Mizell, S.A.; A. L. Gutjahr; and L.W. Gelhar, Stochastic analysis of spatial variability in two-dimensional steady groundwater flow assuming stationary and nonstationary heads. Water Resour. Res., 18(4), pp. 1053-1067, 1982.
Mustakata, T., Commercial and industrial AI, Commun. ACM 37(3):23-26, 1994.
Myers, D. E., Vector conditional simulation, Vol.1, edited by A. Armstrong, pp.283-293, 1989.
186
Neuman, S. P., A statistical approach to the inverse problem of aquifer hydrology, 3. Improved solution method and added perspective. Water Resour. Res., 16(2), pp.331-346, 1980.
Neuman, S. P.; and S. Yakowitz, A statistical approach to the inverse problem of aquifer hydrology, 1. Theory, Water Resow. Res., 15, pp.845-860, 1979.
Neuman, S. P., Reply, Water Resour. Res., 27,1383-1384, 1991.
Papoulis, A., Probability, random variables, and stochastic processes, 2°'^ edition, 576 pp., McGraw-Hill, New York, 1984.
Parker, D., Optimal Algorithms for Adaptive Networks: Second Order Back-Propagation, Second order Direct Propagation, and Second Order Hebbian Learning, Proceedings of the IEEE First International Corrference on Neural Networks, Vol. 11, San Diego, CA, pp 593-600, 1987.
Plaut, D., S. Nowlan, and G. Hinton, Experiments on learning by backpropagation. Technical report CMU-CS-86-126, Department of Computer Science, Carnegie Mellon University, Pittsburg, PA, 1986.
Poggio, T., and Girosi, F., Regularization Algorithms for Learning that Are Equivalent to Multiple-Layer Networks, .Jc/e/Tce, VoL247, pp.978-982, 1990.
Press, W. H.; W.T. Vetterling; S. A. Teukolsky; and B. P. Flannery, Numerical recipes in FORTRAN, 2nd edition, Cambridge University Press, Cambridge, 1992.
Priestley, M. B., Spectral analysis and time series. Academic Press, San Diego, pp.890, 1981.
Rubin, Y.; and G. Dagan, Stochastic identification of transmissivity and effective recharge in steady groundwater flow, 1. Theory, Water Resour. Res., 23(7), pp.ll85-1192, 1987.
Rumelhart, D. E., G. E. Hinton, and R. J. Williams, Learning representations by backpropagation errors. Nature 323:533-536, 1986b.
Rumelhart, D. E., Hinton, G. R., and Williams, R. J., Learning Internal Representations by Error Propagation, in Parallel Distributed Processing, Vol. 1, D. E. Rumelhart, and J. L. McClelland, eds., MIT Press, Cambridge, MA, 1986.
Russo, D.; and M. Bouton, Statistical analysis of spatial variability in unsaturated flow parameters. Water Resour. Res., 28(7), pp.1911-1925, 1992.
187
Russo, D., Stochastic modeling of macro dispersion for solute transport in a heterogeneous unsaturated porous formation. Water Resour. Res., 29(2), pp.383-397, 1993a.
Russo, D., Stochastic modeling of solute flux in a heterogeneous partially saturated porous formation. Water Resour. Res., 29(6), pp.1731-1744, 1993b.
Russo, D.; and G. Dagan, On solute transport in a heterogeneous porous formation under saturated and unsaturated water flows. Water Resour. Res., 27(2), pp.285-292, 1991.
Schaffer, S., A Semi-Coarsening Multigrid Method for Elliptic Partial Differential Equations with High Discontinuous and Anisotropic Coefficients, SIAM Journal of Scientific Computing, 1995.
Shinozuka, M., Monte Carlo simulation of structural dynamics. Computers and Structures, 2, pp.855-875, 1972.
Shinozuka, M.; G. Deodatis, Simulation of stochastic processes by spectral representation, Ap/?/., Mech. Rev., 44, pp.191-204, 1991.
Smith, L.; and R.A. Freeze, Stochastic analysis of groundwater flow in a bounded domain, 2. Two-dimensional simulations. Water Resour. Res., 1S(6), pp.1543-1559, 1979.
Sudicky, E. A., A natural gradient experiment on solute transport in a sand aquifer: spatial variability of hydraulic conductivity and its role in the dispersion process. Water Resour. Res., 22(2), pp.2069-2082, 1986.
Tompson, A. F.; R. Ababou; and L.W. Gelhar, Implementation of the three-dimensional turning bands random field generator. Water Resour. Res., 25(10), pp.227-2243, 1989.
Vauclin, M.; D.R. Vieira; G. Vachaud; and D.R. Nielsen, The use of cokriging with limited field soil observations. Soil Science Soc. Am. J., 47, pp.175-184, 1983.
Werbos, P., Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences, Ph.D. Dissertation, Harvard University, Boston, MA, 1974.
Wiener, N., Generalized harmonic analysis, Acta Math., 55, pp.117-258, 1930.
Willis, R.; and W.G. Yeh, Groundwater Systems Planning and Management, Prentice Hall Englewood Cliffs, NJ, 1987.
188
Yeh, T.-C. J.; A. L. Gutjahr; and M. Jin, An iterative co-conditional simulation method for solute transport in heterogeneous aquifers, EOS, Transactions, American Geophysical Union, 74, Supplement, p.251, October 26, 1993b.
Yeh, T.-C. J.; and P.A. Mock, A structured approach for calibrating steady-state ground water flow models, to appear Ground Water, May-3une, 1996.
Yeh, T.-C. J.; L.W. Gelhar; and A. L. Gutjahr, Stochastic analysis of unsamrated flow in heterogeneous soils, 1: Statistically isotropic media. Water Resour. Res., 21(4), pp.447-456, 1985a.
Yeh, T.-C. J.; R. Srivastava; and A. Guzman, Th. Harter, A numerical model for two-dimensional water flow and chemical transport. Ground Water, 32, pp.2-il, 1993a.
Yeh, T.-C.J.; L.W. Gelhar; and A. L. Gutjahr, Stochastic analysis of unsamrated flow in heterogeneous soils, 2: Statistically anisotropic media with variable CL, Water Resour. Res., 21(4), pp.457-464, 1985b.
Yeh, T.-C.J.; L.W. Gelhar; and A. L. Gutjahr, Stochastic analysis of unsaturated flow in heterogeneous soils: 3, Observations and Applications, Water Resour. Res., 21(4), pp. 465-471, 1985c.
Yeh, T-C. J., Stochastic modeling of groundwater flow and solute transport in aquifers, HydroL Processes, 5, pp.369-395, 1992.
Yeh, T-C. J.; A. L. Gutjahr; and M. Jin, An iterative Cokriging-like technique for groundwater flow modeling. Groundwater Water, Jan.-Feb, 1995a.
Yeh, T-C. J.; and J. T. McCord, Review of modeling of water flow and solute transport in the vadose zone: Stochastic Approaches, Technical Report HWR94-040, Hydrology and Water Resources Dept., November 10, 1994.
Yeh, T.-C.J.; M. Jin; and S. Hanna, An iterative inverse method: Conditional effective transmissivity and hydraulic head fields. Water Resour. Res., 32(1), pp. 85-92, 1996.
Yeh, W.W-G., Review of parameter identification procedure in groundwater hydrology: The inverse problem. Water Resour. Res., 22(2), pp.95-108, 1986.
Yeh, W.W-G., System analysis in groundwater planning and management, J. of Water Resour. Planning and Manag., 118(3), pp. 22^231, 1992.
Yeh, W.W-G.; and G.W. Tauxe, Optimal identification of aquifer diffiisivity using quasi-linearization. Water Resour. Res., 7(4), pp. 955-962, 1971.
189
Zimmermann, D. A.; and J.L. Wilson, TUBA: A computer code for generating two-dimensional random fields via the turning bands method. A: User Guide, Seasoft, Albuquerque, New Mexico, 1990.
IMAGE EVALUATION TEST TARGET (QA-3)
150mm
IM/^GE.Inc 1653 East Main Street Roctiester. NY 14609 USA Phone; 716/482-03CX) Fax: 716/288-5989
O 1993, Applied Image, Inc., All Rights Reserved