Statistics in WR: Session 20
Introduction to Spatial Statistics
Ernest To
Outline
1. Basics of spatial statistics
2. Kriging
3. Application of spatial-temporal statistics (Gravity currents in CCBay)
Ernest To 20090408
2
Basics
Consider the following scenario• Two river stations, A and B,
measure dissolved oxygen (DO). • At station A
– mean DO = µA = 5 mg/L
– std dev at Station A= σA = 2 mg/L
• At station B– mean DO = µB = 5 mg/L
– std dev at Station A= σB = 2 mg/L
• Correlation between measurements at stations A and B = ρAB = 0.5.
AA
BB
Ernest To 20090408
4
New data!• We collected a DO measurement
of 2 mg/L at Station A.
• What is the updated mean (µB|XA ) and standard deviation (σB|XA) at Station B?
– (assume that the DO distributions are normal)
AA
BB
µA = 5 mg/LσA = 2 mg/L
New sample X A = 2 mg/L
µB = 5 mg/LσB = 2 mg/L
µB|XA = ? σB|XA = ?
Ernest To 20090408
5
• Distributions at A and B (assume normal)
• Joint distribution at A and B
Let’s sketch out the distributions
XA
XB
µA = 5 mg/L, σA = 2 mg/L µB = 5 mg/L, σB = 2 mg/L
f(xA,xB)
XBXA
f(xA) f(xB)
Ernest To 20090408
6
Marginal and joint distributions
XA
XB
f(xA,xB)
Ernest To 20090408
7
How does ρAB affect the shape of the joint distribution?
Scatter plots of XA vs XB
Joint distribution of XB and XA
ρAB = 0.5ρAB = 0.5 ρAB = 0.99ρAB = 0.99ρAB = 0ρAB = 0ρAB = -0.99ρAB = -0.99
XA
XB
XB
XA
f(xA,xB)
XA
XB
XA
XB
XA
XB
XB
XA
XB
XA
XB
XA
Ernest To 20090408
8
Bayesian conditioning
Prior pdf
XA
xA = 2 mg/L
XB
XA
xA = 2 mg/L
XB
Prior pdf (joint distribution)
XA
XB
PRIOR STAGE
CONDITIONALIZATION STAGEObserved data is used to update
the distribution.
POSTERIOR STAGE
A conditional pdf for XB is generated.
Conditional pdf
),( BAXX xxfBA
)2(
),2( |
AX
BAXXXX xf
xxff
A
BA
AB
LmgxA
/2
Ernest To 20090408
9
Prior pdf
XA
xA = 2 mg/L
XB
Conditional pdf
(The variance is independent of XA or XB Homoscedasticity)
If the prior pdf is binormal, the conditional pdf is also normal with:
Mean =
Variance =
Conditional pdf
A
A
B
BAB XAX
XXXX X
|
22|
2 1 AAB XXX
Expected value of conditional pdf is a linear function of the conditioning data
XB|XA
AB XXf |
Ernest To 20090408
10
Back to the problemUpdated mean and std. dev at Station BMean
Std. dev
AA
BB
µA = 5 mg/LσA = 2 mg/L
New sample X A = 2 mg/L
µB = 5 mg/LσB = 2 mg/L
µB|XA = 3.5 mg/L σB|XA = 1.7 mg/L
Lmg
LmgLmgLmg
LmgLmg
XA
A
B
B
AB
XAx
XX
XX
/5.3
/5/2/2
/25.0/5
|
Lmg
A
AB
X
XX
/7.1
)5.01(2
1
22
22
|
Ernest To 20090408
11
Can we do the same for any two points on the river?
Yes we can….
But under following conditions
1. Normality
2. 2nd order stationarity:– Mean does not change with location
– Variance does not change with location
3. Know the mean and variance.
4. Have a function that determines the correlation between two locations
AA
BB
µ = 5 mg/Lσ = 2 mg/L
Ernest To 20090408
12
Modeling correlationIn spatial statistics, correlation is modeled as a function of the separation distance between two points
Where h = separation distance (aka lag).
)(hfAB
Most of the time, correlation decreases with distance.
(Things that are closer together tend to be more correlated with each other).
Ernest To 20090408
13
Imagine the case where we have a smattering of data along an axis.
Any given pair of data points, i and j, will have two properties:
1.The semivariance = γ = 0.5*(Zi-Zj )2
2. The separation distance = hij
Estimating correlation model from data
Data point jMeasured value =Zj
Data point iMeasured value =Zi
hij = separation distance
Ernest To 20090408
14
We can plot the semivariance, γ , of all possible pairs against the lag, h. This gives us a variogram.
Estimating correlation model from data
Ernest To 20090408
15
We can fit a curve through the semivariogram to model the semivariance as a function of the lag. This is the variogram model.
Estimating correlation model from data
)(hf
Ernest To 20090408
16
We can fit a curve through the semivariogram to model the semivariance as a function of the lag. This is the variogram model.
Estimating correlation model from data
)(hf
range
sill
Ernest To 20090408
17
Assuming that mean and variance do not change with location (assumption of stationarity), the variogram model is related to the
covariance model by the equation:
Estimating correlation model from data
C(h)
)()( 2 hhC
Where σ2 is the variance
Ernest To 20090408
18
Assuming that variance does not change with location (assumption of stationarity), the correlation model is related to the
covariance model model by the equation :
Estimating correlation model from data
ρ(h)
2/)()( hCh
1
.8
.6
.4
.2
Ernest To 20090408
19
How does the correlation model affect the estimation
Scatter plotsof XA vs XB
Joint distribution of XA and XB
ρAB = 0.5ρAB = 0.5ρAB = 0.99ρAB = 0.99 ρAB = 0ρAB = 0
XA
XB
XB
XA
f(xA,xB)
XB|XA
AB XXf | Conditional distribution of XB|XA
Increasing h
XA
XB
XA
XB
XA
XB
XA
XB
Ernest To 20090408
20
Kriging
Multivariable caseWhat if we have more than one location that provide conditioning data?
(Assume distributions are STILL normal at all locations).•At station A1, A2, A3, A4
– µA1 = µA2 = µA3 = µA4 = 5 mg/L
– σA1 = σA2 = σA3 = σA4 = 2 mg/L
•At station B– mean DO = µB = 5 mg/L
– std dev at Station A= σB = 2 mg/L
•ρ =f(h)= 0.0125h2 - 0.225h + 1
AA22
BB
AA33
AA44
AA11
Ernest To 20090408
22
00.10.20.30.40.50.60.70.80.9
1
0 2 4 6 8 10
ρ
Separation distance, h
ρ=f(h )
Modeling correlation
AA22BB AA33AA44 AA11
Distance (s) matrixA1 A2 A3 A4 B
A1 0 2 4 6 8A2 2 0 2 4 6A3 4 2 0 2 4A4 6 4 2 0 2B 8 6 4 2 0
From correlation model:ρA1B = 0.0, ρA2B = 0.1, ρA3B = 0.3, ρA4B = 0.6; ρA1A2 = 0.6, ρA1A3 = 0.3, ρA1A4 = 0.1, ρA2A3 = 0.6, ρA2A4 =0.3 , ρA3A4 = 0.6
2 2 2 2
Distance along river (in hundred meters)
ρ =f(h)= 0.0125h2 - 0.225h + 1
Correlation matrixA1 A2 A3 A4 B
A1 1 0.6 0.3 0.1 0A2 0.6 1 0.6 0.3 0.1A3 0.3 0.6 1 0.6 0.3A4 0.1 0.3 0.6 1 0.6B 0 0.1 0.3 0.6 1
Ernest To 20090408
23
Dealing with multiple variablesDivide locations into two groups:
1. The vector, , representing the set of random variables at the locations contributing the conditioning data.
2. The variable, ,representing the random variable at the point of estimation.
AA22
BB
AA33
AA44
AA11AX
BX
Ernest To 20090408
24
Concept
XA1
XA2 XA3
XA4
XB
),,,,( 43214321 BAAAAXXXXX xxxxxfBAAAA
1. If individual distributions are normal, joint pdf is multi-normal.
Prior pdf
),( BAXX xxfBA
AA xX
AX
BX
AX
BX
Conditional pdf
2. Group variables into two: one for points with data, one for the point of estimation.
3. Intersect pdf with conditioning data to get conditional pdf.
Ernest To 20090408
25
Dealing with multiple variables
AA22
BB
AA33
AA44
AA11
AAABABAB XAXXXXX XCC 1
BA
BA
AB
AAAA
AAAA
AA
X
X
X
XX
XX
X
C
C
C
CC
CC
C
4
1
4414
4111
...,
ABAABABAB XXXXXX CCC 122
The updated mean and variance of the distribution at Station B are given by:
Mean:
Variance:
Where:
Ernest To 20090408
26
Conditional pdf
Equations in multivariable case are more generalized
A
A
B
BAB XAx
XABXXX X
|
22|
2 1 ABXXX AAB
AAABABAB XAXXXXX XCC
1
ABAABABAB XXXXXX CCC 122
Multivariable case
Recall two variable case
Multivariable case takes into account 1.Correlation between data locations and estimated location ( ).2.Correlation among data locations ( ).
This is the most fundamental form of kriging, i.e. Simple Kriging.
BAXC
AAXC
Ernest To 20090408
27
Plug and Chug
• Recall that Cov(A,B) = ρAB σA σ B
• Compute data to data correlation:
44.22.14.0
4.244.22.1
2.14.244.2
4.02.14.24
221226.0223.0221.0
226.0221226.0223.0
223.0226.0221226.0
221.0223.0226.0221
44441414
41411111
4414
4111
AAAAAAAA
AAAAAAAA
XX
XX
X
AAAA
AAAA
AA
CC
CC
C
Ernest To 20090408
28
Plug and Chug
4.2
2.1
4.0
0
226.0
223.0
221.0
220
...
4
1
BA
BA
AB
X
X
X
C
C
C
• Compute data to estimation point correlation:
Ernest To 20090408
29
Plug and Chug
5
5
5
5
47.012.0078.0043.05
5
5
5
5
44.22.14.0
4.244.22.1
2.14.244.2
4.02.14.24
4.22.14.005
4
3
2
1
4
3
2
1
1
1
A
A
A
A
A
A
A
A
XAXXXXX
X
X
X
X
X
X
X
X
XCCAAABABAB
Note: The weights attributed to each station are determined by the prior (joint distribution) among them.
weights
Ernest To 20090408
30
Plug and Chug
5
5
5
5
47.012.0078.0043.05
5
5
5
5
44.22.14.0
4.244.22.1
2.14.244.2
4.02.14.24
4.22.14.005
4
3
2
1
4
3
2
1
1
1
A
A
A
A
A
A
A
A
XAXXXXX
X
X
X
X
X
X
X
X
XCCAAABABAB
Note: The weights attributed to each station are determined by the prior (joint distribution) among them.
weights
Weights = [λ1, λ2, λ3,… λn]
Ernest To 20090408
31
Plug and Chug
Lmg
XXXX
AB
AB
XX
XX
AAAA
/2.3
52
55
53
52
65.0053.0053.0018.05
2 ,5 ,3 ,2
:data ngconditioni following Given the
4321
Ernest To 20090408
32
Plug and Chug
Lmg
CCC
AB
ABAABABAB
XX
XXXXXX
/3.2
05.5
2
2.1
4.0
0
422.14.0
2422.1
2.1242
4.02.124
4.26.14.002
1
2
122
Ernest To 20090408
33
Results from Simple KrigingThe updated mean and standard deviation of the distribution at Station B are:
Mean:
Standard deviation: AA22
BB
AA33
AA44
AA11
LmgAB XX
/88.3
LmgAB XX
/3.2
Ernest To 20090408
34
Other forms of kriging• Ordinary kriging (OK)
– Does not require mean to be known– Assumes that mean is constant and is somewhere in the range of the
conditioning data
• Universal kriging (UK)– Does not require mean to be known nor require it to be constant– User specifies a model for the trend in mean. UK will then fit the model to
the data.
• Indicator kriging (IK)– handles binary variables (0 or 1)– has ability to take care of non-normality in data through iterative
application.
• Co-kriging (CK)– takes into account a related secondary variable to help estimate the
primary variable.
Ernest To 20090408
35
Extension to 2D, 3D• The lag can be represented by the euclidean
distance between 2 points
• So the covariance model of the form, C = f(h), can still be used
• Variables may be more correlated in one direction than the other (anisotropy)– linear transformation can be performed to transform
the distances so the correlation distance is the same in all directions (isotropy)
212
212
212 )()()( zzyyxxh
c
zz
b
yy
a
xxh
212
212
212 )()()(
'
Ernest To 20090408
36
Extension to space-time• For space and time, there is no standard space-time
metric.• The form:
– is not always correct because the temporal and spatial axes are not always orthogonal to each other.
– Processes that happen in time usually have some dependency on processes that happen in space.
– (They are not independent).
• A separate temporal lag term is usually used
• The covariance function takes on the form:
212
212
212
212 )()()()( ttzzyyxxh
12 tt
),( hfC
Ernest To 20090408
37
Application(Gravity currents in Corpus
Christi Bay)
Sensors in Corpus Christi Bay
HRI stations
TCOON stations
USGS gages
TCEQ stations
Corpus Corpus Christi BayChristi Bay
Laguna Laguna MadreMadre
OsoOsoBayBay
Gulf of Gulf of MexicoMexico
SERF stations Aerial photo from Google Earth
Ernest To 20090408
39
DO
DODO DO
DO
Laguna Madre Corpus Christi Bay
1. Occurrence:Gravity currents emerge when wind and tide conditions are conducive.
Wind
4. Oxygen consumption:Dissolved oxygen in the pulse is depleted by benthic demand, sometimes to hypoxic levels.
2. Path:Gravity dominates the movement of the current. Current travels down-slope along bay bottom.
2. Wind Conditions:
• Mixing energy from the wind is transmitted down water column.
• The fluid at the top of the gravity current is entrained into the ambient fluid.
• Thickness of the bottom layer is reduced.
Ernest Sin Chit To, CRWR
Ernest To 20090408
40
Gravity currentemerges from Laguna Madre
Gravity currentdoes not emerge
O
Gravity current is broken upby wind before it reachespoint of interest
Gravity current reachespoint of interest
Dissolved oxygen has been depletedbelow 2mg/L
Dissolved oxygen is NOTdepletedbelow 2mg/L
Hypoxia
Point of interest is located withinpath of gravity current
Point of interest is outside path of gravity current
Occurrence TravelWind
conditions
O2
Consumption
No Hypoxia
Result
Ernest To 20090408
41
?
?
channel
depressions
?
ridges
Oso Bay
East LagunaMadre
West LagunaMadre
- 5.0 m above Mean High Water Level
- 4.5 m above Mean High Water Level
- 4.0 m above Mean High Water Level
- 3.5 m above Mean High Water Level
- 2.5 m above MeanHigh Water Level
- 2.0 m above Mean High Water Level
- 1.5 m above Mean High Water Level
- 1.0 m above Mean High Water Level
Selecting a study area
Ernest To 20090408
42
Downstream of East Laguna Madre
Plume tracking survey
July 14 to 17, 2006.
(While gravity current was on the move)Ben Hodges
University of Texas at Austin
Water quality data
July 12 and 18, 2006.
(At birth and demise of gravity current)Paul Montagna
Texas A&M University, Corpus ChristiErnest To 20090408
43
Synthesis of data
Time history of gravity current along direction of flow
Salinity profiles collected at various locations and time
Synthesis
Direction of flow
0
depth
salinity
0
depth
salinity0
depth
salinity
0
depth
salinity 0
depth
salinity 0
depth
salinity 0
depth
salinity
0
depth
salinity 0
depth
salinity 0
depth
salinity
0
depth
salinity 0
depth
salinity
t = 0 t = 1 t = 2 t = 3
Ernest To 20090408
44
HRI stations
HydroGet interface
Acquired data in ArcHydro II Time Series Table
Data Preparation1. Salinity data from HRI are acquired using HydroGet (a GIS web service client) and combined with plume tracking data.
3. Space-time kriging is performed in 3 dimensions
X= Longitudinal measure(meters from origin point)
Y =Time (days since 7/12/2006)
Z =Elevation (meters from water surface)
2. Data locations are projected onto a reference line following the general direction of flow.
Originx = 0 m
Ref
eren
ce li
ne
45Ernest To 20090408
Variogram along direction of flow
)1)(()( )(010
2
23ah
eCCChf
where h= lag distance along direction of flowC0= nugget = 2 psu2
C1= sill = 3.6 psu2
a = range = 6000 m
(Gaussian variogram model)
Ernest To 20090408
46
Variogram along direction of flow
)1)(()( )(010
2
23ah
eCCChf
where h= lag distance along direction of flowC0= nugget = 2 psu2
C1= sill = 3.6 psu2
a = range = 6000 m
(Gaussian variogram model)
sill
nugget
range
Ernest To 20090408
47
Variogram along depth
)1)(()( )(010
2
23ah
eCCChf
where h= lag distance along direction of flowC0= nugget = 0 psu2
C1= sill = 3.6 psu2
a = range = 1.7 m
(Gaussian variogram model)
Ernest To 20090408
48
Variogram along time axis
where h= lag distance along direction of flowC0= nugget = 0 psu2
C1= sill = 3 psu2
a = range = 1 day
(Spherical variogram model)
))(()( 3
3
223
010 ah
ahCCChf
Ernest To 20090408
49
Interpolation results
Distance to origin pointTime
Elevation Longitudinal profile on 7/13/2006 18:00
xy
z
Longitudinal profile on 7/12/2006 18:00
37 – 40 psu40 – 42 psu42 – 43 psu42 – 44 psu44 – 46 psu
LEGEND
N
N
Ernest To 20090408
50
Longitudinal Profiles
Ernest To 20090408
51
Bottom salinities
Ernest To 20090408
52
Cross validation• a common method to evaluate variogram models. • aka “fictitious point” method (Delhomme, 1978), • remove one data point at a time from data set and then using
the remaining n-1 points the estimate the removed point. • estimated and actual values were then compared with each
other. R² = 0.85
35
37
39
41
43
45
47
35 37 39 41 43 45 47
Estim
ated
sal
inity
(psu
)
Measured salinity (psu)Ernest To 20090408
53
Conclusions
We’ve covered:
1. Basics of spatial statistics
2. Kriging
3. Application of spatial-temporal statistics (Gravity currents in CCBay)
Spatial statistics is fun!
Ernest To 20090408
54
Geostatistical tools
• ArcGIS Geostatistical Analyst– Easiest to use
• GSLIB– Library of fortran programs
• DeCesare’s version of GSLIB– Modification of GSLIB to do space-time kriging
• BMELIB• Library of MATLAB programs
Ernest To 20090408
55