Upload
miracle
View
46
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Lecture #3: Modeling spatial autocorrelation in normal, binomial/logistic, and Poisson variables: autoregressive and spatial filter specifications. Spatial statistics in practice Center for Tropical Ecology and Biodiversity, Tunghai University & Fushan Botanical Garden. - PowerPoint PPT Presentation
Citation preview
Lecture #3:Lecture #3: Modeling spatial Modeling spatial autocorrelation in autocorrelation in
normal, normal, binomial/logistic, and binomial/logistic, and
Poisson variables: Poisson variables: autoregressive and autoregressive and
spatial filter spatial filter specificationsspecifications
Spatial statistics in Spatial statistics in practicepractice
Center for Tropical Ecology and Center for Tropical Ecology and Biodiversity, Tunghai University & Fushan Biodiversity, Tunghai University & Fushan
Botanical GardenBotanical Garden
Topics for today’s lecture• Autoregressive specifications and normal curve
theory (PROC NLIN).• Auto-binomial and auto-Poisson models: the
need for MCMC.• Relationships between spatial autoregressive
and geostatistical models• Spatial filtering specifications and linear and
generalized linear models (PROC GENMOD).• Autoregressive specifications and linear mixed
models (PROC MIXED).• Implications for space-time datasets (PROC
NLMIXED)
What is an auto- model?
Y is on both sidesof the = sign
The auto-normal (auto-Gaussian)
model
Popular autoregressive equations for the normal probability model
MCIDDεXβDYY
ερW)Xβ(IWYY
εXβWYY
εY
)ρ( , :
ρ:
ρ:
)μ,(:
T
CAR
SAR
AR
fationautocorrelpure
A normality assumption usually is added to the error term.
M is diagonal,and often is I
2nd-order
models
1st-order
model
spatial autoregressionThe workhorse of classical statistics is linear regression; the workhorse of
spatial statistics is nonlinear regression.
The simultaneous autoregressive (SAR) model
εXβWIWYY )ρ(ρwhere denotes the spatial autocorrelation parameter ρ
JJJ ee])ρ(ρ[e εXβWIWYY
Georeference data preparation• Concern #1: the normalizing factor
– Rule: probabilities must integrate/sum to 1– Both a spatially autocorrelated and
unautocorrelated mathematical space must satisfy this rule
• Jacobian term for Gaussian RVs – a function of the eigenvalues of matrix W (or C)
symmetricset of
eigenvalues
non-symmetric
set ofeigenvalues
Calculation of the Jacobian term
Step 1
extract the eigenvalues from n-by-n matrix W (or C)
- eigenvalues are the n solutions to the
equation det(W – I) = 0
- eigenvectors are the n solutions to the
equation (W - I)E = 0.
Step 2 (from matrix determinant)
compute ; J2 is
n
1ii )ρλLN(1
n
1
n
1ii )ρλLN(1
n
2
λ
λ
Minimizing SSE
0.9795
1.0542
MIN OLS: 1.1486MIN with Jacobian,
which is a weight: 1.8959
Relative plots (in z scores)
worst case scenario
Gaussian approximations allow an evaluation of redundant information
Houston (n=690) Syracuse (n=208)
% redundant information
n* % redundant information
n*
population density 61 66 72 15% male 32 156 18 78black/white ratio 63 62 57 27% widowed 52 91 16 84% with university degree
70 49 43 37
% Chinese 51 93 34 48
effectivesample size
The auto-binomial/logist
ic model
NOTE: a data transformation does not exist that enables binary 0-1 responses to conform closely
to a bell-shaped curve
Primary sources of overdispersion: binomial extra variation [Var(Y) = np(1-p) , and >1]
• misspecification of the mean function• nonlinear relationships & covariate
interactions• presence of outliers• heterogeneity or intra-unit correlation in
group data• inter-unit spatial autocorrelation• choosing an inappropriate probability
model to represent the variation in data• excessive counts (especially 0s)
The auto-binomial/logistic model• By definition, a percentage/binary response
variable is on the left-hand side of the equation, and some spatial lagged version of this response variable also is on the right-hand side of the equation.
• Unlike the auto-Gaussian model, whose normalizing constant (i.e., its Jacobian term) is numerically tractable, here the normalizing constant is intractable.
• A specific relationship tends to hold between the logistic model’s intercept and autoregressive parameters.
Pseudo-likelihood estimation
Maximum pseudo-likelihood treats areal unit values as though they are conditionally
independent, and is equivalent to maximum likelihood estimation when they are independent.
Each areal unit value is regressed on a function of its surrounding areal unit values.
Statistical efficiency is lost when dependent values are assumed to be independent.
]e[1ep
n
1jjiji
n
1jjiji ycρβXαycρβXα
i
Quasi-likelihood estimation
Maximum quasi-likelihood treats the variation of Y values as though it is inflated, and estimates of the variance term np(1-p) for the purpose of rescaling when testing hypotheses.
This approach is equivalent to maximum likelihood estimation when = 1, and most log-likelihood function asymptotic theory transfers to the results.
0.41Rpseudo 0.61;Rpseudo
Ic ;)F/P/P(Fc
2I
2pct
n
1jj25%,ij%25i
n
1jjjiji
25%
ICYC
Preliminary estimation (pseudo- and quasi-likelihood) results: F/P (%)
model intercept SA seSA dispersion Deviance
binomial -1.10 0 ***** ***** 945.96
auto-binomial -1.11 0.89 0.001 0 384.51
quasi-auto-binomial -1.10 0.89 0.015 19.74 0.99
auto-logistic -2.03 0.80 0.032 ***** 0.93
What is the alternative to pseudo-likelihood?
MCMC maximum likelihood estimation!
• exploits the sufficient statistics
• based upon Markov chain transition matrices converging to an equilibrium
• exploits marginal probabilities, and hence can begin with pseudo-likelihood results
• based upon simulation theory
Properties of estimators: a review
• Unbiasedness• Efficiency• Consistency• Robustness• BLUE• BLUP
• SufficiencySufficiency
MCMC maximum likelihood estimation
• MCMC denotes Markov chain Monte Carlo
• Pseudo-likelihood works with the conditional marginal models
• MCMC is needed to compute the simultaneous likelihood result
• MCMC exploits the conditional models
The theory of Markov chains was developed by Andrei Markov at the
beginning of the 20th century. A Markov chain is a process consisting of a finite number of states and known probabilities, pij, of
moving from state i to state j.
Markov chain theory is based on the Ergodicity ThmErgodicity Thm: irreducible, recurrent non-null, and aperiodic.
If a Markov chain is ergodic, then a unique steady state distribution exists, independent of the initial state: for transition matrix M, ;
P(Xt+1 = j| X0=i0, …, Xt=it) = P(Xt+1 = j| Xt=it) = tpij
* lim k
kMM
Example transition matrix convergence:
3
1
3
10
3
100
4
1
4
1
4
10
4
10
03
1
3
100
3
13
100
3
1
3
10
04
10
4
1
4
1
4
1
003
10
3
1
3
134
15.020.015.015.020.015.0
15.020.015.015.020.015.0
15.020.015.015.020.015.0
15.020.015.015.020.015.0
15.020.015.015.020.015.0
15.020.015.015.020.015.0
A B C
D E F
0.15 0.20 0.15
0.15 0.20 0.15
Monte Carlo simulation is named after the city in the Monaco principality,
because of a roulette, a simple random number generator. The name and the
systematic development of Monte Carlo methods date from about 1944.
The Monte Carlo method provides approximate solutions to a variety of
mathematical problems by performing statistical sampling experiments with a
computer using pseudo-random numbers.
MCMC provides a mechanism for taking dependent samples in situations where
regular sampling is difficult, if not completely impossible. The standard
situation is where the normalizing constant for a joint or a posterior
probability distribution is either too difficult to calculate or analytically
intractable.
MCMC has been around for about 50 years.
What is MCMCMCMC? A definition
MCMC is used to simulate from some distribution p known only up to a constant
factor, C:
pi = Cqi
where qi is known but C is unknown and too horrible to calculate.
MCMC begins with conditional (marginal) distributions, and MCMC sampling outputs a sample of parameters drawn from their joint
(posterior) distribution.
Starting with any Markov chain having transition matrix M over the set of states i on
which p is defined, and given Xt = i, the idea is to simulate a random variable X* with
distribution qi: qij = P(X* = j| Xt = i).
The distribution qi is called the proposal distribution.
After a burn-inset of simulations,
a chain converges
to an equilibrium
po = 0.5
p=0.2
• a stochastic process that returns a different result with each execution; a method for generating a joint empirical distribution of several variables from a set of modelled conditional distributions for each variable when the structure of data is too complex to implement mathematical formulae or directly simulate.
• a recipe for producing a Markov chain that yields simulated data that have the correct unconditional model properties, given the conditional distributions of those variables under study.
• its principal idea is to convert a multivariate problem into a sequence of univariate problems, which then are iteratively solved to produce a Markov chain.
Gibbs sampling is a MCMCMCMC scheme for simulation from p where a transition kernel is
formed by the full conditional distributions of p.
(1) t = 0; set initial values 0x = (0x1, …, 0xn)’ (2) obtain new values tx = (tx1, …, txn)’ from t-1x:
tx1 ~ p(x1|{t-1x2, …, t-1xn)
tx2 ~ p(x2|{tx1, t-1x3, …, t-1xn)…
txn ~ p(x1|{tx1, …, txn-1) (3) t = t+1; repeat step (2) until convergence.
A Gibbs sampling algorithm
Monitoring convergenceMCMC exploits the sufficient statistics,
which should be monitored with a time-series plot for randomness.
After removing burn-in iteration results, a chain should be weeded (i.e., only every kth output is retained). These weeded values
should be independent; this property can be checked by constructing a correlogram.
Convergence of m chains can be assessed using ANOVA: within-chain variance pooling is legitimate when chains have converged.
Sufficient statistics for normal, binomial, and Poisson models
A sufficient statistic (established with the Rao-Blackwell factorization theorem) is a statistic that captures all of the information contained in a sample that is relevant to the estimation of a population parameter.
1yn
1ii
p..., 2, 1, j , xyn
1iiji
Implementation of MCMC for the autologistic model
Y1 Y2 … Y20
Y21 Y22 … Y40
.
.
.
.
.
.
.
.
.
.
.
.
Y381 Y382 … Y400
1- τand τof mixture a is * τwhere,
e1
e p
)p1,binomial(n ~Y
/n)y(or p :0τ
n
1j*τj,ij
n
1j*τj,ij
ycρ̂α̂
ycρ̂α̂
τi,
τi,τi,
n
1ii2
1i
drawingsfrom the
binomial distributionis the Monte Carlo
part
partchain Markov theis )1(
n
1i
n
1jτj,ijτi,2
1τ2,
n
1iτi,τ1,
ycyT
1yT :statistics sufficient
MCMC-MLEs are extracted from the generated chains
MCMC results 25,000 + 225,000/100burn-in + weeded
alpha rho
df F prob F prob
iter-ation
44 1.0 0.52 1.0 0.47
chain 2 0.1 0.91 0.1 0.92
inter-action
88 1.0 0.56 1.0 0.54
error 6615
Some prediction comparisons
The (modified) auto-Poisson
model
NOTE: the auto-Poisson model can only capture negative spatial
autocorrelationNOTE: excessive zeroes is a
serious problem with empirical Poisson RVs
Spatial autoregression: the auto-Poisson model
The workhorse of spatial statistical generalized linear models is MCMCMCMC
For counts, y, in the setof integers {0, 1, 2, 3, … }
μVAR(Y)
μE(Y)
y!μe)P(Y
yμ
y
XβCYμ ρ)(LN
!y)ycρβX(αeP i
yn
1jjiji
)ycρβX(α-1
i
i
n
1jjiji
c
c-1 is an intractable normalizing factor
ationautocorrel spatial negative :0ρbut
MCMC is initiated with pseudo-likelihood estimates
positive spatial autocorrelation can be handled with Winsorizing, or binomial approximation
When VAR(Y) >overdispersion (extra Poisson variation)
is encountered
• Detected when deviance/df > 1
• Often described as VAR(Y) =
• Leads to the Negative Binomial model
μ
ημ)μ(1
Conceptualizedas the number of times some phenomenon occurs before a fixed number oftimes (r) that itdoes not occur. 2
ry
p)/p-r(1VAR(Y)
p)/p-r(1E(Y)
p)(1p1r
1ryC)P(Y
y
i
)μ̂LN(
ρ̂n
1j
c/dbi
2ratio
2counts
n
1jjjiji
DeeB̂
0.17Rpseudo 0.86;Rpseudo
)/d(bc
ijjj
YC
Preliminary estimation (pseudo- and quasi-likelihood) results: B/D
model SA seSA dispersion Deviance
Poisson 0 ***** ***** 1230.20
auto-Poisson 0.02 <0.001 0 822.23
quasi-auto-Poisson 0.02 0.006 29.2553 0.96
auto-negative binomial 0.02 0.007 0.0626 1.01
MCMC results
typical correlogram
25,000 + 500,000/100burn-in + weeded
Some prediction comparisons
Geographic covariation:n-by-n matrix V
IV
εV1Y
then ation,autocorrel spatial no if
,μ 1/2
RVV
CSIWIV
k )()( :icsgeostatist
)ρ - ( )ρ - ( :SAR1/2-T1/2-
-12/1
autoregression works with the inverse covariance matrix &geostatistics works with the covariance matrix itself
Relationships between the range parameter and rho for an ideal
infinite surface
modifiedBessel function
for CAR
0.00003RESS
e10.2298
rB10.25
ρ̂
1.8403 r 7229.62.1825
1
CAR
Bessel functionfor SAR
0.00041RESS
e10.2896
rB10.25
ρ̂
0.8879 r 6842.02.0130
1
SAR
b
ij
r
d
1
ij
r
d
Constructing eigenfunctions for filtering spatial autocorrelation out of georeferenced variables:
MC = (n/1T C1)x
YT(I – 11T/n)C (I – 11T/n)Y/ YT(I – 11T/n)Y
the eigenfunctions come from
(I – 11T/n)C (I – 11T/n)
C versus (I – 11T/n)C(I – 11T/n) = MCMC MCM 2.06 2.07 * 0.00 -1.10 -1.09 -1.98 -1.98
5.51 * 1.92 1.92 -0.10 -0.10 -1.21 -1.21 -2.02 -2.02
5.19 5.21 1.78 1.79 -0.13 -0.12 -1.24 -1.23 -2.08 -2.08
4.91 4.99 1.57 1.59 -0.15 -0.15 -1.33 -1.33 -2.12 -2.12
4.35 4.35 1.32 1.35 -0.29 -0.28 -1.38 -1.38 -2.15 -2.15
4.01 4.06 1.23 1.26 -0.33 -0.33 -1.44 -1.44 -2.23 -2.23
3.96 3.96 1.05 1.06 -0.49 -0.46 -1.54 -1.52 -2.24 -2.24
3.84 3.88 0.93 0.94 -0.53 -0.53 -1.56 -1.55 -2.33 -2.32
3.42 3.43 0.80 0.80 -0.59 -0.59 -1.60 -1.59 -2.40 -2.39
3.35 3.35 0.78 0.79 -0.63 -0.63 -1.64 -1.64 -2.41 -2.41
2.90 2.91 0.58 0.61 -0.80 -0.80 -1.74 -1.74 -2.43 -2.42
2.65 2.72 0.38 0.38 -0.89 -0.88 -1.84 -1.84 -2.54 -2.54
2.53 2.59 0.27 0.27 -0.92 -0.92 -1.87 -1.87 -2.62 -2.62
2.35 2.40 0.17 0.19 -0.95 -0.95 -1.90 -1.90 -2.67 -2.67
2.20 2.24 0.12 0.12 -1.07 -1.06 -1.96 -1.96 -2.70 -2.70
C
MCM
1.005λ0.011
λ̂
C1,λ
1ρ
C11MCM
T1,λn
MC
MCλn
T
n, C11MCM
Eigenvectors of MCM• (I – 11T/n) = M ensures that the eigenvector means are
0• symmetry ensures that the eigenvectors are orthogonal• M ensures that the eigenvectors are uncorrelated• replacing the 1st eigenvalue with 0 inserts the intercept
vector 1 into the set of eigenvectors• thus, the eigenvectors represent all possible
distinct (i.e., orthogonal and uncorrelated) spatial autocorrelation map patterns for a given surface partitioning
• Legendre and his colleagues are developing analogous eigenfunction spatial filters based upon the truncated distance matrix used in geostatistics
Expectations for the Moran Coefficient for linear
regression with normal residuals
p1n
])TR[(nE(MC)
T1T
T
CXXXX
C11
C11TMC
2σ
A spatial filtering counterpart to the auto-normal model specification.•Y = Ekß + ε•b = Ek
TY• Only a single regression is needed to implement the stepwise procedure.
MAX: R2; eigenvectors selected in order of their bivariate correlations
residual spatial autocorrelation =
Selected demographic attributes of China
attribute
# common to MAX-R2, MIN-
MC
# not truly redundant
info(~MAX-R2, MIN-MC)
# spatially structured(MAX-R2, ~MIN-
MC)
population density (|zres| = 7.5 → 6.3)
149 151 71
crude fertility rate (|zres| = 4.4 → 2.7)
229 105 0
% 100+ years old (|zres| = 0.4 → 0.0)
145 8 20
births/deaths ratio (|zres| = 2.7 → 0.6)
233 119 0
Overdispersion: binomial extra variation
• E(Y) = np and Var(Y) = np(1-p) , and >1
• tends to have little impact on regression parameter point estimates (maximum likelihood estimator typically is consistent, although small sample bias might occur); but, regression parameter standard error estimates (variances/covariances) are underestimated
• may be reflected in the size of the deviance statistic
• difficult to detect in binary 0-1 data
Spatial structure and generalized linear modeling: “Poisson” regression
CBR: the spatial filter is constructed with 199 of 561 candidate eigenvectors.
Poisson Negative binomial
SF negative binomial
deviance 1377.31 1.02 1.10
mean 0.1241 0.1351 0.1308
dispersion 0 0.0933 0.0302
Pseudo-R2
(observed vs predicted births)
0.762 0.762 0.903
SF resultsin green
SF
Spatial structure and generalized linear modeling: “binomial” regression
% population 100+ years old: the spatial filter is constructed with 92 of 561 candidate eigenvectors.
binomial SF binomialdeviance 4.76 1.00Intercept -12.0706
(0.0124)
-12.5000
(0.0276)scale 1 1.47Pseudo-R2
(observed vs predicted births)
0 0.283
SF
Advantages of spatial filtering
• Do not need MCMC for GLM parameter estimation – conventional statistical theory applies
• Uncover distinct map pattern components of spatial autocorrelation that relate directly to the MC
• The eigenvectors are orthogonal and uncorrelated
• Can always calculate the necessary eigenvectors as long as the number of areal units does not exceed n ≈ 10,000
Interpretation of MIN-MC selections
Matrix Ek contains three disjoint eigenvector subsets: Er, for those representing redundant locational information; Es, for those representing spatially structured random effects; and, Emisc, for those being unrelated to Y. Accordingly, the pure spatial autocorrelation model becomes
Y = µ1 + Erßr + (Esßs + e) ,
where ßr and ßs respectively are regression coefficients defining relationships between Y and the sets of eigenvectors Er and Es, and the term (Esßs + e) behaves like a spatially structured random effect.
Random effects model
is a random observation effect (differences among individual observational units)
is a time-varying residual error (links to change over time)
The composite error term is the sum of the two.
) , f( εξXβY ξ
ε
Random effects model: normally distributed intercept term
• ~ N(0, ) and uncorrelated with covariates
• supports inference beyond the nonrandom sample analyzed
• simplest is where intercept is allowed to vary across areal units (repeated observations are individual time series)
• The random effect variable is integrated out (with numerical methods) of the likelihood fcn
• accounts for missing variables & within unit correlation (commonality across time periods)
2σξ
Random effects: mixed models
• Moving closer to a Bayesian perspective, spatial autocorrelation can be accounted for by introducing a (spatially structured) random effect into a model specification.
• SAS PROC MIXED supports this approach for linear modeling in which a map is treated as a multivariate sample of size 1.
• SAS PROC NLMIXED supports this approach for generalized linear modeling.
SAS PROC MIXED and random effects:Y=XB + Zu
• The spatially correlated errors model is performed with PROC MIXED through the REPEATED statement.
• The SUBJECT=INTERCEPT option specifies that the correlation between units is essentially between experimental units that are different observations within the data set.
• The LOCAL option in the REPEATED statement tells PROC MIXED to include a nugget effect.
EXAMPLE: density of workers across Germany’s 439 Kreises
LN(density – 23.53) ~ N
A spatial covariance structure coupled with a random slope coefficient model
192,721 distance pairsdmax = 9.32478
PROC MIXED output: intercept
intercept
estimate
corre-lation
-2log(L) nugget (partial)
sill
range
5.28
(0.06)
none 1445.1 0 1.5782 0
5.01
(0.12)
spherical 1348.4 0.9139 0.5542 1.3801
5.01
(0.18)
exponential 1349.8 0.9154 0.5873 0.7824
5.01
(0.13)
Gaussian 1344.7 0.9858 0.5194 0.7260
5.01
(0.18)
power 1349.8 0.9154 0.5873 0.2786
Random intercept term
measure No covariates
Spatial filter
Spherical semivariogram
-2log(L) 1445.1 1179.3 1348.4
Intercept variance
0.9631 0.2538 0.5542
Residual variance
0.6116 0.6011 0.9139
Intercept estimate
5.2827
(0.0599)
5.2827
(0.0443)
5.0142
(0.1210)
The spatial filter contains 27 (of 98) eigenvectors, with R2 = 0.4542, P(S-Wresiduals) < 0.0001.
Generalized linear mixed models
• One drawback of spatial filtering is that as the number of areal units increases, the number of eigenvectors needed to construct a spatial filter tends to increase, resulting in asymptotics being difficult or impossible to achieve.
• This situation can be remedied by resorting to a space-time data set, with time being repeated measures whose correlation can be captured by a random effects intercept term.
Unemployment in Germany: 1996-2002
year year-specific eigenvectors common eigenvectors
global regional local global regional local
1996 E9, E16, E21, E25, E41, E52, E53, E64
E89
E2 - E5
E6 - E8, E11, E18, E24, E28, E30, E39,
E60
E74
1997 E1 E15, E19, E21, E34, E38, E64 E93
1998 E13, E15, E16, E19, E21, E34, E38, E42, E52, E66
E68, E93
1999 E9, E13, E15, E16, E19, E21, E34, E38, E42, E52, E66
E93
2000 E9, E13, E15, E16, E19, E21, E25, E34, E38, E42, E51, E52, E66
E93, E97
2001 E9, E12, E13, E15, E16, E19, E34, E42, E52, E56, E65, E66
E68, E93, E97
2002 E1 E9, E12, E13, E15, E16, E19, E20, E25, E38, E42, E52, E65,
E66
Unemployment in Germany:annual spatial filters
year # of eigenvecvtors
scale adjusted pseudo-R2
1996 24 21.98 0.5929 1.0232
1997 23 24.38 0.6425 1.0412
1998 27 23.52 0.6846 1.0438
1999 27 23.25 0.7068 1.0364
2000 30 23.83 0.7483 1.0507
2001 30 25.18 0.7683 1.0489
2002 29 26.08 0.7549 1.0459
SSPE/SSE
The composite spatial filter constructed with common vectors
year SF residuals
MC GR
1996 0.67 → 0.21 0.62
1997 0.73 → 0.20 0.66
1998 0.76 → 0.20 0.64
1999 0.79 → 0.21 0.61
2000 0.83 → 0.25 0.59
2001 0.85 → 0.27 0.57
2002 0.85 → 0.27 0.56
SF 1.14 0.15
Dark red: very highLight red:
highGray:
mediumLight green:
lowDark green: very low
former east-westdivide
Generated space-time predictions
the lack of serialcorrelation informationin 1996 is conspicuous
the best fit is inthe center of the
space-time series
% urban in Puerto Rico: SF-logistic with a spatial structured random effect