Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
HW2 solutionYushan Gu2020/3/1
Most parts are worth 2 points each. Indicated parts are worth 4 points. Total is 100 points.library(maptools)library(sp)library(gstat)library(ggplot2)
setwd (".")# Change this to the appropriate working directory containing the data files, if necessary
Problem 1a)
juraZn
1 2 3 4 5
12
34
5
juraZn$Xloc
jura
Zn$
Ylo
c
0
0
0
0
0
0
00
−0.115
0.248
0
0
0
0 0
00
0
0.315
0
# plot(juraZn$Xloc,juraZn$Yloc, type = "n", xlim = c(1, 5), ylim = c(0.5, 5))# points(1.7, 1.5, pch=19, col = "red")# text(juraZn$Xloc,juraZn$Yloc,as.character(round(okwt[1,],3)))
These weights are reasonable because they are non-zero for the three points closest to the prediction location.
d)
muhat + okwt[2,] %*% (juraZn$Zn - rep(muhat, 20))
## [,1]## [1,] 71.73419
Note: The prediction is the overall mean (estimated by GLS) because all the observed values have weights of0 (or very close to 0).
e)
plot(juraZn$Xloc,juraZn$Yloc, pch=19, col=4, xlim = c(1, 5), ylim = c(0.5, 5))points(3.5, 2.5, pch=19, col = "red")pointLabel(juraZn$Xloc,juraZn$Yloc,as.character(round(okwt[2,],3)))
2
1 2 3 4 5
12
34
5
juraZn$Xloc
jura
Zn$
Ylo
c
0
0
0
0
0
0
00
0
0
0
0
0
0 0
00
0
0
0
It make sence since the predicted location does not close to any other points. O.K. weight for all other pointsare 0.
f)
plot(juraZn$Xloc,juraZn$Yloc, pch=19, col=4, xlim = c(1, 5), ylim = c(0.5, 5))pointLabel(juraZn$Xloc,juraZn$Yloc,as.character(round(okwt[3,],3)))
3
1 2 3 4 5
12
34
5
juraZn$Xloc
jura
Zn$
Ylo
c
0
0
0
0
1
0
00
0
0
0
0
0
0 0
00
0
0
0
Location: xlab - 4.383; ylab - 1.081
Note: why? When the kriging prediction location is one of the observed locations, the prediction is theobserved value at that location, i.e. weight=1 for the observed location.
g) 4 points
610.65631 - t(jura20s0) %*% solve(juraZn20) %*% as.matrix(jura20s0)
## P1 P2 P3## P1 523.6746 610.6563 6.106563e+02## P2 610.6563 610.6563 6.106563e+02## P3 610.6563 610.6563 2.868103e-06
prediction variance: P1 - 523.6746; P2 - 610.6563; P3 - 0.000002868
h)The prediction variance for P2 is larger than P1’s is because the location of P2 is further from observedpoints than P1.
i)The prediction variance is 0 since there is one observed point exactly located at that location.
Problem 3
juraZnfull
a) 4 points
spplot(juraZnfull.sp, 'Zn')
[25.2,64.02](64.02,102.8](102.8,141.7](141.7,180.5](180.5,219.3]
b)It’s good to describe how Zn concentrations vary across the study area since this study use gridded design formany of the samples. It’s kind of systematic sampling design and spreads observations across the study area.
c)It’s good to estimate the empirical variogram since there are “extra” points shorten the distance betweenpoints in order to estimate semivariances at short distance.
Problem 4a) 4 points
juraZnfull.vc
distance
sem
ivar
ianc
e
5000
10000
15000
0.5 1.0 1.5 2.0
b) 4 pointsYes, this is evidence for non-zero correlation. Simivariance grows as increasing of distance implies a positivecorrelation.
c)
sqrt(2*625)
## [1] 35.35534
since semivariance = 12 (Z(si)− Z(sj))2.
Note: We accepted 25 =√
625 for full credit because the important concept is the relationship betweensemivariance and the difference in values.
d) 4 pointsThere are two ways to get this:
• numericallyTab
• or using digitize=T to interactively query the semivariogram cloud:
unusual
distance
sem
ivar
ianc
e
500
1000
1500
1 2 3 4 5
The plot in part e is more useful since there are more bins for short distance and more information with it.Since we onlu interested in pattern at short distance, we don’t have to include distance from 2 to 5.5 km.
g) 4 points
juraZnfull.cressie
0.0 0.5 1.0 1.5 2.0
200
400
600
800
sx
sy
As we talked in the lecture, the classical estimator is very sensitive to outliers, Cressie-Hawkins estimatoris more robust to outliers. Back to the cloud plot in part a, I find a lot of outliers, so, I prefer to use C-Hestimator.
h) 4 points
plot(variogram(Zn~1, juraZnfull.sp, map=T, cutoff = 1.5, width = 1.5/4))
9
dx
dy
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
var1
400
600
800
1000
1200
1400
I use variogram map to check anisotropy. There are more pink squares at direction NW and SE, more bluesquares at NE and SW. Since I didn’t find bulls eye, I assume anisotropy for this data,
Note: The decision is subjective. Full credit if you drew the plot, looked at it, saw something close to a bullseye and concluded isotropy.
Note: Also full credit if you used 2 (or 3 or 4) directional variograms. My memory is that those look moreanisotropic.
Problem 5a)
juraZnfull.vm
distance
Cre
ssie
's s
emiv
aria
nce
200
400
600
800
0.5 1.0 1.5 2.0
Output parameter isjuraZnfull.vm
## model psill range## 1 Nug 201.7491 0## 2 Exp 0.0000 2
Yes, I do have concerns about the fit. The fitted line is flat, it seems not fit to our data points. I should tryother starting parameters.
b)From last plot and based on my eyeball guess, tail of semivariance points appear at around 800, intersect withy-axis at around 100, distance from nuggest to sill is about 0.7. Therefore, I use sill = 800. nugget = 100,and range = 0.7 ∗ 0.95 as my starting values.
c)
juraZnfull.vm2
distance
Cre
ssie
's s
emiv
aria
nce
200
400
600
800
0.5 1.0 1.5 2.0
Estimated parameters arejuraZnfull.vm2
## model psill range## 1 Nug 72.54005 0.0000000## 2 Exp 826.82672 0.4639301
It seems fit reasonable now.
d)
juraZnfull.vm3
0.0 0.5 1.0 1.5 2.0 2.5
020
040
060
080
0
dist
gam
ma
ExponentialSphericalMatern, k=1
Visually, the Matern model with k = 1 is batter than the other two.
e)
show.vgms(par.strip.text=list(cex=0.5))
distance
sem
ivar
ianc
e
0123
vgm(1,"Nug",0)
0.0 1.0 2.0 3.0
vgm(1,"Exp",1) vgm(1,"Sph",1)
0.0 1.0 2.0 3.0
vgm(1,"Gau",1) vgm(1,"Exc",1)
vgm(1,"Mat",1) vgm(1,"Ste",1) vgm(1,"Cir",1) vgm(1,"Lin",0)
0123
vgm(1,"Bes",1)
0123
vgm(1,"Pen",1) vgm(1,"Per",1) vgm(1,"Wav",1) vgm(1,"Hol",1) vgm(1,"Log",1)
0.0 1.0 2.0 3.0
vgm(1,"Pow",1)
0123
vgm(1,"Spl",1)
13
# "Pen" seems reasonablejuraZnfull.vm5
g)
juraZnfull.vm6
a) 4 points
jura.locsk
Problem 7a)
# histgramhist(juraZnfull$Zn)
Histogram of juraZnfull$Zn
juraZnfull$Zn
Fre
quen
cy
50 100 150 200
020
4060
hist(log(juraZnfull$Zn))
17
Histogram of log(juraZnfull$Zn)
log(juraZnfull$Zn)
Fre
quen
cy
3.5 4.0 4.5 5.0
010
2030
4050
# qq plotqqnorm(juraZnfull$Zn);qqline(juraZnfull$Zn)
−3 −2 −1 0 1 2 3
5010
015
020
0
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
qqnorm(log(juraZnfull$Zn));qqline(log(juraZnfull$Zn))
18
−3 −2 −1 0 1 2 3
3.5
4.0
4.5
5.0
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
More reasonable to assume the normal distribution for log Zn since points of log Zn in qq plot closer to thestraight line than points of Zn.
b)
juraLogZnfull.cressie
distance
Cre
ssie
's s
emiv
aria
nce
0.05
0.10
0.15
0.5 1.0 1.5 2.0
c)
juraLogZnfull.vm
juraLogZnfull.vm
## model psill range## 1 Nug 0.01967165 0.0000000## 2 Sph 0.14362114 0.9094144
d)
# Compare withjuraZnfull.vm3
## model psill range## 1 Nug 102.4267 0.0000000## 2 Sph 725.1663 0.9195141# log Zn0.14362114/(0.01967165+0.14362114)
## [1] 0.8795314# Zn725.1663/(102.4267+725.1663)
## [1] 0.8762354
Log Zn has more spatially-related variability, but only by a very small amount.
Note: Full credit if you looked at those two numbers and said they have the same amount because thenumbers are so similar.
e) 4 points
juraLogZnfull.tgk
40
60
80
100
120
140
160
f) 4 points
ggplot(data=data.frame(Zn = jura.k$var1.pred,LogZn = juraLogZnfull.tgk$var1.pred),
aes(x = Zn, y = LogZn))+geom_point()
40
80
120
160
40 80 120 160
Zn
LogZ
n
22
From this plot, I didn’t find much difference between these two methods.
Note: Where there is a difference between the two predictions, the trans-Gaussian prediction is smaller. Youcan see this clearly when you overlay a straight line, where X=Y, on the plot. The base graphics commandto do this is abline(0,1), i.e. draw a line with intercept=0 and slope=1.
23
Problem 1a)b)c) 4 points.d)e)f)g) 4 pointsh)i)
Problem 3a) 4 pointsb)c)
Problem 4a) 4 pointsb) 4 pointsc)d) 4 pointse) 4 pointsf)g) 4 pointsh) 4 points
Problem 5a)b)c)d)e)f) 4 pointsg)h)
Problem 6a) 4 pointsb)c)
Problem 7a)b)c)d)e) 4 pointsf) 4 points