HW2 solution - Iowa State Universitydistance semivariance 5000 10000 15000 0.5 1.0 1.5 2.0 b) 4 points Yes,thisisevidencefornon-zerocorrelation. Simivariancegrowsasincreasingofdistanceimpliesapositive

HW2 solutionYushan Gu2020/3/1

Most parts are worth 2 points each. Indicated parts are worth 4 points. Total is 100 points.library(maptools)library(sp)library(gstat)library(ggplot2)

setwd (".")# Change this to the appropriate working directory containing the data files, if necessary

Problem 1a)

juraZn

1 2 3 4 5

12

34

5

juraZn$Xloc

jura

Zn$

Ylo

c

0

0

0

0

0

0

00

−0.115

0.248

0

0

0

0 0

00

0

0.315

0

# plot(juraZn$Xloc,juraZn$Yloc, type = "n", xlim = c(1, 5), ylim = c(0.5, 5))# points(1.7, 1.5, pch=19, col = "red")# text(juraZn$Xloc,juraZn$Yloc,as.character(round(okwt[1,],3)))

These weights are reasonable because they are non-zero for the three points closest to the prediction location.

d)

muhat + okwt[2,] %*% (juraZn$Zn - rep(muhat, 20))

## [,1]## [1,] 71.73419

Note: The prediction is the overall mean (estimated by GLS) because all the observed values have weights of0 (or very close to 0).

e)

plot(juraZn$Xloc,juraZn$Yloc, pch=19, col=4, xlim = c(1, 5), ylim = c(0.5, 5))points(3.5, 2.5, pch=19, col = "red")pointLabel(juraZn$Xloc,juraZn$Yloc,as.character(round(okwt[2,],3)))

2

1 2 3 4 5

12

34

5

juraZn$Xloc

jura

Zn$

Ylo

c

0

0

0

0

0

0

00

0

0

0

0

0

0 0

00

0

0

0

It make sence since the predicted location does not close to any other points. O.K. weight for all other pointsare 0.

f)

plot(juraZn$Xloc,juraZn$Yloc, pch=19, col=4, xlim = c(1, 5), ylim = c(0.5, 5))pointLabel(juraZn$Xloc,juraZn$Yloc,as.character(round(okwt[3,],3)))

3

1 2 3 4 5

12

34

5

juraZn$Xloc

jura

Zn$

Ylo

c

0

0

0

0

1

0

00

0

0

0

0

0

0 0

00

0

0

0

Location: xlab - 4.383; ylab - 1.081

Note: why? When the kriging prediction location is one of the observed locations, the prediction is theobserved value at that location, i.e. weight=1 for the observed location.

g) 4 points

610.65631 - t(jura20s0) %*% solve(juraZn20) %*% as.matrix(jura20s0)

## P1 P2 P3## P1 523.6746 610.6563 6.106563e+02## P2 610.6563 610.6563 6.106563e+02## P3 610.6563 610.6563 2.868103e-06

prediction variance: P1 - 523.6746; P2 - 610.6563; P3 - 0.000002868

h)The prediction variance for P2 is larger than P1’s is because the location of P2 is further from observedpoints than P1.

i)The prediction variance is 0 since there is one observed point exactly located at that location.

Problem 3

juraZnfull

a) 4 points

spplot(juraZnfull.sp, 'Zn')

[25.2,64.02](64.02,102.8](102.8,141.7](141.7,180.5](180.5,219.3]

b)It’s good to describe how Zn concentrations vary across the study area since this study use gridded design formany of the samples. It’s kind of systematic sampling design and spreads observations across the study area.

c)It’s good to estimate the empirical variogram since there are “extra” points shorten the distance betweenpoints in order to estimate semivariances at short distance.

Problem 4a) 4 points

juraZnfull.vc

distance

sem

ivar

ianc

e

5000

10000

15000

0.5 1.0 1.5 2.0

b) 4 pointsYes, this is evidence for non-zero correlation. Simivariance grows as increasing of distance implies a positivecorrelation.

c)

sqrt(2*625)

## [1] 35.35534

since semivariance = 12 (Z(si)− Z(sj))2.

Note: We accepted 25 =√

625 for full credit because the important concept is the relationship betweensemivariance and the difference in values.

d) 4 pointsThere are two ways to get this:

• numericallyTab

• or using digitize=T to interactively query the semivariogram cloud:

unusual

distance

sem

ivar

ianc

e

500

1000

1500

1 2 3 4 5

The plot in part e is more useful since there are more bins for short distance and more information with it.Since we onlu interested in pattern at short distance, we don’t have to include distance from 2 to 5.5 km.

g) 4 points

juraZnfull.cressie

0.0 0.5 1.0 1.5 2.0

200

400

600

800

sx

sy

As we talked in the lecture, the classical estimator is very sensitive to outliers, Cressie-Hawkins estimatoris more robust to outliers. Back to the cloud plot in part a, I find a lot of outliers, so, I prefer to use C-Hestimator.

h) 4 points

plot(variogram(Zn~1, juraZnfull.sp, map=T, cutoff = 1.5, width = 1.5/4))

9

dx

dy

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

var1

400

600

800

1000

1200

1400

I use variogram map to check anisotropy. There are more pink squares at direction NW and SE, more bluesquares at NE and SW. Since I didn’t find bulls eye, I assume anisotropy for this data,

Note: The decision is subjective. Full credit if you drew the plot, looked at it, saw something close to a bullseye and concluded isotropy.

Note: Also full credit if you used 2 (or 3 or 4) directional variograms. My memory is that those look moreanisotropic.

Problem 5a)

juraZnfull.vm

distance

Cre

ssie

's s

emiv

aria

nce

200

400

600

800

0.5 1.0 1.5 2.0

Output parameter isjuraZnfull.vm

## model psill range## 1 Nug 201.7491 0## 2 Exp 0.0000 2

Yes, I do have concerns about the fit. The fitted line is flat, it seems not fit to our data points. I should tryother starting parameters.

b)From last plot and based on my eyeball guess, tail of semivariance points appear at around 800, intersect withy-axis at around 100, distance from nuggest to sill is about 0.7. Therefore, I use sill = 800. nugget = 100,and range = 0.7 ∗ 0.95 as my starting values.

c)

juraZnfull.vm2

distance

Cre

ssie

's s

emiv

aria

nce

200

400

600

800

0.5 1.0 1.5 2.0

Estimated parameters arejuraZnfull.vm2

## model psill range## 1 Nug 72.54005 0.0000000## 2 Exp 826.82672 0.4639301

It seems fit reasonable now.

d)

juraZnfull.vm3

0.0 0.5 1.0 1.5 2.0 2.5

020

040

060

080

0

dist

gam

ma

ExponentialSphericalMatern, k=1

Visually, the Matern model with k = 1 is batter than the other two.

e)

show.vgms(par.strip.text=list(cex=0.5))

distance

sem

ivar

ianc

e

0123

vgm(1,"Nug",0)

0.0 1.0 2.0 3.0

vgm(1,"Exp",1) vgm(1,"Sph",1)

0.0 1.0 2.0 3.0

vgm(1,"Gau",1) vgm(1,"Exc",1)

vgm(1,"Mat",1) vgm(1,"Ste",1) vgm(1,"Cir",1) vgm(1,"Lin",0)

0123

vgm(1,"Bes",1)

0123

vgm(1,"Pen",1) vgm(1,"Per",1) vgm(1,"Wav",1) vgm(1,"Hol",1) vgm(1,"Log",1)

0.0 1.0 2.0 3.0

vgm(1,"Pow",1)

0123

vgm(1,"Spl",1)

13

# "Pen" seems reasonablejuraZnfull.vm5

g)

juraZnfull.vm6

a) 4 points

jura.locsk

Problem 7a)

# histgramhist(juraZnfull$Zn)

Histogram of juraZnfull$Zn

juraZnfull$Zn

Fre

quen

cy

50 100 150 200

020

4060

hist(log(juraZnfull$Zn))

17

Histogram of log(juraZnfull$Zn)

log(juraZnfull$Zn)

Fre

quen

cy

3.5 4.0 4.5 5.0

010

2030

4050

# qq plotqqnorm(juraZnfull$Zn);qqline(juraZnfull$Zn)

−3 −2 −1 0 1 2 3

5010

015

020

0

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

qqnorm(log(juraZnfull$Zn));qqline(log(juraZnfull$Zn))

18

−3 −2 −1 0 1 2 3

3.5

4.0

4.5

5.0

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

More reasonable to assume the normal distribution for log Zn since points of log Zn in qq plot closer to thestraight line than points of Zn.

b)

juraLogZnfull.cressie

distance

Cre

ssie

's s

emiv

aria

nce

0.05

0.10

0.15

0.5 1.0 1.5 2.0

c)

juraLogZnfull.vm

juraLogZnfull.vm

## model psill range## 1 Nug 0.01967165 0.0000000## 2 Sph 0.14362114 0.9094144

d)

# Compare withjuraZnfull.vm3

## model psill range## 1 Nug 102.4267 0.0000000## 2 Sph 725.1663 0.9195141# log Zn0.14362114/(0.01967165+0.14362114)

## [1] 0.8795314# Zn725.1663/(102.4267+725.1663)

## [1] 0.8762354

Log Zn has more spatially-related variability, but only by a very small amount.

Note: Full credit if you looked at those two numbers and said they have the same amount because thenumbers are so similar.

e) 4 points

juraLogZnfull.tgk

40

60

80

100

120

140

160

f) 4 points

ggplot(data=data.frame(Zn = jura.k$var1.pred,LogZn = juraLogZnfull.tgk$var1.pred),

aes(x = Zn, y = LogZn))+geom_point()

40

80

120

160

40 80 120 160

Zn

LogZ

n

22

From this plot, I didn’t find much difference between these two methods.

Note: Where there is a difference between the two predictions, the trans-Gaussian prediction is smaller. Youcan see this clearly when you overlay a straight line, where X=Y, on the plot. The base graphics commandto do this is abline(0,1), i.e. draw a line with intercept=0 and slope=1.

23

Problem 1a)b)c) 4 points.d)e)f)g) 4 pointsh)i)

Problem 3a) 4 pointsb)c)

Problem 4a) 4 pointsb) 4 pointsc)d) 4 pointse) 4 pointsf)g) 4 pointsh) 4 points

Problem 5a)b)c)d)e)f) 4 pointsg)h)

Problem 6a) 4 pointsb)c)

Problem 7a)b)c)d)e) 4 pointsf) 4 points

Documents

HW2 solution - Iowa State Universitydistance semivariance 5000 10000 15000 0.5 1.0 1.5 2.0 b) 4 points Yes,thisisevidencefornon-zerocorrelation. Simivariancegrowsasincreasingofdistanceimpliesapositive