23
HW2 solution Yushan Gu 2020/3/1 Most parts are worth 2 points each. Indicated parts are worth 4 points. Total is 100 points. library(maptools) library(sp) library(gstat) library(ggplot2) setwd (".") # Change this to the appropriate working directory containing the data files, if necessary Problem 1 a) juraZn <- read.csv("juraZn.csv") juraZn20 <- read.csv("juraZn20.csv") juraZn20 <- as.matrix(juraZn20) X <- matrix(rep(1, 20)) muhat <- solve(t(X) %*% solve(juraZn20) %*% X) %*% t(X) %*% solve(juraZn20) %*% juraZn$Zn muhat ## [,1] ## [1,] 71.73419 The GLS estimete of the mean is 71.73419. b) Ordinary Kriging c) 4 points. jura20s0 <- read.csv("jura20s0.csv") okwt <- t(jura20s0) %*% solve(juraZn20) plot(juraZn$Xloc,juraZn$Yloc, pch=19, col=4, xlim = c(1, 5), ylim = c(0.5, 5)) points(1.7, 1.5, pch=19, col = "red") pointLabel(juraZn$Xloc,juraZn$Yloc,as.character(round(okwt[1,],3))) 1

HW2 solution - Iowa State Universitydistance semivariance 5000 10000 15000 0.5 1.0 1.5 2.0 b) 4 points Yes,thisisevidencefornon-zerocorrelation. Simivariancegrowsasincreasingofdistanceimpliesapositive

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • HW2 solutionYushan Gu2020/3/1

    Most parts are worth 2 points each. Indicated parts are worth 4 points. Total is 100 points.library(maptools)library(sp)library(gstat)library(ggplot2)

    setwd (".")# Change this to the appropriate working directory containing the data files, if necessary

    Problem 1a)

    juraZn

  • 1 2 3 4 5

    12

    34

    5

    juraZn$Xloc

    jura

    Zn$

    Ylo

    c

    0

    0

    0

    0

    0

    0

    00

    −0.115

    0.248

    0

    0

    0

    0 0

    00

    0

    0.315

    0

    # plot(juraZn$Xloc,juraZn$Yloc, type = "n", xlim = c(1, 5), ylim = c(0.5, 5))# points(1.7, 1.5, pch=19, col = "red")# text(juraZn$Xloc,juraZn$Yloc,as.character(round(okwt[1,],3)))

    These weights are reasonable because they are non-zero for the three points closest to the prediction location.

    d)

    muhat + okwt[2,] %*% (juraZn$Zn - rep(muhat, 20))

    ## [,1]## [1,] 71.73419

    Note: The prediction is the overall mean (estimated by GLS) because all the observed values have weights of0 (or very close to 0).

    e)

    plot(juraZn$Xloc,juraZn$Yloc, pch=19, col=4, xlim = c(1, 5), ylim = c(0.5, 5))points(3.5, 2.5, pch=19, col = "red")pointLabel(juraZn$Xloc,juraZn$Yloc,as.character(round(okwt[2,],3)))

    2

  • 1 2 3 4 5

    12

    34

    5

    juraZn$Xloc

    jura

    Zn$

    Ylo

    c

    0

    0

    0

    0

    0

    0

    00

    0

    0

    0

    0

    0

    0 0

    00

    0

    0

    0

    It make sence since the predicted location does not close to any other points. O.K. weight for all other pointsare 0.

    f)

    plot(juraZn$Xloc,juraZn$Yloc, pch=19, col=4, xlim = c(1, 5), ylim = c(0.5, 5))pointLabel(juraZn$Xloc,juraZn$Yloc,as.character(round(okwt[3,],3)))

    3

  • 1 2 3 4 5

    12

    34

    5

    juraZn$Xloc

    jura

    Zn$

    Ylo

    c

    0

    0

    0

    0

    1

    0

    00

    0

    0

    0

    0

    0

    0 0

    00

    0

    0

    0

    Location: xlab - 4.383; ylab - 1.081

    Note: why? When the kriging prediction location is one of the observed locations, the prediction is theobserved value at that location, i.e. weight=1 for the observed location.

    g) 4 points

    610.65631 - t(jura20s0) %*% solve(juraZn20) %*% as.matrix(jura20s0)

    ## P1 P2 P3## P1 523.6746 610.6563 6.106563e+02## P2 610.6563 610.6563 6.106563e+02## P3 610.6563 610.6563 2.868103e-06

    prediction variance: P1 - 523.6746; P2 - 610.6563; P3 - 0.000002868

    h)The prediction variance for P2 is larger than P1’s is because the location of P2 is further from observedpoints than P1.

    i)The prediction variance is 0 since there is one observed point exactly located at that location.

    Problem 3

    juraZnfull

  • a) 4 points

    spplot(juraZnfull.sp, 'Zn')

    [25.2,64.02](64.02,102.8](102.8,141.7](141.7,180.5](180.5,219.3]

    b)It’s good to describe how Zn concentrations vary across the study area since this study use gridded design formany of the samples. It’s kind of systematic sampling design and spreads observations across the study area.

    c)It’s good to estimate the empirical variogram since there are “extra” points shorten the distance betweenpoints in order to estimate semivariances at short distance.

    Problem 4a) 4 points

    juraZnfull.vc

  • distance

    sem

    ivar

    ianc

    e

    5000

    10000

    15000

    0.5 1.0 1.5 2.0

    b) 4 pointsYes, this is evidence for non-zero correlation. Simivariance grows as increasing of distance implies a positivecorrelation.

    c)

    sqrt(2*625)

    ## [1] 35.35534

    since semivariance = 12 (Z(si)− Z(sj))2.

    Note: We accepted 25 =√

    625 for full credit because the important concept is the relationship betweensemivariance and the difference in values.

    d) 4 pointsThere are two ways to get this:

    • numericallyTab

  • • or using digitize=T to interactively query the semivariogram cloud:

    unusual

  • distance

    sem

    ivar

    ianc

    e

    500

    1000

    1500

    1 2 3 4 5

    The plot in part e is more useful since there are more bins for short distance and more information with it.Since we onlu interested in pattern at short distance, we don’t have to include distance from 2 to 5.5 km.

    g) 4 points

    juraZnfull.cressie

  • 0.0 0.5 1.0 1.5 2.0

    200

    400

    600

    800

    sx

    sy

    As we talked in the lecture, the classical estimator is very sensitive to outliers, Cressie-Hawkins estimatoris more robust to outliers. Back to the cloud plot in part a, I find a lot of outliers, so, I prefer to use C-Hestimator.

    h) 4 points

    plot(variogram(Zn~1, juraZnfull.sp, map=T, cutoff = 1.5, width = 1.5/4))

    9

  • dx

    dy

    −1.5

    −1.0

    −0.5

    0.0

    0.5

    1.0

    1.5

    −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

    var1

    400

    600

    800

    1000

    1200

    1400

    I use variogram map to check anisotropy. There are more pink squares at direction NW and SE, more bluesquares at NE and SW. Since I didn’t find bulls eye, I assume anisotropy for this data,

    Note: The decision is subjective. Full credit if you drew the plot, looked at it, saw something close to a bullseye and concluded isotropy.

    Note: Also full credit if you used 2 (or 3 or 4) directional variograms. My memory is that those look moreanisotropic.

    Problem 5a)

    juraZnfull.vm

  • distance

    Cre

    ssie

    's s

    emiv

    aria

    nce

    200

    400

    600

    800

    0.5 1.0 1.5 2.0

    Output parameter isjuraZnfull.vm

    ## model psill range## 1 Nug 201.7491 0## 2 Exp 0.0000 2

    Yes, I do have concerns about the fit. The fitted line is flat, it seems not fit to our data points. I should tryother starting parameters.

    b)From last plot and based on my eyeball guess, tail of semivariance points appear at around 800, intersect withy-axis at around 100, distance from nuggest to sill is about 0.7. Therefore, I use sill = 800. nugget = 100,and range = 0.7 ∗ 0.95 as my starting values.

    c)

    juraZnfull.vm2

  • distance

    Cre

    ssie

    's s

    emiv

    aria

    nce

    200

    400

    600

    800

    0.5 1.0 1.5 2.0

    Estimated parameters arejuraZnfull.vm2

    ## model psill range## 1 Nug 72.54005 0.0000000## 2 Exp 826.82672 0.4639301

    It seems fit reasonable now.

    d)

    juraZnfull.vm3

  • 0.0 0.5 1.0 1.5 2.0 2.5

    020

    040

    060

    080

    0

    dist

    gam

    ma

    ExponentialSphericalMatern, k=1

    Visually, the Matern model with k = 1 is batter than the other two.

    e)

    show.vgms(par.strip.text=list(cex=0.5))

    distance

    sem

    ivar

    ianc

    e

    0123

    vgm(1,"Nug",0)

    0.0 1.0 2.0 3.0

    vgm(1,"Exp",1) vgm(1,"Sph",1)

    0.0 1.0 2.0 3.0

    vgm(1,"Gau",1) vgm(1,"Exc",1)

    vgm(1,"Mat",1) vgm(1,"Ste",1) vgm(1,"Cir",1) vgm(1,"Lin",0)

    0123

    vgm(1,"Bes",1)

    0123

    vgm(1,"Pen",1) vgm(1,"Per",1) vgm(1,"Wav",1) vgm(1,"Hol",1) vgm(1,"Log",1)

    0.0 1.0 2.0 3.0

    vgm(1,"Pow",1)

    0123

    vgm(1,"Spl",1)

    13

  • # "Pen" seems reasonablejuraZnfull.vm5

  • g)

    juraZnfull.vm6

  • a) 4 points

    jura.locsk

  • Problem 7a)

    # histgramhist(juraZnfull$Zn)

    Histogram of juraZnfull$Zn

    juraZnfull$Zn

    Fre

    quen

    cy

    50 100 150 200

    020

    4060

    hist(log(juraZnfull$Zn))

    17

  • Histogram of log(juraZnfull$Zn)

    log(juraZnfull$Zn)

    Fre

    quen

    cy

    3.5 4.0 4.5 5.0

    010

    2030

    4050

    # qq plotqqnorm(juraZnfull$Zn);qqline(juraZnfull$Zn)

    −3 −2 −1 0 1 2 3

    5010

    015

    020

    0

    Normal Q−Q Plot

    Theoretical Quantiles

    Sam

    ple

    Qua

    ntile

    s

    qqnorm(log(juraZnfull$Zn));qqline(log(juraZnfull$Zn))

    18

  • −3 −2 −1 0 1 2 3

    3.5

    4.0

    4.5

    5.0

    Normal Q−Q Plot

    Theoretical Quantiles

    Sam

    ple

    Qua

    ntile

    s

    More reasonable to assume the normal distribution for log Zn since points of log Zn in qq plot closer to thestraight line than points of Zn.

    b)

    juraLogZnfull.cressie

  • distance

    Cre

    ssie

    's s

    emiv

    aria

    nce

    0.05

    0.10

    0.15

    0.5 1.0 1.5 2.0

    c)

    juraLogZnfull.vm

  • juraLogZnfull.vm

    ## model psill range## 1 Nug 0.01967165 0.0000000## 2 Sph 0.14362114 0.9094144

    d)

    # Compare withjuraZnfull.vm3

    ## model psill range## 1 Nug 102.4267 0.0000000## 2 Sph 725.1663 0.9195141# log Zn0.14362114/(0.01967165+0.14362114)

    ## [1] 0.8795314# Zn725.1663/(102.4267+725.1663)

    ## [1] 0.8762354

    Log Zn has more spatially-related variability, but only by a very small amount.

    Note: Full credit if you looked at those two numbers and said they have the same amount because thenumbers are so similar.

    e) 4 points

    juraLogZnfull.tgk

  • 40

    60

    80

    100

    120

    140

    160

    f) 4 points

    ggplot(data=data.frame(Zn = jura.k$var1.pred,LogZn = juraLogZnfull.tgk$var1.pred),

    aes(x = Zn, y = LogZn))+geom_point()

    40

    80

    120

    160

    40 80 120 160

    Zn

    LogZ

    n

    22

  • From this plot, I didn’t find much difference between these two methods.

    Note: Where there is a difference between the two predictions, the trans-Gaussian prediction is smaller. Youcan see this clearly when you overlay a straight line, where X=Y, on the plot. The base graphics commandto do this is abline(0,1), i.e. draw a line with intercept=0 and slope=1.

    23

    Problem 1a)b)c) 4 points.d)e)f)g) 4 pointsh)i)

    Problem 3a) 4 pointsb)c)

    Problem 4a) 4 pointsb) 4 pointsc)d) 4 pointse) 4 pointsf)g) 4 pointsh) 4 points

    Problem 5a)b)c)d)e)f) 4 pointsg)h)

    Problem 6a) 4 pointsb)c)

    Problem 7a)b)c)d)e) 4 pointsf) 4 points