27
Title Spatial Statistics for Point Processes and Lattice Data (Part I) Tonglin Zhang Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data

Spatial Statistics for Point Processes and Lattice Data ...huang251/pointlattice1.pdf · Lattice Data Since C is unknown, it is often assumed that C ∈ C where C is a collection

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Spatial Statistics for Point Processes and Lattice Data ...huang251/pointlattice1.pdf · Lattice Data Since C is unknown, it is often assumed that C ∈ C where C is a collection

Title

Spatial Statistics for Point Processesand Lattice Data (Part I)

Tonglin Zhang

Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data

Page 2: Spatial Statistics for Point Processes and Lattice Data ...huang251/pointlattice1.pdf · Lattice Data Since C is unknown, it is often assumed that C ∈ C where C is a collection

Outline

Outline

I Introduction

I Examples

I Point Processes

I Lattice Data

I Popular Models

I Connection

Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data

Page 3: Spatial Statistics for Point Processes and Lattice Data ...huang251/pointlattice1.pdf · Lattice Data Since C is unknown, it is often assumed that C ∈ C where C is a collection

Introduction

Introduction

Spatial Statistis has three areas:I Geostatistics: fixed station observations (kriging, correlation

functions, Gaussian process, and etc).

I Point Processes: locations are random (point patterns,marked point patterns, K-functions, and etc).

I Lattice Data: aggregated unit level data (cluster andclustering detection, spatial autocorrelation).

Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data

Page 4: Spatial Statistics for Point Processes and Lattice Data ...huang251/pointlattice1.pdf · Lattice Data Since C is unknown, it is often assumed that C ∈ C where C is a collection

Examples

Point Processes: Tree Locations

Ambrosia dumosa is a drought deciduous shrub with 20-60cm inheight which is abundant on well drained soils below one thousandmeter elevation. The data were were collected within a hectare(100× 100m2) area in the Colorado Desert in 1984. The datacontains

I locations 4358 Ambrosia dumosa trees;

I the height of the plant canopy;

I the length of the major axis of the plant canopy;

I the length of the minor axis of the plant canopy;

I the volume of the plant canopy.

Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data

Page 5: Spatial Statistics for Point Processes and Lattice Data ...huang251/pointlattice1.pdf · Lattice Data Since C is unknown, it is often assumed that C ∈ C where C is a collection

Examples

0 20 40 60 80 100

020

4060

8010

0

Figure : Tree locations in Ambrosia dumosa plant data.

Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data

Page 6: Spatial Statistics for Point Processes and Lattice Data ...huang251/pointlattice1.pdf · Lattice Data Since C is unknown, it is often assumed that C ∈ C where C is a collection

Examples

Point Processes: Forest Wildfires

−120 −118 −116 −114 −112 −110

5052

5456

5860

Longitude

Latit

ude

Figure : The Alberta (Canada) forest wildfire data contained wildfirelocations and area burned from 1996 to 2010.

Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data

Page 7: Spatial Statistics for Point Processes and Lattice Data ...huang251/pointlattice1.pdf · Lattice Data Since C is unknown, it is often assumed that C ∈ C where C is a collection

Examples

Point Processes: Earthquakes

Figure : The Japan Earthquake data contained earthquake locations andmagnitudes from 2002 to 2011.

Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data

Page 8: Spatial Statistics for Point Processes and Lattice Data ...huang251/pointlattice1.pdf · Lattice Data Since C is unknown, it is often assumed that C ∈ C where C is a collection

Examples

Lattice Data: Infant Morality

Rate249 - 11881188 - 18601860 - 22372237 - 31623162 - 39393939 - 7260

100 0 100 200 Miles

N

Figure : County level infant mortality rate per 100,000 in Guangxi, Chinain 2000.

Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data

Page 9: Spatial Statistics for Point Processes and Lattice Data ...huang251/pointlattice1.pdf · Lattice Data Since C is unknown, it is often assumed that C ∈ C where C is a collection

Examples

Lattice Data: Crimes

1979-84 rate0 - 1.411.41 - 2.462.46 - 3.123.12 - 3.793.79 - 4.514.51 - 6.156.15 - 46.57

80 0 80 Miles

N

EW

S

Figure : County level homicide rate (per 100,000) in St. Louise area from1978 to 1984.

Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data

Page 10: Spatial Statistics for Point Processes and Lattice Data ...huang251/pointlattice1.pdf · Lattice Data Since C is unknown, it is often assumed that C ∈ C where C is a collection

Examples

Lattice Data: Cancers

Marion

40 0 40 Miles

Legend56 - 9195 - 103104 - 114115 - 129130 - 151154 - 258

Figure : County level Male Colorectal Cancer Rate (per 100,000) between2003 and 2007.

Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data

Page 11: Spatial Statistics for Point Processes and Lattice Data ...huang251/pointlattice1.pdf · Lattice Data Since C is unknown, it is often assumed that C ∈ C where C is a collection

Point Processes

Definition

Let S ⊆ Rd be the domain. Let N(A) be the number of points inA for any A ⊆ S. Then, the distribution of N is given by

Pk [N(A1) = n1, · · · ,N(Ak) = nk ]

for any A1, · · · ,Ak ⊆ S and k ∈ N+.

Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data

Page 12: Spatial Statistics for Point Processes and Lattice Data ...huang251/pointlattice1.pdf · Lattice Data Since C is unknown, it is often assumed that C ∈ C where C is a collection

Point Processes

Intensity Functions

The k-th order intensity function of N (if it exists) as

λk(s1, · · · , sk) = lim|dsi |→0,i=1,··· ,k

{E [N(ds1) · · ·N(dsk)]

|ds1| · · · |dsk |

},

where si are distinct points in S, dsi is an infinitesimal regioncontaining si ∈ S, and |dsi | is the Lebesgue measure of dsi .

Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data

Page 13: Spatial Statistics for Point Processes and Lattice Data ...huang251/pointlattice1.pdf · Lattice Data Since C is unknown, it is often assumed that C ∈ C where C is a collection

Point Processes

I Both Pk and λk can be used as the distribution of N.

I People focus on λk more than Pk .

I For Poisson point process, if A1, · · · ,Ak are disjoint, then

Pk [N(A1) = n1, · · · ,N(Ak) = nk ] =k∏

i=1

µni (Ai )

ni !e−µ(Ai ),

where µ is the mean measure.

I If s1, · · · , sk are distinct, then

λk(s1, · · · , sk) =k∏

i=1

λ(si ).

I It shows that if Pk and λk exists for any k, then Pk and λk

are equivalent.

Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data

Page 14: Spatial Statistics for Point Processes and Lattice Data ...huang251/pointlattice1.pdf · Lattice Data Since C is unknown, it is often assumed that C ∈ C where C is a collection

Point Processes

Mean and Variance Functions

The mean measure of N is

µ(A) =

∫Aλ(s)ds.

The covariance structure of N is

Cov [N(A1),N(A2)]

=

∫A1

∫A2

[λ2(s1, s2)− λ(s1)λ(s2)]ds2ds1 +

∫A1∩A2

λ(s)ds.

Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data

Page 15: Spatial Statistics for Point Processes and Lattice Data ...huang251/pointlattice1.pdf · Lattice Data Since C is unknown, it is often assumed that C ∈ C where C is a collection

Point Processes

Let

g(s1, s2) =λ2(s1, s2)

λ(s1)λ(s2).

Then, g is called the pair correlation function and the covariancestructure is

Cov [N(A1),N(A2)]

=

∫A1

∫A2

[g(s1, s2)− 1]λ(s1)λ(s2)ds2ds1 +

∫A1∩A2

λ(s)ds.

Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data

Page 16: Spatial Statistics for Point Processes and Lattice Data ...huang251/pointlattice1.pdf · Lattice Data Since C is unknown, it is often assumed that C ∈ C where C is a collection

Point Processes

Strong Stationarity

A spatial point process N is said strong stationary if for anymeasurable A1, · · · ,Ak ∈ B(Rd) the joint distribution of

N(A1 + s), · · · ,N(Ak + s)

does not depend on s, where

Ai + s = {s′ + s : s′ ∈ Ai}.

Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data

Page 17: Spatial Statistics for Point Processes and Lattice Data ...huang251/pointlattice1.pdf · Lattice Data Since C is unknown, it is often assumed that C ∈ C where C is a collection

Point Processes

Second-Order Stationarity

If λ(s) is a constant and

λ2(s1, s2) = λ2(s1 − s2),

then N is called second-order stationary. In addition, if

λ2(s1, s2) = λ2(∥s1 − s2∥),

then N is called isotropic.If N is isotropic, then

g(s1, s2) = g(∥s1 − s2∥).

Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data

Page 18: Spatial Statistics for Point Processes and Lattice Data ...huang251/pointlattice1.pdf · Lattice Data Since C is unknown, it is often assumed that C ∈ C where C is a collection

Point Processes

K-functions

Suppose N is stationary. Let λ be the first-order intensity function.Then, the K -function is defined by

K (t) =1

λE [number of extra events within

distance of t of a randomly chosen event].

The L-function is

L(t) =

√K (t)

π− t.

In real application, K (t) is more often used.

Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data

Page 19: Spatial Statistics for Point Processes and Lattice Data ...huang251/pointlattice1.pdf · Lattice Data Since C is unknown, it is often assumed that C ∈ C where C is a collection

Lattice Data

Formulation

A study area S is partitioned in A1, · · · ,Am units. At least thereare

I Event counts: yi ;

I At risk population sizes: ni ;

I Explanatory Variables: xi .

Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data

Page 20: Spatial Statistics for Point Processes and Lattice Data ...huang251/pointlattice1.pdf · Lattice Data Since C is unknown, it is often assumed that C ∈ C where C is a collection

Lattice Data

Cluster Detection

SupposeYi ∼ Poisson(niθi )

where θi is called the incidence rate. If there is a cluster C (asubset of S) in the study area, then it is assumeed that θi = θc ifi ∈ C and θi = θ0 if i ̸∈ C . One tests

H0 : θc = θ0 ↔ Ha : θc > θ0.

Sometimes, one uses Ha : θc ̸= θ0.

Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data

Page 21: Spatial Statistics for Point Processes and Lattice Data ...huang251/pointlattice1.pdf · Lattice Data Since C is unknown, it is often assumed that C ∈ C where C is a collection

Lattice Data

Spatial Scan Test

Given C , the likelihood ratio statistic is

ΛC =(yC/nC )

yC (yC̄/nC̄ )yC̄

(y/n)y,

where yC =∑

i∈C yi , yC̄ =∑

i ̸∈C yi , y =∑n

i=1 yi , nC =∑

i∈C ni ,nC̄ =

∑i ̸∈C ni , and n =

∑mi=1 yi .

Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data

Page 22: Spatial Statistics for Point Processes and Lattice Data ...huang251/pointlattice1.pdf · Lattice Data Since C is unknown, it is often assumed that C ∈ C where C is a collection

Lattice Data

Since C is unknown, it is often assumed that C ∈ C where C is acollection of candiates of clusters. Then, the spatial scan statistic is

Λ = supC∈C

ΛC .

It can be seen that

I no explanatory variables are involved;

I data are Poisson; and

I disease rates are equal within clusters, and outside of clusters,respectively.

Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data

Page 23: Spatial Statistics for Point Processes and Lattice Data ...huang251/pointlattice1.pdf · Lattice Data Since C is unknown, it is often assumed that C ∈ C where C is a collection

Lattice Data

There are a few modifications:

I loglinear models: Zhang, T. and Lin, G. (2009). Spatial scanstatistics in loglinear models. Computational Statistics andData Analysis, 53, 2851-2858;

I overdispersion: Zhang, T., Zhang, Z. and Lin, G. (2012).Spatial scan statistics with over dispersion. Statistics inMedicine, 31, 762-774.

I zero inflation: Cancado, A.L.F., de-Silva, C.Q., and da Silva,M.F. (2014). A spatial scan statistic for zero-inflated Poissonprocess. Environmental and Ecological Statistics, 21, 627-650.

I zero inflation and overdispersion: de Lima, M.S., Duczmal,L.H., Neto, J.C., and Pinto, L.P. (2014). Spatial scanstatistics for models with overdispersion and inflated zeros.Statistica Sinica, preprint (doi:10.5705/ss.2013.220w).

Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data

Page 24: Spatial Statistics for Point Processes and Lattice Data ...huang251/pointlattice1.pdf · Lattice Data Since C is unknown, it is often assumed that C ∈ C where C is a collection

Lattice Data

Disease Mapping

Then, one can model

yi ∼ Poisson(niθi ),

whereθi = xti β + Ui

and Ui is a spatial random effect.

Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data

Page 25: Spatial Statistics for Point Processes and Lattice Data ...huang251/pointlattice1.pdf · Lattice Data Since C is unknown, it is often assumed that C ∈ C where C is a collection

Connection

SAR (Spatial Autoregressive) Model

Let u = (U1, · · · ,Um)T . Then, the model assumes

u = ρWu+ ϵ,

where wij = 1/|∂i | if j ∈ ∂i (neightbors of unit i) andϵ ∼ N(0, σ2

uI).

Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data

Page 26: Spatial Statistics for Point Processes and Lattice Data ...huang251/pointlattice1.pdf · Lattice Data Since C is unknown, it is often assumed that C ∈ C where C is a collection

Lattice Data

CAR (Conditional Autoregressive) Model

Let Ei be the expected value under the model without Ui . Then,the model assumes

u ∼ N(0, σ2u(I−W)−1D),

where σ2u > 0, D = diag(di ) with di = E−1

i , and wij = ρ√Ei/Ej

for j ∈ ∂i .

Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data

Page 27: Spatial Statistics for Point Processes and Lattice Data ...huang251/pointlattice1.pdf · Lattice Data Since C is unknown, it is often assumed that C ∈ C where C is a collection

Connection

I For any point process, it can be aggregated to a lattice datawithout xi .

I For any marked point process, it can be aggregated to alattice data with xi .

I If there are point process for events, point process for at riskpopulations, and point process for explanatory variables, wecan also get a lattice data with xi .

I Geostatistical data can also be used as xi .

Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data