2018 SISMID Module 14 Lecture 1: Introduction and Overviewfaculty.washington.edu/jonno/SISMIDmaterial/2018-SISMID-Lec1.pdf · 2018 SISMID Module 14 Lecture 1: Introduction and Overview

2018 SISMID Module 14Lecture 1: Introduction and Overview

Jon Wakefield and Lance Waller

Departments of Statistics and BiostatisticsUniversity of Washington

1 / 113

Outline

Course Logistics and Motivation

Standardization

Geographical Information Systems

Map Projections and Coordinate Reference Systems

Data Quality

Motivation for Time Series Tangent

Stationarity and Measures of Temporal Dependence

Models for Temporal Dependence

An Example

Point Process Models

2 / 113

Course Logistics and Motivation

3 / 113

Course Outline

DAY 1:I Mon 8.30–10.00: Lecture 1 (Wakefield) Introduction: Overview of course; Motivation;

Standardization; Data issues; GIS in R; Time Series Methods.I Mon 10.30–12.00: Lecture 2 (Waller) Initial examinations of spatial data; Questions that can

be asked. More on GIS.I Mon 1.30–3.00: Lecture 3 (Waller) Point processes; K functions.I Mon 3.30–5.00 Lecture 4 (Wakefield) Space and space-time disease mapping; INLA for

implementation.

DAY 2:I Tue 8.30–10.00: Lecture 5 (Waller) Spatial regression including geostatistics.I Tue 10.30–12.00: Lecture 6 (Wakefield) Clustering and cluster detection with aggregate data;

Scan statistics.I Tue 1.30–3.00: Lecture 7 (Waller) Slippery slopes: spatially varying coefficientsI Tue 3.30–5.00: Lecture 8 (Wakefield) Disease dynamics/infectious diseases; illustrated with

measles and flu examples.I Tue 5.00–6.00: R session: Exercises.

DAY 3I Wed 8.30–10.00: Lecture 9 (Waller) Disease ecology.I Wed 10.30–12.00: Lecture 10 (Wakefield) Small-area estimation.

4 / 113

Spatial books

I Baddeley, A., Rubak, E. and Turner, R. (2015). Spatial Point Patterns: Methodology andApplications with R, CRC Press. Technical.

I Banerjee, S., Gelfand, A.E. and Carlin, B.P. (2014). Hierarchical Modeling and Analysis forSpatial Data, Second Edition, CRC Press. Technical.

I Blangiardo, M. and Cameletti, M. (2015). Spatial and Spatio-Temporal Bayesian Models withR-INLA, John Wiley and Sons. Technical, and focussed on models that can be with theintegrated nested Laplace approximation (INLA) method.

I Bivand, R.S., Pebesma, E.J. and Gomez-Rubio, V. (2013). Applied Spatial Data Analysis withR, Second Edition, Springer. Available on-line at UW libraries. The definitive guide to GIS inR but not an easy book to learn from, unless already proficient in R.

I Brundson, C. and Comber, L. (2015). An Introduction to R for Spatial Analysis and Mapping.Sage. Good introductory level, but not much on analysis. Moredescriptive/exploratory/vizualization.

I Darmofal, D. (2015). Spatial Analysis for the Social Sciences. Cambridge. As the titlesuggests, specific to the social sciences.

I Diggle, P.J. (2013). Statistical Analysis of Spatial and Spatio-Temporal Point Patterns. CRCPress.

I Diggle, P.J. and P.J. Ribeiro (2007). Model-Based Geostatistics, Springer.I Elliott, P., Wakefield, J., Best, N. and Briggs, D. (2000). Spatial Epidemiology: Methods and

Applications, Oxford University Press.I Gelfand, A.E., Diggle, P.J., Fuentes, M. and Guttorp, P. (2010). Handbook of Spatial

Statistics, CRC Press.I Haining, R. Spatial Data Analysis: Theory and Methods, Cambridge.

5 / 113

Spatial books

I Lawson, A.B. (2006). Statistical Methods in Spatial Epidemiology, 2nd Edition, John Wileyand Sons.

I Lawson, A.B., Browne, W.J. and Rodeiro, C.L.V. (2003). Disease Mapping with WinBUGSand MLwiN, John Wiley and Sons.

I Schabenberger, O. and Gotway, C.A. (2004). Statistical Methods for Spatial Data Analysis,CRC Press. More on the theory side.

I Shaddick, G. and Zidek, J. (2015). Spatio-Temporal Methods in Environmental Epidemiology,CRC Press.

I Stein, M.L. (1999). Interpolation of Spatial Data: Some Theory for Kriging, Springer.Theoretical and concentrates on geostatistical models.

I Waller, L.A. and Gotway, C.A. (2004). Applied Spatial Statistics for Public Health Data, Wiley,New York. A very good book, at an intermediate level.

I Ward, M.D. and Gleditsch, K. S. (2008). Spatial Regression Models. Sage. Webpage,containing data and R code: http://privatewww.essex.ac.uk/~ksg/srm_book.html

6 / 113

http://privatewww.essex.ac.uk/~ksg/srm_book.html

Background reading

R Computing Environment:I Hills, M., Plummer, M. and Carstensen, B. (2006). Statistical

Practice in Epidemiology with R. Available on class web site.I Krause, A. and Olson, M. (2005). The Basics of S-Plus, Fourth

Edition, Springer-Verlag. Available online from UW libraries.There are many others...

Bayesian Statistics:I Hoff, P.D. (2009). A First Course in Bayesian Statistical Methods.

Springer. Available online from UW libraries.

7 / 113

Logistics

Demonstrations of methods via R implementations will be carried outin class. Students are encouraged to follow along.

Code and other materials (course notes, papers) are available at thecourse website:

http://faculty.washington.edu/jonno/SISMIDspatial.html

R command files containing R code are on the website.

8 / 113

Motivation: Spatial Epidemiology

Epidemiology: The study of the distribution, causes and control ofdiseases in human populations.

Disease risk depends on the classic epidemiological triad of person(genetics/behavior), place and time – spatial epidemiology focuses onthe second of these.

Place is a surrogate for exposures present at that location,e.g. environmental exposures in water/air/soil, or the lifestylecharacteristics of those living in particular areas.

Time, which may be measured on different scales(age/period/cohort), is also a surrogate for aging processes andexposures/experiences accrued.

In a perfect world we would have data on residence history, so thatwe could examine space-time interactions in detail.

9 / 113

Three Types of Analyses

Key Point: Units are not uniformly distributed in space, therefore weneed information on the background spatial distribution of the units inorder to infer whether the spatial distribution of units of interest(e.g. cases) differs.

I Geostatistical data in which “exact” residential locations exist forthe points, and spatial regression and/or prediction is of interest.

I Area data in which aggregation (typically over administrativeunits) has been carried out. These data are ecological in nature,in that they are collected across groups, in spatial studies thegroups are geographical areas.

I Point data in which we are interested in the actual configurationof a set of point, or points.

10 / 113

Need for spatial methods

All investigations are spatial!

But often the study area is small and/or there is abundantindividual-level information and so spatial location is not acting as asurrogate for risk factors.

When do we consider the spatial component?I When we are explicitly interested in the spatial pattern of disease

incidence? e.g. mapping, cluster detection.I When we want to leverage spatial dependence in rates to

improve estimation, e.g. small area estimation.I The clustering may be a nuisance quantity that we wish to

acknowledge, but are not explicitly interested in?I In spatial regression we want to get appropriate standard errors,

but we may wish to model the association as a function of space,as in geographically weighted regression.

11 / 113

The selection mechanism

If we are interested in the spatial pattern then, if the data are not acomplete enumeration, we clearly we would prefer the data to be“randomly sampled in space”, i.e., not subject to selection bias withthe extent of bias depending on the spatial location of the individual.

For example, in a matched case-control study, we may match controlson the geographical region of the cases, which will clear distort thegeographical distribution of controls (so that they will not berepresentative of the population at risk).

In small-area estimation this is a serious consideration because theavailable data are often gathered via complex survey designs inwhich, for example, cluster sampling, stratified sampling andover-sampling of certain populations is carried out for reasons oflogistics, or because of power considerations.

12 / 113

Complex survey data

In complex survey design settings, design weights are typicallyreported, these reflect the design, and can often be interpretated asthe number of individuals represented by that surveyed individual.

In this case, careful thought is required to determine whether theweights should be incorporated in the analysis.

Example: Suppose we are interested in the prevalence of diabetesacross areas with samples being taken and diabetes statusdetermined. Suppose we oversample African Americans, if oneignores the design we will obtain biased estimates of prevalencebecause diabetes is associated with race.

13 / 113

MotivationGrowing interest in spatial epidemiology due to:

I Public interest in effects of environmental “pollution”,e.g., Sellafield, UK (Gardner, 1992), Three-Mile Island, US.

I Acknowledgment that many environmental/man-made riskfactors may be detrimental to human health.

I Development of statistical/epidemiological methods forinvestigating disease “clusters”.

I Epidemiological interest in the existence of large/medium spreadin chronic disease rates across different areas.

I Data availability: collection of health, population and exposuredata at different geographical scales.

I Evidence-based decision making, regarding interventions, forexample, requires point and interval estimates for relevantquantities. Prevalence mapping and small area estimation areboth endeavors that provide estimates with associated measuresof uncertainty.

I Increase in computing power and tools such as GeographicalInformations Systems (GIS).

14 / 113

Standardization

15 / 113

Indirect StandardizationThe method of indirect standardization produces the standardizedmortality/morbidity ratio (SMR):

SMRi =Yi∑J

j=1 Nijqj=

Yi

Ei

where Yi is the number of cases in area i , Nij is the population inconfounder strata j and area i and qj is a reference risk.

The indirectly standardized rate compares the total number of casesin an area to those that would result if the risks in a referencepopulation were applied to the population of area i .

Which reference rates should be used?

In a regression setting dangerous to use internal standardization inwhich qj = Yj/Nj since we might distort the effect of the exposure ofinterest.

External standardization uses risks/rates from another region oranother time period. 16 / 113

Expected Numbers: Derivation

We have E[Yij ] = Nij rij where rij is the risk in area i and stratum j .

In a general model we would try to estimate all parameters rij which,for small areas in particular, is not possible.

As an alternative we can assume the proportionality model

rij = θi × qj

so that θi is the relative risk associated with area i ,since

θi =rij

qj,

for all j , i.e., a ratio of risks.

17 / 113

Expected NumbersWe then have

E[Yij ] = Nijθi × qj .

Summing over stratum:

E[Yi ] = E

J∑j=1

Yij

= θi

J∑j=1

Nijqj = θiEi

where the expected numbers are defined as

Ei =J∑

j=1

Nijqj .

This assumption removes the need to estimate J risks in each areaand we have the mean model

E [Yi ] = Eiθi ,

and the aim is often to model θi as a function of space andspatially-referenced covariates.

18 / 113

Expected Numbers

A starting model for Yi is often a Poisson model with mean Eiθi .

In some cases the outcome of interest will not be available by stratumfor each area (i.e., we don’t have Yij only Yi ).

But we may have access to rates by stratum (nationally or regionally,for example).

Hence, if we have population counts by stratum (which are oftenavailable) then we can still fit the model

Yi |θi ∼ Poisson(Eiθi ).

19 / 113

SMRs

The SMR in area i isSMRi =

Yi

Ei

and is an estimate of the relative risk θi .

If incidence is measured then also known as the StandardizedIncidence Ratio (SIR).

Control for confounding may also be carried out using regressionmodeling.

20 / 113

600

800

1000

1200

1.1

2140

6079

600

800

1000

1200

01.

42.

94.

35.

7

Figure 1: Expected numbers (left) and SIRs (right) for Scottish male lipcancer incidence. Age is adjusted for in the expected numbers.

21 / 113

GIS

22 / 113

Geographical Information Systems (GIS)

A GIS is a computer-based set of tools for collecting, editing, storing,integrating, displaying and analyzing spatially referenced data.

A GIS allows linkage and querying of geographically indexedinformation.

So for example, for a set of geographical residential locations a GIScan be used to retrieve characteristics of the neighborhood within thelocations lies (e.g. census-based measures such as populationcharacteristics and distributions of income/education), and theproximity to point (e.g. incinerator) and line (e.g. road) sources ofpollution.

Buffering – a specific type of spatial query in which an area is definedwithin a specific distance of a particular point, line or area.

A GIS can store data as vectors (points, lines, areas) or as rasters(pixels).

23 / 113

Geographical Information Systems (GIS)

Time activity modeling of exposures – we may trace the pathway ofan individual, or simulate the movements of a population groupthrough a particular space-time concentration field, in order to obtainan integrated exposure.

In this course, we will not use any exclusive GIS products, but usecapabilities within R to combine datasets and display maps.

Examples, created in R (see accompanying notes):I Figure 2: Washington state counties with 2 highlighted.I Figure 3 Washington state county populations.I Figure 4 John Snow pumps and cases.

24 / 113

King Spokane

Figure 2: Washington State counties with two counties highlighted.

25 / 113

under 8720.28720.2 − 17110.517110.5 − 27780.827780.8 − 3877538775 − 58634.558634.5 − 97339.597339.5 − 201811.5over 201811.5

Figure 3: Washington State counties with two counties highlighted.

26 / 113

Oxford MarketCastle St E

Oxford St #1Oxford St #2

Gt Marlborough

Crown Chapel

Broad St

WarwickBriddle St

So SohoDean St

Coventry StVigo St

Figure 4: John Snow pumps, etc...

27 / 113

Map Projections and Coordinate ReferenceSystems

28 / 113

Overview, see Waller and Gotway (2004, Chapter 3)and Bivand et al. (2013, Chapter 4)

It is clearly fundamentally important to have a mechanism tonumerically represent spatial location.

Two key ingredients are:I A coordinate reference system (CRS).I A map projection.

The CRS allows, amongst other things, datasets to be combined (wemay need to transform between projections, which may be achievedusing the spTransform function in R).

The objective is to represent attributes within some region on the faceof the Earth.

Most countries have multiple CRS.

29 / 113

Projections

If we have a set of points on the surface of the Earth they may berepresented by their associated latitude and longitude; this anangular system.

Lines of longitude pass through the north and south poles. The originis the line set to 0◦ and is the line of longitude passing through theGreenwich Observatory in England.

Longitude can be measured in degrees (0◦ to 180◦) east or west fromthe 0◦ meridian.

Latitude and longitude can be approximated based on a model for theshape of the Earth – known as a datum:

I A simple datum is a spheroid (a sphere that is flattened at thepoles and bulges at the equator.

I The most commonly used datum is called WGS84 (WorldGeodesic System 1984).

30 / 113

Figure 5: The latitude (φ) of a point is the angle between the equatorial planeand the line that passes through a point and the center of the Earth.Longitude (λ) is the angle from a reference meridian (lines of constantlongitude) to a meridian that passes through the point.

From http://www.rspatial.org/spatial/rst/6-crs.html

31 / 113

http://www.rspatial.org/spatial/rst/6-crs.html

Projections

Due to the curvature of the Earth the distance between two meridians(line of longitude) depends on where we are.

Latitude references North-South position and lines of latitude (calledparallels) are perpendicular to lines of longitude.

The equator is defined as 0◦ latitude.

Once a coordinate system is decided upon once must decide onwhich projection is to be used for display, i.e., the map projection.

32 / 113

Projections

The process of creating map projections can be understood byconsidering the positioning of a light source inside a transparentglobe on which opaque Earth features are placed.

Then project the feature outlines onto a two-dimensional flat piece ofpaper.

Different ways of projecting can be produced by surrounding theglobe in a

I cylindrical fashion,I as a cone, orI as a flat surface.

Each of these methods produces what is called a map projectionfamily.

33 / 113

Figure 6: The three projections: a) cylindrical projections, b) conicalprojections, c) planar projections.

From: https://docs.qgis.org/testing/en/docs/gentle_gis_introduction/coordinate_reference_systems.html

34 / 113

https://docs.qgis.org/testing/en/docs/gentle_gis_introduction/coordinate_reference_systems.html

https://docs.qgis.org/testing/en/docs/gentle_gis_introduction/coordinate_reference_systems.html

Projections

Different map projections distort areas, shapes, distances anddirections in different ways.

When move from 3-dimensions to 2-dimensions, it is intuitivelyobvious that we will lose some information.

Over the years, many weird and wonderful projections have beeninvented.

Conformal (e.g. Mercator) projections preserve local shape.

Equal-area (e.g. Albers) projections preserve local area.

Once projection has taken place a grid system must be established.

See https://en.wikipedia.org/wiki/List_of_map_projections

for many examples.

35 / 113

https://en.wikipedia.org/wiki/List_of_map_projections

Figure 7: The Mercator projection is a cylindrical projection that is conformal.

Source: By Strebe - Own work, CC BY-SA 3.0,https://commons.wikimedia.org/w/index.php?curid=17700069

36 / 113

https://commons.wikimedia.org/w/index.php?curid=17700069

Figure 8: Robinson pseudocylindrical projection is neither equal-area norconformal, but goes for a compromise.


37 / 113


Figure 9: The UN logo uses the Azimuthal Equidistant projection.

38 / 113

Data Quality

39 / 113

Data Quality Issues

In routinely carried out investigations the constituent data are oftensubject to errors; local knowledge is invaluable forunderstanding/correcting these errors.

Wakefield and Elliott (1999) contains more discussion of theseaspects.

Population dataI Population registers are the gold standard but counts from the

census are those that are typically routinely-available.I Census counts should be treated as estimates, however, since

inaccuracies, in particular underenumeration, are common.I For inter-censual years, as well as births and deaths, migration

must also be considered.I The geography, that is, the geographical areas of the study

variables, may also change across censuses which causescomplications.

40 / 113

Data Quality Issues

Health dataI For any health event there is always the possibility of diagnostic

error or misclassification.I For other events such as cancers, case registrations may be

subject to double counting and under registration.Exposure data

I Exposure misclassification is always a problem inepidemiological studies.

I Often the exposure variable is measured at distinct locationswithin the study region, and some value is imputed for all of theindividuals/areas in the study.

I A measure of uncertainty in the exposure variable for eachindividual/area is invaluable as an aid to examine the sensitivityto observed relative risks.

Combining the population, health and exposure data is easiest if suchdata are nested, that is, the geographical units are non-overlapping.

41 / 113

Data Quality Issues

GIS dataI Digitized boundaries from different sources can be wildly

different.I These may have an impact on population estimates within areas.I Point locations may not map correctly to geographical areas.I Residential address locations may be very inaccurate,

particularly in rural areas.

42 / 113

Conclusions

Key questions to ask:I Why are we interested in space?I What is the aim of the study: description, exploration, specific

risk factors of interest, estimation of prevalence/incidence/totals?I What is the interpretation of the parameters of the model?I How does the sampling scheme impact modeling/interpretation?I What are the important confounders and how can we control for

these variables?I What is the best map projection and CRS for the problem we are

working on? If the study region is geographically small, thenprobably won’t make much difference.

43 / 113

Motivation for Time Series Tangent

44 / 113

Motivation for examining time series methods

It is useful to briefly review time series methods since:I There are strong analogies between time series analysis, and

spatial analysis.I Space-time models use time series models.I Time is simpler than space, because it is one-dimensional, and

often time series data are collected on a regular schedule (unlikespatial data in a social science/health setting). Hence,understanding time series models is more straightforward.

45 / 113

Stationarity and Measures of TemporalDependence

46 / 113

A simple time series model

Suppose first we have a continuous response y(t) and a covariatex(t) measured on an equally spaced schedule (which forconcreteness we shall call days) and we will index responses andcovariates by t = 1, . . . ,n.

A simple regression model is:

Y (t) = β0 + x(t)β1 + e(t),

for t = 1, . . . ,n, where the unobserved error term e(t) can bedecomposed as

e(t) = ε(t) + S(t),

whereI the ε(t) are unobserved independent error terms andI S(t) are unobserved errors with temporal structure.

Usually x(t) will have temporal structure as well.

47 / 113

What are error terms?

In general, the stochastic error terms e(t) reflect variability in Y (t) not‘explained’ by the deterministic part of the model β0 + x(t)β1,including:

I Missing covariates associated with the response.I Data errors

If these elements have temporal structure they will be absorbed intoS(t), otherwise they manifest themselves in ε(t).

48 / 113

The autocorrelation function

We might start by fitting a simple regression model,

Y (t) = β0 + x(t)β1 + e(t),

where the e(t)’s are independent error terms with constant variance.

To investigate whether there is temporal dependence, we would likediagnostic methods.

We now discuss:I stationary processes,I autocorrelation function as the correlation between residuals for

equally spaced data, andI the semi-variogram as a measure for dependence that may be

used for unequally spaced data.

49 / 113

Stationary ProcessesA temporal point process is a random mechanism that results in a setof one-dimensional points.

Without further assumptions, a single time series of length n, can(unfortunately) be viewed as a single realization, due to the potentialdependence in time.

The problem is how to assess temporal dependence between allpairs of the n responses.

With a single series, we cannot estimate the n × n elements of thevariance-covariance matrix (n/(n − 1)/2 + n of which are distinct),with

I the diagonal terms representing the variances, andI the off-diagonal terms the covariances between all pairs of

responses.If our series ‘looks the same’ whenever we observe it, then we canbegin to make inference, because we have some notion of replication.

50 / 113

Stationary Processes

This motivates stationarity.

Strict stationarity says that the joint probability distribution of thecollection of m random variables

{Y (ti + s), i = 1, . . . ,m}

is the same for all s.

Simplest non-trivial example: If m = 2 and our two observations aremeasured at t1 = 1, t2 = 5 then we are saying that the bivariatedistribution of Y (2),Y (5) is the same as Y (2 + s),Y (5 + s); that is,the distribution does not change as we slide the pair along the timeaxis (2 means, 2 variance, and covariance are the same, as is thecomplete distribution).

This is very strong, and uncheckable, and so we we go for somethingweaker, but we begin with some definitions.

51 / 113


For the stochastic process Y (t), we write

Y (t) = µ(t) + e(t),

where µ(t) is the deterministic trend component.

Let

γ(t , t ′) = cov{Y (t),Y (t ′)}= E[{Y (t)− µ(t)}{Y (t ′)− µ(t ′)}]= E[e(t)e(t ′)],

denote the autocovariance function of Y (t).

The covariance is measuring the linear dependence betweenresponses at times t and t ′.

If the covariance between responses at t and t ′ is positive, it means ifY (t) is large (small), Y (t ′) tends to be large (small) also.

52 / 113


Reminder: γ(t , t ′) = cov{e(t),e(t ′)}.

Definition: A process e(t) is second-order stationary if E[e(t)] isconstant, for all t , and the autocovariance γ(t , t ′) depends only on|t − t ′|. We write C(t − t ′) = C(s) to define the autocovariance in thiscase.

Notes:I Second-order stationary is the stationarity we refer to, from now

on.I For a residual process e(t) any non-zero constant has been

absorbed into µ(t).I It doesn’t matter where we look at the time series, the

dependence only depends on the relative time between t and t ′.So γ(3,7) = C(4) is the same as γ(12,16) = C(4).

53 / 113

The Autocorrelation FunctionFor a second-order stationary random process, e(t), theautocovariance function is

cov{e(t),e(t + s)} = C(s),

so that C(0) is the variance of e(t) for all t .

The autocorrelation function is defined as

ρ(s) =C(s)

C(0).

The autocorrelation function ρ(s) measures the correlation betweenerror terms s units apart, with

−1 ≤ ρ(s) ≤ +1.

Since the error terms have zero mean:

ρ(s) =E[e(t)e(t + s)]

E[e(t)2](1)

for s = 1,2, . . . .

54 / 113


Back to our discretely observed data {yt , xt}, and suppose we fit themodel

Yt = β0 + xtβ1 + εt ,

assuming var(εt ) = σ2 and cov(εt , εt′) = 0 for all t , t ′.

We then form the residuals

rt = yt − yt

= yt − β0 − xt β1,

for t = 1, . . . ,n.

55 / 113


From (1), the obvious estimates are the empirical autocorrelations:

ρs =1n

∑nt=1 rt rt+s

1n

∑nt=1 r2

t

We then plot ρs versus s to give a correlogram.

For a white noise process1, the ρs are approximately N(0,1/n)distributed, for large n.

Therefore, approximate 95% confidence intervals under no temporaldependence are ±2/

√n.

1zero mean, independent random variables with constant (finite) variance56 / 113

The Autocorrelation Function

Consider the modelY (t) = µ(t) + e(t)

where µ(t) is the deterministic trend component and e(t) thestochastic component.

There is a fundamental difficulty with trying to decompose Y (t) intothe trend and the stochastic component in a single series becausethe two are unidentifiable without further assumptions.

Bartlett (1964) showed that a a process with constant intensity andclusters of points is not distinguishable from a process withindependent points from a randomly varying intensity.2

Is it serial dependence in the residuals, or a high-order polynomialtrend, for example?

2an example of a Cox process57 / 113

The semi-variogram

The semi-variogram can deal with unequally-spaced data.

The semi-variogram is defined, for a process e(t) and s ≥ 0 by

ν(s) =12

var [e(t)− e(t + s)] =12

E[{e(t)− e(t + s)}2

].

If the process displays dependence then if t and t + s are ‘close’ thevariables measured at these points are likely to be positivelycorrelated, and so will tend to be similar, so that the variance of thedifference is relatively small.

58 / 113

The semi-variogramRecall that for a second-order stationary process, E[e(t)] = 0 for all tand cov{e(t),e(t + s)} only depends on s (which implies constantvariance).

For a second-order stationary smooth process

ν(s) =12{

E[e(t)2] + E[e(t + s)2]− 2E[e(t)e(t + s)]}

= σ2e{1− ρ(s)},

where var(e) = σ2e .

So the semi-variogram is a flipped over and scaled version of theautocorrelation.

As s increases then for observations far apart in space

ν(s)→ var [e(t)] = σ2e ,

which (recall) is assumed constant.59 / 113

The semi-variogram

Recall, the semi-variogram is given by

ν(s) =12

var [ e(t)− e(t + s) ] .

If there is temporal dependence then points close together will tend tobe similar and so the variance of the pairwise difference will be small,but will increase as the distance increases.

At some distance the points will become independent, and thesemi-variogram will take on the variance of the process, this is calledthe sill, and the distance at which this occurs, the range.

60 / 113

VariogramsOften we would like to include an additional variance for“measurement error”.

We decompose the residual error into

e(t) = ε(t) + S(t)

withI ε(t) independent error terms with

var[ε(t)] = σ2ε

andcov[ε(t), ε(t + s)] = 0,

I S(t) temporally dependent error terms with

var[S(t)] = σ2τ

andcov[S(t),S(t + s)] = σ2

τρ(s).

61 / 113

Variograms

In this case the form of the variogram is

ν(s) =12{

E[e(t)2] + E[e(t + s)2]− 2E[e(t)e(t + s)]}

= σ2ε + σ2

τ{1− ρ(s)},

where var[e(t)] = σ2ε + σ2

τ .

The value of the semi-variogram close to the origin is of particularinterest, since it determines the degree of smoothness of the process.

In spatial statistics, the discontinuity at the origin is called the nuggeteffect, and is often due to measurement error.

Figure 10 displays these concepts.

62 / 113

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Distance

Sem

i−Va

rianc

e

NuggetEffective Range

Sill

Partial Sill

Total Variance

Figure 10: Theoretical semi-variogram ν(s) versus ‘distance’.

63 / 113

The semi-variogram for equally-spaced data

The semi-variogram is given by

ν(s) =12

var [ e(t)− e(t + s) ]

=12

E{

[e(t)− e(t + s)]2}.

With equally-spaced data we could calculate the empiricalsemi-variogram as

ν(s) =1

n − s

s∑t=1

(rt − rt+s)2

2,

and plot against s.

64 / 113

The semi-variogram for unequally-spaced data

For unequally-spaced data, it’s a little more tricky, because thenumber of pairs of residuals s time units apart is likely to be small (1in the limit).

For all pairs of residuals (ri , rj ), corresponding to times (ti , tj ) we cancalculate the contributions

(ri − rj )2

2.

These quantities may be plotted against

|ti − tj |,

to give a variogram cloud.

65 / 113

The semi-variogram for unequally-spaced data

Often it is useful to look at the binned variogram in which distancesare binned, with the contributions within each bin averaged.

Specifically, if s is the midpoint of a bin:

υ(s) =1

2|Ns|∑

i,j∈Ns

(ri − rj )2,

where Ns is the collection of all pairs of residuals that give distancesin the bin with mid-point s, and |Ns| is the number of these pairs.

Note: we could form a binned version of the autocorrelation also.

66 / 113

Extent of temporal dependence

We might envisage different scales of temporal dependence:I Large scale trends may be removed using a linear time effect

(i.e., S(t) = β × t).I Medium scale trends may be removed using spline models often

embedded within a generalized additive model (GAM)framework, see Dominici et al. (2004).

I Small scale trends – we will describe both discrete andcontinuous time models.

67 / 113

Spline models

In air pollution context, we would like to adjust for weather, season,and influenza epidemics (when we don’t have explicit observations onthese variables).

Many different flavors of splines (smoothing, penalized,...).

Wakefield (2013, Chapter 11) gives a summary.

68 / 113

Spline models

The spline contribution at time t is of the form

S(t) =K∑

k=1

bk (t , tk )δk ,

whereI K is the number of knots,I tk are the knot locations,I bk is a basis (building blocks for the spline regression),I δk is the coefficient associated with the k -th basis. A penalization

of the form λ∑K

k=1 δ2k is sometimes used, with λ > 0 fixed via

some criteria (AIC, cross-validation, estimation in a mixedmodel).

The key decision to obtain the right level of smoothing: either throughchoice of K or λ; too much smoothing and the association isremoved, too little smoothing and the association is over-estimated.

69 / 113

Models for Temporal Dependence

70 / 113

Small scale dependenceConsider the autoregressive (AR) process of order 1:

St = ρSt−1 + τt , τt ∼iid N(0, σ2τ ) (2)

with |ρ| < 1.

This model carries out smoothing of the time series on a short timescale.

Recall the error term is decomposed as

e(t) = ε(t) + S(t),

with ε(t) zero mean and independent with variance σ2ε .

The scale of the dependent term is determined by σ2τ and so the

relative contribution of the temporal dependence to the total is

σ2τ

σ2ε + σ2

τ

.

The usual case of positive temporal dependence corresponds toρ > 0, with ρ = 0 giving independence.

71 / 113

Small scale dependenceAlternatively we may write this model in the form

St |St−1, . . . ,T1 ∼ N(ρSt−1, σ2τ ) (3)

for t = 2, . . . ,n.

The model is an example of a first-order Markov model, since theobservation at t only depends on the previous observation at t − 1.

We add a convenient choice for the first observation:

S1 ∼ N(0, σ2τ [1− ρ2]−1).

Then S = [S1, . . . ,Sn]T ∼ N(0,Σ) where

Σ = Σ(σ2τ , ρ) =

σ2τ

1 − ρ2

1 ρ ρ2 . . . ρn−1

ρ 1 ρ . . . ρn−2

......

.... . .

...ρn−1 ρn−2 ρn−3 . . . 1

72 / 113

Small scale dependenceWhen we start thinking about spatial models the directed conditionaldistributions as in (3) are not so useful.

An equivalent (undirected) representation of the AR(1) model is

St |S−t ∼

N(ρSt+1, σ

2τ ) t = 1

N(

ρ1+ρ2 (St−1 + St+1),

σ2τ

1+ρ2

)1 < t < n

N(ρSt−1, σ2τ ) t = n

(4)

where S−t = [S1, . . . ,St−1,St+1, . . . ,Sn] is the vector oftime-dependent errors with the t-th error, St , removed.

For the points not at the end, the mean is the weighted average of thetwo adjacent points.

Important Point: A first-order Markov model does not correspond to(marginal) correlation that only extends for one time unit.

This representation indicates a way forward in the spatial setting.73 / 113

Small scale dependenceTo write down a likelihood we need the probability distributionp(S|σ2

δ , ρ) which is multivariate normal, i.e., S|σ2δ , ρ ∼ N{0,Σ(σ2

δ , ρ)}.

The density is

p(S|σ2δ , ρ) = (2π)−n/2|Σ|−1/2 exp

(− 1

2σ2δ

STΣ−1S), (5)

where the precision matrix is defined as Q = Σ−1 and because Σ is asymmetric Toeplitz matrix the inverse is available as (Schott, 1997):

Q =

1 −ρ 0 . . . 0−ρ 1 + ρ2 −ρ . . . 0...

......

. . . . . .0 . . . −ρ 1 + ρ2 −ρ0 . . . . . . −ρ 1

This is a sparse matrix which has huge implications for computation,since the determinant is straightforward to calculate, and theexponent in (5) is easy also.

74 / 113

Small scale dependence

This model may be extended to unequally spaced points if we thinkabout the covariance model

cov[S(t),S(t + s)] = σ2τρ

s,

for s > 0.

If we write ρ = exp(−φ) we obtain the exponential covariance model:

cov[S(t),S(t + s)] = σ2τ exp(−φs),

for s > 0 and φ > 0.

This model is sometimes extended to

cov[S(t),S(t + s)] = σ2τ exp(−φsψ),

for s > 0, φ > 0 and 0 < ψ ≤ 2, though the Matern class we describenext has better theoretical and applied properties.

75 / 113

Small scale dependence

Many forms of covariance models have been suggested; in spatialstatistics, a popular choice is the Matern covariance function (Stein,1999):

cov[S(t),S(t + s)] =σ2τ

Γ(υ + 1/2)(4π)1/2κ2υ2υ−1 (κs)υKυ(κs)

where Kυ(·) is a modified Bessel function of the second kind, κ > 0 isa scale parameter and υ > 0 is a smoothness parameter.

In general, difficult to estimate many parameters in a spatial modeland often υ is fixed.

The form looks complex, but it has good properties.

76 / 113

RW1 Model

We describe a particular limiting autoregressive model that is apopular tool for nonparametric smoothing.

The model takes the limit of the AR(ρ) model as ρ→ 1 and takes theform

I Stage 1: Yt = µt + St + εt , εt ∼iid N(0, σ2ε ).

I Stage 2: St = St−1 + τt , τt ∼iid N(0, σ2τ ).

This is known as a random walk model of order one, which we writeas RW1.

Note: depends on a single parameter σ2τ .

77 / 113

RW1 Model

The undirectional version is

St |St−1,St+1 ∼ N(

12

(St−1 + St+1),σ2τ

2

),

for 1 < t < n.

For prediction, future values have the conditional distribution:

Sn+s|S1, . . . ,Sn, σ2τ ∼ N

Sn︸︷︷︸Predictive Mean

, s × σ2τ︸︷︷︸

Predictive Variance

,

for s > 0.

Hence, predictions into the future have the same level, and thevariance is linear in s.

78 / 113

RW2 Model

For the RW1 model, a least squares fit to the two adjacent pointsSt−1 and St+1 gives a fitted mean of 1

2 (St−1 + St+1).

The RW2 model gives more smoothing by smoothing over 4neighbors.

The undirectional version is

St |St−1,St−2,St+1 ∼ N{

46

(St+1 + St−1)− 16

(St+2 + St−2),σ2τ

6

}1 < t < n.

A least squares fit to the four adjacent points St−2,St−1,St+1,St+2gives a fitted mean of

46

(St+1 + St−1)− 16

(St+2 + St−2).

79 / 113

RW2 Model

For prediction, future values have the conditional distribution:

Sn+s|S1, . . . ,Sn, σ2τ ∼ N

(1 + s)Sn − sSn−1︸︷︷︸Predictive Mean

, (1 + 22 + · · ·+ s2)× σ2τ︸︷︷︸

Predictive Variance

,

for s > 0.

Hence, the temporal trend is determined by the last two points, andwe have a linear trend.

So the trend is more flexible, but the variance is larger, whencompared to the RW1 model.

80 / 113

RW1 and RW2 Models

Figure 11 shows simulated data from a sine curve and then fit usingRW1 and RW2 models.

The resultant fits are in indicated.

Note that the RW2 fit is smoother.

The RW1 and RW2 models are usually fitted using a Bayesianapproach; the prior on σ2

τ can be used to control the amount ofsmoothing:

I Giving greater weight to smaller (larger) values of σ2τ gives more

(less) smoothing.I In the limit as σ2

τ → 0, the RW1 model tends to a horizontal line,and the RW2 model tends to a linear trend in time.

81 / 113

RW1 and RW2 Models

●

●

●

●

●●

●

●●

●●

●●●

●●

●

●

●

●

●

●●●●

●

●

●

●

●

●●

●

●●●●

●

●

●

●

●●

●●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●●

●

●●

●●

●●

●

●

●

●●

●

●●●

●

●

●

●●

●●

●

●●

●●●●

●●●

●●

0 20 40 60 80 100

−1.0

−0.5

0.00.5

1.0

time

y

RW1RW2

Figure 11: Simulated data and RW1 and RW2 fits.

82 / 113

Non-normal outcomes

For generalized linear models beyond the normal (e.g., binomial andPoisson), the temporal random effects are added in to the linearpredictor.

For example, for a temporal Poisson regression mode, with counts Ytarising from a population of size N, a possible model is:

Yt |θt ∼ind Poisson(Nθt ),

log θt = β0 + β1xt + εt + St

εt ∼ Independent ErrorsSt ∼ Temporal Errors

83 / 113

An Example

84 / 113

An illustrative Example

Cobb (1978) analyzes data collected between 1871 and 1970 on theannual volume of discharge from the Nile River at Aswan.

These data (with the response variable scaled) are plotted in Figure12.

It has been suggested by a number of authors, including Cobb (1978)and Balke (1993) that the mean volume of the River Nile experienceda significant decline in 1898. These data have been put to varioususes.

One question of interest is, which concerns regression, is what wasthe size of the decline in 1898?

Another objective is smoothing in which we wish to, over the 100-yearperiod, obtain a smoothed estimate of the mean of the process.

Finally, we may be interested in prediction of the level into future timeperiods.

85 / 113

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●●●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●●

0 20 40 60 80 100

0.00.2

0.40.6

0.81.0

time

y

Figure 12: Nile data (volume scaled).

86 / 113


We calculate the autocorrelations for the raw – these are plotted inthe left panel of Figure 13; the autocorrelations are significant for lagsup to 8–10.

We define an indicator variable that is 0 before 1898, and 1 after, andthen fit a linear model in this variable.

The autocorrelations of the residuals from this model are plotted inthe right hand panel and are now much smaller.

87 / 113

0 5 10 15

−0.2

0.00.2

0.40.6

0.81.0

Lag

ACF

0 5 10 15 20

−0.2

0.00.2

0.40.6

0.81.0

LagAC

F

Figure 13: Autocorrelation functions of raw data (left) and residuals fromlinear model with indicator in 1898 (right).

88 / 113


Figure 14 shows fits from three models:I A linear model in the indicator.I RW1 model (no covariate).I RW2 model (no covariate).

Both the RW1 and RW2 models show a decline around 1898.

There is greater smoothing under the RW2 models.

Figure 16 shows a smoothing spline fit to the data (again with nocovariate).

89 / 113

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●●●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●●

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

Time (years)

Nile

Volum

e (S

caled

))

RW1RW2x

Figure 14: Fitted values from Bayesian analyses.

90 / 113

Prior specification

Under the Bayesian model we need to put a prior on the smoothingvariance σ2

τ .

The response was scaled here to lie between 0 and 1 (the originalrange was huge), and we emphasize that the size of στ is relative tothe range of the response.

91 / 113

Prior specification

We used three different inverse gamma priors:I Non-Smooth: IGa(1,10000000) which leads to 2.5%, 50%,

97.5% quantiles for στ of 1646, 3798, 19874. So the variance isencouraged to be large. The posterior medians of στ are 403and 393, for the RW1 and RW2 models. The two non-smoothcurves were jittered so that both could be seen – they lie on topof each other, and go through the data.

I Smooth: IGa(1,0.0001) which leads to 2.5%, 50%, 97.5%quantiles for στ of 0.0052, 0.012, 0.063. So the variance isencouraged to be quite small. The posterior medians of στ are0.03 and 0.007, for the RW1 and RW2 models.

I Tiny Variance: IGa(100,0.000001) which leads to 2.5%, 50%,97.5% quantiles for στ of 9.1× 10−5,1.0× 10−4,1.1× 10−4. Sothe variance is encouraged to be very small. The posteriormedians of στ are 0.0001 and 0.0001, for the RW1 and RW2models. The RW1 fit is flat, and and the RW2 fit is linear in time.

92 / 113

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●●●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●●

0 20 40 60 80 100

0.00.2

0.40.6

0.81.0

Time (years)

Nile

Volum

e (Sc

aled)

)

RW1 Non−SmoothRW1 SmoothRW1 Tiny VarianceRW2 Non−SmoothRW2 SmoothRW2 Tiny Variance

Figure 15: Nile data with RW1 and RW2 fits under different priors.

93 / 113

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●●●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●●

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

Time (years)

Nile

Volum

e (S

caled

))

Figure 16: Smoothing spline fit.

94 / 113

Point Process Models

95 / 113

Point Process Models for Temporal Data

Consider a hypothetical infinite population, and suppose we observethis population over some period [0,A].

A binary event is of interest, and let ti , i = 1, . . . ,n, be the times atwhich we observe this event happening, so n is random.

Let N(t), t ≥ 0 be the number of observed events up to (andincluding) t .

We describe some simple models to describe this scenario, andsome sampling plans.

96 / 113


A (homogenous) Poisson process with intensity λ is a processN(t), t ≥ 0, taking values in 0,1,2, . . . . , such that

1. N(0) = 0 and if s < t , N(s) ≤ N(t).2. For a small time interval δt ,

Pr(N(t + δt) = y + z|N(t) = y) =

λδt + o(δt)3 if z = 11− λδt + o(δt) if z = 0o(δt) if z > 1

3. If s < t , then the number of events N(t)− N(s) in the interval(s, t ] is independent of the number of events in [0, s].

An inhomogenous Poisson process replaces λ by λ(t).

3the notation o(δt) means that the error (or remainder) is of lower order than δt ,i.e., limδt→0

o(δt)δt = 0.

97 / 113


Some observations on the Poisson process:I For any interval (s, t ], with 0 ≤ s < t ≤ A, the number of events

N(t)− N(s) is Poisson with mean∫ t

s λ(u)du.I There is no dependence on past events.I To simulate from a Poisson process on [0,A] we can

1. simulate the total number of events n, from Poisson(∫ A

0 λ(t)dt)

,2. simulate ti ∼iid f (t), for i = 1, . . . , n, where

f (t) =λ(t)∫ A

0 λ(u)du, 0 < t < A.

98 / 113


I The homogenous Poisson process has

f (t) = A−1, 0 < t < A,

i.e., uniform, and could be referred to as complete temporalrandomness.

I In this case, the times between events are independentexponentials with means 1/λ.

I This model is stationary.I For the inhomogenous Poisson process, there will (in

expectation) be ‘clumpiness’ of the points where λ(t) is high, butthe generative model has no interaction (or direct dependence)between points.

I This model is non-stationary.I Hence, Poisson process models are not (biologically) close to

any true model for infectious disease events. Or sociologicallyclose to any true model for interacting events.

I See Diggle (2014) for more details on point process models.99 / 113


Three types of data:1. Point process data in which the complete set of exact event times

are observed. We include in this category, thinned versions inwhich each point is observed with some constant probability (forexample, the collection system picks up 70% of events). In amarked point process the locations have an attribute attached tothem, which may be discrete or continuous. The simplestexample is a binary mark, such as case/non-case or crime/nocrime.

2. Temporally aggregate counts, rather than exact times.3. Point (geostatistical) data: collect data at a set of times,

tj , j = 1, . . . ,m. Of crucial importance (for some endeavors), ishow the sampling times are chosen. The difference betweenpoint process and geostatistical data is that locations forsampling are chosen in the latter (albeit, perhaps via somerandom mechanism), but in the former we simply get to seewhere events occur (or a thinned version).

100 / 113


Endeavors:I Mapping.I Spatial regression.I Assessment of clustering.I Cluster Detection.

Under geostatistical sampling, we must take care, as mapping mayde deceptive if the sampling depends on the likelihood of the events,e.g., if we oversample in areas where the intensity is high, but we actas if we have a representative sample, then an estimate will be adistortion of the true intensity.

101 / 113

A Simulation

Data were simulated from a Poisson process with rate shown inFigure 17.

Imagine an A = 60 month observation period with a constantpopulation size: the simulation we show below is consistent with apopulation of size 518,412 with probability of the event being 0.0005for each individual in the population.

The realized points (jittered) are shown in red.

Aggregate counts are shown in Figure 18.

A reduced sample, with thinning by 0.5, is shown in Figure 19.

Kernel density estimates from the full and reduced samples areshown in Figures 20 and 21.

102 / 113

0 10 20 30 40 50 60

02

46

8

Time

λ(t)

Figure 17: Event intensity and (jittered) realized temporal pattern.

103 / 113

Time

Freq

uenc

y

0 10 20 30 40 50 60

02

46

810

12

Figure 18: Aggregated counts.

104 / 113

0 10 20 30 40 50 60

02

46

8

Time

λ(t)

Figure 19: Event intensity and (jittered) realized thinned temporal pattern.

105 / 113

0 10 20 30 40 50 60

02

46

8

Time

Kern

el De

nsity

Esti

mat

e

Figure 20: Estimated event intensity and (jittered) realized temporal patternfrom which estimate was made.

106 / 113

0 10 20 30 40 50 60

02

46

8

Time

Kern

el De

nsity

Esti

mat

e

Figure 21: Estimated event intensity and (jittered) thinned realized temporalpattern from which estimate was made.

107 / 113


In Figure 22 we perform geostatistical type sampling4 in which weobserve the aggregate counts for 1 month, every 4 months.

A second version is performed in which we take samplingprobabilities that are strongly related to the underlying intensity, theprobabilities are shows in Figure 23.

The realized sample is shown in Figure 24; notice the tendency fortheReconstructing the underlying intensity from these data, withoutadjusting for the sampling, would lead to bias in the estimate (wewould overestimate).

4geostatistics usually refers to sampling at discrete times, but here we assumeobserving the number of events in a small interval

108 / 113


The issue in which is often referred to as preferential sampling, Diggleet al. (2010).

Let t = [t1, . . . , tm] represent the sampling times, y = [y1, . . . , ym] theresponses, and S = [S1, . . . ,Sm], the latent (temporal) random field,and assume the model:

yi = β0 + S(ti ) + εi ,

with εi being zero mean independent residuals with variance σ2ε ,

i = 1, . . . ,m.

Viewing the sampling times as random, non-preferential samplingrefers to the situation in which corresponds to

p(t ,S) = p(t)× p(S),

inequality corresponds to preferential sampling.

109 / 113

Time

Freq

uenc

y

0 10 20 30 40 50 60

02

46

810

12 SampledNot Sampled

Figure 22: Sampling every fourth time period.

110 / 113

0 10 20 30 40 50 60

0.0

0.1

0.2

0.3

0.4

Time

Sam

pling

Pro

babil

ities

Figure 23: Sampling probabilities for preferential sampling.

111 / 113

Time

Freq

uenc

y

0 10 20 30 40 50 60

02

46

810

12 SampledNot Sampled

Figure 24: Unequal sampling weights.

112 / 113

Conclusions

Take-home messages:I Ignoring dependence in a regression context can lead to

inappropriate standard errors (usually too small).I Dependence models can be based on discrete or continuous

time.I Confounding by time is a difficult problem; semi-parametric

spline models are used to control for time, but degrees offreedom choice is crucial to give appropriate adjustment.

I Need to worry about the sampling scheme.

113 / 113

References

Balke, N. (1993). Detecting level shifts in time series. Journal ofBusiness & Economic Statistics, 11(1), 81–92.

Bartlett, M. (1964). The spectral analysis of two-dimensional pointprocesses. Biometrika, 51, 299–311.

Bivand, R., Pebesma, E., and Gomez-Rubio, V. (2013). AppliedSpatial Data Analysis with R, 2nd Edition. Springer, New York.

Carstairs, V. (2000). Socio-economic factors at areal level and theirrelationship with health. In P. Elliott, J. C. Wakefield, N. G. Best,and D. J. Briggs, editors, Spatial Epidemiology: Methods andApplication, pages 51–67. Oxford University Press, Oxford.

Carstensen, B. (2007). Age–period–cohort models for the lexisdiagram. Statistics in Medicine, 26, 3018–3045.

Clayton, D. and Schifflers, E. (1987a). Models for temporal variationin cancer rates. I: age–period and age–cohort models. Statistics inmedicine, 6, 449–467.

Clayton, D. and Schifflers, E. (1987b). Models for temporal variationin cancer rates. II: age–period–cohort models. Statistics inmedicine, 6, 469–481.

113 / 113

Cobb, G. W. (1978). The problem of the Nile: conditional solution to achangepoint problem. Biometrika, 65, 243–251.

Diggle, P. J. (2014). Statistical Analysis of Spatial andSpatio-Temporal Point Patterns, Third Edition. CRC Press.

Diggle, P. J., Menezes, R., and Su, T.-L. (2010). Geostatisticalinference under preferential sampling. Journal of the RoyalStatistical Society: Series C (Applied Statistics), 59, 191–232.

Dominici, F., McDermott, and Hastie, T. (2004). Improvedsemi-parametric time series models of air pollution and mortality.Journal of the American Statistical Association, 468, 938–948.

Gardner, M. (1992). Childhood leukaemia around the sellafieldnuclear plant. In P. Elliott, J. Cuzick, D. English, and R. Stern,editors, Geographical and Environmental Epidemiology: Methodsfor Small-area Studies, pages 291–309, Oxford. Oxford UniversityPress.

Rothman, K. J. and Greenland, S. (1998). Modern Epidemiology,Second Edition. Lippincott-Raven.

Schott, J. (1997). Matrix Analysis for Statistics. John Wiley, New York.

Stein, M. (1999). Interpolation of Spatial Data: Some Theory forKriging. Springer.

113 / 113

Thomas, D. C. (2014). Statistical Methods in EnvironmentalEpidemiology . Oxford University Press, Oxford.

Wakefield, J. (2013). Bayesian and Frequentist Regression Methods.Springer, New York.

Wakefield, J. and Elliott, P. (1999). Issues in the statistical analysis ofsmall area health data. Statistics in Medicine, 18, 2377–2399.

Waller, L. and Gotway, C. (2004). Applied Spatial Statistics for PublicHealth Data. John Wiley and Sons.

113 / 113

Documents

2018 SISMID Module 14 Lecture 1: Introduction and Overviewfaculty.washington.edu/jonno/SISMIDmaterial/2018-SISMID-Lec1.pdf · 2018 SISMID Module 14 Lecture 1: Introduction and Overview