36
Spatial Statistics Toolbox 2.0 R. Kelley Pace 1 LREC Endowed Chair of Real Estate Department of Finance E.J. Ourso College of Business Administration Louisiana State University Baton Rouge, LA 70803-6308 OFF: (225)-578-6256, FAX: (225)-578-6095 [email protected], www.spatial-statistics.com February 15, 2003 1 I have many individuals and organizations to thank for supporting the toolbox. First, I would like to gratefully acknowledge the research support received from the National Science Foundation (BCS-0136193 and BCS-0136229). Note, Any opinions, findings and conclusions or recommendations expressed in this material are mine and do not necessarily reflect the views of the National Science Foundation (NSF). Second, I would like to thank James LeSage (www.spatial-econometrics.com) for his remarks and advice and Ron Barry at the University of Alaska for help with the previous version. Third, I would like to gratefully acknowledge the research support received from Louisiana State University. Finally, I would like to thank Ming-Long Lee, Darren Hayunga, and Baris Kazar for their help.

Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

Embed Size (px)

Citation preview

Page 1: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

Spatial Statistics Toolbox 20

R Kelley Pace1

LREC Endowed Chair of Real EstateDepartment of Finance

EJ Ourso College of Business AdministrationLouisiana State University

Baton Rouge LA 70803-6308OFF (225)-578-6256 FAX (225)-578-6095kelleypaceam wwwspatial-statisticscom

February 15 2003

1I have many individuals and organizations to thank for supporting the toolboxFirst I would like to gratefully acknowledge the research support received from theNational Science Foundation (BCS-0136193 and BCS-0136229) Note Any opinionsfindings and conclusions or recommendations expressed in this material are mineand do not necessarily reflect the views of the National Science Foundation (NSF)Second I would like to thank James LeSage (wwwspatial-econometricscom) for hisremarks and advice and Ron Barry at the University of Alaska for help with theprevious version Third I would like to gratefully acknowledge the research supportreceived from Louisiana State University Finally I would like to thank Ming-LongLee Darren Hayunga and Baris Kazar for their help

Contents

1 Why the Toolbox Exists 5

2 Using the Toolbox 821 Hardware and Software Requirements 822 Installation 823 Help and documentation 1024 Known Limitations 1125 Tips on Using the Toolbox 1126 Included Examples 1327 Included Datasets 1428 Included Manuscripts 14

3 A Brief Selected Tour of the Toolbox 15

4 References 35

2

List of Tables

31 SAR Estimation Results using Chebyshev lndet approxima-tion and likelihood dominance inference 22

32 SAR Estimation results using exact lndet 2233 Estimates on Election Data 2434 Signed Root Deviances using Election Data 2535 Timing for operations to Election data (n=3107) 2636 Distribution of local estimates 3137 Times for Different Methods for 57647 Observations 3238 Likelihoods across ρ for Doubly Stochastic Scaling 3339 Times for Optimizing the Likelihood over ρ for Both Scalings 33310 Likelihoods Across Doubly Stochastic and Regular Scalings 34311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-

tic Scaling 34

3

List of Figures

31 Connections among Counties via Delaunay 1632 Connections among Counties via Eight Nearest Neighbors 1733 Plot of Non-zeros of Delaunay Weight Matrix 1834 Plot of Non-zeros of Permuted Delaunay Weight Matrix 1935 Plot of Exact log-determinant with Chebyshev Approxima-

tion and Taylor Bounds 2036 Plot of Exact log-determinant with Monte Carlo Approxima-

tion and Limits 2137 SAR Profile Likelihoods by Model 2338 Plot of Fringe prediction error versus subsample size 2739 Plot of prediction error on center of area versus subsample size 28310 Spatial dependence parameter estimate versus subsample size 29311 Map of influence of homeownership on voting 30

4

Chapter 1

Why the Toolbox Exists

Individual data arise at a time and a place Randomization can destroy andaggregation can obscure spatial and temporal information but the originaldata points potentially exhibit spatiotemporal dependence Often simplemodels fitted to these data will produce spatially temporally or spatiotem-porally correlated errors provided reality is more complex than the modelIgnoring the spatial temporal or spatiotemporal dependence among errorsresults in inefficient parameter estimation biased inference and ignores in-formation that can greatly improve prediction accuracy In addition if thedata generating process is an autoregressive one ignoring the dependencewill lead to biased estimates and inference

Historically spatial statistics software floundered with problems involv-ing even thousands of observations For example Li (1995) required 8515seconds to compute a 2500 observation spatial autoregression using an IBMRS6000 Model 550 workstation The culprit for the difficulty lies in the max-imum likelihood estimatorrsquos need for the determinant of the n by n matrixof the covariances among the spatially scattered observations

The conventional computational approach relies upon eigenvalues (Ord(1975)) Even with faster computers the calculation of eigenvalues requiressubstantial time and memory For example on a 1700 Athlon it requires2475 minutes to compute the eigenvalues of the spatial weight matrix based

5

6 CHAPTER 1 WHY THE TOOLBOX EXISTS

on 30 nearest neighbors The resulting matrix takes 77 megabytes of storageas well

However many problems of practical importance generate large spatialdata sets Obvious examples include census data (over 200000 block groupsfor the US) and housing sales (many millions sold per year) The SpatialStatistics Toolbox addresses the need to quickly estimate large problems Incontrast to the eigenvalue approach by focusing upon direct computation ofdeterminants and using sparsity the same operation takes 338 seconds in thetoolbox Using a Chebyshev approximation reduces the time to under 02seconds As the eigenvalue computations rise at the cube of n while the log-determinant functions in the toolbox rise with n or n ln(n) large problemsfurther increase the performance of the toolbox relative to conventional ap-proaches

The speed of Spatial Statistics Toolbox 20 permits users to explore alter-native specifications spatial and aspatial in a timely fashion For exampleone can estimate local spatial autoregressions (spatial autoregressive local es-timation or SALE) as in Pace and LeSage (forthcoming) In the SALE exam-ple provided the toolbox was able to estimate a sequence of over 400 spatialautoregressions for around each of 3107 points in under four minutes

The Spatial Statistics Toolbox 20 conserves on memory as well It is notdifficult to estimate a spatial autoregression with over one million observa-tions In fact the toolbox provides an example under the dataset directorywhereby a one million observation spatial autoregression is estimated in justunder 20 seconds It took 13063 seconds to find the weight matrix 6024seconds to simulate the dependent variable and 1942 seconds to estimatethe autoregression

Relative to the previous incarnation the Spatial Statistics Toolbox 20introduces multidimensional and spatiotemporal weight matrices new scal-ings of the weight matrices (doubly stochastic) faster computation of exactlog-determinants (actually interpolated from exact computations of a smoothfunction) a couple of approximations to log-determinants and some new

7

computationally and theoretically interesting models such as the aforemen-tioned SALE and the matrix exponential spatial specification (MESS)

Chapter 2

Using the Toolbox

21 Hardware and Software Requirements

The toolbox has been developed under 65 and tested under 65 and 61across W2K and Windows ME However some routines run faster under65 than under 61 The total installation takes around 15 megabytes Theroutines have been tested on PC compatibles The routines should run onother platforms but have not been tested on non-PC compatibles

22 Installation

For users who can extract files from zip archives follow the instructions foryour product (eg Winzip) and extract the files into the drive in which youwish to install the toolbox The installation program will create the followingdirectory structure in whichever drive you choose

drive_letterspace

+---articles

+---datasets

+---big_one

+---election

8

22 INSTALLATION 9

+---housing

+---space_time

+---documentation

+---examples

+---CAR

+---CAR_SIM

+---CHEBYSHEV

+---CHEBYSHEV_SEQUENCE

+---CLOSEST_NEIGHBOR

+---DELAUNAY2

+---DOUBLY

+---LNDET_INTERP

+---LNDET_INTERP_SEQUENCE

+---LNDET_MONTECARLO

+---MESS_AR

+---MESS_CAR

+---MESS_SIM

+---MIXED

+---MULTIVARIATE

+---NEAREST_NEIGHBORS

+---OLS

+---SALE

+---SAR

+---SAR_SIM

+---SPACE_TIME

+---fmex_code

+---functions

+---fmex_code

To see whether the installation has succeeded change the directory inMatlab to one of the supplied examples and type run x_ m For ex-

10 CHAPTER 2 USING THE TOOLBOX

ample go to the examplescar subdirectory and type run x_car2_ga1The use of in this context serves as a wildcard or placeholder for the in-tervening characters This should cause the example script x_car2_ga1m torun

The multidimensional weight matrix routines require the installation ofTSTOOL Go to wwwphysik3gwdgdetstool to find this useful packageand follow the instructions to install it in Matlab If you have Matlab 65 youcan easily add the relevant paths to the mex functions by going to the File

menu selecting Set Path under the applet selecting Add Folder and addthe paths so Matlab can find the functions On my machine I added

opentstooltstoolboxmexdll

opentstooltstoolboxmex

opentstooltstoolboxutils

tstoolopentstooltstoolboxgui

tstoolopentstooltstoolbox

23 Help and documentation

All the example scripts should follow the form x_ m (eg x_car2_ga1mx_sar2_ga1m) Functions follow the form f m (eg fsar2m fols2m)Matlab matrix files (which may include multiple matrices) have the form mat

If you wish to access the functions in other directories you can define apath to space and spacefunctions using the set path commandunder the file menu in Matlab 65 where refers to the drive where thespatial statistics toobox resides and provided space is the name of the direc-tory you chose This is the same procedure outlined in installing TSTOOLOn my machine I added

space

spacefunctions

24 KNOWN LIMITATIONS 11

As in previous versions of Matlab users can employ the addpath commandto add these directories to the search path

Both the functions and examples are well-documented internally Onecan use the help functions of Matlab in the usual fashion when the toolbox ison the search path or the functions are in the current directory If the toolboxis on the search path a user can type help space and see the functionscontained in the Spatial Statistics Toolbox 2 (provided space is the name ofthe directory you chose) If the toolbox is on the search path or the functionlies in the current directory a user can obtain help on the function by typinghelp function_name or doc function_name

The examples provide the best means of understanding the toolbox functions Eachsubdirectory under the examples subdirectory contains all the files neededfor that particular example (except the multivariate subdirectory which re-quires installation of the TSTOOL package) Users can substitute their datafor the example data and see how the functions perform for their problemNote a few of these functions take several minutes to run (notably the SALEexamples) Most however are quite fast

24 Known Limitations

None All are unknown

25 Tips on Using the Toolbox

Typical sessions with the toolbox proceed in four steps First import thedata into Matlab If the file is fixed-format or tab-delimited ASCII the com-mand load nameextension (whatever that nameextension may be) willload the filename contents into memory into a Matlab variable name Savingthis will convert it into a matlab file (eg save name name will save variablename into matrix name stored in namemat Failure to specify both names will

12 CHAPTER 2 USING THE TOOLBOX

result in saving all defined variables into one file The data would includethe dependent variable the independent variables and the locational coor-dinates For example suppose the user has the dependent variable in the textfile ytxt Issuing the command load ytxt results in a Matlab variable yIssuing the command save y y results in saving ymat in the directory

Second create a spatial weight matrix Users can choose weight matricesbased upon nearest neighbors (symmetric or asymmetric) multidimensionalsymmetric neighbors spatiotemporal neighbors (asymmetric) and Delaunaytriangles (symmetric) In almost all cases one must make sure each locationis unique One may need to add slight amounts of random noise to thelocational coordinates to meet this restriction (some of the latest versions ofMatlab do this automatically mdash do not dither the coordinates in this case)Note some estimators only use symmetric matrices You can specify thenumber of neighbors used and their relative weightings

Note the Delaunay spatial weight matrix leads to a concentration matrixor a variance-covariance matrix that depends upon only one-parameter (αthe autoregressive parameter) In contrast the nearest neighbor concentra-tion matrices or variance-covariance matrices depend upon three parameters(α the autoregressive parameter m the number of neighbors and ρ whichgoverns the rate weights decline with the order of the neighbors with the clos-est neighbor given the highest weighting the second closest given a lowerweighting and so forth) Three parameters should make this specificationsufficiently flexible for many purposes

Third one computes the log-determinants for a grid of autoregressive pa-rameters (prespecified by the routine as a default or specified by the user asan option) Determinant computations proceed faster for symmetric matri-ces You must choose the appropriate log-determinant routines for the typeof spatial weight matrix you have specified Computing the log-determinantsis the slower than estimation but only needs to be done when changing thespatial weight matrix For example one can use the same weight matrix andlog-determinant files when exploring transformations or specifications of the

26 INCLUDED EXAMPLES 13

dependent and independent variables (for the same observations)Fourth pick a statistical routine to run given the data matrices the spatial

weight matrix and the log-determinant vector One can choose among con-ditional autoregressions (CAR) simultaneous autoregressions (SAR) matrixexponential spatial specifications (MESS) mixed regressive spatially autore-gressive estimators (which include pure autoregressive models and spatiallylagged independent variable models as special cases) and OLS In additionone can explore multivariate spatiotemporal and multivariate estimationThese routines require little time to run One can change models weight-ings and transformations and reestimate in the vast majority of cases withoutrerunning the spatial weight matrix or log-determinant routines (you mayneed to add another simple Jacobian term when performing weighting ortransformations of the dependent variables) This aids interactive data explo-ration

Fifth these procedures provide a wealth of information Many of theseroutines yield the profile likelihood in the autoregressive parameter for eachsubmodel (corresponding to the deletion of individual variables or the spatialterm) All of the inference even for the OLS routine uses likelihood ratiostatistics in the form of signed root deviances This is just the square root oftwice the difference in likelihoods given the sign of the parameter estimateIt has a t-like interpretation (Chen and Jennrich (1996)) The use of signedroot deviances (SRDs) facilitates comparisons among different models

26 Included Examples

The Spatial Statistics Toolbox comes with many examples These are foundin the subdirectories under spatial_toolbox_2examples To run theexamples change the directory in Matlab into the many subdirectories thatillustrate individual routines Look at the documentation in each example di-rectory for more detail Almost all of the specific models have examples Inaddition the simulation routine examples serve as minor Monte Carlo stud-

14 CHAPTER 2 USING THE TOOLBOX

ies which also help verify the functioning of the estimators The examplesuse the 3107 observation dataset from the Pace and Barry (1997) GeographicalAnalysis article

27 Included Datasets

The spatial_toolbox_2datasets subdirectory contains subdirectorieswith individual data sets in Matlab file formats as well as their documentationThe data sets include example programs and output Note due to the manyimprovements incorporated into the Spatial Statistics Toolbox over time therunning times have greatly improved over those in the articles

Hopefully these data sets should provide a good starting point for ex-ploring applications of spatial statistics

28 Included Manuscripts

In the manuscript subdirectory we provide pdf versions of the GeographicalAnalysis 1997 and 2000 articles I would like to thank the publishers (OhioState Press and Elsevier) for having given us copyright permission to dis-tribute these works One can also go to wwwspatial-statisticscom to accesssome other articles (eg the Linear Algebra and its Applications article whichproposed the Monte Carlo log-determinant estimator)

Chapter 3

A Brief Selected Tour of theToolbox

The weight matrix specifies the dependence among observations One formof weight matrix (Delaunay) uses the notion of contiguity to specify depen-dence as depicted in Figure 31

15

16 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

Delaunay Nearest Neighbor Graph

Figure 31 Connections among Counties via Delaunay

Note the somewhat strange behavior of connections to outlying observa-tions in Figure 31 This arises to the geometric nature of contiguity Usingnearest neighbors based upon some metric can avoid this as shown in Fig-ure 32

17

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

8 Nearest Neighbor Graph

Figure 32 Connections among Counties via Eight Nearest Neighbors

Using only nearby observations implies that the weight matrix has mainzeros or is sparse as shown in Figure 33

18 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Original Nonminuszero Weight Pattern for Delaunay

Figure 33 Plot of Non-zeros of Delaunay Weight Matrix

This becomes even more apparent when reordering the observations asin Figure 34

19

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Permuted Nonminuszero Weight Pattern for Delaunay

Figure 34 Plot of Non-zeros of Permuted Delaunay Weight Matrix

Sparsity as well as finding an appropriate ordering are key in quicklycomputing the log-determinants used in maximum likelihood The toolboxhas functions for exact computation of the log-determinants (actually inter-polation of exact computations at various points) However users can selectapproximations as well which depend only on sparsity and not upon order-ings The quadratic Chebyshev is the fastest and most approximate technique(Figure 35)

20 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus4000

minus3500

minus3000

minus2500

minus2000

minus1500

minus1000

minus500

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Chebyshev approximation Taylor bounds

Taylor Lower minusChebyshev oTaylor Upper +Exact

Figure 35 Plot of Exact log-determinant with Chebyshev Approximationand Taylor Bounds

The Chebyshev approximation appears quite good for positive moderatevalues of the dependence parameter but could use improvement for materi-ally negative values of the spatial dependence parameter Fortunately suchnegative values seem rare in practice

The Monte Carlo log-determinant estimator is quite fast but more accu-rate (Figure 36)

21

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus500

minus450

minus400

minus350

minus300

minus250

minus200

minus150

minus100

minus50

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Monte Carlo approximation and confidence limits

Lower ConfidenceMonte CarloUpper ConfidenceExact

Figure 36 Plot of Exact log-determinant with Monte Carlo Approximationand Limits

To see the effects of exact versus approximate log-determinant computa-tions consider Tables 31 and 32 using the 3107 county election data Theestimated autoregressive parameter is only off by 001 from using the approx-imation The approximate method also uses likelihood dominance inferencewhich results in a lower bound to the signed root deviances As shown bythe tables the likelihood dominance SRDs are smaller in magnitude thanthe exact SRDs However they can still document statistical significance formany variables and thus can prove useful in many circumstances

22 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07784 -311227 00000Education 02696 93438 00000Home Ownership 04530 262661 00000Income 00071 00035 09972Intercept 05443 78773 00000Alpha 07250 327719 00000

Table 31 SAR Estimation Results using Chebyshev lndet approximationand likelihood dominance inference

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07806 -321748 00000Education 02746 119438 00000Home Ownership 04525 270073 00000Income 00047 02187 08269Intercept 05528 94908 00000Alpha 07150 348648 00000

Table 32 SAR Estimation results using exact lndet

23

Some of the routines not only yield the maximum of the likelihood func-tion but also profile likelihoods in the dependence parameter α by model asshown in Figure 37

0 01 02 03 04 05 06 07 08 09 1minus7000

minus6800

minus6600

minus6400

minus6200

minus6000

minus5800

minus5600Profile likelihoods vs α for global model and deleteminus1 submodels

Dependence parameter (α)

Pro

file

logminus

likel

ihoo

d

Global likelihoodVoting PopEducationHome OwnershipIncomeIntercept

Figure 37 SAR Profile Likelihoods by Model

The toolbox includes SAR CAR and MESS error models as well asMESS closest neighbor and MIX autoregressive models as shown in Ta-ble 33 and Table 34

24 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables b OLS b closest b MESS b Mix

Voting Pop -08464 -07489 -07693 -07298Education 05167 02899 01941 01818Home Ownership 04291 04457 04832 04580Income -01439 -00332 00423 00427Lag Voting Pop 00000 01878 04186 04616Lag Education 00000 00975 01205 00450Lag Home Ownership 00000 -01569 -03249 -03299Lag Income 00000 -01079 -01802 -01410Intercept 09814 07495 05636 04205α 00000 03352 14628 06550

Table 33 Estimates on Election Data

The closest neighbor approach is intermediate to a non-spatial approach(OLS) and a full spatial approach (MESS) or the approximate mixed routineNote the close agreement between MESS and the mixed routine Note OLSin this case uses the spatial averages of the basic independent variables asadditional independent variables

None of these operations take long for the election data

25

Variables b OLS b closest b MESS b Mix

Voting Pop -347211 -305020 -288899 -294018Education 308928 134922 76654 77169Home Ownership 230066 254083 268836 273515Income -72373 -15742 17478 18952Lag Voting Pop 00000 77600 107656 125507Lag Education 00000 45090 39785 15705Lag Home Ownership 00000 -94245 -111259 -121200Lag Income 00000 -51399 -55551 -46603Intercept 202782 160533 107480 84626α 00000 242762 315948 337097

Table 34 Signed Root Deviances using Election Data

26 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Operation Timings in seconds

OLS 00150Closest AR 00470MESS 00940Exact Log-det 08750Mix 00470Doubly Stochastic Scaling 02660

Table 35 Timing for operations to Election data (n=3107)

In addition to global models the toolbox has spatial autoregressive localestimation (SALE) The user chooses the bandwidth (subsample size) by ex-amining cross-validation error at the fringe observations as in Figure 39 oreach observation as in Figure 38

27

0 50 100 150 200 250 300 350 400 450 5000055

006

0065

007

0075Smoothed SALE Recursive Residuals of Fringe Observations

Number of Local Observations

Abs

olut

e E

rror

Figure 38 Plot of Fringe prediction error versus subsample size

28 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 50 100 150 200 250 300 350 400 450 500005

0052

0054

0056

0058

006

0062

0064Smoothed SALE Initial Holdout Residuals

Number of Local Observations

Abs

olut

e E

rror

Figure 39 Plot of prediction error on center of area versus subsample size

Usually there is spatial dependence even in small subsamples as shownin Figure 310

29

0 50 100 150 200 250 300 350 400 450 5000

01

02

03

04

05

06

07SALE Autoregressive Parameter Estimates

Number of Local Observations

Med

ian

Aut

oreg

ress

ive

Par

amet

er E

stim

ates

Figure 310 Spatial dependence parameter estimate versus subsample size

Local estimation leads to spatially varying parameter estimates such asthose shown in Figure 311

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 2: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

Contents

1 Why the Toolbox Exists 5

2 Using the Toolbox 821 Hardware and Software Requirements 822 Installation 823 Help and documentation 1024 Known Limitations 1125 Tips on Using the Toolbox 1126 Included Examples 1327 Included Datasets 1428 Included Manuscripts 14

3 A Brief Selected Tour of the Toolbox 15

4 References 35

2

List of Tables

31 SAR Estimation Results using Chebyshev lndet approxima-tion and likelihood dominance inference 22

32 SAR Estimation results using exact lndet 2233 Estimates on Election Data 2434 Signed Root Deviances using Election Data 2535 Timing for operations to Election data (n=3107) 2636 Distribution of local estimates 3137 Times for Different Methods for 57647 Observations 3238 Likelihoods across ρ for Doubly Stochastic Scaling 3339 Times for Optimizing the Likelihood over ρ for Both Scalings 33310 Likelihoods Across Doubly Stochastic and Regular Scalings 34311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-

tic Scaling 34

3

List of Figures

31 Connections among Counties via Delaunay 1632 Connections among Counties via Eight Nearest Neighbors 1733 Plot of Non-zeros of Delaunay Weight Matrix 1834 Plot of Non-zeros of Permuted Delaunay Weight Matrix 1935 Plot of Exact log-determinant with Chebyshev Approxima-

tion and Taylor Bounds 2036 Plot of Exact log-determinant with Monte Carlo Approxima-

tion and Limits 2137 SAR Profile Likelihoods by Model 2338 Plot of Fringe prediction error versus subsample size 2739 Plot of prediction error on center of area versus subsample size 28310 Spatial dependence parameter estimate versus subsample size 29311 Map of influence of homeownership on voting 30

4

Chapter 1

Why the Toolbox Exists

Individual data arise at a time and a place Randomization can destroy andaggregation can obscure spatial and temporal information but the originaldata points potentially exhibit spatiotemporal dependence Often simplemodels fitted to these data will produce spatially temporally or spatiotem-porally correlated errors provided reality is more complex than the modelIgnoring the spatial temporal or spatiotemporal dependence among errorsresults in inefficient parameter estimation biased inference and ignores in-formation that can greatly improve prediction accuracy In addition if thedata generating process is an autoregressive one ignoring the dependencewill lead to biased estimates and inference

Historically spatial statistics software floundered with problems involv-ing even thousands of observations For example Li (1995) required 8515seconds to compute a 2500 observation spatial autoregression using an IBMRS6000 Model 550 workstation The culprit for the difficulty lies in the max-imum likelihood estimatorrsquos need for the determinant of the n by n matrixof the covariances among the spatially scattered observations

The conventional computational approach relies upon eigenvalues (Ord(1975)) Even with faster computers the calculation of eigenvalues requiressubstantial time and memory For example on a 1700 Athlon it requires2475 minutes to compute the eigenvalues of the spatial weight matrix based

5

6 CHAPTER 1 WHY THE TOOLBOX EXISTS

on 30 nearest neighbors The resulting matrix takes 77 megabytes of storageas well

However many problems of practical importance generate large spatialdata sets Obvious examples include census data (over 200000 block groupsfor the US) and housing sales (many millions sold per year) The SpatialStatistics Toolbox addresses the need to quickly estimate large problems Incontrast to the eigenvalue approach by focusing upon direct computation ofdeterminants and using sparsity the same operation takes 338 seconds in thetoolbox Using a Chebyshev approximation reduces the time to under 02seconds As the eigenvalue computations rise at the cube of n while the log-determinant functions in the toolbox rise with n or n ln(n) large problemsfurther increase the performance of the toolbox relative to conventional ap-proaches

The speed of Spatial Statistics Toolbox 20 permits users to explore alter-native specifications spatial and aspatial in a timely fashion For exampleone can estimate local spatial autoregressions (spatial autoregressive local es-timation or SALE) as in Pace and LeSage (forthcoming) In the SALE exam-ple provided the toolbox was able to estimate a sequence of over 400 spatialautoregressions for around each of 3107 points in under four minutes

The Spatial Statistics Toolbox 20 conserves on memory as well It is notdifficult to estimate a spatial autoregression with over one million observa-tions In fact the toolbox provides an example under the dataset directorywhereby a one million observation spatial autoregression is estimated in justunder 20 seconds It took 13063 seconds to find the weight matrix 6024seconds to simulate the dependent variable and 1942 seconds to estimatethe autoregression

Relative to the previous incarnation the Spatial Statistics Toolbox 20introduces multidimensional and spatiotemporal weight matrices new scal-ings of the weight matrices (doubly stochastic) faster computation of exactlog-determinants (actually interpolated from exact computations of a smoothfunction) a couple of approximations to log-determinants and some new

7

computationally and theoretically interesting models such as the aforemen-tioned SALE and the matrix exponential spatial specification (MESS)

Chapter 2

Using the Toolbox

21 Hardware and Software Requirements

The toolbox has been developed under 65 and tested under 65 and 61across W2K and Windows ME However some routines run faster under65 than under 61 The total installation takes around 15 megabytes Theroutines have been tested on PC compatibles The routines should run onother platforms but have not been tested on non-PC compatibles

22 Installation

For users who can extract files from zip archives follow the instructions foryour product (eg Winzip) and extract the files into the drive in which youwish to install the toolbox The installation program will create the followingdirectory structure in whichever drive you choose

drive_letterspace

+---articles

+---datasets

+---big_one

+---election

8

22 INSTALLATION 9

+---housing

+---space_time

+---documentation

+---examples

+---CAR

+---CAR_SIM

+---CHEBYSHEV

+---CHEBYSHEV_SEQUENCE

+---CLOSEST_NEIGHBOR

+---DELAUNAY2

+---DOUBLY

+---LNDET_INTERP

+---LNDET_INTERP_SEQUENCE

+---LNDET_MONTECARLO

+---MESS_AR

+---MESS_CAR

+---MESS_SIM

+---MIXED

+---MULTIVARIATE

+---NEAREST_NEIGHBORS

+---OLS

+---SALE

+---SAR

+---SAR_SIM

+---SPACE_TIME

+---fmex_code

+---functions

+---fmex_code

To see whether the installation has succeeded change the directory inMatlab to one of the supplied examples and type run x_ m For ex-

10 CHAPTER 2 USING THE TOOLBOX

ample go to the examplescar subdirectory and type run x_car2_ga1The use of in this context serves as a wildcard or placeholder for the in-tervening characters This should cause the example script x_car2_ga1m torun

The multidimensional weight matrix routines require the installation ofTSTOOL Go to wwwphysik3gwdgdetstool to find this useful packageand follow the instructions to install it in Matlab If you have Matlab 65 youcan easily add the relevant paths to the mex functions by going to the File

menu selecting Set Path under the applet selecting Add Folder and addthe paths so Matlab can find the functions On my machine I added

opentstooltstoolboxmexdll

opentstooltstoolboxmex

opentstooltstoolboxutils

tstoolopentstooltstoolboxgui

tstoolopentstooltstoolbox

23 Help and documentation

All the example scripts should follow the form x_ m (eg x_car2_ga1mx_sar2_ga1m) Functions follow the form f m (eg fsar2m fols2m)Matlab matrix files (which may include multiple matrices) have the form mat

If you wish to access the functions in other directories you can define apath to space and spacefunctions using the set path commandunder the file menu in Matlab 65 where refers to the drive where thespatial statistics toobox resides and provided space is the name of the direc-tory you chose This is the same procedure outlined in installing TSTOOLOn my machine I added

space

spacefunctions

24 KNOWN LIMITATIONS 11

As in previous versions of Matlab users can employ the addpath commandto add these directories to the search path

Both the functions and examples are well-documented internally Onecan use the help functions of Matlab in the usual fashion when the toolbox ison the search path or the functions are in the current directory If the toolboxis on the search path a user can type help space and see the functionscontained in the Spatial Statistics Toolbox 2 (provided space is the name ofthe directory you chose) If the toolbox is on the search path or the functionlies in the current directory a user can obtain help on the function by typinghelp function_name or doc function_name

The examples provide the best means of understanding the toolbox functions Eachsubdirectory under the examples subdirectory contains all the files neededfor that particular example (except the multivariate subdirectory which re-quires installation of the TSTOOL package) Users can substitute their datafor the example data and see how the functions perform for their problemNote a few of these functions take several minutes to run (notably the SALEexamples) Most however are quite fast

24 Known Limitations

None All are unknown

25 Tips on Using the Toolbox

Typical sessions with the toolbox proceed in four steps First import thedata into Matlab If the file is fixed-format or tab-delimited ASCII the com-mand load nameextension (whatever that nameextension may be) willload the filename contents into memory into a Matlab variable name Savingthis will convert it into a matlab file (eg save name name will save variablename into matrix name stored in namemat Failure to specify both names will

12 CHAPTER 2 USING THE TOOLBOX

result in saving all defined variables into one file The data would includethe dependent variable the independent variables and the locational coor-dinates For example suppose the user has the dependent variable in the textfile ytxt Issuing the command load ytxt results in a Matlab variable yIssuing the command save y y results in saving ymat in the directory

Second create a spatial weight matrix Users can choose weight matricesbased upon nearest neighbors (symmetric or asymmetric) multidimensionalsymmetric neighbors spatiotemporal neighbors (asymmetric) and Delaunaytriangles (symmetric) In almost all cases one must make sure each locationis unique One may need to add slight amounts of random noise to thelocational coordinates to meet this restriction (some of the latest versions ofMatlab do this automatically mdash do not dither the coordinates in this case)Note some estimators only use symmetric matrices You can specify thenumber of neighbors used and their relative weightings

Note the Delaunay spatial weight matrix leads to a concentration matrixor a variance-covariance matrix that depends upon only one-parameter (αthe autoregressive parameter) In contrast the nearest neighbor concentra-tion matrices or variance-covariance matrices depend upon three parameters(α the autoregressive parameter m the number of neighbors and ρ whichgoverns the rate weights decline with the order of the neighbors with the clos-est neighbor given the highest weighting the second closest given a lowerweighting and so forth) Three parameters should make this specificationsufficiently flexible for many purposes

Third one computes the log-determinants for a grid of autoregressive pa-rameters (prespecified by the routine as a default or specified by the user asan option) Determinant computations proceed faster for symmetric matri-ces You must choose the appropriate log-determinant routines for the typeof spatial weight matrix you have specified Computing the log-determinantsis the slower than estimation but only needs to be done when changing thespatial weight matrix For example one can use the same weight matrix andlog-determinant files when exploring transformations or specifications of the

26 INCLUDED EXAMPLES 13

dependent and independent variables (for the same observations)Fourth pick a statistical routine to run given the data matrices the spatial

weight matrix and the log-determinant vector One can choose among con-ditional autoregressions (CAR) simultaneous autoregressions (SAR) matrixexponential spatial specifications (MESS) mixed regressive spatially autore-gressive estimators (which include pure autoregressive models and spatiallylagged independent variable models as special cases) and OLS In additionone can explore multivariate spatiotemporal and multivariate estimationThese routines require little time to run One can change models weight-ings and transformations and reestimate in the vast majority of cases withoutrerunning the spatial weight matrix or log-determinant routines (you mayneed to add another simple Jacobian term when performing weighting ortransformations of the dependent variables) This aids interactive data explo-ration

Fifth these procedures provide a wealth of information Many of theseroutines yield the profile likelihood in the autoregressive parameter for eachsubmodel (corresponding to the deletion of individual variables or the spatialterm) All of the inference even for the OLS routine uses likelihood ratiostatistics in the form of signed root deviances This is just the square root oftwice the difference in likelihoods given the sign of the parameter estimateIt has a t-like interpretation (Chen and Jennrich (1996)) The use of signedroot deviances (SRDs) facilitates comparisons among different models

26 Included Examples

The Spatial Statistics Toolbox comes with many examples These are foundin the subdirectories under spatial_toolbox_2examples To run theexamples change the directory in Matlab into the many subdirectories thatillustrate individual routines Look at the documentation in each example di-rectory for more detail Almost all of the specific models have examples Inaddition the simulation routine examples serve as minor Monte Carlo stud-

14 CHAPTER 2 USING THE TOOLBOX

ies which also help verify the functioning of the estimators The examplesuse the 3107 observation dataset from the Pace and Barry (1997) GeographicalAnalysis article

27 Included Datasets

The spatial_toolbox_2datasets subdirectory contains subdirectorieswith individual data sets in Matlab file formats as well as their documentationThe data sets include example programs and output Note due to the manyimprovements incorporated into the Spatial Statistics Toolbox over time therunning times have greatly improved over those in the articles

Hopefully these data sets should provide a good starting point for ex-ploring applications of spatial statistics

28 Included Manuscripts

In the manuscript subdirectory we provide pdf versions of the GeographicalAnalysis 1997 and 2000 articles I would like to thank the publishers (OhioState Press and Elsevier) for having given us copyright permission to dis-tribute these works One can also go to wwwspatial-statisticscom to accesssome other articles (eg the Linear Algebra and its Applications article whichproposed the Monte Carlo log-determinant estimator)

Chapter 3

A Brief Selected Tour of theToolbox

The weight matrix specifies the dependence among observations One formof weight matrix (Delaunay) uses the notion of contiguity to specify depen-dence as depicted in Figure 31

15

16 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

Delaunay Nearest Neighbor Graph

Figure 31 Connections among Counties via Delaunay

Note the somewhat strange behavior of connections to outlying observa-tions in Figure 31 This arises to the geometric nature of contiguity Usingnearest neighbors based upon some metric can avoid this as shown in Fig-ure 32

17

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

8 Nearest Neighbor Graph

Figure 32 Connections among Counties via Eight Nearest Neighbors

Using only nearby observations implies that the weight matrix has mainzeros or is sparse as shown in Figure 33

18 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Original Nonminuszero Weight Pattern for Delaunay

Figure 33 Plot of Non-zeros of Delaunay Weight Matrix

This becomes even more apparent when reordering the observations asin Figure 34

19

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Permuted Nonminuszero Weight Pattern for Delaunay

Figure 34 Plot of Non-zeros of Permuted Delaunay Weight Matrix

Sparsity as well as finding an appropriate ordering are key in quicklycomputing the log-determinants used in maximum likelihood The toolboxhas functions for exact computation of the log-determinants (actually inter-polation of exact computations at various points) However users can selectapproximations as well which depend only on sparsity and not upon order-ings The quadratic Chebyshev is the fastest and most approximate technique(Figure 35)

20 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus4000

minus3500

minus3000

minus2500

minus2000

minus1500

minus1000

minus500

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Chebyshev approximation Taylor bounds

Taylor Lower minusChebyshev oTaylor Upper +Exact

Figure 35 Plot of Exact log-determinant with Chebyshev Approximationand Taylor Bounds

The Chebyshev approximation appears quite good for positive moderatevalues of the dependence parameter but could use improvement for materi-ally negative values of the spatial dependence parameter Fortunately suchnegative values seem rare in practice

The Monte Carlo log-determinant estimator is quite fast but more accu-rate (Figure 36)

21

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus500

minus450

minus400

minus350

minus300

minus250

minus200

minus150

minus100

minus50

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Monte Carlo approximation and confidence limits

Lower ConfidenceMonte CarloUpper ConfidenceExact

Figure 36 Plot of Exact log-determinant with Monte Carlo Approximationand Limits

To see the effects of exact versus approximate log-determinant computa-tions consider Tables 31 and 32 using the 3107 county election data Theestimated autoregressive parameter is only off by 001 from using the approx-imation The approximate method also uses likelihood dominance inferencewhich results in a lower bound to the signed root deviances As shown bythe tables the likelihood dominance SRDs are smaller in magnitude thanthe exact SRDs However they can still document statistical significance formany variables and thus can prove useful in many circumstances

22 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07784 -311227 00000Education 02696 93438 00000Home Ownership 04530 262661 00000Income 00071 00035 09972Intercept 05443 78773 00000Alpha 07250 327719 00000

Table 31 SAR Estimation Results using Chebyshev lndet approximationand likelihood dominance inference

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07806 -321748 00000Education 02746 119438 00000Home Ownership 04525 270073 00000Income 00047 02187 08269Intercept 05528 94908 00000Alpha 07150 348648 00000

Table 32 SAR Estimation results using exact lndet

23

Some of the routines not only yield the maximum of the likelihood func-tion but also profile likelihoods in the dependence parameter α by model asshown in Figure 37

0 01 02 03 04 05 06 07 08 09 1minus7000

minus6800

minus6600

minus6400

minus6200

minus6000

minus5800

minus5600Profile likelihoods vs α for global model and deleteminus1 submodels

Dependence parameter (α)

Pro

file

logminus

likel

ihoo

d

Global likelihoodVoting PopEducationHome OwnershipIncomeIntercept

Figure 37 SAR Profile Likelihoods by Model

The toolbox includes SAR CAR and MESS error models as well asMESS closest neighbor and MIX autoregressive models as shown in Ta-ble 33 and Table 34

24 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables b OLS b closest b MESS b Mix

Voting Pop -08464 -07489 -07693 -07298Education 05167 02899 01941 01818Home Ownership 04291 04457 04832 04580Income -01439 -00332 00423 00427Lag Voting Pop 00000 01878 04186 04616Lag Education 00000 00975 01205 00450Lag Home Ownership 00000 -01569 -03249 -03299Lag Income 00000 -01079 -01802 -01410Intercept 09814 07495 05636 04205α 00000 03352 14628 06550

Table 33 Estimates on Election Data

The closest neighbor approach is intermediate to a non-spatial approach(OLS) and a full spatial approach (MESS) or the approximate mixed routineNote the close agreement between MESS and the mixed routine Note OLSin this case uses the spatial averages of the basic independent variables asadditional independent variables

None of these operations take long for the election data

25

Variables b OLS b closest b MESS b Mix

Voting Pop -347211 -305020 -288899 -294018Education 308928 134922 76654 77169Home Ownership 230066 254083 268836 273515Income -72373 -15742 17478 18952Lag Voting Pop 00000 77600 107656 125507Lag Education 00000 45090 39785 15705Lag Home Ownership 00000 -94245 -111259 -121200Lag Income 00000 -51399 -55551 -46603Intercept 202782 160533 107480 84626α 00000 242762 315948 337097

Table 34 Signed Root Deviances using Election Data

26 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Operation Timings in seconds

OLS 00150Closest AR 00470MESS 00940Exact Log-det 08750Mix 00470Doubly Stochastic Scaling 02660

Table 35 Timing for operations to Election data (n=3107)

In addition to global models the toolbox has spatial autoregressive localestimation (SALE) The user chooses the bandwidth (subsample size) by ex-amining cross-validation error at the fringe observations as in Figure 39 oreach observation as in Figure 38

27

0 50 100 150 200 250 300 350 400 450 5000055

006

0065

007

0075Smoothed SALE Recursive Residuals of Fringe Observations

Number of Local Observations

Abs

olut

e E

rror

Figure 38 Plot of Fringe prediction error versus subsample size

28 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 50 100 150 200 250 300 350 400 450 500005

0052

0054

0056

0058

006

0062

0064Smoothed SALE Initial Holdout Residuals

Number of Local Observations

Abs

olut

e E

rror

Figure 39 Plot of prediction error on center of area versus subsample size

Usually there is spatial dependence even in small subsamples as shownin Figure 310

29

0 50 100 150 200 250 300 350 400 450 5000

01

02

03

04

05

06

07SALE Autoregressive Parameter Estimates

Number of Local Observations

Med

ian

Aut

oreg

ress

ive

Par

amet

er E

stim

ates

Figure 310 Spatial dependence parameter estimate versus subsample size

Local estimation leads to spatially varying parameter estimates such asthose shown in Figure 311

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 3: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

List of Tables

31 SAR Estimation Results using Chebyshev lndet approxima-tion and likelihood dominance inference 22

32 SAR Estimation results using exact lndet 2233 Estimates on Election Data 2434 Signed Root Deviances using Election Data 2535 Timing for operations to Election data (n=3107) 2636 Distribution of local estimates 3137 Times for Different Methods for 57647 Observations 3238 Likelihoods across ρ for Doubly Stochastic Scaling 3339 Times for Optimizing the Likelihood over ρ for Both Scalings 33310 Likelihoods Across Doubly Stochastic and Regular Scalings 34311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-

tic Scaling 34

3

List of Figures

31 Connections among Counties via Delaunay 1632 Connections among Counties via Eight Nearest Neighbors 1733 Plot of Non-zeros of Delaunay Weight Matrix 1834 Plot of Non-zeros of Permuted Delaunay Weight Matrix 1935 Plot of Exact log-determinant with Chebyshev Approxima-

tion and Taylor Bounds 2036 Plot of Exact log-determinant with Monte Carlo Approxima-

tion and Limits 2137 SAR Profile Likelihoods by Model 2338 Plot of Fringe prediction error versus subsample size 2739 Plot of prediction error on center of area versus subsample size 28310 Spatial dependence parameter estimate versus subsample size 29311 Map of influence of homeownership on voting 30

4

Chapter 1

Why the Toolbox Exists

Individual data arise at a time and a place Randomization can destroy andaggregation can obscure spatial and temporal information but the originaldata points potentially exhibit spatiotemporal dependence Often simplemodels fitted to these data will produce spatially temporally or spatiotem-porally correlated errors provided reality is more complex than the modelIgnoring the spatial temporal or spatiotemporal dependence among errorsresults in inefficient parameter estimation biased inference and ignores in-formation that can greatly improve prediction accuracy In addition if thedata generating process is an autoregressive one ignoring the dependencewill lead to biased estimates and inference

Historically spatial statistics software floundered with problems involv-ing even thousands of observations For example Li (1995) required 8515seconds to compute a 2500 observation spatial autoregression using an IBMRS6000 Model 550 workstation The culprit for the difficulty lies in the max-imum likelihood estimatorrsquos need for the determinant of the n by n matrixof the covariances among the spatially scattered observations

The conventional computational approach relies upon eigenvalues (Ord(1975)) Even with faster computers the calculation of eigenvalues requiressubstantial time and memory For example on a 1700 Athlon it requires2475 minutes to compute the eigenvalues of the spatial weight matrix based

5

6 CHAPTER 1 WHY THE TOOLBOX EXISTS

on 30 nearest neighbors The resulting matrix takes 77 megabytes of storageas well

However many problems of practical importance generate large spatialdata sets Obvious examples include census data (over 200000 block groupsfor the US) and housing sales (many millions sold per year) The SpatialStatistics Toolbox addresses the need to quickly estimate large problems Incontrast to the eigenvalue approach by focusing upon direct computation ofdeterminants and using sparsity the same operation takes 338 seconds in thetoolbox Using a Chebyshev approximation reduces the time to under 02seconds As the eigenvalue computations rise at the cube of n while the log-determinant functions in the toolbox rise with n or n ln(n) large problemsfurther increase the performance of the toolbox relative to conventional ap-proaches

The speed of Spatial Statistics Toolbox 20 permits users to explore alter-native specifications spatial and aspatial in a timely fashion For exampleone can estimate local spatial autoregressions (spatial autoregressive local es-timation or SALE) as in Pace and LeSage (forthcoming) In the SALE exam-ple provided the toolbox was able to estimate a sequence of over 400 spatialautoregressions for around each of 3107 points in under four minutes

The Spatial Statistics Toolbox 20 conserves on memory as well It is notdifficult to estimate a spatial autoregression with over one million observa-tions In fact the toolbox provides an example under the dataset directorywhereby a one million observation spatial autoregression is estimated in justunder 20 seconds It took 13063 seconds to find the weight matrix 6024seconds to simulate the dependent variable and 1942 seconds to estimatethe autoregression

Relative to the previous incarnation the Spatial Statistics Toolbox 20introduces multidimensional and spatiotemporal weight matrices new scal-ings of the weight matrices (doubly stochastic) faster computation of exactlog-determinants (actually interpolated from exact computations of a smoothfunction) a couple of approximations to log-determinants and some new

7

computationally and theoretically interesting models such as the aforemen-tioned SALE and the matrix exponential spatial specification (MESS)

Chapter 2

Using the Toolbox

21 Hardware and Software Requirements

The toolbox has been developed under 65 and tested under 65 and 61across W2K and Windows ME However some routines run faster under65 than under 61 The total installation takes around 15 megabytes Theroutines have been tested on PC compatibles The routines should run onother platforms but have not been tested on non-PC compatibles

22 Installation

For users who can extract files from zip archives follow the instructions foryour product (eg Winzip) and extract the files into the drive in which youwish to install the toolbox The installation program will create the followingdirectory structure in whichever drive you choose

drive_letterspace

+---articles

+---datasets

+---big_one

+---election

8

22 INSTALLATION 9

+---housing

+---space_time

+---documentation

+---examples

+---CAR

+---CAR_SIM

+---CHEBYSHEV

+---CHEBYSHEV_SEQUENCE

+---CLOSEST_NEIGHBOR

+---DELAUNAY2

+---DOUBLY

+---LNDET_INTERP

+---LNDET_INTERP_SEQUENCE

+---LNDET_MONTECARLO

+---MESS_AR

+---MESS_CAR

+---MESS_SIM

+---MIXED

+---MULTIVARIATE

+---NEAREST_NEIGHBORS

+---OLS

+---SALE

+---SAR

+---SAR_SIM

+---SPACE_TIME

+---fmex_code

+---functions

+---fmex_code

To see whether the installation has succeeded change the directory inMatlab to one of the supplied examples and type run x_ m For ex-

10 CHAPTER 2 USING THE TOOLBOX

ample go to the examplescar subdirectory and type run x_car2_ga1The use of in this context serves as a wildcard or placeholder for the in-tervening characters This should cause the example script x_car2_ga1m torun

The multidimensional weight matrix routines require the installation ofTSTOOL Go to wwwphysik3gwdgdetstool to find this useful packageand follow the instructions to install it in Matlab If you have Matlab 65 youcan easily add the relevant paths to the mex functions by going to the File

menu selecting Set Path under the applet selecting Add Folder and addthe paths so Matlab can find the functions On my machine I added

opentstooltstoolboxmexdll

opentstooltstoolboxmex

opentstooltstoolboxutils

tstoolopentstooltstoolboxgui

tstoolopentstooltstoolbox

23 Help and documentation

All the example scripts should follow the form x_ m (eg x_car2_ga1mx_sar2_ga1m) Functions follow the form f m (eg fsar2m fols2m)Matlab matrix files (which may include multiple matrices) have the form mat

If you wish to access the functions in other directories you can define apath to space and spacefunctions using the set path commandunder the file menu in Matlab 65 where refers to the drive where thespatial statistics toobox resides and provided space is the name of the direc-tory you chose This is the same procedure outlined in installing TSTOOLOn my machine I added

space

spacefunctions

24 KNOWN LIMITATIONS 11

As in previous versions of Matlab users can employ the addpath commandto add these directories to the search path

Both the functions and examples are well-documented internally Onecan use the help functions of Matlab in the usual fashion when the toolbox ison the search path or the functions are in the current directory If the toolboxis on the search path a user can type help space and see the functionscontained in the Spatial Statistics Toolbox 2 (provided space is the name ofthe directory you chose) If the toolbox is on the search path or the functionlies in the current directory a user can obtain help on the function by typinghelp function_name or doc function_name

The examples provide the best means of understanding the toolbox functions Eachsubdirectory under the examples subdirectory contains all the files neededfor that particular example (except the multivariate subdirectory which re-quires installation of the TSTOOL package) Users can substitute their datafor the example data and see how the functions perform for their problemNote a few of these functions take several minutes to run (notably the SALEexamples) Most however are quite fast

24 Known Limitations

None All are unknown

25 Tips on Using the Toolbox

Typical sessions with the toolbox proceed in four steps First import thedata into Matlab If the file is fixed-format or tab-delimited ASCII the com-mand load nameextension (whatever that nameextension may be) willload the filename contents into memory into a Matlab variable name Savingthis will convert it into a matlab file (eg save name name will save variablename into matrix name stored in namemat Failure to specify both names will

12 CHAPTER 2 USING THE TOOLBOX

result in saving all defined variables into one file The data would includethe dependent variable the independent variables and the locational coor-dinates For example suppose the user has the dependent variable in the textfile ytxt Issuing the command load ytxt results in a Matlab variable yIssuing the command save y y results in saving ymat in the directory

Second create a spatial weight matrix Users can choose weight matricesbased upon nearest neighbors (symmetric or asymmetric) multidimensionalsymmetric neighbors spatiotemporal neighbors (asymmetric) and Delaunaytriangles (symmetric) In almost all cases one must make sure each locationis unique One may need to add slight amounts of random noise to thelocational coordinates to meet this restriction (some of the latest versions ofMatlab do this automatically mdash do not dither the coordinates in this case)Note some estimators only use symmetric matrices You can specify thenumber of neighbors used and their relative weightings

Note the Delaunay spatial weight matrix leads to a concentration matrixor a variance-covariance matrix that depends upon only one-parameter (αthe autoregressive parameter) In contrast the nearest neighbor concentra-tion matrices or variance-covariance matrices depend upon three parameters(α the autoregressive parameter m the number of neighbors and ρ whichgoverns the rate weights decline with the order of the neighbors with the clos-est neighbor given the highest weighting the second closest given a lowerweighting and so forth) Three parameters should make this specificationsufficiently flexible for many purposes

Third one computes the log-determinants for a grid of autoregressive pa-rameters (prespecified by the routine as a default or specified by the user asan option) Determinant computations proceed faster for symmetric matri-ces You must choose the appropriate log-determinant routines for the typeof spatial weight matrix you have specified Computing the log-determinantsis the slower than estimation but only needs to be done when changing thespatial weight matrix For example one can use the same weight matrix andlog-determinant files when exploring transformations or specifications of the

26 INCLUDED EXAMPLES 13

dependent and independent variables (for the same observations)Fourth pick a statistical routine to run given the data matrices the spatial

weight matrix and the log-determinant vector One can choose among con-ditional autoregressions (CAR) simultaneous autoregressions (SAR) matrixexponential spatial specifications (MESS) mixed regressive spatially autore-gressive estimators (which include pure autoregressive models and spatiallylagged independent variable models as special cases) and OLS In additionone can explore multivariate spatiotemporal and multivariate estimationThese routines require little time to run One can change models weight-ings and transformations and reestimate in the vast majority of cases withoutrerunning the spatial weight matrix or log-determinant routines (you mayneed to add another simple Jacobian term when performing weighting ortransformations of the dependent variables) This aids interactive data explo-ration

Fifth these procedures provide a wealth of information Many of theseroutines yield the profile likelihood in the autoregressive parameter for eachsubmodel (corresponding to the deletion of individual variables or the spatialterm) All of the inference even for the OLS routine uses likelihood ratiostatistics in the form of signed root deviances This is just the square root oftwice the difference in likelihoods given the sign of the parameter estimateIt has a t-like interpretation (Chen and Jennrich (1996)) The use of signedroot deviances (SRDs) facilitates comparisons among different models

26 Included Examples

The Spatial Statistics Toolbox comes with many examples These are foundin the subdirectories under spatial_toolbox_2examples To run theexamples change the directory in Matlab into the many subdirectories thatillustrate individual routines Look at the documentation in each example di-rectory for more detail Almost all of the specific models have examples Inaddition the simulation routine examples serve as minor Monte Carlo stud-

14 CHAPTER 2 USING THE TOOLBOX

ies which also help verify the functioning of the estimators The examplesuse the 3107 observation dataset from the Pace and Barry (1997) GeographicalAnalysis article

27 Included Datasets

The spatial_toolbox_2datasets subdirectory contains subdirectorieswith individual data sets in Matlab file formats as well as their documentationThe data sets include example programs and output Note due to the manyimprovements incorporated into the Spatial Statistics Toolbox over time therunning times have greatly improved over those in the articles

Hopefully these data sets should provide a good starting point for ex-ploring applications of spatial statistics

28 Included Manuscripts

In the manuscript subdirectory we provide pdf versions of the GeographicalAnalysis 1997 and 2000 articles I would like to thank the publishers (OhioState Press and Elsevier) for having given us copyright permission to dis-tribute these works One can also go to wwwspatial-statisticscom to accesssome other articles (eg the Linear Algebra and its Applications article whichproposed the Monte Carlo log-determinant estimator)

Chapter 3

A Brief Selected Tour of theToolbox

The weight matrix specifies the dependence among observations One formof weight matrix (Delaunay) uses the notion of contiguity to specify depen-dence as depicted in Figure 31

15

16 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

Delaunay Nearest Neighbor Graph

Figure 31 Connections among Counties via Delaunay

Note the somewhat strange behavior of connections to outlying observa-tions in Figure 31 This arises to the geometric nature of contiguity Usingnearest neighbors based upon some metric can avoid this as shown in Fig-ure 32

17

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

8 Nearest Neighbor Graph

Figure 32 Connections among Counties via Eight Nearest Neighbors

Using only nearby observations implies that the weight matrix has mainzeros or is sparse as shown in Figure 33

18 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Original Nonminuszero Weight Pattern for Delaunay

Figure 33 Plot of Non-zeros of Delaunay Weight Matrix

This becomes even more apparent when reordering the observations asin Figure 34

19

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Permuted Nonminuszero Weight Pattern for Delaunay

Figure 34 Plot of Non-zeros of Permuted Delaunay Weight Matrix

Sparsity as well as finding an appropriate ordering are key in quicklycomputing the log-determinants used in maximum likelihood The toolboxhas functions for exact computation of the log-determinants (actually inter-polation of exact computations at various points) However users can selectapproximations as well which depend only on sparsity and not upon order-ings The quadratic Chebyshev is the fastest and most approximate technique(Figure 35)

20 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus4000

minus3500

minus3000

minus2500

minus2000

minus1500

minus1000

minus500

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Chebyshev approximation Taylor bounds

Taylor Lower minusChebyshev oTaylor Upper +Exact

Figure 35 Plot of Exact log-determinant with Chebyshev Approximationand Taylor Bounds

The Chebyshev approximation appears quite good for positive moderatevalues of the dependence parameter but could use improvement for materi-ally negative values of the spatial dependence parameter Fortunately suchnegative values seem rare in practice

The Monte Carlo log-determinant estimator is quite fast but more accu-rate (Figure 36)

21

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus500

minus450

minus400

minus350

minus300

minus250

minus200

minus150

minus100

minus50

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Monte Carlo approximation and confidence limits

Lower ConfidenceMonte CarloUpper ConfidenceExact

Figure 36 Plot of Exact log-determinant with Monte Carlo Approximationand Limits

To see the effects of exact versus approximate log-determinant computa-tions consider Tables 31 and 32 using the 3107 county election data Theestimated autoregressive parameter is only off by 001 from using the approx-imation The approximate method also uses likelihood dominance inferencewhich results in a lower bound to the signed root deviances As shown bythe tables the likelihood dominance SRDs are smaller in magnitude thanthe exact SRDs However they can still document statistical significance formany variables and thus can prove useful in many circumstances

22 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07784 -311227 00000Education 02696 93438 00000Home Ownership 04530 262661 00000Income 00071 00035 09972Intercept 05443 78773 00000Alpha 07250 327719 00000

Table 31 SAR Estimation Results using Chebyshev lndet approximationand likelihood dominance inference

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07806 -321748 00000Education 02746 119438 00000Home Ownership 04525 270073 00000Income 00047 02187 08269Intercept 05528 94908 00000Alpha 07150 348648 00000

Table 32 SAR Estimation results using exact lndet

23

Some of the routines not only yield the maximum of the likelihood func-tion but also profile likelihoods in the dependence parameter α by model asshown in Figure 37

0 01 02 03 04 05 06 07 08 09 1minus7000

minus6800

minus6600

minus6400

minus6200

minus6000

minus5800

minus5600Profile likelihoods vs α for global model and deleteminus1 submodels

Dependence parameter (α)

Pro

file

logminus

likel

ihoo

d

Global likelihoodVoting PopEducationHome OwnershipIncomeIntercept

Figure 37 SAR Profile Likelihoods by Model

The toolbox includes SAR CAR and MESS error models as well asMESS closest neighbor and MIX autoregressive models as shown in Ta-ble 33 and Table 34

24 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables b OLS b closest b MESS b Mix

Voting Pop -08464 -07489 -07693 -07298Education 05167 02899 01941 01818Home Ownership 04291 04457 04832 04580Income -01439 -00332 00423 00427Lag Voting Pop 00000 01878 04186 04616Lag Education 00000 00975 01205 00450Lag Home Ownership 00000 -01569 -03249 -03299Lag Income 00000 -01079 -01802 -01410Intercept 09814 07495 05636 04205α 00000 03352 14628 06550

Table 33 Estimates on Election Data

The closest neighbor approach is intermediate to a non-spatial approach(OLS) and a full spatial approach (MESS) or the approximate mixed routineNote the close agreement between MESS and the mixed routine Note OLSin this case uses the spatial averages of the basic independent variables asadditional independent variables

None of these operations take long for the election data

25

Variables b OLS b closest b MESS b Mix

Voting Pop -347211 -305020 -288899 -294018Education 308928 134922 76654 77169Home Ownership 230066 254083 268836 273515Income -72373 -15742 17478 18952Lag Voting Pop 00000 77600 107656 125507Lag Education 00000 45090 39785 15705Lag Home Ownership 00000 -94245 -111259 -121200Lag Income 00000 -51399 -55551 -46603Intercept 202782 160533 107480 84626α 00000 242762 315948 337097

Table 34 Signed Root Deviances using Election Data

26 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Operation Timings in seconds

OLS 00150Closest AR 00470MESS 00940Exact Log-det 08750Mix 00470Doubly Stochastic Scaling 02660

Table 35 Timing for operations to Election data (n=3107)

In addition to global models the toolbox has spatial autoregressive localestimation (SALE) The user chooses the bandwidth (subsample size) by ex-amining cross-validation error at the fringe observations as in Figure 39 oreach observation as in Figure 38

27

0 50 100 150 200 250 300 350 400 450 5000055

006

0065

007

0075Smoothed SALE Recursive Residuals of Fringe Observations

Number of Local Observations

Abs

olut

e E

rror

Figure 38 Plot of Fringe prediction error versus subsample size

28 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 50 100 150 200 250 300 350 400 450 500005

0052

0054

0056

0058

006

0062

0064Smoothed SALE Initial Holdout Residuals

Number of Local Observations

Abs

olut

e E

rror

Figure 39 Plot of prediction error on center of area versus subsample size

Usually there is spatial dependence even in small subsamples as shownin Figure 310

29

0 50 100 150 200 250 300 350 400 450 5000

01

02

03

04

05

06

07SALE Autoregressive Parameter Estimates

Number of Local Observations

Med

ian

Aut

oreg

ress

ive

Par

amet

er E

stim

ates

Figure 310 Spatial dependence parameter estimate versus subsample size

Local estimation leads to spatially varying parameter estimates such asthose shown in Figure 311

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 4: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

List of Figures

31 Connections among Counties via Delaunay 1632 Connections among Counties via Eight Nearest Neighbors 1733 Plot of Non-zeros of Delaunay Weight Matrix 1834 Plot of Non-zeros of Permuted Delaunay Weight Matrix 1935 Plot of Exact log-determinant with Chebyshev Approxima-

tion and Taylor Bounds 2036 Plot of Exact log-determinant with Monte Carlo Approxima-

tion and Limits 2137 SAR Profile Likelihoods by Model 2338 Plot of Fringe prediction error versus subsample size 2739 Plot of prediction error on center of area versus subsample size 28310 Spatial dependence parameter estimate versus subsample size 29311 Map of influence of homeownership on voting 30

4

Chapter 1

Why the Toolbox Exists

Individual data arise at a time and a place Randomization can destroy andaggregation can obscure spatial and temporal information but the originaldata points potentially exhibit spatiotemporal dependence Often simplemodels fitted to these data will produce spatially temporally or spatiotem-porally correlated errors provided reality is more complex than the modelIgnoring the spatial temporal or spatiotemporal dependence among errorsresults in inefficient parameter estimation biased inference and ignores in-formation that can greatly improve prediction accuracy In addition if thedata generating process is an autoregressive one ignoring the dependencewill lead to biased estimates and inference

Historically spatial statistics software floundered with problems involv-ing even thousands of observations For example Li (1995) required 8515seconds to compute a 2500 observation spatial autoregression using an IBMRS6000 Model 550 workstation The culprit for the difficulty lies in the max-imum likelihood estimatorrsquos need for the determinant of the n by n matrixof the covariances among the spatially scattered observations

The conventional computational approach relies upon eigenvalues (Ord(1975)) Even with faster computers the calculation of eigenvalues requiressubstantial time and memory For example on a 1700 Athlon it requires2475 minutes to compute the eigenvalues of the spatial weight matrix based

5

6 CHAPTER 1 WHY THE TOOLBOX EXISTS

on 30 nearest neighbors The resulting matrix takes 77 megabytes of storageas well

However many problems of practical importance generate large spatialdata sets Obvious examples include census data (over 200000 block groupsfor the US) and housing sales (many millions sold per year) The SpatialStatistics Toolbox addresses the need to quickly estimate large problems Incontrast to the eigenvalue approach by focusing upon direct computation ofdeterminants and using sparsity the same operation takes 338 seconds in thetoolbox Using a Chebyshev approximation reduces the time to under 02seconds As the eigenvalue computations rise at the cube of n while the log-determinant functions in the toolbox rise with n or n ln(n) large problemsfurther increase the performance of the toolbox relative to conventional ap-proaches

The speed of Spatial Statistics Toolbox 20 permits users to explore alter-native specifications spatial and aspatial in a timely fashion For exampleone can estimate local spatial autoregressions (spatial autoregressive local es-timation or SALE) as in Pace and LeSage (forthcoming) In the SALE exam-ple provided the toolbox was able to estimate a sequence of over 400 spatialautoregressions for around each of 3107 points in under four minutes

The Spatial Statistics Toolbox 20 conserves on memory as well It is notdifficult to estimate a spatial autoregression with over one million observa-tions In fact the toolbox provides an example under the dataset directorywhereby a one million observation spatial autoregression is estimated in justunder 20 seconds It took 13063 seconds to find the weight matrix 6024seconds to simulate the dependent variable and 1942 seconds to estimatethe autoregression

Relative to the previous incarnation the Spatial Statistics Toolbox 20introduces multidimensional and spatiotemporal weight matrices new scal-ings of the weight matrices (doubly stochastic) faster computation of exactlog-determinants (actually interpolated from exact computations of a smoothfunction) a couple of approximations to log-determinants and some new

7

computationally and theoretically interesting models such as the aforemen-tioned SALE and the matrix exponential spatial specification (MESS)

Chapter 2

Using the Toolbox

21 Hardware and Software Requirements

The toolbox has been developed under 65 and tested under 65 and 61across W2K and Windows ME However some routines run faster under65 than under 61 The total installation takes around 15 megabytes Theroutines have been tested on PC compatibles The routines should run onother platforms but have not been tested on non-PC compatibles

22 Installation

For users who can extract files from zip archives follow the instructions foryour product (eg Winzip) and extract the files into the drive in which youwish to install the toolbox The installation program will create the followingdirectory structure in whichever drive you choose

drive_letterspace

+---articles

+---datasets

+---big_one

+---election

8

22 INSTALLATION 9

+---housing

+---space_time

+---documentation

+---examples

+---CAR

+---CAR_SIM

+---CHEBYSHEV

+---CHEBYSHEV_SEQUENCE

+---CLOSEST_NEIGHBOR

+---DELAUNAY2

+---DOUBLY

+---LNDET_INTERP

+---LNDET_INTERP_SEQUENCE

+---LNDET_MONTECARLO

+---MESS_AR

+---MESS_CAR

+---MESS_SIM

+---MIXED

+---MULTIVARIATE

+---NEAREST_NEIGHBORS

+---OLS

+---SALE

+---SAR

+---SAR_SIM

+---SPACE_TIME

+---fmex_code

+---functions

+---fmex_code

To see whether the installation has succeeded change the directory inMatlab to one of the supplied examples and type run x_ m For ex-

10 CHAPTER 2 USING THE TOOLBOX

ample go to the examplescar subdirectory and type run x_car2_ga1The use of in this context serves as a wildcard or placeholder for the in-tervening characters This should cause the example script x_car2_ga1m torun

The multidimensional weight matrix routines require the installation ofTSTOOL Go to wwwphysik3gwdgdetstool to find this useful packageand follow the instructions to install it in Matlab If you have Matlab 65 youcan easily add the relevant paths to the mex functions by going to the File

menu selecting Set Path under the applet selecting Add Folder and addthe paths so Matlab can find the functions On my machine I added

opentstooltstoolboxmexdll

opentstooltstoolboxmex

opentstooltstoolboxutils

tstoolopentstooltstoolboxgui

tstoolopentstooltstoolbox

23 Help and documentation

All the example scripts should follow the form x_ m (eg x_car2_ga1mx_sar2_ga1m) Functions follow the form f m (eg fsar2m fols2m)Matlab matrix files (which may include multiple matrices) have the form mat

If you wish to access the functions in other directories you can define apath to space and spacefunctions using the set path commandunder the file menu in Matlab 65 where refers to the drive where thespatial statistics toobox resides and provided space is the name of the direc-tory you chose This is the same procedure outlined in installing TSTOOLOn my machine I added

space

spacefunctions

24 KNOWN LIMITATIONS 11

As in previous versions of Matlab users can employ the addpath commandto add these directories to the search path

Both the functions and examples are well-documented internally Onecan use the help functions of Matlab in the usual fashion when the toolbox ison the search path or the functions are in the current directory If the toolboxis on the search path a user can type help space and see the functionscontained in the Spatial Statistics Toolbox 2 (provided space is the name ofthe directory you chose) If the toolbox is on the search path or the functionlies in the current directory a user can obtain help on the function by typinghelp function_name or doc function_name

The examples provide the best means of understanding the toolbox functions Eachsubdirectory under the examples subdirectory contains all the files neededfor that particular example (except the multivariate subdirectory which re-quires installation of the TSTOOL package) Users can substitute their datafor the example data and see how the functions perform for their problemNote a few of these functions take several minutes to run (notably the SALEexamples) Most however are quite fast

24 Known Limitations

None All are unknown

25 Tips on Using the Toolbox

Typical sessions with the toolbox proceed in four steps First import thedata into Matlab If the file is fixed-format or tab-delimited ASCII the com-mand load nameextension (whatever that nameextension may be) willload the filename contents into memory into a Matlab variable name Savingthis will convert it into a matlab file (eg save name name will save variablename into matrix name stored in namemat Failure to specify both names will

12 CHAPTER 2 USING THE TOOLBOX

result in saving all defined variables into one file The data would includethe dependent variable the independent variables and the locational coor-dinates For example suppose the user has the dependent variable in the textfile ytxt Issuing the command load ytxt results in a Matlab variable yIssuing the command save y y results in saving ymat in the directory

Second create a spatial weight matrix Users can choose weight matricesbased upon nearest neighbors (symmetric or asymmetric) multidimensionalsymmetric neighbors spatiotemporal neighbors (asymmetric) and Delaunaytriangles (symmetric) In almost all cases one must make sure each locationis unique One may need to add slight amounts of random noise to thelocational coordinates to meet this restriction (some of the latest versions ofMatlab do this automatically mdash do not dither the coordinates in this case)Note some estimators only use symmetric matrices You can specify thenumber of neighbors used and their relative weightings

Note the Delaunay spatial weight matrix leads to a concentration matrixor a variance-covariance matrix that depends upon only one-parameter (αthe autoregressive parameter) In contrast the nearest neighbor concentra-tion matrices or variance-covariance matrices depend upon three parameters(α the autoregressive parameter m the number of neighbors and ρ whichgoverns the rate weights decline with the order of the neighbors with the clos-est neighbor given the highest weighting the second closest given a lowerweighting and so forth) Three parameters should make this specificationsufficiently flexible for many purposes

Third one computes the log-determinants for a grid of autoregressive pa-rameters (prespecified by the routine as a default or specified by the user asan option) Determinant computations proceed faster for symmetric matri-ces You must choose the appropriate log-determinant routines for the typeof spatial weight matrix you have specified Computing the log-determinantsis the slower than estimation but only needs to be done when changing thespatial weight matrix For example one can use the same weight matrix andlog-determinant files when exploring transformations or specifications of the

26 INCLUDED EXAMPLES 13

dependent and independent variables (for the same observations)Fourth pick a statistical routine to run given the data matrices the spatial

weight matrix and the log-determinant vector One can choose among con-ditional autoregressions (CAR) simultaneous autoregressions (SAR) matrixexponential spatial specifications (MESS) mixed regressive spatially autore-gressive estimators (which include pure autoregressive models and spatiallylagged independent variable models as special cases) and OLS In additionone can explore multivariate spatiotemporal and multivariate estimationThese routines require little time to run One can change models weight-ings and transformations and reestimate in the vast majority of cases withoutrerunning the spatial weight matrix or log-determinant routines (you mayneed to add another simple Jacobian term when performing weighting ortransformations of the dependent variables) This aids interactive data explo-ration

Fifth these procedures provide a wealth of information Many of theseroutines yield the profile likelihood in the autoregressive parameter for eachsubmodel (corresponding to the deletion of individual variables or the spatialterm) All of the inference even for the OLS routine uses likelihood ratiostatistics in the form of signed root deviances This is just the square root oftwice the difference in likelihoods given the sign of the parameter estimateIt has a t-like interpretation (Chen and Jennrich (1996)) The use of signedroot deviances (SRDs) facilitates comparisons among different models

26 Included Examples

The Spatial Statistics Toolbox comes with many examples These are foundin the subdirectories under spatial_toolbox_2examples To run theexamples change the directory in Matlab into the many subdirectories thatillustrate individual routines Look at the documentation in each example di-rectory for more detail Almost all of the specific models have examples Inaddition the simulation routine examples serve as minor Monte Carlo stud-

14 CHAPTER 2 USING THE TOOLBOX

ies which also help verify the functioning of the estimators The examplesuse the 3107 observation dataset from the Pace and Barry (1997) GeographicalAnalysis article

27 Included Datasets

The spatial_toolbox_2datasets subdirectory contains subdirectorieswith individual data sets in Matlab file formats as well as their documentationThe data sets include example programs and output Note due to the manyimprovements incorporated into the Spatial Statistics Toolbox over time therunning times have greatly improved over those in the articles

Hopefully these data sets should provide a good starting point for ex-ploring applications of spatial statistics

28 Included Manuscripts

In the manuscript subdirectory we provide pdf versions of the GeographicalAnalysis 1997 and 2000 articles I would like to thank the publishers (OhioState Press and Elsevier) for having given us copyright permission to dis-tribute these works One can also go to wwwspatial-statisticscom to accesssome other articles (eg the Linear Algebra and its Applications article whichproposed the Monte Carlo log-determinant estimator)

Chapter 3

A Brief Selected Tour of theToolbox

The weight matrix specifies the dependence among observations One formof weight matrix (Delaunay) uses the notion of contiguity to specify depen-dence as depicted in Figure 31

15

16 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

Delaunay Nearest Neighbor Graph

Figure 31 Connections among Counties via Delaunay

Note the somewhat strange behavior of connections to outlying observa-tions in Figure 31 This arises to the geometric nature of contiguity Usingnearest neighbors based upon some metric can avoid this as shown in Fig-ure 32

17

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

8 Nearest Neighbor Graph

Figure 32 Connections among Counties via Eight Nearest Neighbors

Using only nearby observations implies that the weight matrix has mainzeros or is sparse as shown in Figure 33

18 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Original Nonminuszero Weight Pattern for Delaunay

Figure 33 Plot of Non-zeros of Delaunay Weight Matrix

This becomes even more apparent when reordering the observations asin Figure 34

19

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Permuted Nonminuszero Weight Pattern for Delaunay

Figure 34 Plot of Non-zeros of Permuted Delaunay Weight Matrix

Sparsity as well as finding an appropriate ordering are key in quicklycomputing the log-determinants used in maximum likelihood The toolboxhas functions for exact computation of the log-determinants (actually inter-polation of exact computations at various points) However users can selectapproximations as well which depend only on sparsity and not upon order-ings The quadratic Chebyshev is the fastest and most approximate technique(Figure 35)

20 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus4000

minus3500

minus3000

minus2500

minus2000

minus1500

minus1000

minus500

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Chebyshev approximation Taylor bounds

Taylor Lower minusChebyshev oTaylor Upper +Exact

Figure 35 Plot of Exact log-determinant with Chebyshev Approximationand Taylor Bounds

The Chebyshev approximation appears quite good for positive moderatevalues of the dependence parameter but could use improvement for materi-ally negative values of the spatial dependence parameter Fortunately suchnegative values seem rare in practice

The Monte Carlo log-determinant estimator is quite fast but more accu-rate (Figure 36)

21

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus500

minus450

minus400

minus350

minus300

minus250

minus200

minus150

minus100

minus50

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Monte Carlo approximation and confidence limits

Lower ConfidenceMonte CarloUpper ConfidenceExact

Figure 36 Plot of Exact log-determinant with Monte Carlo Approximationand Limits

To see the effects of exact versus approximate log-determinant computa-tions consider Tables 31 and 32 using the 3107 county election data Theestimated autoregressive parameter is only off by 001 from using the approx-imation The approximate method also uses likelihood dominance inferencewhich results in a lower bound to the signed root deviances As shown bythe tables the likelihood dominance SRDs are smaller in magnitude thanthe exact SRDs However they can still document statistical significance formany variables and thus can prove useful in many circumstances

22 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07784 -311227 00000Education 02696 93438 00000Home Ownership 04530 262661 00000Income 00071 00035 09972Intercept 05443 78773 00000Alpha 07250 327719 00000

Table 31 SAR Estimation Results using Chebyshev lndet approximationand likelihood dominance inference

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07806 -321748 00000Education 02746 119438 00000Home Ownership 04525 270073 00000Income 00047 02187 08269Intercept 05528 94908 00000Alpha 07150 348648 00000

Table 32 SAR Estimation results using exact lndet

23

Some of the routines not only yield the maximum of the likelihood func-tion but also profile likelihoods in the dependence parameter α by model asshown in Figure 37

0 01 02 03 04 05 06 07 08 09 1minus7000

minus6800

minus6600

minus6400

minus6200

minus6000

minus5800

minus5600Profile likelihoods vs α for global model and deleteminus1 submodels

Dependence parameter (α)

Pro

file

logminus

likel

ihoo

d

Global likelihoodVoting PopEducationHome OwnershipIncomeIntercept

Figure 37 SAR Profile Likelihoods by Model

The toolbox includes SAR CAR and MESS error models as well asMESS closest neighbor and MIX autoregressive models as shown in Ta-ble 33 and Table 34

24 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables b OLS b closest b MESS b Mix

Voting Pop -08464 -07489 -07693 -07298Education 05167 02899 01941 01818Home Ownership 04291 04457 04832 04580Income -01439 -00332 00423 00427Lag Voting Pop 00000 01878 04186 04616Lag Education 00000 00975 01205 00450Lag Home Ownership 00000 -01569 -03249 -03299Lag Income 00000 -01079 -01802 -01410Intercept 09814 07495 05636 04205α 00000 03352 14628 06550

Table 33 Estimates on Election Data

The closest neighbor approach is intermediate to a non-spatial approach(OLS) and a full spatial approach (MESS) or the approximate mixed routineNote the close agreement between MESS and the mixed routine Note OLSin this case uses the spatial averages of the basic independent variables asadditional independent variables

None of these operations take long for the election data

25

Variables b OLS b closest b MESS b Mix

Voting Pop -347211 -305020 -288899 -294018Education 308928 134922 76654 77169Home Ownership 230066 254083 268836 273515Income -72373 -15742 17478 18952Lag Voting Pop 00000 77600 107656 125507Lag Education 00000 45090 39785 15705Lag Home Ownership 00000 -94245 -111259 -121200Lag Income 00000 -51399 -55551 -46603Intercept 202782 160533 107480 84626α 00000 242762 315948 337097

Table 34 Signed Root Deviances using Election Data

26 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Operation Timings in seconds

OLS 00150Closest AR 00470MESS 00940Exact Log-det 08750Mix 00470Doubly Stochastic Scaling 02660

Table 35 Timing for operations to Election data (n=3107)

In addition to global models the toolbox has spatial autoregressive localestimation (SALE) The user chooses the bandwidth (subsample size) by ex-amining cross-validation error at the fringe observations as in Figure 39 oreach observation as in Figure 38

27

0 50 100 150 200 250 300 350 400 450 5000055

006

0065

007

0075Smoothed SALE Recursive Residuals of Fringe Observations

Number of Local Observations

Abs

olut

e E

rror

Figure 38 Plot of Fringe prediction error versus subsample size

28 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 50 100 150 200 250 300 350 400 450 500005

0052

0054

0056

0058

006

0062

0064Smoothed SALE Initial Holdout Residuals

Number of Local Observations

Abs

olut

e E

rror

Figure 39 Plot of prediction error on center of area versus subsample size

Usually there is spatial dependence even in small subsamples as shownin Figure 310

29

0 50 100 150 200 250 300 350 400 450 5000

01

02

03

04

05

06

07SALE Autoregressive Parameter Estimates

Number of Local Observations

Med

ian

Aut

oreg

ress

ive

Par

amet

er E

stim

ates

Figure 310 Spatial dependence parameter estimate versus subsample size

Local estimation leads to spatially varying parameter estimates such asthose shown in Figure 311

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 5: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

Chapter 1

Why the Toolbox Exists

Individual data arise at a time and a place Randomization can destroy andaggregation can obscure spatial and temporal information but the originaldata points potentially exhibit spatiotemporal dependence Often simplemodels fitted to these data will produce spatially temporally or spatiotem-porally correlated errors provided reality is more complex than the modelIgnoring the spatial temporal or spatiotemporal dependence among errorsresults in inefficient parameter estimation biased inference and ignores in-formation that can greatly improve prediction accuracy In addition if thedata generating process is an autoregressive one ignoring the dependencewill lead to biased estimates and inference

Historically spatial statistics software floundered with problems involv-ing even thousands of observations For example Li (1995) required 8515seconds to compute a 2500 observation spatial autoregression using an IBMRS6000 Model 550 workstation The culprit for the difficulty lies in the max-imum likelihood estimatorrsquos need for the determinant of the n by n matrixof the covariances among the spatially scattered observations

The conventional computational approach relies upon eigenvalues (Ord(1975)) Even with faster computers the calculation of eigenvalues requiressubstantial time and memory For example on a 1700 Athlon it requires2475 minutes to compute the eigenvalues of the spatial weight matrix based

5

6 CHAPTER 1 WHY THE TOOLBOX EXISTS

on 30 nearest neighbors The resulting matrix takes 77 megabytes of storageas well

However many problems of practical importance generate large spatialdata sets Obvious examples include census data (over 200000 block groupsfor the US) and housing sales (many millions sold per year) The SpatialStatistics Toolbox addresses the need to quickly estimate large problems Incontrast to the eigenvalue approach by focusing upon direct computation ofdeterminants and using sparsity the same operation takes 338 seconds in thetoolbox Using a Chebyshev approximation reduces the time to under 02seconds As the eigenvalue computations rise at the cube of n while the log-determinant functions in the toolbox rise with n or n ln(n) large problemsfurther increase the performance of the toolbox relative to conventional ap-proaches

The speed of Spatial Statistics Toolbox 20 permits users to explore alter-native specifications spatial and aspatial in a timely fashion For exampleone can estimate local spatial autoregressions (spatial autoregressive local es-timation or SALE) as in Pace and LeSage (forthcoming) In the SALE exam-ple provided the toolbox was able to estimate a sequence of over 400 spatialautoregressions for around each of 3107 points in under four minutes

The Spatial Statistics Toolbox 20 conserves on memory as well It is notdifficult to estimate a spatial autoregression with over one million observa-tions In fact the toolbox provides an example under the dataset directorywhereby a one million observation spatial autoregression is estimated in justunder 20 seconds It took 13063 seconds to find the weight matrix 6024seconds to simulate the dependent variable and 1942 seconds to estimatethe autoregression

Relative to the previous incarnation the Spatial Statistics Toolbox 20introduces multidimensional and spatiotemporal weight matrices new scal-ings of the weight matrices (doubly stochastic) faster computation of exactlog-determinants (actually interpolated from exact computations of a smoothfunction) a couple of approximations to log-determinants and some new

7

computationally and theoretically interesting models such as the aforemen-tioned SALE and the matrix exponential spatial specification (MESS)

Chapter 2

Using the Toolbox

21 Hardware and Software Requirements

The toolbox has been developed under 65 and tested under 65 and 61across W2K and Windows ME However some routines run faster under65 than under 61 The total installation takes around 15 megabytes Theroutines have been tested on PC compatibles The routines should run onother platforms but have not been tested on non-PC compatibles

22 Installation

For users who can extract files from zip archives follow the instructions foryour product (eg Winzip) and extract the files into the drive in which youwish to install the toolbox The installation program will create the followingdirectory structure in whichever drive you choose

drive_letterspace

+---articles

+---datasets

+---big_one

+---election

8

22 INSTALLATION 9

+---housing

+---space_time

+---documentation

+---examples

+---CAR

+---CAR_SIM

+---CHEBYSHEV

+---CHEBYSHEV_SEQUENCE

+---CLOSEST_NEIGHBOR

+---DELAUNAY2

+---DOUBLY

+---LNDET_INTERP

+---LNDET_INTERP_SEQUENCE

+---LNDET_MONTECARLO

+---MESS_AR

+---MESS_CAR

+---MESS_SIM

+---MIXED

+---MULTIVARIATE

+---NEAREST_NEIGHBORS

+---OLS

+---SALE

+---SAR

+---SAR_SIM

+---SPACE_TIME

+---fmex_code

+---functions

+---fmex_code

To see whether the installation has succeeded change the directory inMatlab to one of the supplied examples and type run x_ m For ex-

10 CHAPTER 2 USING THE TOOLBOX

ample go to the examplescar subdirectory and type run x_car2_ga1The use of in this context serves as a wildcard or placeholder for the in-tervening characters This should cause the example script x_car2_ga1m torun

The multidimensional weight matrix routines require the installation ofTSTOOL Go to wwwphysik3gwdgdetstool to find this useful packageand follow the instructions to install it in Matlab If you have Matlab 65 youcan easily add the relevant paths to the mex functions by going to the File

menu selecting Set Path under the applet selecting Add Folder and addthe paths so Matlab can find the functions On my machine I added

opentstooltstoolboxmexdll

opentstooltstoolboxmex

opentstooltstoolboxutils

tstoolopentstooltstoolboxgui

tstoolopentstooltstoolbox

23 Help and documentation

All the example scripts should follow the form x_ m (eg x_car2_ga1mx_sar2_ga1m) Functions follow the form f m (eg fsar2m fols2m)Matlab matrix files (which may include multiple matrices) have the form mat

If you wish to access the functions in other directories you can define apath to space and spacefunctions using the set path commandunder the file menu in Matlab 65 where refers to the drive where thespatial statistics toobox resides and provided space is the name of the direc-tory you chose This is the same procedure outlined in installing TSTOOLOn my machine I added

space

spacefunctions

24 KNOWN LIMITATIONS 11

As in previous versions of Matlab users can employ the addpath commandto add these directories to the search path

Both the functions and examples are well-documented internally Onecan use the help functions of Matlab in the usual fashion when the toolbox ison the search path or the functions are in the current directory If the toolboxis on the search path a user can type help space and see the functionscontained in the Spatial Statistics Toolbox 2 (provided space is the name ofthe directory you chose) If the toolbox is on the search path or the functionlies in the current directory a user can obtain help on the function by typinghelp function_name or doc function_name

The examples provide the best means of understanding the toolbox functions Eachsubdirectory under the examples subdirectory contains all the files neededfor that particular example (except the multivariate subdirectory which re-quires installation of the TSTOOL package) Users can substitute their datafor the example data and see how the functions perform for their problemNote a few of these functions take several minutes to run (notably the SALEexamples) Most however are quite fast

24 Known Limitations

None All are unknown

25 Tips on Using the Toolbox

Typical sessions with the toolbox proceed in four steps First import thedata into Matlab If the file is fixed-format or tab-delimited ASCII the com-mand load nameextension (whatever that nameextension may be) willload the filename contents into memory into a Matlab variable name Savingthis will convert it into a matlab file (eg save name name will save variablename into matrix name stored in namemat Failure to specify both names will

12 CHAPTER 2 USING THE TOOLBOX

result in saving all defined variables into one file The data would includethe dependent variable the independent variables and the locational coor-dinates For example suppose the user has the dependent variable in the textfile ytxt Issuing the command load ytxt results in a Matlab variable yIssuing the command save y y results in saving ymat in the directory

Second create a spatial weight matrix Users can choose weight matricesbased upon nearest neighbors (symmetric or asymmetric) multidimensionalsymmetric neighbors spatiotemporal neighbors (asymmetric) and Delaunaytriangles (symmetric) In almost all cases one must make sure each locationis unique One may need to add slight amounts of random noise to thelocational coordinates to meet this restriction (some of the latest versions ofMatlab do this automatically mdash do not dither the coordinates in this case)Note some estimators only use symmetric matrices You can specify thenumber of neighbors used and their relative weightings

Note the Delaunay spatial weight matrix leads to a concentration matrixor a variance-covariance matrix that depends upon only one-parameter (αthe autoregressive parameter) In contrast the nearest neighbor concentra-tion matrices or variance-covariance matrices depend upon three parameters(α the autoregressive parameter m the number of neighbors and ρ whichgoverns the rate weights decline with the order of the neighbors with the clos-est neighbor given the highest weighting the second closest given a lowerweighting and so forth) Three parameters should make this specificationsufficiently flexible for many purposes

Third one computes the log-determinants for a grid of autoregressive pa-rameters (prespecified by the routine as a default or specified by the user asan option) Determinant computations proceed faster for symmetric matri-ces You must choose the appropriate log-determinant routines for the typeof spatial weight matrix you have specified Computing the log-determinantsis the slower than estimation but only needs to be done when changing thespatial weight matrix For example one can use the same weight matrix andlog-determinant files when exploring transformations or specifications of the

26 INCLUDED EXAMPLES 13

dependent and independent variables (for the same observations)Fourth pick a statistical routine to run given the data matrices the spatial

weight matrix and the log-determinant vector One can choose among con-ditional autoregressions (CAR) simultaneous autoregressions (SAR) matrixexponential spatial specifications (MESS) mixed regressive spatially autore-gressive estimators (which include pure autoregressive models and spatiallylagged independent variable models as special cases) and OLS In additionone can explore multivariate spatiotemporal and multivariate estimationThese routines require little time to run One can change models weight-ings and transformations and reestimate in the vast majority of cases withoutrerunning the spatial weight matrix or log-determinant routines (you mayneed to add another simple Jacobian term when performing weighting ortransformations of the dependent variables) This aids interactive data explo-ration

Fifth these procedures provide a wealth of information Many of theseroutines yield the profile likelihood in the autoregressive parameter for eachsubmodel (corresponding to the deletion of individual variables or the spatialterm) All of the inference even for the OLS routine uses likelihood ratiostatistics in the form of signed root deviances This is just the square root oftwice the difference in likelihoods given the sign of the parameter estimateIt has a t-like interpretation (Chen and Jennrich (1996)) The use of signedroot deviances (SRDs) facilitates comparisons among different models

26 Included Examples

The Spatial Statistics Toolbox comes with many examples These are foundin the subdirectories under spatial_toolbox_2examples To run theexamples change the directory in Matlab into the many subdirectories thatillustrate individual routines Look at the documentation in each example di-rectory for more detail Almost all of the specific models have examples Inaddition the simulation routine examples serve as minor Monte Carlo stud-

14 CHAPTER 2 USING THE TOOLBOX

ies which also help verify the functioning of the estimators The examplesuse the 3107 observation dataset from the Pace and Barry (1997) GeographicalAnalysis article

27 Included Datasets

The spatial_toolbox_2datasets subdirectory contains subdirectorieswith individual data sets in Matlab file formats as well as their documentationThe data sets include example programs and output Note due to the manyimprovements incorporated into the Spatial Statistics Toolbox over time therunning times have greatly improved over those in the articles

Hopefully these data sets should provide a good starting point for ex-ploring applications of spatial statistics

28 Included Manuscripts

In the manuscript subdirectory we provide pdf versions of the GeographicalAnalysis 1997 and 2000 articles I would like to thank the publishers (OhioState Press and Elsevier) for having given us copyright permission to dis-tribute these works One can also go to wwwspatial-statisticscom to accesssome other articles (eg the Linear Algebra and its Applications article whichproposed the Monte Carlo log-determinant estimator)

Chapter 3

A Brief Selected Tour of theToolbox

The weight matrix specifies the dependence among observations One formof weight matrix (Delaunay) uses the notion of contiguity to specify depen-dence as depicted in Figure 31

15

16 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

Delaunay Nearest Neighbor Graph

Figure 31 Connections among Counties via Delaunay

Note the somewhat strange behavior of connections to outlying observa-tions in Figure 31 This arises to the geometric nature of contiguity Usingnearest neighbors based upon some metric can avoid this as shown in Fig-ure 32

17

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

8 Nearest Neighbor Graph

Figure 32 Connections among Counties via Eight Nearest Neighbors

Using only nearby observations implies that the weight matrix has mainzeros or is sparse as shown in Figure 33

18 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Original Nonminuszero Weight Pattern for Delaunay

Figure 33 Plot of Non-zeros of Delaunay Weight Matrix

This becomes even more apparent when reordering the observations asin Figure 34

19

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Permuted Nonminuszero Weight Pattern for Delaunay

Figure 34 Plot of Non-zeros of Permuted Delaunay Weight Matrix

Sparsity as well as finding an appropriate ordering are key in quicklycomputing the log-determinants used in maximum likelihood The toolboxhas functions for exact computation of the log-determinants (actually inter-polation of exact computations at various points) However users can selectapproximations as well which depend only on sparsity and not upon order-ings The quadratic Chebyshev is the fastest and most approximate technique(Figure 35)

20 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus4000

minus3500

minus3000

minus2500

minus2000

minus1500

minus1000

minus500

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Chebyshev approximation Taylor bounds

Taylor Lower minusChebyshev oTaylor Upper +Exact

Figure 35 Plot of Exact log-determinant with Chebyshev Approximationand Taylor Bounds

The Chebyshev approximation appears quite good for positive moderatevalues of the dependence parameter but could use improvement for materi-ally negative values of the spatial dependence parameter Fortunately suchnegative values seem rare in practice

The Monte Carlo log-determinant estimator is quite fast but more accu-rate (Figure 36)

21

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus500

minus450

minus400

minus350

minus300

minus250

minus200

minus150

minus100

minus50

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Monte Carlo approximation and confidence limits

Lower ConfidenceMonte CarloUpper ConfidenceExact

Figure 36 Plot of Exact log-determinant with Monte Carlo Approximationand Limits

To see the effects of exact versus approximate log-determinant computa-tions consider Tables 31 and 32 using the 3107 county election data Theestimated autoregressive parameter is only off by 001 from using the approx-imation The approximate method also uses likelihood dominance inferencewhich results in a lower bound to the signed root deviances As shown bythe tables the likelihood dominance SRDs are smaller in magnitude thanthe exact SRDs However they can still document statistical significance formany variables and thus can prove useful in many circumstances

22 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07784 -311227 00000Education 02696 93438 00000Home Ownership 04530 262661 00000Income 00071 00035 09972Intercept 05443 78773 00000Alpha 07250 327719 00000

Table 31 SAR Estimation Results using Chebyshev lndet approximationand likelihood dominance inference

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07806 -321748 00000Education 02746 119438 00000Home Ownership 04525 270073 00000Income 00047 02187 08269Intercept 05528 94908 00000Alpha 07150 348648 00000

Table 32 SAR Estimation results using exact lndet

23

Some of the routines not only yield the maximum of the likelihood func-tion but also profile likelihoods in the dependence parameter α by model asshown in Figure 37

0 01 02 03 04 05 06 07 08 09 1minus7000

minus6800

minus6600

minus6400

minus6200

minus6000

minus5800

minus5600Profile likelihoods vs α for global model and deleteminus1 submodels

Dependence parameter (α)

Pro

file

logminus

likel

ihoo

d

Global likelihoodVoting PopEducationHome OwnershipIncomeIntercept

Figure 37 SAR Profile Likelihoods by Model

The toolbox includes SAR CAR and MESS error models as well asMESS closest neighbor and MIX autoregressive models as shown in Ta-ble 33 and Table 34

24 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables b OLS b closest b MESS b Mix

Voting Pop -08464 -07489 -07693 -07298Education 05167 02899 01941 01818Home Ownership 04291 04457 04832 04580Income -01439 -00332 00423 00427Lag Voting Pop 00000 01878 04186 04616Lag Education 00000 00975 01205 00450Lag Home Ownership 00000 -01569 -03249 -03299Lag Income 00000 -01079 -01802 -01410Intercept 09814 07495 05636 04205α 00000 03352 14628 06550

Table 33 Estimates on Election Data

The closest neighbor approach is intermediate to a non-spatial approach(OLS) and a full spatial approach (MESS) or the approximate mixed routineNote the close agreement between MESS and the mixed routine Note OLSin this case uses the spatial averages of the basic independent variables asadditional independent variables

None of these operations take long for the election data

25

Variables b OLS b closest b MESS b Mix

Voting Pop -347211 -305020 -288899 -294018Education 308928 134922 76654 77169Home Ownership 230066 254083 268836 273515Income -72373 -15742 17478 18952Lag Voting Pop 00000 77600 107656 125507Lag Education 00000 45090 39785 15705Lag Home Ownership 00000 -94245 -111259 -121200Lag Income 00000 -51399 -55551 -46603Intercept 202782 160533 107480 84626α 00000 242762 315948 337097

Table 34 Signed Root Deviances using Election Data

26 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Operation Timings in seconds

OLS 00150Closest AR 00470MESS 00940Exact Log-det 08750Mix 00470Doubly Stochastic Scaling 02660

Table 35 Timing for operations to Election data (n=3107)

In addition to global models the toolbox has spatial autoregressive localestimation (SALE) The user chooses the bandwidth (subsample size) by ex-amining cross-validation error at the fringe observations as in Figure 39 oreach observation as in Figure 38

27

0 50 100 150 200 250 300 350 400 450 5000055

006

0065

007

0075Smoothed SALE Recursive Residuals of Fringe Observations

Number of Local Observations

Abs

olut

e E

rror

Figure 38 Plot of Fringe prediction error versus subsample size

28 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 50 100 150 200 250 300 350 400 450 500005

0052

0054

0056

0058

006

0062

0064Smoothed SALE Initial Holdout Residuals

Number of Local Observations

Abs

olut

e E

rror

Figure 39 Plot of prediction error on center of area versus subsample size

Usually there is spatial dependence even in small subsamples as shownin Figure 310

29

0 50 100 150 200 250 300 350 400 450 5000

01

02

03

04

05

06

07SALE Autoregressive Parameter Estimates

Number of Local Observations

Med

ian

Aut

oreg

ress

ive

Par

amet

er E

stim

ates

Figure 310 Spatial dependence parameter estimate versus subsample size

Local estimation leads to spatially varying parameter estimates such asthose shown in Figure 311

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 6: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

6 CHAPTER 1 WHY THE TOOLBOX EXISTS

on 30 nearest neighbors The resulting matrix takes 77 megabytes of storageas well

However many problems of practical importance generate large spatialdata sets Obvious examples include census data (over 200000 block groupsfor the US) and housing sales (many millions sold per year) The SpatialStatistics Toolbox addresses the need to quickly estimate large problems Incontrast to the eigenvalue approach by focusing upon direct computation ofdeterminants and using sparsity the same operation takes 338 seconds in thetoolbox Using a Chebyshev approximation reduces the time to under 02seconds As the eigenvalue computations rise at the cube of n while the log-determinant functions in the toolbox rise with n or n ln(n) large problemsfurther increase the performance of the toolbox relative to conventional ap-proaches

The speed of Spatial Statistics Toolbox 20 permits users to explore alter-native specifications spatial and aspatial in a timely fashion For exampleone can estimate local spatial autoregressions (spatial autoregressive local es-timation or SALE) as in Pace and LeSage (forthcoming) In the SALE exam-ple provided the toolbox was able to estimate a sequence of over 400 spatialautoregressions for around each of 3107 points in under four minutes

The Spatial Statistics Toolbox 20 conserves on memory as well It is notdifficult to estimate a spatial autoregression with over one million observa-tions In fact the toolbox provides an example under the dataset directorywhereby a one million observation spatial autoregression is estimated in justunder 20 seconds It took 13063 seconds to find the weight matrix 6024seconds to simulate the dependent variable and 1942 seconds to estimatethe autoregression

Relative to the previous incarnation the Spatial Statistics Toolbox 20introduces multidimensional and spatiotemporal weight matrices new scal-ings of the weight matrices (doubly stochastic) faster computation of exactlog-determinants (actually interpolated from exact computations of a smoothfunction) a couple of approximations to log-determinants and some new

7

computationally and theoretically interesting models such as the aforemen-tioned SALE and the matrix exponential spatial specification (MESS)

Chapter 2

Using the Toolbox

21 Hardware and Software Requirements

The toolbox has been developed under 65 and tested under 65 and 61across W2K and Windows ME However some routines run faster under65 than under 61 The total installation takes around 15 megabytes Theroutines have been tested on PC compatibles The routines should run onother platforms but have not been tested on non-PC compatibles

22 Installation

For users who can extract files from zip archives follow the instructions foryour product (eg Winzip) and extract the files into the drive in which youwish to install the toolbox The installation program will create the followingdirectory structure in whichever drive you choose

drive_letterspace

+---articles

+---datasets

+---big_one

+---election

8

22 INSTALLATION 9

+---housing

+---space_time

+---documentation

+---examples

+---CAR

+---CAR_SIM

+---CHEBYSHEV

+---CHEBYSHEV_SEQUENCE

+---CLOSEST_NEIGHBOR

+---DELAUNAY2

+---DOUBLY

+---LNDET_INTERP

+---LNDET_INTERP_SEQUENCE

+---LNDET_MONTECARLO

+---MESS_AR

+---MESS_CAR

+---MESS_SIM

+---MIXED

+---MULTIVARIATE

+---NEAREST_NEIGHBORS

+---OLS

+---SALE

+---SAR

+---SAR_SIM

+---SPACE_TIME

+---fmex_code

+---functions

+---fmex_code

To see whether the installation has succeeded change the directory inMatlab to one of the supplied examples and type run x_ m For ex-

10 CHAPTER 2 USING THE TOOLBOX

ample go to the examplescar subdirectory and type run x_car2_ga1The use of in this context serves as a wildcard or placeholder for the in-tervening characters This should cause the example script x_car2_ga1m torun

The multidimensional weight matrix routines require the installation ofTSTOOL Go to wwwphysik3gwdgdetstool to find this useful packageand follow the instructions to install it in Matlab If you have Matlab 65 youcan easily add the relevant paths to the mex functions by going to the File

menu selecting Set Path under the applet selecting Add Folder and addthe paths so Matlab can find the functions On my machine I added

opentstooltstoolboxmexdll

opentstooltstoolboxmex

opentstooltstoolboxutils

tstoolopentstooltstoolboxgui

tstoolopentstooltstoolbox

23 Help and documentation

All the example scripts should follow the form x_ m (eg x_car2_ga1mx_sar2_ga1m) Functions follow the form f m (eg fsar2m fols2m)Matlab matrix files (which may include multiple matrices) have the form mat

If you wish to access the functions in other directories you can define apath to space and spacefunctions using the set path commandunder the file menu in Matlab 65 where refers to the drive where thespatial statistics toobox resides and provided space is the name of the direc-tory you chose This is the same procedure outlined in installing TSTOOLOn my machine I added

space

spacefunctions

24 KNOWN LIMITATIONS 11

As in previous versions of Matlab users can employ the addpath commandto add these directories to the search path

Both the functions and examples are well-documented internally Onecan use the help functions of Matlab in the usual fashion when the toolbox ison the search path or the functions are in the current directory If the toolboxis on the search path a user can type help space and see the functionscontained in the Spatial Statistics Toolbox 2 (provided space is the name ofthe directory you chose) If the toolbox is on the search path or the functionlies in the current directory a user can obtain help on the function by typinghelp function_name or doc function_name

The examples provide the best means of understanding the toolbox functions Eachsubdirectory under the examples subdirectory contains all the files neededfor that particular example (except the multivariate subdirectory which re-quires installation of the TSTOOL package) Users can substitute their datafor the example data and see how the functions perform for their problemNote a few of these functions take several minutes to run (notably the SALEexamples) Most however are quite fast

24 Known Limitations

None All are unknown

25 Tips on Using the Toolbox

Typical sessions with the toolbox proceed in four steps First import thedata into Matlab If the file is fixed-format or tab-delimited ASCII the com-mand load nameextension (whatever that nameextension may be) willload the filename contents into memory into a Matlab variable name Savingthis will convert it into a matlab file (eg save name name will save variablename into matrix name stored in namemat Failure to specify both names will

12 CHAPTER 2 USING THE TOOLBOX

result in saving all defined variables into one file The data would includethe dependent variable the independent variables and the locational coor-dinates For example suppose the user has the dependent variable in the textfile ytxt Issuing the command load ytxt results in a Matlab variable yIssuing the command save y y results in saving ymat in the directory

Second create a spatial weight matrix Users can choose weight matricesbased upon nearest neighbors (symmetric or asymmetric) multidimensionalsymmetric neighbors spatiotemporal neighbors (asymmetric) and Delaunaytriangles (symmetric) In almost all cases one must make sure each locationis unique One may need to add slight amounts of random noise to thelocational coordinates to meet this restriction (some of the latest versions ofMatlab do this automatically mdash do not dither the coordinates in this case)Note some estimators only use symmetric matrices You can specify thenumber of neighbors used and their relative weightings

Note the Delaunay spatial weight matrix leads to a concentration matrixor a variance-covariance matrix that depends upon only one-parameter (αthe autoregressive parameter) In contrast the nearest neighbor concentra-tion matrices or variance-covariance matrices depend upon three parameters(α the autoregressive parameter m the number of neighbors and ρ whichgoverns the rate weights decline with the order of the neighbors with the clos-est neighbor given the highest weighting the second closest given a lowerweighting and so forth) Three parameters should make this specificationsufficiently flexible for many purposes

Third one computes the log-determinants for a grid of autoregressive pa-rameters (prespecified by the routine as a default or specified by the user asan option) Determinant computations proceed faster for symmetric matri-ces You must choose the appropriate log-determinant routines for the typeof spatial weight matrix you have specified Computing the log-determinantsis the slower than estimation but only needs to be done when changing thespatial weight matrix For example one can use the same weight matrix andlog-determinant files when exploring transformations or specifications of the

26 INCLUDED EXAMPLES 13

dependent and independent variables (for the same observations)Fourth pick a statistical routine to run given the data matrices the spatial

weight matrix and the log-determinant vector One can choose among con-ditional autoregressions (CAR) simultaneous autoregressions (SAR) matrixexponential spatial specifications (MESS) mixed regressive spatially autore-gressive estimators (which include pure autoregressive models and spatiallylagged independent variable models as special cases) and OLS In additionone can explore multivariate spatiotemporal and multivariate estimationThese routines require little time to run One can change models weight-ings and transformations and reestimate in the vast majority of cases withoutrerunning the spatial weight matrix or log-determinant routines (you mayneed to add another simple Jacobian term when performing weighting ortransformations of the dependent variables) This aids interactive data explo-ration

Fifth these procedures provide a wealth of information Many of theseroutines yield the profile likelihood in the autoregressive parameter for eachsubmodel (corresponding to the deletion of individual variables or the spatialterm) All of the inference even for the OLS routine uses likelihood ratiostatistics in the form of signed root deviances This is just the square root oftwice the difference in likelihoods given the sign of the parameter estimateIt has a t-like interpretation (Chen and Jennrich (1996)) The use of signedroot deviances (SRDs) facilitates comparisons among different models

26 Included Examples

The Spatial Statistics Toolbox comes with many examples These are foundin the subdirectories under spatial_toolbox_2examples To run theexamples change the directory in Matlab into the many subdirectories thatillustrate individual routines Look at the documentation in each example di-rectory for more detail Almost all of the specific models have examples Inaddition the simulation routine examples serve as minor Monte Carlo stud-

14 CHAPTER 2 USING THE TOOLBOX

ies which also help verify the functioning of the estimators The examplesuse the 3107 observation dataset from the Pace and Barry (1997) GeographicalAnalysis article

27 Included Datasets

The spatial_toolbox_2datasets subdirectory contains subdirectorieswith individual data sets in Matlab file formats as well as their documentationThe data sets include example programs and output Note due to the manyimprovements incorporated into the Spatial Statistics Toolbox over time therunning times have greatly improved over those in the articles

Hopefully these data sets should provide a good starting point for ex-ploring applications of spatial statistics

28 Included Manuscripts

In the manuscript subdirectory we provide pdf versions of the GeographicalAnalysis 1997 and 2000 articles I would like to thank the publishers (OhioState Press and Elsevier) for having given us copyright permission to dis-tribute these works One can also go to wwwspatial-statisticscom to accesssome other articles (eg the Linear Algebra and its Applications article whichproposed the Monte Carlo log-determinant estimator)

Chapter 3

A Brief Selected Tour of theToolbox

The weight matrix specifies the dependence among observations One formof weight matrix (Delaunay) uses the notion of contiguity to specify depen-dence as depicted in Figure 31

15

16 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

Delaunay Nearest Neighbor Graph

Figure 31 Connections among Counties via Delaunay

Note the somewhat strange behavior of connections to outlying observa-tions in Figure 31 This arises to the geometric nature of contiguity Usingnearest neighbors based upon some metric can avoid this as shown in Fig-ure 32

17

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

8 Nearest Neighbor Graph

Figure 32 Connections among Counties via Eight Nearest Neighbors

Using only nearby observations implies that the weight matrix has mainzeros or is sparse as shown in Figure 33

18 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Original Nonminuszero Weight Pattern for Delaunay

Figure 33 Plot of Non-zeros of Delaunay Weight Matrix

This becomes even more apparent when reordering the observations asin Figure 34

19

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Permuted Nonminuszero Weight Pattern for Delaunay

Figure 34 Plot of Non-zeros of Permuted Delaunay Weight Matrix

Sparsity as well as finding an appropriate ordering are key in quicklycomputing the log-determinants used in maximum likelihood The toolboxhas functions for exact computation of the log-determinants (actually inter-polation of exact computations at various points) However users can selectapproximations as well which depend only on sparsity and not upon order-ings The quadratic Chebyshev is the fastest and most approximate technique(Figure 35)

20 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus4000

minus3500

minus3000

minus2500

minus2000

minus1500

minus1000

minus500

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Chebyshev approximation Taylor bounds

Taylor Lower minusChebyshev oTaylor Upper +Exact

Figure 35 Plot of Exact log-determinant with Chebyshev Approximationand Taylor Bounds

The Chebyshev approximation appears quite good for positive moderatevalues of the dependence parameter but could use improvement for materi-ally negative values of the spatial dependence parameter Fortunately suchnegative values seem rare in practice

The Monte Carlo log-determinant estimator is quite fast but more accu-rate (Figure 36)

21

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus500

minus450

minus400

minus350

minus300

minus250

minus200

minus150

minus100

minus50

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Monte Carlo approximation and confidence limits

Lower ConfidenceMonte CarloUpper ConfidenceExact

Figure 36 Plot of Exact log-determinant with Monte Carlo Approximationand Limits

To see the effects of exact versus approximate log-determinant computa-tions consider Tables 31 and 32 using the 3107 county election data Theestimated autoregressive parameter is only off by 001 from using the approx-imation The approximate method also uses likelihood dominance inferencewhich results in a lower bound to the signed root deviances As shown bythe tables the likelihood dominance SRDs are smaller in magnitude thanthe exact SRDs However they can still document statistical significance formany variables and thus can prove useful in many circumstances

22 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07784 -311227 00000Education 02696 93438 00000Home Ownership 04530 262661 00000Income 00071 00035 09972Intercept 05443 78773 00000Alpha 07250 327719 00000

Table 31 SAR Estimation Results using Chebyshev lndet approximationand likelihood dominance inference

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07806 -321748 00000Education 02746 119438 00000Home Ownership 04525 270073 00000Income 00047 02187 08269Intercept 05528 94908 00000Alpha 07150 348648 00000

Table 32 SAR Estimation results using exact lndet

23

Some of the routines not only yield the maximum of the likelihood func-tion but also profile likelihoods in the dependence parameter α by model asshown in Figure 37

0 01 02 03 04 05 06 07 08 09 1minus7000

minus6800

minus6600

minus6400

minus6200

minus6000

minus5800

minus5600Profile likelihoods vs α for global model and deleteminus1 submodels

Dependence parameter (α)

Pro

file

logminus

likel

ihoo

d

Global likelihoodVoting PopEducationHome OwnershipIncomeIntercept

Figure 37 SAR Profile Likelihoods by Model

The toolbox includes SAR CAR and MESS error models as well asMESS closest neighbor and MIX autoregressive models as shown in Ta-ble 33 and Table 34

24 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables b OLS b closest b MESS b Mix

Voting Pop -08464 -07489 -07693 -07298Education 05167 02899 01941 01818Home Ownership 04291 04457 04832 04580Income -01439 -00332 00423 00427Lag Voting Pop 00000 01878 04186 04616Lag Education 00000 00975 01205 00450Lag Home Ownership 00000 -01569 -03249 -03299Lag Income 00000 -01079 -01802 -01410Intercept 09814 07495 05636 04205α 00000 03352 14628 06550

Table 33 Estimates on Election Data

The closest neighbor approach is intermediate to a non-spatial approach(OLS) and a full spatial approach (MESS) or the approximate mixed routineNote the close agreement between MESS and the mixed routine Note OLSin this case uses the spatial averages of the basic independent variables asadditional independent variables

None of these operations take long for the election data

25

Variables b OLS b closest b MESS b Mix

Voting Pop -347211 -305020 -288899 -294018Education 308928 134922 76654 77169Home Ownership 230066 254083 268836 273515Income -72373 -15742 17478 18952Lag Voting Pop 00000 77600 107656 125507Lag Education 00000 45090 39785 15705Lag Home Ownership 00000 -94245 -111259 -121200Lag Income 00000 -51399 -55551 -46603Intercept 202782 160533 107480 84626α 00000 242762 315948 337097

Table 34 Signed Root Deviances using Election Data

26 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Operation Timings in seconds

OLS 00150Closest AR 00470MESS 00940Exact Log-det 08750Mix 00470Doubly Stochastic Scaling 02660

Table 35 Timing for operations to Election data (n=3107)

In addition to global models the toolbox has spatial autoregressive localestimation (SALE) The user chooses the bandwidth (subsample size) by ex-amining cross-validation error at the fringe observations as in Figure 39 oreach observation as in Figure 38

27

0 50 100 150 200 250 300 350 400 450 5000055

006

0065

007

0075Smoothed SALE Recursive Residuals of Fringe Observations

Number of Local Observations

Abs

olut

e E

rror

Figure 38 Plot of Fringe prediction error versus subsample size

28 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 50 100 150 200 250 300 350 400 450 500005

0052

0054

0056

0058

006

0062

0064Smoothed SALE Initial Holdout Residuals

Number of Local Observations

Abs

olut

e E

rror

Figure 39 Plot of prediction error on center of area versus subsample size

Usually there is spatial dependence even in small subsamples as shownin Figure 310

29

0 50 100 150 200 250 300 350 400 450 5000

01

02

03

04

05

06

07SALE Autoregressive Parameter Estimates

Number of Local Observations

Med

ian

Aut

oreg

ress

ive

Par

amet

er E

stim

ates

Figure 310 Spatial dependence parameter estimate versus subsample size

Local estimation leads to spatially varying parameter estimates such asthose shown in Figure 311

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 7: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

7

computationally and theoretically interesting models such as the aforemen-tioned SALE and the matrix exponential spatial specification (MESS)

Chapter 2

Using the Toolbox

21 Hardware and Software Requirements

The toolbox has been developed under 65 and tested under 65 and 61across W2K and Windows ME However some routines run faster under65 than under 61 The total installation takes around 15 megabytes Theroutines have been tested on PC compatibles The routines should run onother platforms but have not been tested on non-PC compatibles

22 Installation

For users who can extract files from zip archives follow the instructions foryour product (eg Winzip) and extract the files into the drive in which youwish to install the toolbox The installation program will create the followingdirectory structure in whichever drive you choose

drive_letterspace

+---articles

+---datasets

+---big_one

+---election

8

22 INSTALLATION 9

+---housing

+---space_time

+---documentation

+---examples

+---CAR

+---CAR_SIM

+---CHEBYSHEV

+---CHEBYSHEV_SEQUENCE

+---CLOSEST_NEIGHBOR

+---DELAUNAY2

+---DOUBLY

+---LNDET_INTERP

+---LNDET_INTERP_SEQUENCE

+---LNDET_MONTECARLO

+---MESS_AR

+---MESS_CAR

+---MESS_SIM

+---MIXED

+---MULTIVARIATE

+---NEAREST_NEIGHBORS

+---OLS

+---SALE

+---SAR

+---SAR_SIM

+---SPACE_TIME

+---fmex_code

+---functions

+---fmex_code

To see whether the installation has succeeded change the directory inMatlab to one of the supplied examples and type run x_ m For ex-

10 CHAPTER 2 USING THE TOOLBOX

ample go to the examplescar subdirectory and type run x_car2_ga1The use of in this context serves as a wildcard or placeholder for the in-tervening characters This should cause the example script x_car2_ga1m torun

The multidimensional weight matrix routines require the installation ofTSTOOL Go to wwwphysik3gwdgdetstool to find this useful packageand follow the instructions to install it in Matlab If you have Matlab 65 youcan easily add the relevant paths to the mex functions by going to the File

menu selecting Set Path under the applet selecting Add Folder and addthe paths so Matlab can find the functions On my machine I added

opentstooltstoolboxmexdll

opentstooltstoolboxmex

opentstooltstoolboxutils

tstoolopentstooltstoolboxgui

tstoolopentstooltstoolbox

23 Help and documentation

All the example scripts should follow the form x_ m (eg x_car2_ga1mx_sar2_ga1m) Functions follow the form f m (eg fsar2m fols2m)Matlab matrix files (which may include multiple matrices) have the form mat

If you wish to access the functions in other directories you can define apath to space and spacefunctions using the set path commandunder the file menu in Matlab 65 where refers to the drive where thespatial statistics toobox resides and provided space is the name of the direc-tory you chose This is the same procedure outlined in installing TSTOOLOn my machine I added

space

spacefunctions

24 KNOWN LIMITATIONS 11

As in previous versions of Matlab users can employ the addpath commandto add these directories to the search path

Both the functions and examples are well-documented internally Onecan use the help functions of Matlab in the usual fashion when the toolbox ison the search path or the functions are in the current directory If the toolboxis on the search path a user can type help space and see the functionscontained in the Spatial Statistics Toolbox 2 (provided space is the name ofthe directory you chose) If the toolbox is on the search path or the functionlies in the current directory a user can obtain help on the function by typinghelp function_name or doc function_name

The examples provide the best means of understanding the toolbox functions Eachsubdirectory under the examples subdirectory contains all the files neededfor that particular example (except the multivariate subdirectory which re-quires installation of the TSTOOL package) Users can substitute their datafor the example data and see how the functions perform for their problemNote a few of these functions take several minutes to run (notably the SALEexamples) Most however are quite fast

24 Known Limitations

None All are unknown

25 Tips on Using the Toolbox

Typical sessions with the toolbox proceed in four steps First import thedata into Matlab If the file is fixed-format or tab-delimited ASCII the com-mand load nameextension (whatever that nameextension may be) willload the filename contents into memory into a Matlab variable name Savingthis will convert it into a matlab file (eg save name name will save variablename into matrix name stored in namemat Failure to specify both names will

12 CHAPTER 2 USING THE TOOLBOX

result in saving all defined variables into one file The data would includethe dependent variable the independent variables and the locational coor-dinates For example suppose the user has the dependent variable in the textfile ytxt Issuing the command load ytxt results in a Matlab variable yIssuing the command save y y results in saving ymat in the directory

Second create a spatial weight matrix Users can choose weight matricesbased upon nearest neighbors (symmetric or asymmetric) multidimensionalsymmetric neighbors spatiotemporal neighbors (asymmetric) and Delaunaytriangles (symmetric) In almost all cases one must make sure each locationis unique One may need to add slight amounts of random noise to thelocational coordinates to meet this restriction (some of the latest versions ofMatlab do this automatically mdash do not dither the coordinates in this case)Note some estimators only use symmetric matrices You can specify thenumber of neighbors used and their relative weightings

Note the Delaunay spatial weight matrix leads to a concentration matrixor a variance-covariance matrix that depends upon only one-parameter (αthe autoregressive parameter) In contrast the nearest neighbor concentra-tion matrices or variance-covariance matrices depend upon three parameters(α the autoregressive parameter m the number of neighbors and ρ whichgoverns the rate weights decline with the order of the neighbors with the clos-est neighbor given the highest weighting the second closest given a lowerweighting and so forth) Three parameters should make this specificationsufficiently flexible for many purposes

Third one computes the log-determinants for a grid of autoregressive pa-rameters (prespecified by the routine as a default or specified by the user asan option) Determinant computations proceed faster for symmetric matri-ces You must choose the appropriate log-determinant routines for the typeof spatial weight matrix you have specified Computing the log-determinantsis the slower than estimation but only needs to be done when changing thespatial weight matrix For example one can use the same weight matrix andlog-determinant files when exploring transformations or specifications of the

26 INCLUDED EXAMPLES 13

dependent and independent variables (for the same observations)Fourth pick a statistical routine to run given the data matrices the spatial

weight matrix and the log-determinant vector One can choose among con-ditional autoregressions (CAR) simultaneous autoregressions (SAR) matrixexponential spatial specifications (MESS) mixed regressive spatially autore-gressive estimators (which include pure autoregressive models and spatiallylagged independent variable models as special cases) and OLS In additionone can explore multivariate spatiotemporal and multivariate estimationThese routines require little time to run One can change models weight-ings and transformations and reestimate in the vast majority of cases withoutrerunning the spatial weight matrix or log-determinant routines (you mayneed to add another simple Jacobian term when performing weighting ortransformations of the dependent variables) This aids interactive data explo-ration

Fifth these procedures provide a wealth of information Many of theseroutines yield the profile likelihood in the autoregressive parameter for eachsubmodel (corresponding to the deletion of individual variables or the spatialterm) All of the inference even for the OLS routine uses likelihood ratiostatistics in the form of signed root deviances This is just the square root oftwice the difference in likelihoods given the sign of the parameter estimateIt has a t-like interpretation (Chen and Jennrich (1996)) The use of signedroot deviances (SRDs) facilitates comparisons among different models

26 Included Examples

The Spatial Statistics Toolbox comes with many examples These are foundin the subdirectories under spatial_toolbox_2examples To run theexamples change the directory in Matlab into the many subdirectories thatillustrate individual routines Look at the documentation in each example di-rectory for more detail Almost all of the specific models have examples Inaddition the simulation routine examples serve as minor Monte Carlo stud-

14 CHAPTER 2 USING THE TOOLBOX

ies which also help verify the functioning of the estimators The examplesuse the 3107 observation dataset from the Pace and Barry (1997) GeographicalAnalysis article

27 Included Datasets

The spatial_toolbox_2datasets subdirectory contains subdirectorieswith individual data sets in Matlab file formats as well as their documentationThe data sets include example programs and output Note due to the manyimprovements incorporated into the Spatial Statistics Toolbox over time therunning times have greatly improved over those in the articles

Hopefully these data sets should provide a good starting point for ex-ploring applications of spatial statistics

28 Included Manuscripts

In the manuscript subdirectory we provide pdf versions of the GeographicalAnalysis 1997 and 2000 articles I would like to thank the publishers (OhioState Press and Elsevier) for having given us copyright permission to dis-tribute these works One can also go to wwwspatial-statisticscom to accesssome other articles (eg the Linear Algebra and its Applications article whichproposed the Monte Carlo log-determinant estimator)

Chapter 3

A Brief Selected Tour of theToolbox

The weight matrix specifies the dependence among observations One formof weight matrix (Delaunay) uses the notion of contiguity to specify depen-dence as depicted in Figure 31

15

16 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

Delaunay Nearest Neighbor Graph

Figure 31 Connections among Counties via Delaunay

Note the somewhat strange behavior of connections to outlying observa-tions in Figure 31 This arises to the geometric nature of contiguity Usingnearest neighbors based upon some metric can avoid this as shown in Fig-ure 32

17

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

8 Nearest Neighbor Graph

Figure 32 Connections among Counties via Eight Nearest Neighbors

Using only nearby observations implies that the weight matrix has mainzeros or is sparse as shown in Figure 33

18 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Original Nonminuszero Weight Pattern for Delaunay

Figure 33 Plot of Non-zeros of Delaunay Weight Matrix

This becomes even more apparent when reordering the observations asin Figure 34

19

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Permuted Nonminuszero Weight Pattern for Delaunay

Figure 34 Plot of Non-zeros of Permuted Delaunay Weight Matrix

Sparsity as well as finding an appropriate ordering are key in quicklycomputing the log-determinants used in maximum likelihood The toolboxhas functions for exact computation of the log-determinants (actually inter-polation of exact computations at various points) However users can selectapproximations as well which depend only on sparsity and not upon order-ings The quadratic Chebyshev is the fastest and most approximate technique(Figure 35)

20 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus4000

minus3500

minus3000

minus2500

minus2000

minus1500

minus1000

minus500

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Chebyshev approximation Taylor bounds

Taylor Lower minusChebyshev oTaylor Upper +Exact

Figure 35 Plot of Exact log-determinant with Chebyshev Approximationand Taylor Bounds

The Chebyshev approximation appears quite good for positive moderatevalues of the dependence parameter but could use improvement for materi-ally negative values of the spatial dependence parameter Fortunately suchnegative values seem rare in practice

The Monte Carlo log-determinant estimator is quite fast but more accu-rate (Figure 36)

21

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus500

minus450

minus400

minus350

minus300

minus250

minus200

minus150

minus100

minus50

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Monte Carlo approximation and confidence limits

Lower ConfidenceMonte CarloUpper ConfidenceExact

Figure 36 Plot of Exact log-determinant with Monte Carlo Approximationand Limits

To see the effects of exact versus approximate log-determinant computa-tions consider Tables 31 and 32 using the 3107 county election data Theestimated autoregressive parameter is only off by 001 from using the approx-imation The approximate method also uses likelihood dominance inferencewhich results in a lower bound to the signed root deviances As shown bythe tables the likelihood dominance SRDs are smaller in magnitude thanthe exact SRDs However they can still document statistical significance formany variables and thus can prove useful in many circumstances

22 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07784 -311227 00000Education 02696 93438 00000Home Ownership 04530 262661 00000Income 00071 00035 09972Intercept 05443 78773 00000Alpha 07250 327719 00000

Table 31 SAR Estimation Results using Chebyshev lndet approximationand likelihood dominance inference

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07806 -321748 00000Education 02746 119438 00000Home Ownership 04525 270073 00000Income 00047 02187 08269Intercept 05528 94908 00000Alpha 07150 348648 00000

Table 32 SAR Estimation results using exact lndet

23

Some of the routines not only yield the maximum of the likelihood func-tion but also profile likelihoods in the dependence parameter α by model asshown in Figure 37

0 01 02 03 04 05 06 07 08 09 1minus7000

minus6800

minus6600

minus6400

minus6200

minus6000

minus5800

minus5600Profile likelihoods vs α for global model and deleteminus1 submodels

Dependence parameter (α)

Pro

file

logminus

likel

ihoo

d

Global likelihoodVoting PopEducationHome OwnershipIncomeIntercept

Figure 37 SAR Profile Likelihoods by Model

The toolbox includes SAR CAR and MESS error models as well asMESS closest neighbor and MIX autoregressive models as shown in Ta-ble 33 and Table 34

24 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables b OLS b closest b MESS b Mix

Voting Pop -08464 -07489 -07693 -07298Education 05167 02899 01941 01818Home Ownership 04291 04457 04832 04580Income -01439 -00332 00423 00427Lag Voting Pop 00000 01878 04186 04616Lag Education 00000 00975 01205 00450Lag Home Ownership 00000 -01569 -03249 -03299Lag Income 00000 -01079 -01802 -01410Intercept 09814 07495 05636 04205α 00000 03352 14628 06550

Table 33 Estimates on Election Data

The closest neighbor approach is intermediate to a non-spatial approach(OLS) and a full spatial approach (MESS) or the approximate mixed routineNote the close agreement between MESS and the mixed routine Note OLSin this case uses the spatial averages of the basic independent variables asadditional independent variables

None of these operations take long for the election data

25

Variables b OLS b closest b MESS b Mix

Voting Pop -347211 -305020 -288899 -294018Education 308928 134922 76654 77169Home Ownership 230066 254083 268836 273515Income -72373 -15742 17478 18952Lag Voting Pop 00000 77600 107656 125507Lag Education 00000 45090 39785 15705Lag Home Ownership 00000 -94245 -111259 -121200Lag Income 00000 -51399 -55551 -46603Intercept 202782 160533 107480 84626α 00000 242762 315948 337097

Table 34 Signed Root Deviances using Election Data

26 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Operation Timings in seconds

OLS 00150Closest AR 00470MESS 00940Exact Log-det 08750Mix 00470Doubly Stochastic Scaling 02660

Table 35 Timing for operations to Election data (n=3107)

In addition to global models the toolbox has spatial autoregressive localestimation (SALE) The user chooses the bandwidth (subsample size) by ex-amining cross-validation error at the fringe observations as in Figure 39 oreach observation as in Figure 38

27

0 50 100 150 200 250 300 350 400 450 5000055

006

0065

007

0075Smoothed SALE Recursive Residuals of Fringe Observations

Number of Local Observations

Abs

olut

e E

rror

Figure 38 Plot of Fringe prediction error versus subsample size

28 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 50 100 150 200 250 300 350 400 450 500005

0052

0054

0056

0058

006

0062

0064Smoothed SALE Initial Holdout Residuals

Number of Local Observations

Abs

olut

e E

rror

Figure 39 Plot of prediction error on center of area versus subsample size

Usually there is spatial dependence even in small subsamples as shownin Figure 310

29

0 50 100 150 200 250 300 350 400 450 5000

01

02

03

04

05

06

07SALE Autoregressive Parameter Estimates

Number of Local Observations

Med

ian

Aut

oreg

ress

ive

Par

amet

er E

stim

ates

Figure 310 Spatial dependence parameter estimate versus subsample size

Local estimation leads to spatially varying parameter estimates such asthose shown in Figure 311

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 8: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

Chapter 2

Using the Toolbox

21 Hardware and Software Requirements

The toolbox has been developed under 65 and tested under 65 and 61across W2K and Windows ME However some routines run faster under65 than under 61 The total installation takes around 15 megabytes Theroutines have been tested on PC compatibles The routines should run onother platforms but have not been tested on non-PC compatibles

22 Installation

For users who can extract files from zip archives follow the instructions foryour product (eg Winzip) and extract the files into the drive in which youwish to install the toolbox The installation program will create the followingdirectory structure in whichever drive you choose

drive_letterspace

+---articles

+---datasets

+---big_one

+---election

8

22 INSTALLATION 9

+---housing

+---space_time

+---documentation

+---examples

+---CAR

+---CAR_SIM

+---CHEBYSHEV

+---CHEBYSHEV_SEQUENCE

+---CLOSEST_NEIGHBOR

+---DELAUNAY2

+---DOUBLY

+---LNDET_INTERP

+---LNDET_INTERP_SEQUENCE

+---LNDET_MONTECARLO

+---MESS_AR

+---MESS_CAR

+---MESS_SIM

+---MIXED

+---MULTIVARIATE

+---NEAREST_NEIGHBORS

+---OLS

+---SALE

+---SAR

+---SAR_SIM

+---SPACE_TIME

+---fmex_code

+---functions

+---fmex_code

To see whether the installation has succeeded change the directory inMatlab to one of the supplied examples and type run x_ m For ex-

10 CHAPTER 2 USING THE TOOLBOX

ample go to the examplescar subdirectory and type run x_car2_ga1The use of in this context serves as a wildcard or placeholder for the in-tervening characters This should cause the example script x_car2_ga1m torun

The multidimensional weight matrix routines require the installation ofTSTOOL Go to wwwphysik3gwdgdetstool to find this useful packageand follow the instructions to install it in Matlab If you have Matlab 65 youcan easily add the relevant paths to the mex functions by going to the File

menu selecting Set Path under the applet selecting Add Folder and addthe paths so Matlab can find the functions On my machine I added

opentstooltstoolboxmexdll

opentstooltstoolboxmex

opentstooltstoolboxutils

tstoolopentstooltstoolboxgui

tstoolopentstooltstoolbox

23 Help and documentation

All the example scripts should follow the form x_ m (eg x_car2_ga1mx_sar2_ga1m) Functions follow the form f m (eg fsar2m fols2m)Matlab matrix files (which may include multiple matrices) have the form mat

If you wish to access the functions in other directories you can define apath to space and spacefunctions using the set path commandunder the file menu in Matlab 65 where refers to the drive where thespatial statistics toobox resides and provided space is the name of the direc-tory you chose This is the same procedure outlined in installing TSTOOLOn my machine I added

space

spacefunctions

24 KNOWN LIMITATIONS 11

As in previous versions of Matlab users can employ the addpath commandto add these directories to the search path

Both the functions and examples are well-documented internally Onecan use the help functions of Matlab in the usual fashion when the toolbox ison the search path or the functions are in the current directory If the toolboxis on the search path a user can type help space and see the functionscontained in the Spatial Statistics Toolbox 2 (provided space is the name ofthe directory you chose) If the toolbox is on the search path or the functionlies in the current directory a user can obtain help on the function by typinghelp function_name or doc function_name

The examples provide the best means of understanding the toolbox functions Eachsubdirectory under the examples subdirectory contains all the files neededfor that particular example (except the multivariate subdirectory which re-quires installation of the TSTOOL package) Users can substitute their datafor the example data and see how the functions perform for their problemNote a few of these functions take several minutes to run (notably the SALEexamples) Most however are quite fast

24 Known Limitations

None All are unknown

25 Tips on Using the Toolbox

Typical sessions with the toolbox proceed in four steps First import thedata into Matlab If the file is fixed-format or tab-delimited ASCII the com-mand load nameextension (whatever that nameextension may be) willload the filename contents into memory into a Matlab variable name Savingthis will convert it into a matlab file (eg save name name will save variablename into matrix name stored in namemat Failure to specify both names will

12 CHAPTER 2 USING THE TOOLBOX

result in saving all defined variables into one file The data would includethe dependent variable the independent variables and the locational coor-dinates For example suppose the user has the dependent variable in the textfile ytxt Issuing the command load ytxt results in a Matlab variable yIssuing the command save y y results in saving ymat in the directory

Second create a spatial weight matrix Users can choose weight matricesbased upon nearest neighbors (symmetric or asymmetric) multidimensionalsymmetric neighbors spatiotemporal neighbors (asymmetric) and Delaunaytriangles (symmetric) In almost all cases one must make sure each locationis unique One may need to add slight amounts of random noise to thelocational coordinates to meet this restriction (some of the latest versions ofMatlab do this automatically mdash do not dither the coordinates in this case)Note some estimators only use symmetric matrices You can specify thenumber of neighbors used and their relative weightings

Note the Delaunay spatial weight matrix leads to a concentration matrixor a variance-covariance matrix that depends upon only one-parameter (αthe autoregressive parameter) In contrast the nearest neighbor concentra-tion matrices or variance-covariance matrices depend upon three parameters(α the autoregressive parameter m the number of neighbors and ρ whichgoverns the rate weights decline with the order of the neighbors with the clos-est neighbor given the highest weighting the second closest given a lowerweighting and so forth) Three parameters should make this specificationsufficiently flexible for many purposes

Third one computes the log-determinants for a grid of autoregressive pa-rameters (prespecified by the routine as a default or specified by the user asan option) Determinant computations proceed faster for symmetric matri-ces You must choose the appropriate log-determinant routines for the typeof spatial weight matrix you have specified Computing the log-determinantsis the slower than estimation but only needs to be done when changing thespatial weight matrix For example one can use the same weight matrix andlog-determinant files when exploring transformations or specifications of the

26 INCLUDED EXAMPLES 13

dependent and independent variables (for the same observations)Fourth pick a statistical routine to run given the data matrices the spatial

weight matrix and the log-determinant vector One can choose among con-ditional autoregressions (CAR) simultaneous autoregressions (SAR) matrixexponential spatial specifications (MESS) mixed regressive spatially autore-gressive estimators (which include pure autoregressive models and spatiallylagged independent variable models as special cases) and OLS In additionone can explore multivariate spatiotemporal and multivariate estimationThese routines require little time to run One can change models weight-ings and transformations and reestimate in the vast majority of cases withoutrerunning the spatial weight matrix or log-determinant routines (you mayneed to add another simple Jacobian term when performing weighting ortransformations of the dependent variables) This aids interactive data explo-ration

Fifth these procedures provide a wealth of information Many of theseroutines yield the profile likelihood in the autoregressive parameter for eachsubmodel (corresponding to the deletion of individual variables or the spatialterm) All of the inference even for the OLS routine uses likelihood ratiostatistics in the form of signed root deviances This is just the square root oftwice the difference in likelihoods given the sign of the parameter estimateIt has a t-like interpretation (Chen and Jennrich (1996)) The use of signedroot deviances (SRDs) facilitates comparisons among different models

26 Included Examples

The Spatial Statistics Toolbox comes with many examples These are foundin the subdirectories under spatial_toolbox_2examples To run theexamples change the directory in Matlab into the many subdirectories thatillustrate individual routines Look at the documentation in each example di-rectory for more detail Almost all of the specific models have examples Inaddition the simulation routine examples serve as minor Monte Carlo stud-

14 CHAPTER 2 USING THE TOOLBOX

ies which also help verify the functioning of the estimators The examplesuse the 3107 observation dataset from the Pace and Barry (1997) GeographicalAnalysis article

27 Included Datasets

The spatial_toolbox_2datasets subdirectory contains subdirectorieswith individual data sets in Matlab file formats as well as their documentationThe data sets include example programs and output Note due to the manyimprovements incorporated into the Spatial Statistics Toolbox over time therunning times have greatly improved over those in the articles

Hopefully these data sets should provide a good starting point for ex-ploring applications of spatial statistics

28 Included Manuscripts

In the manuscript subdirectory we provide pdf versions of the GeographicalAnalysis 1997 and 2000 articles I would like to thank the publishers (OhioState Press and Elsevier) for having given us copyright permission to dis-tribute these works One can also go to wwwspatial-statisticscom to accesssome other articles (eg the Linear Algebra and its Applications article whichproposed the Monte Carlo log-determinant estimator)

Chapter 3

A Brief Selected Tour of theToolbox

The weight matrix specifies the dependence among observations One formof weight matrix (Delaunay) uses the notion of contiguity to specify depen-dence as depicted in Figure 31

15

16 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

Delaunay Nearest Neighbor Graph

Figure 31 Connections among Counties via Delaunay

Note the somewhat strange behavior of connections to outlying observa-tions in Figure 31 This arises to the geometric nature of contiguity Usingnearest neighbors based upon some metric can avoid this as shown in Fig-ure 32

17

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

8 Nearest Neighbor Graph

Figure 32 Connections among Counties via Eight Nearest Neighbors

Using only nearby observations implies that the weight matrix has mainzeros or is sparse as shown in Figure 33

18 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Original Nonminuszero Weight Pattern for Delaunay

Figure 33 Plot of Non-zeros of Delaunay Weight Matrix

This becomes even more apparent when reordering the observations asin Figure 34

19

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Permuted Nonminuszero Weight Pattern for Delaunay

Figure 34 Plot of Non-zeros of Permuted Delaunay Weight Matrix

Sparsity as well as finding an appropriate ordering are key in quicklycomputing the log-determinants used in maximum likelihood The toolboxhas functions for exact computation of the log-determinants (actually inter-polation of exact computations at various points) However users can selectapproximations as well which depend only on sparsity and not upon order-ings The quadratic Chebyshev is the fastest and most approximate technique(Figure 35)

20 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus4000

minus3500

minus3000

minus2500

minus2000

minus1500

minus1000

minus500

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Chebyshev approximation Taylor bounds

Taylor Lower minusChebyshev oTaylor Upper +Exact

Figure 35 Plot of Exact log-determinant with Chebyshev Approximationand Taylor Bounds

The Chebyshev approximation appears quite good for positive moderatevalues of the dependence parameter but could use improvement for materi-ally negative values of the spatial dependence parameter Fortunately suchnegative values seem rare in practice

The Monte Carlo log-determinant estimator is quite fast but more accu-rate (Figure 36)

21

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus500

minus450

minus400

minus350

minus300

minus250

minus200

minus150

minus100

minus50

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Monte Carlo approximation and confidence limits

Lower ConfidenceMonte CarloUpper ConfidenceExact

Figure 36 Plot of Exact log-determinant with Monte Carlo Approximationand Limits

To see the effects of exact versus approximate log-determinant computa-tions consider Tables 31 and 32 using the 3107 county election data Theestimated autoregressive parameter is only off by 001 from using the approx-imation The approximate method also uses likelihood dominance inferencewhich results in a lower bound to the signed root deviances As shown bythe tables the likelihood dominance SRDs are smaller in magnitude thanthe exact SRDs However they can still document statistical significance formany variables and thus can prove useful in many circumstances

22 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07784 -311227 00000Education 02696 93438 00000Home Ownership 04530 262661 00000Income 00071 00035 09972Intercept 05443 78773 00000Alpha 07250 327719 00000

Table 31 SAR Estimation Results using Chebyshev lndet approximationand likelihood dominance inference

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07806 -321748 00000Education 02746 119438 00000Home Ownership 04525 270073 00000Income 00047 02187 08269Intercept 05528 94908 00000Alpha 07150 348648 00000

Table 32 SAR Estimation results using exact lndet

23

Some of the routines not only yield the maximum of the likelihood func-tion but also profile likelihoods in the dependence parameter α by model asshown in Figure 37

0 01 02 03 04 05 06 07 08 09 1minus7000

minus6800

minus6600

minus6400

minus6200

minus6000

minus5800

minus5600Profile likelihoods vs α for global model and deleteminus1 submodels

Dependence parameter (α)

Pro

file

logminus

likel

ihoo

d

Global likelihoodVoting PopEducationHome OwnershipIncomeIntercept

Figure 37 SAR Profile Likelihoods by Model

The toolbox includes SAR CAR and MESS error models as well asMESS closest neighbor and MIX autoregressive models as shown in Ta-ble 33 and Table 34

24 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables b OLS b closest b MESS b Mix

Voting Pop -08464 -07489 -07693 -07298Education 05167 02899 01941 01818Home Ownership 04291 04457 04832 04580Income -01439 -00332 00423 00427Lag Voting Pop 00000 01878 04186 04616Lag Education 00000 00975 01205 00450Lag Home Ownership 00000 -01569 -03249 -03299Lag Income 00000 -01079 -01802 -01410Intercept 09814 07495 05636 04205α 00000 03352 14628 06550

Table 33 Estimates on Election Data

The closest neighbor approach is intermediate to a non-spatial approach(OLS) and a full spatial approach (MESS) or the approximate mixed routineNote the close agreement between MESS and the mixed routine Note OLSin this case uses the spatial averages of the basic independent variables asadditional independent variables

None of these operations take long for the election data

25

Variables b OLS b closest b MESS b Mix

Voting Pop -347211 -305020 -288899 -294018Education 308928 134922 76654 77169Home Ownership 230066 254083 268836 273515Income -72373 -15742 17478 18952Lag Voting Pop 00000 77600 107656 125507Lag Education 00000 45090 39785 15705Lag Home Ownership 00000 -94245 -111259 -121200Lag Income 00000 -51399 -55551 -46603Intercept 202782 160533 107480 84626α 00000 242762 315948 337097

Table 34 Signed Root Deviances using Election Data

26 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Operation Timings in seconds

OLS 00150Closest AR 00470MESS 00940Exact Log-det 08750Mix 00470Doubly Stochastic Scaling 02660

Table 35 Timing for operations to Election data (n=3107)

In addition to global models the toolbox has spatial autoregressive localestimation (SALE) The user chooses the bandwidth (subsample size) by ex-amining cross-validation error at the fringe observations as in Figure 39 oreach observation as in Figure 38

27

0 50 100 150 200 250 300 350 400 450 5000055

006

0065

007

0075Smoothed SALE Recursive Residuals of Fringe Observations

Number of Local Observations

Abs

olut

e E

rror

Figure 38 Plot of Fringe prediction error versus subsample size

28 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 50 100 150 200 250 300 350 400 450 500005

0052

0054

0056

0058

006

0062

0064Smoothed SALE Initial Holdout Residuals

Number of Local Observations

Abs

olut

e E

rror

Figure 39 Plot of prediction error on center of area versus subsample size

Usually there is spatial dependence even in small subsamples as shownin Figure 310

29

0 50 100 150 200 250 300 350 400 450 5000

01

02

03

04

05

06

07SALE Autoregressive Parameter Estimates

Number of Local Observations

Med

ian

Aut

oreg

ress

ive

Par

amet

er E

stim

ates

Figure 310 Spatial dependence parameter estimate versus subsample size

Local estimation leads to spatially varying parameter estimates such asthose shown in Figure 311

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 9: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

22 INSTALLATION 9

+---housing

+---space_time

+---documentation

+---examples

+---CAR

+---CAR_SIM

+---CHEBYSHEV

+---CHEBYSHEV_SEQUENCE

+---CLOSEST_NEIGHBOR

+---DELAUNAY2

+---DOUBLY

+---LNDET_INTERP

+---LNDET_INTERP_SEQUENCE

+---LNDET_MONTECARLO

+---MESS_AR

+---MESS_CAR

+---MESS_SIM

+---MIXED

+---MULTIVARIATE

+---NEAREST_NEIGHBORS

+---OLS

+---SALE

+---SAR

+---SAR_SIM

+---SPACE_TIME

+---fmex_code

+---functions

+---fmex_code

To see whether the installation has succeeded change the directory inMatlab to one of the supplied examples and type run x_ m For ex-

10 CHAPTER 2 USING THE TOOLBOX

ample go to the examplescar subdirectory and type run x_car2_ga1The use of in this context serves as a wildcard or placeholder for the in-tervening characters This should cause the example script x_car2_ga1m torun

The multidimensional weight matrix routines require the installation ofTSTOOL Go to wwwphysik3gwdgdetstool to find this useful packageand follow the instructions to install it in Matlab If you have Matlab 65 youcan easily add the relevant paths to the mex functions by going to the File

menu selecting Set Path under the applet selecting Add Folder and addthe paths so Matlab can find the functions On my machine I added

opentstooltstoolboxmexdll

opentstooltstoolboxmex

opentstooltstoolboxutils

tstoolopentstooltstoolboxgui

tstoolopentstooltstoolbox

23 Help and documentation

All the example scripts should follow the form x_ m (eg x_car2_ga1mx_sar2_ga1m) Functions follow the form f m (eg fsar2m fols2m)Matlab matrix files (which may include multiple matrices) have the form mat

If you wish to access the functions in other directories you can define apath to space and spacefunctions using the set path commandunder the file menu in Matlab 65 where refers to the drive where thespatial statistics toobox resides and provided space is the name of the direc-tory you chose This is the same procedure outlined in installing TSTOOLOn my machine I added

space

spacefunctions

24 KNOWN LIMITATIONS 11

As in previous versions of Matlab users can employ the addpath commandto add these directories to the search path

Both the functions and examples are well-documented internally Onecan use the help functions of Matlab in the usual fashion when the toolbox ison the search path or the functions are in the current directory If the toolboxis on the search path a user can type help space and see the functionscontained in the Spatial Statistics Toolbox 2 (provided space is the name ofthe directory you chose) If the toolbox is on the search path or the functionlies in the current directory a user can obtain help on the function by typinghelp function_name or doc function_name

The examples provide the best means of understanding the toolbox functions Eachsubdirectory under the examples subdirectory contains all the files neededfor that particular example (except the multivariate subdirectory which re-quires installation of the TSTOOL package) Users can substitute their datafor the example data and see how the functions perform for their problemNote a few of these functions take several minutes to run (notably the SALEexamples) Most however are quite fast

24 Known Limitations

None All are unknown

25 Tips on Using the Toolbox

Typical sessions with the toolbox proceed in four steps First import thedata into Matlab If the file is fixed-format or tab-delimited ASCII the com-mand load nameextension (whatever that nameextension may be) willload the filename contents into memory into a Matlab variable name Savingthis will convert it into a matlab file (eg save name name will save variablename into matrix name stored in namemat Failure to specify both names will

12 CHAPTER 2 USING THE TOOLBOX

result in saving all defined variables into one file The data would includethe dependent variable the independent variables and the locational coor-dinates For example suppose the user has the dependent variable in the textfile ytxt Issuing the command load ytxt results in a Matlab variable yIssuing the command save y y results in saving ymat in the directory

Second create a spatial weight matrix Users can choose weight matricesbased upon nearest neighbors (symmetric or asymmetric) multidimensionalsymmetric neighbors spatiotemporal neighbors (asymmetric) and Delaunaytriangles (symmetric) In almost all cases one must make sure each locationis unique One may need to add slight amounts of random noise to thelocational coordinates to meet this restriction (some of the latest versions ofMatlab do this automatically mdash do not dither the coordinates in this case)Note some estimators only use symmetric matrices You can specify thenumber of neighbors used and their relative weightings

Note the Delaunay spatial weight matrix leads to a concentration matrixor a variance-covariance matrix that depends upon only one-parameter (αthe autoregressive parameter) In contrast the nearest neighbor concentra-tion matrices or variance-covariance matrices depend upon three parameters(α the autoregressive parameter m the number of neighbors and ρ whichgoverns the rate weights decline with the order of the neighbors with the clos-est neighbor given the highest weighting the second closest given a lowerweighting and so forth) Three parameters should make this specificationsufficiently flexible for many purposes

Third one computes the log-determinants for a grid of autoregressive pa-rameters (prespecified by the routine as a default or specified by the user asan option) Determinant computations proceed faster for symmetric matri-ces You must choose the appropriate log-determinant routines for the typeof spatial weight matrix you have specified Computing the log-determinantsis the slower than estimation but only needs to be done when changing thespatial weight matrix For example one can use the same weight matrix andlog-determinant files when exploring transformations or specifications of the

26 INCLUDED EXAMPLES 13

dependent and independent variables (for the same observations)Fourth pick a statistical routine to run given the data matrices the spatial

weight matrix and the log-determinant vector One can choose among con-ditional autoregressions (CAR) simultaneous autoregressions (SAR) matrixexponential spatial specifications (MESS) mixed regressive spatially autore-gressive estimators (which include pure autoregressive models and spatiallylagged independent variable models as special cases) and OLS In additionone can explore multivariate spatiotemporal and multivariate estimationThese routines require little time to run One can change models weight-ings and transformations and reestimate in the vast majority of cases withoutrerunning the spatial weight matrix or log-determinant routines (you mayneed to add another simple Jacobian term when performing weighting ortransformations of the dependent variables) This aids interactive data explo-ration

Fifth these procedures provide a wealth of information Many of theseroutines yield the profile likelihood in the autoregressive parameter for eachsubmodel (corresponding to the deletion of individual variables or the spatialterm) All of the inference even for the OLS routine uses likelihood ratiostatistics in the form of signed root deviances This is just the square root oftwice the difference in likelihoods given the sign of the parameter estimateIt has a t-like interpretation (Chen and Jennrich (1996)) The use of signedroot deviances (SRDs) facilitates comparisons among different models

26 Included Examples

The Spatial Statistics Toolbox comes with many examples These are foundin the subdirectories under spatial_toolbox_2examples To run theexamples change the directory in Matlab into the many subdirectories thatillustrate individual routines Look at the documentation in each example di-rectory for more detail Almost all of the specific models have examples Inaddition the simulation routine examples serve as minor Monte Carlo stud-

14 CHAPTER 2 USING THE TOOLBOX

ies which also help verify the functioning of the estimators The examplesuse the 3107 observation dataset from the Pace and Barry (1997) GeographicalAnalysis article

27 Included Datasets

The spatial_toolbox_2datasets subdirectory contains subdirectorieswith individual data sets in Matlab file formats as well as their documentationThe data sets include example programs and output Note due to the manyimprovements incorporated into the Spatial Statistics Toolbox over time therunning times have greatly improved over those in the articles

Hopefully these data sets should provide a good starting point for ex-ploring applications of spatial statistics

28 Included Manuscripts

In the manuscript subdirectory we provide pdf versions of the GeographicalAnalysis 1997 and 2000 articles I would like to thank the publishers (OhioState Press and Elsevier) for having given us copyright permission to dis-tribute these works One can also go to wwwspatial-statisticscom to accesssome other articles (eg the Linear Algebra and its Applications article whichproposed the Monte Carlo log-determinant estimator)

Chapter 3

A Brief Selected Tour of theToolbox

The weight matrix specifies the dependence among observations One formof weight matrix (Delaunay) uses the notion of contiguity to specify depen-dence as depicted in Figure 31

15

16 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

Delaunay Nearest Neighbor Graph

Figure 31 Connections among Counties via Delaunay

Note the somewhat strange behavior of connections to outlying observa-tions in Figure 31 This arises to the geometric nature of contiguity Usingnearest neighbors based upon some metric can avoid this as shown in Fig-ure 32

17

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

8 Nearest Neighbor Graph

Figure 32 Connections among Counties via Eight Nearest Neighbors

Using only nearby observations implies that the weight matrix has mainzeros or is sparse as shown in Figure 33

18 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Original Nonminuszero Weight Pattern for Delaunay

Figure 33 Plot of Non-zeros of Delaunay Weight Matrix

This becomes even more apparent when reordering the observations asin Figure 34

19

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Permuted Nonminuszero Weight Pattern for Delaunay

Figure 34 Plot of Non-zeros of Permuted Delaunay Weight Matrix

Sparsity as well as finding an appropriate ordering are key in quicklycomputing the log-determinants used in maximum likelihood The toolboxhas functions for exact computation of the log-determinants (actually inter-polation of exact computations at various points) However users can selectapproximations as well which depend only on sparsity and not upon order-ings The quadratic Chebyshev is the fastest and most approximate technique(Figure 35)

20 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus4000

minus3500

minus3000

minus2500

minus2000

minus1500

minus1000

minus500

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Chebyshev approximation Taylor bounds

Taylor Lower minusChebyshev oTaylor Upper +Exact

Figure 35 Plot of Exact log-determinant with Chebyshev Approximationand Taylor Bounds

The Chebyshev approximation appears quite good for positive moderatevalues of the dependence parameter but could use improvement for materi-ally negative values of the spatial dependence parameter Fortunately suchnegative values seem rare in practice

The Monte Carlo log-determinant estimator is quite fast but more accu-rate (Figure 36)

21

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus500

minus450

minus400

minus350

minus300

minus250

minus200

minus150

minus100

minus50

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Monte Carlo approximation and confidence limits

Lower ConfidenceMonte CarloUpper ConfidenceExact

Figure 36 Plot of Exact log-determinant with Monte Carlo Approximationand Limits

To see the effects of exact versus approximate log-determinant computa-tions consider Tables 31 and 32 using the 3107 county election data Theestimated autoregressive parameter is only off by 001 from using the approx-imation The approximate method also uses likelihood dominance inferencewhich results in a lower bound to the signed root deviances As shown bythe tables the likelihood dominance SRDs are smaller in magnitude thanthe exact SRDs However they can still document statistical significance formany variables and thus can prove useful in many circumstances

22 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07784 -311227 00000Education 02696 93438 00000Home Ownership 04530 262661 00000Income 00071 00035 09972Intercept 05443 78773 00000Alpha 07250 327719 00000

Table 31 SAR Estimation Results using Chebyshev lndet approximationand likelihood dominance inference

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07806 -321748 00000Education 02746 119438 00000Home Ownership 04525 270073 00000Income 00047 02187 08269Intercept 05528 94908 00000Alpha 07150 348648 00000

Table 32 SAR Estimation results using exact lndet

23

Some of the routines not only yield the maximum of the likelihood func-tion but also profile likelihoods in the dependence parameter α by model asshown in Figure 37

0 01 02 03 04 05 06 07 08 09 1minus7000

minus6800

minus6600

minus6400

minus6200

minus6000

minus5800

minus5600Profile likelihoods vs α for global model and deleteminus1 submodels

Dependence parameter (α)

Pro

file

logminus

likel

ihoo

d

Global likelihoodVoting PopEducationHome OwnershipIncomeIntercept

Figure 37 SAR Profile Likelihoods by Model

The toolbox includes SAR CAR and MESS error models as well asMESS closest neighbor and MIX autoregressive models as shown in Ta-ble 33 and Table 34

24 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables b OLS b closest b MESS b Mix

Voting Pop -08464 -07489 -07693 -07298Education 05167 02899 01941 01818Home Ownership 04291 04457 04832 04580Income -01439 -00332 00423 00427Lag Voting Pop 00000 01878 04186 04616Lag Education 00000 00975 01205 00450Lag Home Ownership 00000 -01569 -03249 -03299Lag Income 00000 -01079 -01802 -01410Intercept 09814 07495 05636 04205α 00000 03352 14628 06550

Table 33 Estimates on Election Data

The closest neighbor approach is intermediate to a non-spatial approach(OLS) and a full spatial approach (MESS) or the approximate mixed routineNote the close agreement between MESS and the mixed routine Note OLSin this case uses the spatial averages of the basic independent variables asadditional independent variables

None of these operations take long for the election data

25

Variables b OLS b closest b MESS b Mix

Voting Pop -347211 -305020 -288899 -294018Education 308928 134922 76654 77169Home Ownership 230066 254083 268836 273515Income -72373 -15742 17478 18952Lag Voting Pop 00000 77600 107656 125507Lag Education 00000 45090 39785 15705Lag Home Ownership 00000 -94245 -111259 -121200Lag Income 00000 -51399 -55551 -46603Intercept 202782 160533 107480 84626α 00000 242762 315948 337097

Table 34 Signed Root Deviances using Election Data

26 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Operation Timings in seconds

OLS 00150Closest AR 00470MESS 00940Exact Log-det 08750Mix 00470Doubly Stochastic Scaling 02660

Table 35 Timing for operations to Election data (n=3107)

In addition to global models the toolbox has spatial autoregressive localestimation (SALE) The user chooses the bandwidth (subsample size) by ex-amining cross-validation error at the fringe observations as in Figure 39 oreach observation as in Figure 38

27

0 50 100 150 200 250 300 350 400 450 5000055

006

0065

007

0075Smoothed SALE Recursive Residuals of Fringe Observations

Number of Local Observations

Abs

olut

e E

rror

Figure 38 Plot of Fringe prediction error versus subsample size

28 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 50 100 150 200 250 300 350 400 450 500005

0052

0054

0056

0058

006

0062

0064Smoothed SALE Initial Holdout Residuals

Number of Local Observations

Abs

olut

e E

rror

Figure 39 Plot of prediction error on center of area versus subsample size

Usually there is spatial dependence even in small subsamples as shownin Figure 310

29

0 50 100 150 200 250 300 350 400 450 5000

01

02

03

04

05

06

07SALE Autoregressive Parameter Estimates

Number of Local Observations

Med

ian

Aut

oreg

ress

ive

Par

amet

er E

stim

ates

Figure 310 Spatial dependence parameter estimate versus subsample size

Local estimation leads to spatially varying parameter estimates such asthose shown in Figure 311

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 10: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

10 CHAPTER 2 USING THE TOOLBOX

ample go to the examplescar subdirectory and type run x_car2_ga1The use of in this context serves as a wildcard or placeholder for the in-tervening characters This should cause the example script x_car2_ga1m torun

The multidimensional weight matrix routines require the installation ofTSTOOL Go to wwwphysik3gwdgdetstool to find this useful packageand follow the instructions to install it in Matlab If you have Matlab 65 youcan easily add the relevant paths to the mex functions by going to the File

menu selecting Set Path under the applet selecting Add Folder and addthe paths so Matlab can find the functions On my machine I added

opentstooltstoolboxmexdll

opentstooltstoolboxmex

opentstooltstoolboxutils

tstoolopentstooltstoolboxgui

tstoolopentstooltstoolbox

23 Help and documentation

All the example scripts should follow the form x_ m (eg x_car2_ga1mx_sar2_ga1m) Functions follow the form f m (eg fsar2m fols2m)Matlab matrix files (which may include multiple matrices) have the form mat

If you wish to access the functions in other directories you can define apath to space and spacefunctions using the set path commandunder the file menu in Matlab 65 where refers to the drive where thespatial statistics toobox resides and provided space is the name of the direc-tory you chose This is the same procedure outlined in installing TSTOOLOn my machine I added

space

spacefunctions

24 KNOWN LIMITATIONS 11

As in previous versions of Matlab users can employ the addpath commandto add these directories to the search path

Both the functions and examples are well-documented internally Onecan use the help functions of Matlab in the usual fashion when the toolbox ison the search path or the functions are in the current directory If the toolboxis on the search path a user can type help space and see the functionscontained in the Spatial Statistics Toolbox 2 (provided space is the name ofthe directory you chose) If the toolbox is on the search path or the functionlies in the current directory a user can obtain help on the function by typinghelp function_name or doc function_name

The examples provide the best means of understanding the toolbox functions Eachsubdirectory under the examples subdirectory contains all the files neededfor that particular example (except the multivariate subdirectory which re-quires installation of the TSTOOL package) Users can substitute their datafor the example data and see how the functions perform for their problemNote a few of these functions take several minutes to run (notably the SALEexamples) Most however are quite fast

24 Known Limitations

None All are unknown

25 Tips on Using the Toolbox

Typical sessions with the toolbox proceed in four steps First import thedata into Matlab If the file is fixed-format or tab-delimited ASCII the com-mand load nameextension (whatever that nameextension may be) willload the filename contents into memory into a Matlab variable name Savingthis will convert it into a matlab file (eg save name name will save variablename into matrix name stored in namemat Failure to specify both names will

12 CHAPTER 2 USING THE TOOLBOX

result in saving all defined variables into one file The data would includethe dependent variable the independent variables and the locational coor-dinates For example suppose the user has the dependent variable in the textfile ytxt Issuing the command load ytxt results in a Matlab variable yIssuing the command save y y results in saving ymat in the directory

Second create a spatial weight matrix Users can choose weight matricesbased upon nearest neighbors (symmetric or asymmetric) multidimensionalsymmetric neighbors spatiotemporal neighbors (asymmetric) and Delaunaytriangles (symmetric) In almost all cases one must make sure each locationis unique One may need to add slight amounts of random noise to thelocational coordinates to meet this restriction (some of the latest versions ofMatlab do this automatically mdash do not dither the coordinates in this case)Note some estimators only use symmetric matrices You can specify thenumber of neighbors used and their relative weightings

Note the Delaunay spatial weight matrix leads to a concentration matrixor a variance-covariance matrix that depends upon only one-parameter (αthe autoregressive parameter) In contrast the nearest neighbor concentra-tion matrices or variance-covariance matrices depend upon three parameters(α the autoregressive parameter m the number of neighbors and ρ whichgoverns the rate weights decline with the order of the neighbors with the clos-est neighbor given the highest weighting the second closest given a lowerweighting and so forth) Three parameters should make this specificationsufficiently flexible for many purposes

Third one computes the log-determinants for a grid of autoregressive pa-rameters (prespecified by the routine as a default or specified by the user asan option) Determinant computations proceed faster for symmetric matri-ces You must choose the appropriate log-determinant routines for the typeof spatial weight matrix you have specified Computing the log-determinantsis the slower than estimation but only needs to be done when changing thespatial weight matrix For example one can use the same weight matrix andlog-determinant files when exploring transformations or specifications of the

26 INCLUDED EXAMPLES 13

dependent and independent variables (for the same observations)Fourth pick a statistical routine to run given the data matrices the spatial

weight matrix and the log-determinant vector One can choose among con-ditional autoregressions (CAR) simultaneous autoregressions (SAR) matrixexponential spatial specifications (MESS) mixed regressive spatially autore-gressive estimators (which include pure autoregressive models and spatiallylagged independent variable models as special cases) and OLS In additionone can explore multivariate spatiotemporal and multivariate estimationThese routines require little time to run One can change models weight-ings and transformations and reestimate in the vast majority of cases withoutrerunning the spatial weight matrix or log-determinant routines (you mayneed to add another simple Jacobian term when performing weighting ortransformations of the dependent variables) This aids interactive data explo-ration

Fifth these procedures provide a wealth of information Many of theseroutines yield the profile likelihood in the autoregressive parameter for eachsubmodel (corresponding to the deletion of individual variables or the spatialterm) All of the inference even for the OLS routine uses likelihood ratiostatistics in the form of signed root deviances This is just the square root oftwice the difference in likelihoods given the sign of the parameter estimateIt has a t-like interpretation (Chen and Jennrich (1996)) The use of signedroot deviances (SRDs) facilitates comparisons among different models

26 Included Examples

The Spatial Statistics Toolbox comes with many examples These are foundin the subdirectories under spatial_toolbox_2examples To run theexamples change the directory in Matlab into the many subdirectories thatillustrate individual routines Look at the documentation in each example di-rectory for more detail Almost all of the specific models have examples Inaddition the simulation routine examples serve as minor Monte Carlo stud-

14 CHAPTER 2 USING THE TOOLBOX

ies which also help verify the functioning of the estimators The examplesuse the 3107 observation dataset from the Pace and Barry (1997) GeographicalAnalysis article

27 Included Datasets

The spatial_toolbox_2datasets subdirectory contains subdirectorieswith individual data sets in Matlab file formats as well as their documentationThe data sets include example programs and output Note due to the manyimprovements incorporated into the Spatial Statistics Toolbox over time therunning times have greatly improved over those in the articles

Hopefully these data sets should provide a good starting point for ex-ploring applications of spatial statistics

28 Included Manuscripts

In the manuscript subdirectory we provide pdf versions of the GeographicalAnalysis 1997 and 2000 articles I would like to thank the publishers (OhioState Press and Elsevier) for having given us copyright permission to dis-tribute these works One can also go to wwwspatial-statisticscom to accesssome other articles (eg the Linear Algebra and its Applications article whichproposed the Monte Carlo log-determinant estimator)

Chapter 3

A Brief Selected Tour of theToolbox

The weight matrix specifies the dependence among observations One formof weight matrix (Delaunay) uses the notion of contiguity to specify depen-dence as depicted in Figure 31

15

16 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

Delaunay Nearest Neighbor Graph

Figure 31 Connections among Counties via Delaunay

Note the somewhat strange behavior of connections to outlying observa-tions in Figure 31 This arises to the geometric nature of contiguity Usingnearest neighbors based upon some metric can avoid this as shown in Fig-ure 32

17

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

8 Nearest Neighbor Graph

Figure 32 Connections among Counties via Eight Nearest Neighbors

Using only nearby observations implies that the weight matrix has mainzeros or is sparse as shown in Figure 33

18 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Original Nonminuszero Weight Pattern for Delaunay

Figure 33 Plot of Non-zeros of Delaunay Weight Matrix

This becomes even more apparent when reordering the observations asin Figure 34

19

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Permuted Nonminuszero Weight Pattern for Delaunay

Figure 34 Plot of Non-zeros of Permuted Delaunay Weight Matrix

Sparsity as well as finding an appropriate ordering are key in quicklycomputing the log-determinants used in maximum likelihood The toolboxhas functions for exact computation of the log-determinants (actually inter-polation of exact computations at various points) However users can selectapproximations as well which depend only on sparsity and not upon order-ings The quadratic Chebyshev is the fastest and most approximate technique(Figure 35)

20 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus4000

minus3500

minus3000

minus2500

minus2000

minus1500

minus1000

minus500

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Chebyshev approximation Taylor bounds

Taylor Lower minusChebyshev oTaylor Upper +Exact

Figure 35 Plot of Exact log-determinant with Chebyshev Approximationand Taylor Bounds

The Chebyshev approximation appears quite good for positive moderatevalues of the dependence parameter but could use improvement for materi-ally negative values of the spatial dependence parameter Fortunately suchnegative values seem rare in practice

The Monte Carlo log-determinant estimator is quite fast but more accu-rate (Figure 36)

21

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus500

minus450

minus400

minus350

minus300

minus250

minus200

minus150

minus100

minus50

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Monte Carlo approximation and confidence limits

Lower ConfidenceMonte CarloUpper ConfidenceExact

Figure 36 Plot of Exact log-determinant with Monte Carlo Approximationand Limits

To see the effects of exact versus approximate log-determinant computa-tions consider Tables 31 and 32 using the 3107 county election data Theestimated autoregressive parameter is only off by 001 from using the approx-imation The approximate method also uses likelihood dominance inferencewhich results in a lower bound to the signed root deviances As shown bythe tables the likelihood dominance SRDs are smaller in magnitude thanthe exact SRDs However they can still document statistical significance formany variables and thus can prove useful in many circumstances

22 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07784 -311227 00000Education 02696 93438 00000Home Ownership 04530 262661 00000Income 00071 00035 09972Intercept 05443 78773 00000Alpha 07250 327719 00000

Table 31 SAR Estimation Results using Chebyshev lndet approximationand likelihood dominance inference

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07806 -321748 00000Education 02746 119438 00000Home Ownership 04525 270073 00000Income 00047 02187 08269Intercept 05528 94908 00000Alpha 07150 348648 00000

Table 32 SAR Estimation results using exact lndet

23

Some of the routines not only yield the maximum of the likelihood func-tion but also profile likelihoods in the dependence parameter α by model asshown in Figure 37

0 01 02 03 04 05 06 07 08 09 1minus7000

minus6800

minus6600

minus6400

minus6200

minus6000

minus5800

minus5600Profile likelihoods vs α for global model and deleteminus1 submodels

Dependence parameter (α)

Pro

file

logminus

likel

ihoo

d

Global likelihoodVoting PopEducationHome OwnershipIncomeIntercept

Figure 37 SAR Profile Likelihoods by Model

The toolbox includes SAR CAR and MESS error models as well asMESS closest neighbor and MIX autoregressive models as shown in Ta-ble 33 and Table 34

24 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables b OLS b closest b MESS b Mix

Voting Pop -08464 -07489 -07693 -07298Education 05167 02899 01941 01818Home Ownership 04291 04457 04832 04580Income -01439 -00332 00423 00427Lag Voting Pop 00000 01878 04186 04616Lag Education 00000 00975 01205 00450Lag Home Ownership 00000 -01569 -03249 -03299Lag Income 00000 -01079 -01802 -01410Intercept 09814 07495 05636 04205α 00000 03352 14628 06550

Table 33 Estimates on Election Data

The closest neighbor approach is intermediate to a non-spatial approach(OLS) and a full spatial approach (MESS) or the approximate mixed routineNote the close agreement between MESS and the mixed routine Note OLSin this case uses the spatial averages of the basic independent variables asadditional independent variables

None of these operations take long for the election data

25

Variables b OLS b closest b MESS b Mix

Voting Pop -347211 -305020 -288899 -294018Education 308928 134922 76654 77169Home Ownership 230066 254083 268836 273515Income -72373 -15742 17478 18952Lag Voting Pop 00000 77600 107656 125507Lag Education 00000 45090 39785 15705Lag Home Ownership 00000 -94245 -111259 -121200Lag Income 00000 -51399 -55551 -46603Intercept 202782 160533 107480 84626α 00000 242762 315948 337097

Table 34 Signed Root Deviances using Election Data

26 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Operation Timings in seconds

OLS 00150Closest AR 00470MESS 00940Exact Log-det 08750Mix 00470Doubly Stochastic Scaling 02660

Table 35 Timing for operations to Election data (n=3107)

In addition to global models the toolbox has spatial autoregressive localestimation (SALE) The user chooses the bandwidth (subsample size) by ex-amining cross-validation error at the fringe observations as in Figure 39 oreach observation as in Figure 38

27

0 50 100 150 200 250 300 350 400 450 5000055

006

0065

007

0075Smoothed SALE Recursive Residuals of Fringe Observations

Number of Local Observations

Abs

olut

e E

rror

Figure 38 Plot of Fringe prediction error versus subsample size

28 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 50 100 150 200 250 300 350 400 450 500005

0052

0054

0056

0058

006

0062

0064Smoothed SALE Initial Holdout Residuals

Number of Local Observations

Abs

olut

e E

rror

Figure 39 Plot of prediction error on center of area versus subsample size

Usually there is spatial dependence even in small subsamples as shownin Figure 310

29

0 50 100 150 200 250 300 350 400 450 5000

01

02

03

04

05

06

07SALE Autoregressive Parameter Estimates

Number of Local Observations

Med

ian

Aut

oreg

ress

ive

Par

amet

er E

stim

ates

Figure 310 Spatial dependence parameter estimate versus subsample size

Local estimation leads to spatially varying parameter estimates such asthose shown in Figure 311

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 11: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

24 KNOWN LIMITATIONS 11

As in previous versions of Matlab users can employ the addpath commandto add these directories to the search path

Both the functions and examples are well-documented internally Onecan use the help functions of Matlab in the usual fashion when the toolbox ison the search path or the functions are in the current directory If the toolboxis on the search path a user can type help space and see the functionscontained in the Spatial Statistics Toolbox 2 (provided space is the name ofthe directory you chose) If the toolbox is on the search path or the functionlies in the current directory a user can obtain help on the function by typinghelp function_name or doc function_name

The examples provide the best means of understanding the toolbox functions Eachsubdirectory under the examples subdirectory contains all the files neededfor that particular example (except the multivariate subdirectory which re-quires installation of the TSTOOL package) Users can substitute their datafor the example data and see how the functions perform for their problemNote a few of these functions take several minutes to run (notably the SALEexamples) Most however are quite fast

24 Known Limitations

None All are unknown

25 Tips on Using the Toolbox

Typical sessions with the toolbox proceed in four steps First import thedata into Matlab If the file is fixed-format or tab-delimited ASCII the com-mand load nameextension (whatever that nameextension may be) willload the filename contents into memory into a Matlab variable name Savingthis will convert it into a matlab file (eg save name name will save variablename into matrix name stored in namemat Failure to specify both names will

12 CHAPTER 2 USING THE TOOLBOX

result in saving all defined variables into one file The data would includethe dependent variable the independent variables and the locational coor-dinates For example suppose the user has the dependent variable in the textfile ytxt Issuing the command load ytxt results in a Matlab variable yIssuing the command save y y results in saving ymat in the directory

Second create a spatial weight matrix Users can choose weight matricesbased upon nearest neighbors (symmetric or asymmetric) multidimensionalsymmetric neighbors spatiotemporal neighbors (asymmetric) and Delaunaytriangles (symmetric) In almost all cases one must make sure each locationis unique One may need to add slight amounts of random noise to thelocational coordinates to meet this restriction (some of the latest versions ofMatlab do this automatically mdash do not dither the coordinates in this case)Note some estimators only use symmetric matrices You can specify thenumber of neighbors used and their relative weightings

Note the Delaunay spatial weight matrix leads to a concentration matrixor a variance-covariance matrix that depends upon only one-parameter (αthe autoregressive parameter) In contrast the nearest neighbor concentra-tion matrices or variance-covariance matrices depend upon three parameters(α the autoregressive parameter m the number of neighbors and ρ whichgoverns the rate weights decline with the order of the neighbors with the clos-est neighbor given the highest weighting the second closest given a lowerweighting and so forth) Three parameters should make this specificationsufficiently flexible for many purposes

Third one computes the log-determinants for a grid of autoregressive pa-rameters (prespecified by the routine as a default or specified by the user asan option) Determinant computations proceed faster for symmetric matri-ces You must choose the appropriate log-determinant routines for the typeof spatial weight matrix you have specified Computing the log-determinantsis the slower than estimation but only needs to be done when changing thespatial weight matrix For example one can use the same weight matrix andlog-determinant files when exploring transformations or specifications of the

26 INCLUDED EXAMPLES 13

dependent and independent variables (for the same observations)Fourth pick a statistical routine to run given the data matrices the spatial

weight matrix and the log-determinant vector One can choose among con-ditional autoregressions (CAR) simultaneous autoregressions (SAR) matrixexponential spatial specifications (MESS) mixed regressive spatially autore-gressive estimators (which include pure autoregressive models and spatiallylagged independent variable models as special cases) and OLS In additionone can explore multivariate spatiotemporal and multivariate estimationThese routines require little time to run One can change models weight-ings and transformations and reestimate in the vast majority of cases withoutrerunning the spatial weight matrix or log-determinant routines (you mayneed to add another simple Jacobian term when performing weighting ortransformations of the dependent variables) This aids interactive data explo-ration

Fifth these procedures provide a wealth of information Many of theseroutines yield the profile likelihood in the autoregressive parameter for eachsubmodel (corresponding to the deletion of individual variables or the spatialterm) All of the inference even for the OLS routine uses likelihood ratiostatistics in the form of signed root deviances This is just the square root oftwice the difference in likelihoods given the sign of the parameter estimateIt has a t-like interpretation (Chen and Jennrich (1996)) The use of signedroot deviances (SRDs) facilitates comparisons among different models

26 Included Examples

The Spatial Statistics Toolbox comes with many examples These are foundin the subdirectories under spatial_toolbox_2examples To run theexamples change the directory in Matlab into the many subdirectories thatillustrate individual routines Look at the documentation in each example di-rectory for more detail Almost all of the specific models have examples Inaddition the simulation routine examples serve as minor Monte Carlo stud-

14 CHAPTER 2 USING THE TOOLBOX

ies which also help verify the functioning of the estimators The examplesuse the 3107 observation dataset from the Pace and Barry (1997) GeographicalAnalysis article

27 Included Datasets

The spatial_toolbox_2datasets subdirectory contains subdirectorieswith individual data sets in Matlab file formats as well as their documentationThe data sets include example programs and output Note due to the manyimprovements incorporated into the Spatial Statistics Toolbox over time therunning times have greatly improved over those in the articles

Hopefully these data sets should provide a good starting point for ex-ploring applications of spatial statistics

28 Included Manuscripts

In the manuscript subdirectory we provide pdf versions of the GeographicalAnalysis 1997 and 2000 articles I would like to thank the publishers (OhioState Press and Elsevier) for having given us copyright permission to dis-tribute these works One can also go to wwwspatial-statisticscom to accesssome other articles (eg the Linear Algebra and its Applications article whichproposed the Monte Carlo log-determinant estimator)

Chapter 3

A Brief Selected Tour of theToolbox

The weight matrix specifies the dependence among observations One formof weight matrix (Delaunay) uses the notion of contiguity to specify depen-dence as depicted in Figure 31

15

16 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

Delaunay Nearest Neighbor Graph

Figure 31 Connections among Counties via Delaunay

Note the somewhat strange behavior of connections to outlying observa-tions in Figure 31 This arises to the geometric nature of contiguity Usingnearest neighbors based upon some metric can avoid this as shown in Fig-ure 32

17

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

8 Nearest Neighbor Graph

Figure 32 Connections among Counties via Eight Nearest Neighbors

Using only nearby observations implies that the weight matrix has mainzeros or is sparse as shown in Figure 33

18 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Original Nonminuszero Weight Pattern for Delaunay

Figure 33 Plot of Non-zeros of Delaunay Weight Matrix

This becomes even more apparent when reordering the observations asin Figure 34

19

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Permuted Nonminuszero Weight Pattern for Delaunay

Figure 34 Plot of Non-zeros of Permuted Delaunay Weight Matrix

Sparsity as well as finding an appropriate ordering are key in quicklycomputing the log-determinants used in maximum likelihood The toolboxhas functions for exact computation of the log-determinants (actually inter-polation of exact computations at various points) However users can selectapproximations as well which depend only on sparsity and not upon order-ings The quadratic Chebyshev is the fastest and most approximate technique(Figure 35)

20 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus4000

minus3500

minus3000

minus2500

minus2000

minus1500

minus1000

minus500

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Chebyshev approximation Taylor bounds

Taylor Lower minusChebyshev oTaylor Upper +Exact

Figure 35 Plot of Exact log-determinant with Chebyshev Approximationand Taylor Bounds

The Chebyshev approximation appears quite good for positive moderatevalues of the dependence parameter but could use improvement for materi-ally negative values of the spatial dependence parameter Fortunately suchnegative values seem rare in practice

The Monte Carlo log-determinant estimator is quite fast but more accu-rate (Figure 36)

21

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus500

minus450

minus400

minus350

minus300

minus250

minus200

minus150

minus100

minus50

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Monte Carlo approximation and confidence limits

Lower ConfidenceMonte CarloUpper ConfidenceExact

Figure 36 Plot of Exact log-determinant with Monte Carlo Approximationand Limits

To see the effects of exact versus approximate log-determinant computa-tions consider Tables 31 and 32 using the 3107 county election data Theestimated autoregressive parameter is only off by 001 from using the approx-imation The approximate method also uses likelihood dominance inferencewhich results in a lower bound to the signed root deviances As shown bythe tables the likelihood dominance SRDs are smaller in magnitude thanthe exact SRDs However they can still document statistical significance formany variables and thus can prove useful in many circumstances

22 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07784 -311227 00000Education 02696 93438 00000Home Ownership 04530 262661 00000Income 00071 00035 09972Intercept 05443 78773 00000Alpha 07250 327719 00000

Table 31 SAR Estimation Results using Chebyshev lndet approximationand likelihood dominance inference

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07806 -321748 00000Education 02746 119438 00000Home Ownership 04525 270073 00000Income 00047 02187 08269Intercept 05528 94908 00000Alpha 07150 348648 00000

Table 32 SAR Estimation results using exact lndet

23

Some of the routines not only yield the maximum of the likelihood func-tion but also profile likelihoods in the dependence parameter α by model asshown in Figure 37

0 01 02 03 04 05 06 07 08 09 1minus7000

minus6800

minus6600

minus6400

minus6200

minus6000

minus5800

minus5600Profile likelihoods vs α for global model and deleteminus1 submodels

Dependence parameter (α)

Pro

file

logminus

likel

ihoo

d

Global likelihoodVoting PopEducationHome OwnershipIncomeIntercept

Figure 37 SAR Profile Likelihoods by Model

The toolbox includes SAR CAR and MESS error models as well asMESS closest neighbor and MIX autoregressive models as shown in Ta-ble 33 and Table 34

24 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables b OLS b closest b MESS b Mix

Voting Pop -08464 -07489 -07693 -07298Education 05167 02899 01941 01818Home Ownership 04291 04457 04832 04580Income -01439 -00332 00423 00427Lag Voting Pop 00000 01878 04186 04616Lag Education 00000 00975 01205 00450Lag Home Ownership 00000 -01569 -03249 -03299Lag Income 00000 -01079 -01802 -01410Intercept 09814 07495 05636 04205α 00000 03352 14628 06550

Table 33 Estimates on Election Data

The closest neighbor approach is intermediate to a non-spatial approach(OLS) and a full spatial approach (MESS) or the approximate mixed routineNote the close agreement between MESS and the mixed routine Note OLSin this case uses the spatial averages of the basic independent variables asadditional independent variables

None of these operations take long for the election data

25

Variables b OLS b closest b MESS b Mix

Voting Pop -347211 -305020 -288899 -294018Education 308928 134922 76654 77169Home Ownership 230066 254083 268836 273515Income -72373 -15742 17478 18952Lag Voting Pop 00000 77600 107656 125507Lag Education 00000 45090 39785 15705Lag Home Ownership 00000 -94245 -111259 -121200Lag Income 00000 -51399 -55551 -46603Intercept 202782 160533 107480 84626α 00000 242762 315948 337097

Table 34 Signed Root Deviances using Election Data

26 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Operation Timings in seconds

OLS 00150Closest AR 00470MESS 00940Exact Log-det 08750Mix 00470Doubly Stochastic Scaling 02660

Table 35 Timing for operations to Election data (n=3107)

In addition to global models the toolbox has spatial autoregressive localestimation (SALE) The user chooses the bandwidth (subsample size) by ex-amining cross-validation error at the fringe observations as in Figure 39 oreach observation as in Figure 38

27

0 50 100 150 200 250 300 350 400 450 5000055

006

0065

007

0075Smoothed SALE Recursive Residuals of Fringe Observations

Number of Local Observations

Abs

olut

e E

rror

Figure 38 Plot of Fringe prediction error versus subsample size

28 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 50 100 150 200 250 300 350 400 450 500005

0052

0054

0056

0058

006

0062

0064Smoothed SALE Initial Holdout Residuals

Number of Local Observations

Abs

olut

e E

rror

Figure 39 Plot of prediction error on center of area versus subsample size

Usually there is spatial dependence even in small subsamples as shownin Figure 310

29

0 50 100 150 200 250 300 350 400 450 5000

01

02

03

04

05

06

07SALE Autoregressive Parameter Estimates

Number of Local Observations

Med

ian

Aut

oreg

ress

ive

Par

amet

er E

stim

ates

Figure 310 Spatial dependence parameter estimate versus subsample size

Local estimation leads to spatially varying parameter estimates such asthose shown in Figure 311

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 12: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

12 CHAPTER 2 USING THE TOOLBOX

result in saving all defined variables into one file The data would includethe dependent variable the independent variables and the locational coor-dinates For example suppose the user has the dependent variable in the textfile ytxt Issuing the command load ytxt results in a Matlab variable yIssuing the command save y y results in saving ymat in the directory

Second create a spatial weight matrix Users can choose weight matricesbased upon nearest neighbors (symmetric or asymmetric) multidimensionalsymmetric neighbors spatiotemporal neighbors (asymmetric) and Delaunaytriangles (symmetric) In almost all cases one must make sure each locationis unique One may need to add slight amounts of random noise to thelocational coordinates to meet this restriction (some of the latest versions ofMatlab do this automatically mdash do not dither the coordinates in this case)Note some estimators only use symmetric matrices You can specify thenumber of neighbors used and their relative weightings

Note the Delaunay spatial weight matrix leads to a concentration matrixor a variance-covariance matrix that depends upon only one-parameter (αthe autoregressive parameter) In contrast the nearest neighbor concentra-tion matrices or variance-covariance matrices depend upon three parameters(α the autoregressive parameter m the number of neighbors and ρ whichgoverns the rate weights decline with the order of the neighbors with the clos-est neighbor given the highest weighting the second closest given a lowerweighting and so forth) Three parameters should make this specificationsufficiently flexible for many purposes

Third one computes the log-determinants for a grid of autoregressive pa-rameters (prespecified by the routine as a default or specified by the user asan option) Determinant computations proceed faster for symmetric matri-ces You must choose the appropriate log-determinant routines for the typeof spatial weight matrix you have specified Computing the log-determinantsis the slower than estimation but only needs to be done when changing thespatial weight matrix For example one can use the same weight matrix andlog-determinant files when exploring transformations or specifications of the

26 INCLUDED EXAMPLES 13

dependent and independent variables (for the same observations)Fourth pick a statistical routine to run given the data matrices the spatial

weight matrix and the log-determinant vector One can choose among con-ditional autoregressions (CAR) simultaneous autoregressions (SAR) matrixexponential spatial specifications (MESS) mixed regressive spatially autore-gressive estimators (which include pure autoregressive models and spatiallylagged independent variable models as special cases) and OLS In additionone can explore multivariate spatiotemporal and multivariate estimationThese routines require little time to run One can change models weight-ings and transformations and reestimate in the vast majority of cases withoutrerunning the spatial weight matrix or log-determinant routines (you mayneed to add another simple Jacobian term when performing weighting ortransformations of the dependent variables) This aids interactive data explo-ration

Fifth these procedures provide a wealth of information Many of theseroutines yield the profile likelihood in the autoregressive parameter for eachsubmodel (corresponding to the deletion of individual variables or the spatialterm) All of the inference even for the OLS routine uses likelihood ratiostatistics in the form of signed root deviances This is just the square root oftwice the difference in likelihoods given the sign of the parameter estimateIt has a t-like interpretation (Chen and Jennrich (1996)) The use of signedroot deviances (SRDs) facilitates comparisons among different models

26 Included Examples

The Spatial Statistics Toolbox comes with many examples These are foundin the subdirectories under spatial_toolbox_2examples To run theexamples change the directory in Matlab into the many subdirectories thatillustrate individual routines Look at the documentation in each example di-rectory for more detail Almost all of the specific models have examples Inaddition the simulation routine examples serve as minor Monte Carlo stud-

14 CHAPTER 2 USING THE TOOLBOX

ies which also help verify the functioning of the estimators The examplesuse the 3107 observation dataset from the Pace and Barry (1997) GeographicalAnalysis article

27 Included Datasets

The spatial_toolbox_2datasets subdirectory contains subdirectorieswith individual data sets in Matlab file formats as well as their documentationThe data sets include example programs and output Note due to the manyimprovements incorporated into the Spatial Statistics Toolbox over time therunning times have greatly improved over those in the articles

Hopefully these data sets should provide a good starting point for ex-ploring applications of spatial statistics

28 Included Manuscripts

In the manuscript subdirectory we provide pdf versions of the GeographicalAnalysis 1997 and 2000 articles I would like to thank the publishers (OhioState Press and Elsevier) for having given us copyright permission to dis-tribute these works One can also go to wwwspatial-statisticscom to accesssome other articles (eg the Linear Algebra and its Applications article whichproposed the Monte Carlo log-determinant estimator)

Chapter 3

A Brief Selected Tour of theToolbox

The weight matrix specifies the dependence among observations One formof weight matrix (Delaunay) uses the notion of contiguity to specify depen-dence as depicted in Figure 31

15

16 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

Delaunay Nearest Neighbor Graph

Figure 31 Connections among Counties via Delaunay

Note the somewhat strange behavior of connections to outlying observa-tions in Figure 31 This arises to the geometric nature of contiguity Usingnearest neighbors based upon some metric can avoid this as shown in Fig-ure 32

17

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

8 Nearest Neighbor Graph

Figure 32 Connections among Counties via Eight Nearest Neighbors

Using only nearby observations implies that the weight matrix has mainzeros or is sparse as shown in Figure 33

18 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Original Nonminuszero Weight Pattern for Delaunay

Figure 33 Plot of Non-zeros of Delaunay Weight Matrix

This becomes even more apparent when reordering the observations asin Figure 34

19

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Permuted Nonminuszero Weight Pattern for Delaunay

Figure 34 Plot of Non-zeros of Permuted Delaunay Weight Matrix

Sparsity as well as finding an appropriate ordering are key in quicklycomputing the log-determinants used in maximum likelihood The toolboxhas functions for exact computation of the log-determinants (actually inter-polation of exact computations at various points) However users can selectapproximations as well which depend only on sparsity and not upon order-ings The quadratic Chebyshev is the fastest and most approximate technique(Figure 35)

20 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus4000

minus3500

minus3000

minus2500

minus2000

minus1500

minus1000

minus500

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Chebyshev approximation Taylor bounds

Taylor Lower minusChebyshev oTaylor Upper +Exact

Figure 35 Plot of Exact log-determinant with Chebyshev Approximationand Taylor Bounds

The Chebyshev approximation appears quite good for positive moderatevalues of the dependence parameter but could use improvement for materi-ally negative values of the spatial dependence parameter Fortunately suchnegative values seem rare in practice

The Monte Carlo log-determinant estimator is quite fast but more accu-rate (Figure 36)

21

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus500

minus450

minus400

minus350

minus300

minus250

minus200

minus150

minus100

minus50

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Monte Carlo approximation and confidence limits

Lower ConfidenceMonte CarloUpper ConfidenceExact

Figure 36 Plot of Exact log-determinant with Monte Carlo Approximationand Limits

To see the effects of exact versus approximate log-determinant computa-tions consider Tables 31 and 32 using the 3107 county election data Theestimated autoregressive parameter is only off by 001 from using the approx-imation The approximate method also uses likelihood dominance inferencewhich results in a lower bound to the signed root deviances As shown bythe tables the likelihood dominance SRDs are smaller in magnitude thanthe exact SRDs However they can still document statistical significance formany variables and thus can prove useful in many circumstances

22 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07784 -311227 00000Education 02696 93438 00000Home Ownership 04530 262661 00000Income 00071 00035 09972Intercept 05443 78773 00000Alpha 07250 327719 00000

Table 31 SAR Estimation Results using Chebyshev lndet approximationand likelihood dominance inference

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07806 -321748 00000Education 02746 119438 00000Home Ownership 04525 270073 00000Income 00047 02187 08269Intercept 05528 94908 00000Alpha 07150 348648 00000

Table 32 SAR Estimation results using exact lndet

23

Some of the routines not only yield the maximum of the likelihood func-tion but also profile likelihoods in the dependence parameter α by model asshown in Figure 37

0 01 02 03 04 05 06 07 08 09 1minus7000

minus6800

minus6600

minus6400

minus6200

minus6000

minus5800

minus5600Profile likelihoods vs α for global model and deleteminus1 submodels

Dependence parameter (α)

Pro

file

logminus

likel

ihoo

d

Global likelihoodVoting PopEducationHome OwnershipIncomeIntercept

Figure 37 SAR Profile Likelihoods by Model

The toolbox includes SAR CAR and MESS error models as well asMESS closest neighbor and MIX autoregressive models as shown in Ta-ble 33 and Table 34

24 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables b OLS b closest b MESS b Mix

Voting Pop -08464 -07489 -07693 -07298Education 05167 02899 01941 01818Home Ownership 04291 04457 04832 04580Income -01439 -00332 00423 00427Lag Voting Pop 00000 01878 04186 04616Lag Education 00000 00975 01205 00450Lag Home Ownership 00000 -01569 -03249 -03299Lag Income 00000 -01079 -01802 -01410Intercept 09814 07495 05636 04205α 00000 03352 14628 06550

Table 33 Estimates on Election Data

The closest neighbor approach is intermediate to a non-spatial approach(OLS) and a full spatial approach (MESS) or the approximate mixed routineNote the close agreement between MESS and the mixed routine Note OLSin this case uses the spatial averages of the basic independent variables asadditional independent variables

None of these operations take long for the election data

25

Variables b OLS b closest b MESS b Mix

Voting Pop -347211 -305020 -288899 -294018Education 308928 134922 76654 77169Home Ownership 230066 254083 268836 273515Income -72373 -15742 17478 18952Lag Voting Pop 00000 77600 107656 125507Lag Education 00000 45090 39785 15705Lag Home Ownership 00000 -94245 -111259 -121200Lag Income 00000 -51399 -55551 -46603Intercept 202782 160533 107480 84626α 00000 242762 315948 337097

Table 34 Signed Root Deviances using Election Data

26 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Operation Timings in seconds

OLS 00150Closest AR 00470MESS 00940Exact Log-det 08750Mix 00470Doubly Stochastic Scaling 02660

Table 35 Timing for operations to Election data (n=3107)

In addition to global models the toolbox has spatial autoregressive localestimation (SALE) The user chooses the bandwidth (subsample size) by ex-amining cross-validation error at the fringe observations as in Figure 39 oreach observation as in Figure 38

27

0 50 100 150 200 250 300 350 400 450 5000055

006

0065

007

0075Smoothed SALE Recursive Residuals of Fringe Observations

Number of Local Observations

Abs

olut

e E

rror

Figure 38 Plot of Fringe prediction error versus subsample size

28 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 50 100 150 200 250 300 350 400 450 500005

0052

0054

0056

0058

006

0062

0064Smoothed SALE Initial Holdout Residuals

Number of Local Observations

Abs

olut

e E

rror

Figure 39 Plot of prediction error on center of area versus subsample size

Usually there is spatial dependence even in small subsamples as shownin Figure 310

29

0 50 100 150 200 250 300 350 400 450 5000

01

02

03

04

05

06

07SALE Autoregressive Parameter Estimates

Number of Local Observations

Med

ian

Aut

oreg

ress

ive

Par

amet

er E

stim

ates

Figure 310 Spatial dependence parameter estimate versus subsample size

Local estimation leads to spatially varying parameter estimates such asthose shown in Figure 311

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 13: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

26 INCLUDED EXAMPLES 13

dependent and independent variables (for the same observations)Fourth pick a statistical routine to run given the data matrices the spatial

weight matrix and the log-determinant vector One can choose among con-ditional autoregressions (CAR) simultaneous autoregressions (SAR) matrixexponential spatial specifications (MESS) mixed regressive spatially autore-gressive estimators (which include pure autoregressive models and spatiallylagged independent variable models as special cases) and OLS In additionone can explore multivariate spatiotemporal and multivariate estimationThese routines require little time to run One can change models weight-ings and transformations and reestimate in the vast majority of cases withoutrerunning the spatial weight matrix or log-determinant routines (you mayneed to add another simple Jacobian term when performing weighting ortransformations of the dependent variables) This aids interactive data explo-ration

Fifth these procedures provide a wealth of information Many of theseroutines yield the profile likelihood in the autoregressive parameter for eachsubmodel (corresponding to the deletion of individual variables or the spatialterm) All of the inference even for the OLS routine uses likelihood ratiostatistics in the form of signed root deviances This is just the square root oftwice the difference in likelihoods given the sign of the parameter estimateIt has a t-like interpretation (Chen and Jennrich (1996)) The use of signedroot deviances (SRDs) facilitates comparisons among different models

26 Included Examples

The Spatial Statistics Toolbox comes with many examples These are foundin the subdirectories under spatial_toolbox_2examples To run theexamples change the directory in Matlab into the many subdirectories thatillustrate individual routines Look at the documentation in each example di-rectory for more detail Almost all of the specific models have examples Inaddition the simulation routine examples serve as minor Monte Carlo stud-

14 CHAPTER 2 USING THE TOOLBOX

ies which also help verify the functioning of the estimators The examplesuse the 3107 observation dataset from the Pace and Barry (1997) GeographicalAnalysis article

27 Included Datasets

The spatial_toolbox_2datasets subdirectory contains subdirectorieswith individual data sets in Matlab file formats as well as their documentationThe data sets include example programs and output Note due to the manyimprovements incorporated into the Spatial Statistics Toolbox over time therunning times have greatly improved over those in the articles

Hopefully these data sets should provide a good starting point for ex-ploring applications of spatial statistics

28 Included Manuscripts

In the manuscript subdirectory we provide pdf versions of the GeographicalAnalysis 1997 and 2000 articles I would like to thank the publishers (OhioState Press and Elsevier) for having given us copyright permission to dis-tribute these works One can also go to wwwspatial-statisticscom to accesssome other articles (eg the Linear Algebra and its Applications article whichproposed the Monte Carlo log-determinant estimator)

Chapter 3

A Brief Selected Tour of theToolbox

The weight matrix specifies the dependence among observations One formof weight matrix (Delaunay) uses the notion of contiguity to specify depen-dence as depicted in Figure 31

15

16 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

Delaunay Nearest Neighbor Graph

Figure 31 Connections among Counties via Delaunay

Note the somewhat strange behavior of connections to outlying observa-tions in Figure 31 This arises to the geometric nature of contiguity Usingnearest neighbors based upon some metric can avoid this as shown in Fig-ure 32

17

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

8 Nearest Neighbor Graph

Figure 32 Connections among Counties via Eight Nearest Neighbors

Using only nearby observations implies that the weight matrix has mainzeros or is sparse as shown in Figure 33

18 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Original Nonminuszero Weight Pattern for Delaunay

Figure 33 Plot of Non-zeros of Delaunay Weight Matrix

This becomes even more apparent when reordering the observations asin Figure 34

19

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Permuted Nonminuszero Weight Pattern for Delaunay

Figure 34 Plot of Non-zeros of Permuted Delaunay Weight Matrix

Sparsity as well as finding an appropriate ordering are key in quicklycomputing the log-determinants used in maximum likelihood The toolboxhas functions for exact computation of the log-determinants (actually inter-polation of exact computations at various points) However users can selectapproximations as well which depend only on sparsity and not upon order-ings The quadratic Chebyshev is the fastest and most approximate technique(Figure 35)

20 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus4000

minus3500

minus3000

minus2500

minus2000

minus1500

minus1000

minus500

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Chebyshev approximation Taylor bounds

Taylor Lower minusChebyshev oTaylor Upper +Exact

Figure 35 Plot of Exact log-determinant with Chebyshev Approximationand Taylor Bounds

The Chebyshev approximation appears quite good for positive moderatevalues of the dependence parameter but could use improvement for materi-ally negative values of the spatial dependence parameter Fortunately suchnegative values seem rare in practice

The Monte Carlo log-determinant estimator is quite fast but more accu-rate (Figure 36)

21

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus500

minus450

minus400

minus350

minus300

minus250

minus200

minus150

minus100

minus50

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Monte Carlo approximation and confidence limits

Lower ConfidenceMonte CarloUpper ConfidenceExact

Figure 36 Plot of Exact log-determinant with Monte Carlo Approximationand Limits

To see the effects of exact versus approximate log-determinant computa-tions consider Tables 31 and 32 using the 3107 county election data Theestimated autoregressive parameter is only off by 001 from using the approx-imation The approximate method also uses likelihood dominance inferencewhich results in a lower bound to the signed root deviances As shown bythe tables the likelihood dominance SRDs are smaller in magnitude thanthe exact SRDs However they can still document statistical significance formany variables and thus can prove useful in many circumstances

22 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07784 -311227 00000Education 02696 93438 00000Home Ownership 04530 262661 00000Income 00071 00035 09972Intercept 05443 78773 00000Alpha 07250 327719 00000

Table 31 SAR Estimation Results using Chebyshev lndet approximationand likelihood dominance inference

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07806 -321748 00000Education 02746 119438 00000Home Ownership 04525 270073 00000Income 00047 02187 08269Intercept 05528 94908 00000Alpha 07150 348648 00000

Table 32 SAR Estimation results using exact lndet

23

Some of the routines not only yield the maximum of the likelihood func-tion but also profile likelihoods in the dependence parameter α by model asshown in Figure 37

0 01 02 03 04 05 06 07 08 09 1minus7000

minus6800

minus6600

minus6400

minus6200

minus6000

minus5800

minus5600Profile likelihoods vs α for global model and deleteminus1 submodels

Dependence parameter (α)

Pro

file

logminus

likel

ihoo

d

Global likelihoodVoting PopEducationHome OwnershipIncomeIntercept

Figure 37 SAR Profile Likelihoods by Model

The toolbox includes SAR CAR and MESS error models as well asMESS closest neighbor and MIX autoregressive models as shown in Ta-ble 33 and Table 34

24 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables b OLS b closest b MESS b Mix

Voting Pop -08464 -07489 -07693 -07298Education 05167 02899 01941 01818Home Ownership 04291 04457 04832 04580Income -01439 -00332 00423 00427Lag Voting Pop 00000 01878 04186 04616Lag Education 00000 00975 01205 00450Lag Home Ownership 00000 -01569 -03249 -03299Lag Income 00000 -01079 -01802 -01410Intercept 09814 07495 05636 04205α 00000 03352 14628 06550

Table 33 Estimates on Election Data

The closest neighbor approach is intermediate to a non-spatial approach(OLS) and a full spatial approach (MESS) or the approximate mixed routineNote the close agreement between MESS and the mixed routine Note OLSin this case uses the spatial averages of the basic independent variables asadditional independent variables

None of these operations take long for the election data

25

Variables b OLS b closest b MESS b Mix

Voting Pop -347211 -305020 -288899 -294018Education 308928 134922 76654 77169Home Ownership 230066 254083 268836 273515Income -72373 -15742 17478 18952Lag Voting Pop 00000 77600 107656 125507Lag Education 00000 45090 39785 15705Lag Home Ownership 00000 -94245 -111259 -121200Lag Income 00000 -51399 -55551 -46603Intercept 202782 160533 107480 84626α 00000 242762 315948 337097

Table 34 Signed Root Deviances using Election Data

26 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Operation Timings in seconds

OLS 00150Closest AR 00470MESS 00940Exact Log-det 08750Mix 00470Doubly Stochastic Scaling 02660

Table 35 Timing for operations to Election data (n=3107)

In addition to global models the toolbox has spatial autoregressive localestimation (SALE) The user chooses the bandwidth (subsample size) by ex-amining cross-validation error at the fringe observations as in Figure 39 oreach observation as in Figure 38

27

0 50 100 150 200 250 300 350 400 450 5000055

006

0065

007

0075Smoothed SALE Recursive Residuals of Fringe Observations

Number of Local Observations

Abs

olut

e E

rror

Figure 38 Plot of Fringe prediction error versus subsample size

28 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 50 100 150 200 250 300 350 400 450 500005

0052

0054

0056

0058

006

0062

0064Smoothed SALE Initial Holdout Residuals

Number of Local Observations

Abs

olut

e E

rror

Figure 39 Plot of prediction error on center of area versus subsample size

Usually there is spatial dependence even in small subsamples as shownin Figure 310

29

0 50 100 150 200 250 300 350 400 450 5000

01

02

03

04

05

06

07SALE Autoregressive Parameter Estimates

Number of Local Observations

Med

ian

Aut

oreg

ress

ive

Par

amet

er E

stim

ates

Figure 310 Spatial dependence parameter estimate versus subsample size

Local estimation leads to spatially varying parameter estimates such asthose shown in Figure 311

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 14: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

14 CHAPTER 2 USING THE TOOLBOX

ies which also help verify the functioning of the estimators The examplesuse the 3107 observation dataset from the Pace and Barry (1997) GeographicalAnalysis article

27 Included Datasets

The spatial_toolbox_2datasets subdirectory contains subdirectorieswith individual data sets in Matlab file formats as well as their documentationThe data sets include example programs and output Note due to the manyimprovements incorporated into the Spatial Statistics Toolbox over time therunning times have greatly improved over those in the articles

Hopefully these data sets should provide a good starting point for ex-ploring applications of spatial statistics

28 Included Manuscripts

In the manuscript subdirectory we provide pdf versions of the GeographicalAnalysis 1997 and 2000 articles I would like to thank the publishers (OhioState Press and Elsevier) for having given us copyright permission to dis-tribute these works One can also go to wwwspatial-statisticscom to accesssome other articles (eg the Linear Algebra and its Applications article whichproposed the Monte Carlo log-determinant estimator)

Chapter 3

A Brief Selected Tour of theToolbox

The weight matrix specifies the dependence among observations One formof weight matrix (Delaunay) uses the notion of contiguity to specify depen-dence as depicted in Figure 31

15

16 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

Delaunay Nearest Neighbor Graph

Figure 31 Connections among Counties via Delaunay

Note the somewhat strange behavior of connections to outlying observa-tions in Figure 31 This arises to the geometric nature of contiguity Usingnearest neighbors based upon some metric can avoid this as shown in Fig-ure 32

17

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

8 Nearest Neighbor Graph

Figure 32 Connections among Counties via Eight Nearest Neighbors

Using only nearby observations implies that the weight matrix has mainzeros or is sparse as shown in Figure 33

18 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Original Nonminuszero Weight Pattern for Delaunay

Figure 33 Plot of Non-zeros of Delaunay Weight Matrix

This becomes even more apparent when reordering the observations asin Figure 34

19

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Permuted Nonminuszero Weight Pattern for Delaunay

Figure 34 Plot of Non-zeros of Permuted Delaunay Weight Matrix

Sparsity as well as finding an appropriate ordering are key in quicklycomputing the log-determinants used in maximum likelihood The toolboxhas functions for exact computation of the log-determinants (actually inter-polation of exact computations at various points) However users can selectapproximations as well which depend only on sparsity and not upon order-ings The quadratic Chebyshev is the fastest and most approximate technique(Figure 35)

20 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus4000

minus3500

minus3000

minus2500

minus2000

minus1500

minus1000

minus500

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Chebyshev approximation Taylor bounds

Taylor Lower minusChebyshev oTaylor Upper +Exact

Figure 35 Plot of Exact log-determinant with Chebyshev Approximationand Taylor Bounds

The Chebyshev approximation appears quite good for positive moderatevalues of the dependence parameter but could use improvement for materi-ally negative values of the spatial dependence parameter Fortunately suchnegative values seem rare in practice

The Monte Carlo log-determinant estimator is quite fast but more accu-rate (Figure 36)

21

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus500

minus450

minus400

minus350

minus300

minus250

minus200

minus150

minus100

minus50

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Monte Carlo approximation and confidence limits

Lower ConfidenceMonte CarloUpper ConfidenceExact

Figure 36 Plot of Exact log-determinant with Monte Carlo Approximationand Limits

To see the effects of exact versus approximate log-determinant computa-tions consider Tables 31 and 32 using the 3107 county election data Theestimated autoregressive parameter is only off by 001 from using the approx-imation The approximate method also uses likelihood dominance inferencewhich results in a lower bound to the signed root deviances As shown bythe tables the likelihood dominance SRDs are smaller in magnitude thanthe exact SRDs However they can still document statistical significance formany variables and thus can prove useful in many circumstances

22 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07784 -311227 00000Education 02696 93438 00000Home Ownership 04530 262661 00000Income 00071 00035 09972Intercept 05443 78773 00000Alpha 07250 327719 00000

Table 31 SAR Estimation Results using Chebyshev lndet approximationand likelihood dominance inference

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07806 -321748 00000Education 02746 119438 00000Home Ownership 04525 270073 00000Income 00047 02187 08269Intercept 05528 94908 00000Alpha 07150 348648 00000

Table 32 SAR Estimation results using exact lndet

23

Some of the routines not only yield the maximum of the likelihood func-tion but also profile likelihoods in the dependence parameter α by model asshown in Figure 37

0 01 02 03 04 05 06 07 08 09 1minus7000

minus6800

minus6600

minus6400

minus6200

minus6000

minus5800

minus5600Profile likelihoods vs α for global model and deleteminus1 submodels

Dependence parameter (α)

Pro

file

logminus

likel

ihoo

d

Global likelihoodVoting PopEducationHome OwnershipIncomeIntercept

Figure 37 SAR Profile Likelihoods by Model

The toolbox includes SAR CAR and MESS error models as well asMESS closest neighbor and MIX autoregressive models as shown in Ta-ble 33 and Table 34

24 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables b OLS b closest b MESS b Mix

Voting Pop -08464 -07489 -07693 -07298Education 05167 02899 01941 01818Home Ownership 04291 04457 04832 04580Income -01439 -00332 00423 00427Lag Voting Pop 00000 01878 04186 04616Lag Education 00000 00975 01205 00450Lag Home Ownership 00000 -01569 -03249 -03299Lag Income 00000 -01079 -01802 -01410Intercept 09814 07495 05636 04205α 00000 03352 14628 06550

Table 33 Estimates on Election Data

The closest neighbor approach is intermediate to a non-spatial approach(OLS) and a full spatial approach (MESS) or the approximate mixed routineNote the close agreement between MESS and the mixed routine Note OLSin this case uses the spatial averages of the basic independent variables asadditional independent variables

None of these operations take long for the election data

25

Variables b OLS b closest b MESS b Mix

Voting Pop -347211 -305020 -288899 -294018Education 308928 134922 76654 77169Home Ownership 230066 254083 268836 273515Income -72373 -15742 17478 18952Lag Voting Pop 00000 77600 107656 125507Lag Education 00000 45090 39785 15705Lag Home Ownership 00000 -94245 -111259 -121200Lag Income 00000 -51399 -55551 -46603Intercept 202782 160533 107480 84626α 00000 242762 315948 337097

Table 34 Signed Root Deviances using Election Data

26 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Operation Timings in seconds

OLS 00150Closest AR 00470MESS 00940Exact Log-det 08750Mix 00470Doubly Stochastic Scaling 02660

Table 35 Timing for operations to Election data (n=3107)

In addition to global models the toolbox has spatial autoregressive localestimation (SALE) The user chooses the bandwidth (subsample size) by ex-amining cross-validation error at the fringe observations as in Figure 39 oreach observation as in Figure 38

27

0 50 100 150 200 250 300 350 400 450 5000055

006

0065

007

0075Smoothed SALE Recursive Residuals of Fringe Observations

Number of Local Observations

Abs

olut

e E

rror

Figure 38 Plot of Fringe prediction error versus subsample size

28 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 50 100 150 200 250 300 350 400 450 500005

0052

0054

0056

0058

006

0062

0064Smoothed SALE Initial Holdout Residuals

Number of Local Observations

Abs

olut

e E

rror

Figure 39 Plot of prediction error on center of area versus subsample size

Usually there is spatial dependence even in small subsamples as shownin Figure 310

29

0 50 100 150 200 250 300 350 400 450 5000

01

02

03

04

05

06

07SALE Autoregressive Parameter Estimates

Number of Local Observations

Med

ian

Aut

oreg

ress

ive

Par

amet

er E

stim

ates

Figure 310 Spatial dependence parameter estimate versus subsample size

Local estimation leads to spatially varying parameter estimates such asthose shown in Figure 311

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 15: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

Chapter 3

A Brief Selected Tour of theToolbox

The weight matrix specifies the dependence among observations One formof weight matrix (Delaunay) uses the notion of contiguity to specify depen-dence as depicted in Figure 31

15

16 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

Delaunay Nearest Neighbor Graph

Figure 31 Connections among Counties via Delaunay

Note the somewhat strange behavior of connections to outlying observa-tions in Figure 31 This arises to the geometric nature of contiguity Usingnearest neighbors based upon some metric can avoid this as shown in Fig-ure 32

17

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

8 Nearest Neighbor Graph

Figure 32 Connections among Counties via Eight Nearest Neighbors

Using only nearby observations implies that the weight matrix has mainzeros or is sparse as shown in Figure 33

18 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Original Nonminuszero Weight Pattern for Delaunay

Figure 33 Plot of Non-zeros of Delaunay Weight Matrix

This becomes even more apparent when reordering the observations asin Figure 34

19

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Permuted Nonminuszero Weight Pattern for Delaunay

Figure 34 Plot of Non-zeros of Permuted Delaunay Weight Matrix

Sparsity as well as finding an appropriate ordering are key in quicklycomputing the log-determinants used in maximum likelihood The toolboxhas functions for exact computation of the log-determinants (actually inter-polation of exact computations at various points) However users can selectapproximations as well which depend only on sparsity and not upon order-ings The quadratic Chebyshev is the fastest and most approximate technique(Figure 35)

20 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus4000

minus3500

minus3000

minus2500

minus2000

minus1500

minus1000

minus500

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Chebyshev approximation Taylor bounds

Taylor Lower minusChebyshev oTaylor Upper +Exact

Figure 35 Plot of Exact log-determinant with Chebyshev Approximationand Taylor Bounds

The Chebyshev approximation appears quite good for positive moderatevalues of the dependence parameter but could use improvement for materi-ally negative values of the spatial dependence parameter Fortunately suchnegative values seem rare in practice

The Monte Carlo log-determinant estimator is quite fast but more accu-rate (Figure 36)

21

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus500

minus450

minus400

minus350

minus300

minus250

minus200

minus150

minus100

minus50

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Monte Carlo approximation and confidence limits

Lower ConfidenceMonte CarloUpper ConfidenceExact

Figure 36 Plot of Exact log-determinant with Monte Carlo Approximationand Limits

To see the effects of exact versus approximate log-determinant computa-tions consider Tables 31 and 32 using the 3107 county election data Theestimated autoregressive parameter is only off by 001 from using the approx-imation The approximate method also uses likelihood dominance inferencewhich results in a lower bound to the signed root deviances As shown bythe tables the likelihood dominance SRDs are smaller in magnitude thanthe exact SRDs However they can still document statistical significance formany variables and thus can prove useful in many circumstances

22 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07784 -311227 00000Education 02696 93438 00000Home Ownership 04530 262661 00000Income 00071 00035 09972Intercept 05443 78773 00000Alpha 07250 327719 00000

Table 31 SAR Estimation Results using Chebyshev lndet approximationand likelihood dominance inference

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07806 -321748 00000Education 02746 119438 00000Home Ownership 04525 270073 00000Income 00047 02187 08269Intercept 05528 94908 00000Alpha 07150 348648 00000

Table 32 SAR Estimation results using exact lndet

23

Some of the routines not only yield the maximum of the likelihood func-tion but also profile likelihoods in the dependence parameter α by model asshown in Figure 37

0 01 02 03 04 05 06 07 08 09 1minus7000

minus6800

minus6600

minus6400

minus6200

minus6000

minus5800

minus5600Profile likelihoods vs α for global model and deleteminus1 submodels

Dependence parameter (α)

Pro

file

logminus

likel

ihoo

d

Global likelihoodVoting PopEducationHome OwnershipIncomeIntercept

Figure 37 SAR Profile Likelihoods by Model

The toolbox includes SAR CAR and MESS error models as well asMESS closest neighbor and MIX autoregressive models as shown in Ta-ble 33 and Table 34

24 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables b OLS b closest b MESS b Mix

Voting Pop -08464 -07489 -07693 -07298Education 05167 02899 01941 01818Home Ownership 04291 04457 04832 04580Income -01439 -00332 00423 00427Lag Voting Pop 00000 01878 04186 04616Lag Education 00000 00975 01205 00450Lag Home Ownership 00000 -01569 -03249 -03299Lag Income 00000 -01079 -01802 -01410Intercept 09814 07495 05636 04205α 00000 03352 14628 06550

Table 33 Estimates on Election Data

The closest neighbor approach is intermediate to a non-spatial approach(OLS) and a full spatial approach (MESS) or the approximate mixed routineNote the close agreement between MESS and the mixed routine Note OLSin this case uses the spatial averages of the basic independent variables asadditional independent variables

None of these operations take long for the election data

25

Variables b OLS b closest b MESS b Mix

Voting Pop -347211 -305020 -288899 -294018Education 308928 134922 76654 77169Home Ownership 230066 254083 268836 273515Income -72373 -15742 17478 18952Lag Voting Pop 00000 77600 107656 125507Lag Education 00000 45090 39785 15705Lag Home Ownership 00000 -94245 -111259 -121200Lag Income 00000 -51399 -55551 -46603Intercept 202782 160533 107480 84626α 00000 242762 315948 337097

Table 34 Signed Root Deviances using Election Data

26 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Operation Timings in seconds

OLS 00150Closest AR 00470MESS 00940Exact Log-det 08750Mix 00470Doubly Stochastic Scaling 02660

Table 35 Timing for operations to Election data (n=3107)

In addition to global models the toolbox has spatial autoregressive localestimation (SALE) The user chooses the bandwidth (subsample size) by ex-amining cross-validation error at the fringe observations as in Figure 39 oreach observation as in Figure 38

27

0 50 100 150 200 250 300 350 400 450 5000055

006

0065

007

0075Smoothed SALE Recursive Residuals of Fringe Observations

Number of Local Observations

Abs

olut

e E

rror

Figure 38 Plot of Fringe prediction error versus subsample size

28 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 50 100 150 200 250 300 350 400 450 500005

0052

0054

0056

0058

006

0062

0064Smoothed SALE Initial Holdout Residuals

Number of Local Observations

Abs

olut

e E

rror

Figure 39 Plot of prediction error on center of area versus subsample size

Usually there is spatial dependence even in small subsamples as shownin Figure 310

29

0 50 100 150 200 250 300 350 400 450 5000

01

02

03

04

05

06

07SALE Autoregressive Parameter Estimates

Number of Local Observations

Med

ian

Aut

oreg

ress

ive

Par

amet

er E

stim

ates

Figure 310 Spatial dependence parameter estimate versus subsample size

Local estimation leads to spatially varying parameter estimates such asthose shown in Figure 311

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 16: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

16 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

Delaunay Nearest Neighbor Graph

Figure 31 Connections among Counties via Delaunay

Note the somewhat strange behavior of connections to outlying observa-tions in Figure 31 This arises to the geometric nature of contiguity Usingnearest neighbors based upon some metric can avoid this as shown in Fig-ure 32

17

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

8 Nearest Neighbor Graph

Figure 32 Connections among Counties via Eight Nearest Neighbors

Using only nearby observations implies that the weight matrix has mainzeros or is sparse as shown in Figure 33

18 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Original Nonminuszero Weight Pattern for Delaunay

Figure 33 Plot of Non-zeros of Delaunay Weight Matrix

This becomes even more apparent when reordering the observations asin Figure 34

19

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Permuted Nonminuszero Weight Pattern for Delaunay

Figure 34 Plot of Non-zeros of Permuted Delaunay Weight Matrix

Sparsity as well as finding an appropriate ordering are key in quicklycomputing the log-determinants used in maximum likelihood The toolboxhas functions for exact computation of the log-determinants (actually inter-polation of exact computations at various points) However users can selectapproximations as well which depend only on sparsity and not upon order-ings The quadratic Chebyshev is the fastest and most approximate technique(Figure 35)

20 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus4000

minus3500

minus3000

minus2500

minus2000

minus1500

minus1000

minus500

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Chebyshev approximation Taylor bounds

Taylor Lower minusChebyshev oTaylor Upper +Exact

Figure 35 Plot of Exact log-determinant with Chebyshev Approximationand Taylor Bounds

The Chebyshev approximation appears quite good for positive moderatevalues of the dependence parameter but could use improvement for materi-ally negative values of the spatial dependence parameter Fortunately suchnegative values seem rare in practice

The Monte Carlo log-determinant estimator is quite fast but more accu-rate (Figure 36)

21

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus500

minus450

minus400

minus350

minus300

minus250

minus200

minus150

minus100

minus50

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Monte Carlo approximation and confidence limits

Lower ConfidenceMonte CarloUpper ConfidenceExact

Figure 36 Plot of Exact log-determinant with Monte Carlo Approximationand Limits

To see the effects of exact versus approximate log-determinant computa-tions consider Tables 31 and 32 using the 3107 county election data Theestimated autoregressive parameter is only off by 001 from using the approx-imation The approximate method also uses likelihood dominance inferencewhich results in a lower bound to the signed root deviances As shown bythe tables the likelihood dominance SRDs are smaller in magnitude thanthe exact SRDs However they can still document statistical significance formany variables and thus can prove useful in many circumstances

22 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07784 -311227 00000Education 02696 93438 00000Home Ownership 04530 262661 00000Income 00071 00035 09972Intercept 05443 78773 00000Alpha 07250 327719 00000

Table 31 SAR Estimation Results using Chebyshev lndet approximationand likelihood dominance inference

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07806 -321748 00000Education 02746 119438 00000Home Ownership 04525 270073 00000Income 00047 02187 08269Intercept 05528 94908 00000Alpha 07150 348648 00000

Table 32 SAR Estimation results using exact lndet

23

Some of the routines not only yield the maximum of the likelihood func-tion but also profile likelihoods in the dependence parameter α by model asshown in Figure 37

0 01 02 03 04 05 06 07 08 09 1minus7000

minus6800

minus6600

minus6400

minus6200

minus6000

minus5800

minus5600Profile likelihoods vs α for global model and deleteminus1 submodels

Dependence parameter (α)

Pro

file

logminus

likel

ihoo

d

Global likelihoodVoting PopEducationHome OwnershipIncomeIntercept

Figure 37 SAR Profile Likelihoods by Model

The toolbox includes SAR CAR and MESS error models as well asMESS closest neighbor and MIX autoregressive models as shown in Ta-ble 33 and Table 34

24 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables b OLS b closest b MESS b Mix

Voting Pop -08464 -07489 -07693 -07298Education 05167 02899 01941 01818Home Ownership 04291 04457 04832 04580Income -01439 -00332 00423 00427Lag Voting Pop 00000 01878 04186 04616Lag Education 00000 00975 01205 00450Lag Home Ownership 00000 -01569 -03249 -03299Lag Income 00000 -01079 -01802 -01410Intercept 09814 07495 05636 04205α 00000 03352 14628 06550

Table 33 Estimates on Election Data

The closest neighbor approach is intermediate to a non-spatial approach(OLS) and a full spatial approach (MESS) or the approximate mixed routineNote the close agreement between MESS and the mixed routine Note OLSin this case uses the spatial averages of the basic independent variables asadditional independent variables

None of these operations take long for the election data

25

Variables b OLS b closest b MESS b Mix

Voting Pop -347211 -305020 -288899 -294018Education 308928 134922 76654 77169Home Ownership 230066 254083 268836 273515Income -72373 -15742 17478 18952Lag Voting Pop 00000 77600 107656 125507Lag Education 00000 45090 39785 15705Lag Home Ownership 00000 -94245 -111259 -121200Lag Income 00000 -51399 -55551 -46603Intercept 202782 160533 107480 84626α 00000 242762 315948 337097

Table 34 Signed Root Deviances using Election Data

26 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Operation Timings in seconds

OLS 00150Closest AR 00470MESS 00940Exact Log-det 08750Mix 00470Doubly Stochastic Scaling 02660

Table 35 Timing for operations to Election data (n=3107)

In addition to global models the toolbox has spatial autoregressive localestimation (SALE) The user chooses the bandwidth (subsample size) by ex-amining cross-validation error at the fringe observations as in Figure 39 oreach observation as in Figure 38

27

0 50 100 150 200 250 300 350 400 450 5000055

006

0065

007

0075Smoothed SALE Recursive Residuals of Fringe Observations

Number of Local Observations

Abs

olut

e E

rror

Figure 38 Plot of Fringe prediction error versus subsample size

28 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 50 100 150 200 250 300 350 400 450 500005

0052

0054

0056

0058

006

0062

0064Smoothed SALE Initial Holdout Residuals

Number of Local Observations

Abs

olut

e E

rror

Figure 39 Plot of prediction error on center of area versus subsample size

Usually there is spatial dependence even in small subsamples as shownin Figure 310

29

0 50 100 150 200 250 300 350 400 450 5000

01

02

03

04

05

06

07SALE Autoregressive Parameter Estimates

Number of Local Observations

Med

ian

Aut

oreg

ress

ive

Par

amet

er E

stim

ates

Figure 310 Spatial dependence parameter estimate versus subsample size

Local estimation leads to spatially varying parameter estimates such asthose shown in Figure 311

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 17: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

17

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude in Decimal Degrees

Latit

utde

in D

ecim

al D

egre

es

8 Nearest Neighbor Graph

Figure 32 Connections among Counties via Eight Nearest Neighbors

Using only nearby observations implies that the weight matrix has mainzeros or is sparse as shown in Figure 33

18 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Original Nonminuszero Weight Pattern for Delaunay

Figure 33 Plot of Non-zeros of Delaunay Weight Matrix

This becomes even more apparent when reordering the observations asin Figure 34

19

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Permuted Nonminuszero Weight Pattern for Delaunay

Figure 34 Plot of Non-zeros of Permuted Delaunay Weight Matrix

Sparsity as well as finding an appropriate ordering are key in quicklycomputing the log-determinants used in maximum likelihood The toolboxhas functions for exact computation of the log-determinants (actually inter-polation of exact computations at various points) However users can selectapproximations as well which depend only on sparsity and not upon order-ings The quadratic Chebyshev is the fastest and most approximate technique(Figure 35)

20 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus4000

minus3500

minus3000

minus2500

minus2000

minus1500

minus1000

minus500

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Chebyshev approximation Taylor bounds

Taylor Lower minusChebyshev oTaylor Upper +Exact

Figure 35 Plot of Exact log-determinant with Chebyshev Approximationand Taylor Bounds

The Chebyshev approximation appears quite good for positive moderatevalues of the dependence parameter but could use improvement for materi-ally negative values of the spatial dependence parameter Fortunately suchnegative values seem rare in practice

The Monte Carlo log-determinant estimator is quite fast but more accu-rate (Figure 36)

21

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus500

minus450

minus400

minus350

minus300

minus250

minus200

minus150

minus100

minus50

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Monte Carlo approximation and confidence limits

Lower ConfidenceMonte CarloUpper ConfidenceExact

Figure 36 Plot of Exact log-determinant with Monte Carlo Approximationand Limits

To see the effects of exact versus approximate log-determinant computa-tions consider Tables 31 and 32 using the 3107 county election data Theestimated autoregressive parameter is only off by 001 from using the approx-imation The approximate method also uses likelihood dominance inferencewhich results in a lower bound to the signed root deviances As shown bythe tables the likelihood dominance SRDs are smaller in magnitude thanthe exact SRDs However they can still document statistical significance formany variables and thus can prove useful in many circumstances

22 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07784 -311227 00000Education 02696 93438 00000Home Ownership 04530 262661 00000Income 00071 00035 09972Intercept 05443 78773 00000Alpha 07250 327719 00000

Table 31 SAR Estimation Results using Chebyshev lndet approximationand likelihood dominance inference

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07806 -321748 00000Education 02746 119438 00000Home Ownership 04525 270073 00000Income 00047 02187 08269Intercept 05528 94908 00000Alpha 07150 348648 00000

Table 32 SAR Estimation results using exact lndet

23

Some of the routines not only yield the maximum of the likelihood func-tion but also profile likelihoods in the dependence parameter α by model asshown in Figure 37

0 01 02 03 04 05 06 07 08 09 1minus7000

minus6800

minus6600

minus6400

minus6200

minus6000

minus5800

minus5600Profile likelihoods vs α for global model and deleteminus1 submodels

Dependence parameter (α)

Pro

file

logminus

likel

ihoo

d

Global likelihoodVoting PopEducationHome OwnershipIncomeIntercept

Figure 37 SAR Profile Likelihoods by Model

The toolbox includes SAR CAR and MESS error models as well asMESS closest neighbor and MIX autoregressive models as shown in Ta-ble 33 and Table 34

24 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables b OLS b closest b MESS b Mix

Voting Pop -08464 -07489 -07693 -07298Education 05167 02899 01941 01818Home Ownership 04291 04457 04832 04580Income -01439 -00332 00423 00427Lag Voting Pop 00000 01878 04186 04616Lag Education 00000 00975 01205 00450Lag Home Ownership 00000 -01569 -03249 -03299Lag Income 00000 -01079 -01802 -01410Intercept 09814 07495 05636 04205α 00000 03352 14628 06550

Table 33 Estimates on Election Data

The closest neighbor approach is intermediate to a non-spatial approach(OLS) and a full spatial approach (MESS) or the approximate mixed routineNote the close agreement between MESS and the mixed routine Note OLSin this case uses the spatial averages of the basic independent variables asadditional independent variables

None of these operations take long for the election data

25

Variables b OLS b closest b MESS b Mix

Voting Pop -347211 -305020 -288899 -294018Education 308928 134922 76654 77169Home Ownership 230066 254083 268836 273515Income -72373 -15742 17478 18952Lag Voting Pop 00000 77600 107656 125507Lag Education 00000 45090 39785 15705Lag Home Ownership 00000 -94245 -111259 -121200Lag Income 00000 -51399 -55551 -46603Intercept 202782 160533 107480 84626α 00000 242762 315948 337097

Table 34 Signed Root Deviances using Election Data

26 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Operation Timings in seconds

OLS 00150Closest AR 00470MESS 00940Exact Log-det 08750Mix 00470Doubly Stochastic Scaling 02660

Table 35 Timing for operations to Election data (n=3107)

In addition to global models the toolbox has spatial autoregressive localestimation (SALE) The user chooses the bandwidth (subsample size) by ex-amining cross-validation error at the fringe observations as in Figure 39 oreach observation as in Figure 38

27

0 50 100 150 200 250 300 350 400 450 5000055

006

0065

007

0075Smoothed SALE Recursive Residuals of Fringe Observations

Number of Local Observations

Abs

olut

e E

rror

Figure 38 Plot of Fringe prediction error versus subsample size

28 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 50 100 150 200 250 300 350 400 450 500005

0052

0054

0056

0058

006

0062

0064Smoothed SALE Initial Holdout Residuals

Number of Local Observations

Abs

olut

e E

rror

Figure 39 Plot of prediction error on center of area versus subsample size

Usually there is spatial dependence even in small subsamples as shownin Figure 310

29

0 50 100 150 200 250 300 350 400 450 5000

01

02

03

04

05

06

07SALE Autoregressive Parameter Estimates

Number of Local Observations

Med

ian

Aut

oreg

ress

ive

Par

amet

er E

stim

ates

Figure 310 Spatial dependence parameter estimate versus subsample size

Local estimation leads to spatially varying parameter estimates such asthose shown in Figure 311

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 18: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

18 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Original Nonminuszero Weight Pattern for Delaunay

Figure 33 Plot of Non-zeros of Delaunay Weight Matrix

This becomes even more apparent when reordering the observations asin Figure 34

19

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Permuted Nonminuszero Weight Pattern for Delaunay

Figure 34 Plot of Non-zeros of Permuted Delaunay Weight Matrix

Sparsity as well as finding an appropriate ordering are key in quicklycomputing the log-determinants used in maximum likelihood The toolboxhas functions for exact computation of the log-determinants (actually inter-polation of exact computations at various points) However users can selectapproximations as well which depend only on sparsity and not upon order-ings The quadratic Chebyshev is the fastest and most approximate technique(Figure 35)

20 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus4000

minus3500

minus3000

minus2500

minus2000

minus1500

minus1000

minus500

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Chebyshev approximation Taylor bounds

Taylor Lower minusChebyshev oTaylor Upper +Exact

Figure 35 Plot of Exact log-determinant with Chebyshev Approximationand Taylor Bounds

The Chebyshev approximation appears quite good for positive moderatevalues of the dependence parameter but could use improvement for materi-ally negative values of the spatial dependence parameter Fortunately suchnegative values seem rare in practice

The Monte Carlo log-determinant estimator is quite fast but more accu-rate (Figure 36)

21

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus500

minus450

minus400

minus350

minus300

minus250

minus200

minus150

minus100

minus50

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Monte Carlo approximation and confidence limits

Lower ConfidenceMonte CarloUpper ConfidenceExact

Figure 36 Plot of Exact log-determinant with Monte Carlo Approximationand Limits

To see the effects of exact versus approximate log-determinant computa-tions consider Tables 31 and 32 using the 3107 county election data Theestimated autoregressive parameter is only off by 001 from using the approx-imation The approximate method also uses likelihood dominance inferencewhich results in a lower bound to the signed root deviances As shown bythe tables the likelihood dominance SRDs are smaller in magnitude thanthe exact SRDs However they can still document statistical significance formany variables and thus can prove useful in many circumstances

22 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07784 -311227 00000Education 02696 93438 00000Home Ownership 04530 262661 00000Income 00071 00035 09972Intercept 05443 78773 00000Alpha 07250 327719 00000

Table 31 SAR Estimation Results using Chebyshev lndet approximationand likelihood dominance inference

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07806 -321748 00000Education 02746 119438 00000Home Ownership 04525 270073 00000Income 00047 02187 08269Intercept 05528 94908 00000Alpha 07150 348648 00000

Table 32 SAR Estimation results using exact lndet

23

Some of the routines not only yield the maximum of the likelihood func-tion but also profile likelihoods in the dependence parameter α by model asshown in Figure 37

0 01 02 03 04 05 06 07 08 09 1minus7000

minus6800

minus6600

minus6400

minus6200

minus6000

minus5800

minus5600Profile likelihoods vs α for global model and deleteminus1 submodels

Dependence parameter (α)

Pro

file

logminus

likel

ihoo

d

Global likelihoodVoting PopEducationHome OwnershipIncomeIntercept

Figure 37 SAR Profile Likelihoods by Model

The toolbox includes SAR CAR and MESS error models as well asMESS closest neighbor and MIX autoregressive models as shown in Ta-ble 33 and Table 34

24 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables b OLS b closest b MESS b Mix

Voting Pop -08464 -07489 -07693 -07298Education 05167 02899 01941 01818Home Ownership 04291 04457 04832 04580Income -01439 -00332 00423 00427Lag Voting Pop 00000 01878 04186 04616Lag Education 00000 00975 01205 00450Lag Home Ownership 00000 -01569 -03249 -03299Lag Income 00000 -01079 -01802 -01410Intercept 09814 07495 05636 04205α 00000 03352 14628 06550

Table 33 Estimates on Election Data

The closest neighbor approach is intermediate to a non-spatial approach(OLS) and a full spatial approach (MESS) or the approximate mixed routineNote the close agreement between MESS and the mixed routine Note OLSin this case uses the spatial averages of the basic independent variables asadditional independent variables

None of these operations take long for the election data

25

Variables b OLS b closest b MESS b Mix

Voting Pop -347211 -305020 -288899 -294018Education 308928 134922 76654 77169Home Ownership 230066 254083 268836 273515Income -72373 -15742 17478 18952Lag Voting Pop 00000 77600 107656 125507Lag Education 00000 45090 39785 15705Lag Home Ownership 00000 -94245 -111259 -121200Lag Income 00000 -51399 -55551 -46603Intercept 202782 160533 107480 84626α 00000 242762 315948 337097

Table 34 Signed Root Deviances using Election Data

26 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Operation Timings in seconds

OLS 00150Closest AR 00470MESS 00940Exact Log-det 08750Mix 00470Doubly Stochastic Scaling 02660

Table 35 Timing for operations to Election data (n=3107)

In addition to global models the toolbox has spatial autoregressive localestimation (SALE) The user chooses the bandwidth (subsample size) by ex-amining cross-validation error at the fringe observations as in Figure 39 oreach observation as in Figure 38

27

0 50 100 150 200 250 300 350 400 450 5000055

006

0065

007

0075Smoothed SALE Recursive Residuals of Fringe Observations

Number of Local Observations

Abs

olut

e E

rror

Figure 38 Plot of Fringe prediction error versus subsample size

28 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 50 100 150 200 250 300 350 400 450 500005

0052

0054

0056

0058

006

0062

0064Smoothed SALE Initial Holdout Residuals

Number of Local Observations

Abs

olut

e E

rror

Figure 39 Plot of prediction error on center of area versus subsample size

Usually there is spatial dependence even in small subsamples as shownin Figure 310

29

0 50 100 150 200 250 300 350 400 450 5000

01

02

03

04

05

06

07SALE Autoregressive Parameter Estimates

Number of Local Observations

Med

ian

Aut

oreg

ress

ive

Par

amet

er E

stim

ates

Figure 310 Spatial dependence parameter estimate versus subsample size

Local estimation leads to spatially varying parameter estimates such asthose shown in Figure 311

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 19: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

19

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

Observation Number

Obs

erva

tion

Num

ber

Permuted Nonminuszero Weight Pattern for Delaunay

Figure 34 Plot of Non-zeros of Permuted Delaunay Weight Matrix

Sparsity as well as finding an appropriate ordering are key in quicklycomputing the log-determinants used in maximum likelihood The toolboxhas functions for exact computation of the log-determinants (actually inter-polation of exact computations at various points) However users can selectapproximations as well which depend only on sparsity and not upon order-ings The quadratic Chebyshev is the fastest and most approximate technique(Figure 35)

20 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus4000

minus3500

minus3000

minus2500

minus2000

minus1500

minus1000

minus500

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Chebyshev approximation Taylor bounds

Taylor Lower minusChebyshev oTaylor Upper +Exact

Figure 35 Plot of Exact log-determinant with Chebyshev Approximationand Taylor Bounds

The Chebyshev approximation appears quite good for positive moderatevalues of the dependence parameter but could use improvement for materi-ally negative values of the spatial dependence parameter Fortunately suchnegative values seem rare in practice

The Monte Carlo log-determinant estimator is quite fast but more accu-rate (Figure 36)

21

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus500

minus450

minus400

minus350

minus300

minus250

minus200

minus150

minus100

minus50

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Monte Carlo approximation and confidence limits

Lower ConfidenceMonte CarloUpper ConfidenceExact

Figure 36 Plot of Exact log-determinant with Monte Carlo Approximationand Limits

To see the effects of exact versus approximate log-determinant computa-tions consider Tables 31 and 32 using the 3107 county election data Theestimated autoregressive parameter is only off by 001 from using the approx-imation The approximate method also uses likelihood dominance inferencewhich results in a lower bound to the signed root deviances As shown bythe tables the likelihood dominance SRDs are smaller in magnitude thanthe exact SRDs However they can still document statistical significance formany variables and thus can prove useful in many circumstances

22 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07784 -311227 00000Education 02696 93438 00000Home Ownership 04530 262661 00000Income 00071 00035 09972Intercept 05443 78773 00000Alpha 07250 327719 00000

Table 31 SAR Estimation Results using Chebyshev lndet approximationand likelihood dominance inference

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07806 -321748 00000Education 02746 119438 00000Home Ownership 04525 270073 00000Income 00047 02187 08269Intercept 05528 94908 00000Alpha 07150 348648 00000

Table 32 SAR Estimation results using exact lndet

23

Some of the routines not only yield the maximum of the likelihood func-tion but also profile likelihoods in the dependence parameter α by model asshown in Figure 37

0 01 02 03 04 05 06 07 08 09 1minus7000

minus6800

minus6600

minus6400

minus6200

minus6000

minus5800

minus5600Profile likelihoods vs α for global model and deleteminus1 submodels

Dependence parameter (α)

Pro

file

logminus

likel

ihoo

d

Global likelihoodVoting PopEducationHome OwnershipIncomeIntercept

Figure 37 SAR Profile Likelihoods by Model

The toolbox includes SAR CAR and MESS error models as well asMESS closest neighbor and MIX autoregressive models as shown in Ta-ble 33 and Table 34

24 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables b OLS b closest b MESS b Mix

Voting Pop -08464 -07489 -07693 -07298Education 05167 02899 01941 01818Home Ownership 04291 04457 04832 04580Income -01439 -00332 00423 00427Lag Voting Pop 00000 01878 04186 04616Lag Education 00000 00975 01205 00450Lag Home Ownership 00000 -01569 -03249 -03299Lag Income 00000 -01079 -01802 -01410Intercept 09814 07495 05636 04205α 00000 03352 14628 06550

Table 33 Estimates on Election Data

The closest neighbor approach is intermediate to a non-spatial approach(OLS) and a full spatial approach (MESS) or the approximate mixed routineNote the close agreement between MESS and the mixed routine Note OLSin this case uses the spatial averages of the basic independent variables asadditional independent variables

None of these operations take long for the election data

25

Variables b OLS b closest b MESS b Mix

Voting Pop -347211 -305020 -288899 -294018Education 308928 134922 76654 77169Home Ownership 230066 254083 268836 273515Income -72373 -15742 17478 18952Lag Voting Pop 00000 77600 107656 125507Lag Education 00000 45090 39785 15705Lag Home Ownership 00000 -94245 -111259 -121200Lag Income 00000 -51399 -55551 -46603Intercept 202782 160533 107480 84626α 00000 242762 315948 337097

Table 34 Signed Root Deviances using Election Data

26 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Operation Timings in seconds

OLS 00150Closest AR 00470MESS 00940Exact Log-det 08750Mix 00470Doubly Stochastic Scaling 02660

Table 35 Timing for operations to Election data (n=3107)

In addition to global models the toolbox has spatial autoregressive localestimation (SALE) The user chooses the bandwidth (subsample size) by ex-amining cross-validation error at the fringe observations as in Figure 39 oreach observation as in Figure 38

27

0 50 100 150 200 250 300 350 400 450 5000055

006

0065

007

0075Smoothed SALE Recursive Residuals of Fringe Observations

Number of Local Observations

Abs

olut

e E

rror

Figure 38 Plot of Fringe prediction error versus subsample size

28 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 50 100 150 200 250 300 350 400 450 500005

0052

0054

0056

0058

006

0062

0064Smoothed SALE Initial Holdout Residuals

Number of Local Observations

Abs

olut

e E

rror

Figure 39 Plot of prediction error on center of area versus subsample size

Usually there is spatial dependence even in small subsamples as shownin Figure 310

29

0 50 100 150 200 250 300 350 400 450 5000

01

02

03

04

05

06

07SALE Autoregressive Parameter Estimates

Number of Local Observations

Med

ian

Aut

oreg

ress

ive

Par

amet

er E

stim

ates

Figure 310 Spatial dependence parameter estimate versus subsample size

Local estimation leads to spatially varying parameter estimates such asthose shown in Figure 311

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 20: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

20 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus4000

minus3500

minus3000

minus2500

minus2000

minus1500

minus1000

minus500

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Chebyshev approximation Taylor bounds

Taylor Lower minusChebyshev oTaylor Upper +Exact

Figure 35 Plot of Exact log-determinant with Chebyshev Approximationand Taylor Bounds

The Chebyshev approximation appears quite good for positive moderatevalues of the dependence parameter but could use improvement for materi-ally negative values of the spatial dependence parameter Fortunately suchnegative values seem rare in practice

The Monte Carlo log-determinant estimator is quite fast but more accu-rate (Figure 36)

21

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus500

minus450

minus400

minus350

minus300

minus250

minus200

minus150

minus100

minus50

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Monte Carlo approximation and confidence limits

Lower ConfidenceMonte CarloUpper ConfidenceExact

Figure 36 Plot of Exact log-determinant with Monte Carlo Approximationand Limits

To see the effects of exact versus approximate log-determinant computa-tions consider Tables 31 and 32 using the 3107 county election data Theestimated autoregressive parameter is only off by 001 from using the approx-imation The approximate method also uses likelihood dominance inferencewhich results in a lower bound to the signed root deviances As shown bythe tables the likelihood dominance SRDs are smaller in magnitude thanthe exact SRDs However they can still document statistical significance formany variables and thus can prove useful in many circumstances

22 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07784 -311227 00000Education 02696 93438 00000Home Ownership 04530 262661 00000Income 00071 00035 09972Intercept 05443 78773 00000Alpha 07250 327719 00000

Table 31 SAR Estimation Results using Chebyshev lndet approximationand likelihood dominance inference

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07806 -321748 00000Education 02746 119438 00000Home Ownership 04525 270073 00000Income 00047 02187 08269Intercept 05528 94908 00000Alpha 07150 348648 00000

Table 32 SAR Estimation results using exact lndet

23

Some of the routines not only yield the maximum of the likelihood func-tion but also profile likelihoods in the dependence parameter α by model asshown in Figure 37

0 01 02 03 04 05 06 07 08 09 1minus7000

minus6800

minus6600

minus6400

minus6200

minus6000

minus5800

minus5600Profile likelihoods vs α for global model and deleteminus1 submodels

Dependence parameter (α)

Pro

file

logminus

likel

ihoo

d

Global likelihoodVoting PopEducationHome OwnershipIncomeIntercept

Figure 37 SAR Profile Likelihoods by Model

The toolbox includes SAR CAR and MESS error models as well asMESS closest neighbor and MIX autoregressive models as shown in Ta-ble 33 and Table 34

24 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables b OLS b closest b MESS b Mix

Voting Pop -08464 -07489 -07693 -07298Education 05167 02899 01941 01818Home Ownership 04291 04457 04832 04580Income -01439 -00332 00423 00427Lag Voting Pop 00000 01878 04186 04616Lag Education 00000 00975 01205 00450Lag Home Ownership 00000 -01569 -03249 -03299Lag Income 00000 -01079 -01802 -01410Intercept 09814 07495 05636 04205α 00000 03352 14628 06550

Table 33 Estimates on Election Data

The closest neighbor approach is intermediate to a non-spatial approach(OLS) and a full spatial approach (MESS) or the approximate mixed routineNote the close agreement between MESS and the mixed routine Note OLSin this case uses the spatial averages of the basic independent variables asadditional independent variables

None of these operations take long for the election data

25

Variables b OLS b closest b MESS b Mix

Voting Pop -347211 -305020 -288899 -294018Education 308928 134922 76654 77169Home Ownership 230066 254083 268836 273515Income -72373 -15742 17478 18952Lag Voting Pop 00000 77600 107656 125507Lag Education 00000 45090 39785 15705Lag Home Ownership 00000 -94245 -111259 -121200Lag Income 00000 -51399 -55551 -46603Intercept 202782 160533 107480 84626α 00000 242762 315948 337097

Table 34 Signed Root Deviances using Election Data

26 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Operation Timings in seconds

OLS 00150Closest AR 00470MESS 00940Exact Log-det 08750Mix 00470Doubly Stochastic Scaling 02660

Table 35 Timing for operations to Election data (n=3107)

In addition to global models the toolbox has spatial autoregressive localestimation (SALE) The user chooses the bandwidth (subsample size) by ex-amining cross-validation error at the fringe observations as in Figure 39 oreach observation as in Figure 38

27

0 50 100 150 200 250 300 350 400 450 5000055

006

0065

007

0075Smoothed SALE Recursive Residuals of Fringe Observations

Number of Local Observations

Abs

olut

e E

rror

Figure 38 Plot of Fringe prediction error versus subsample size

28 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 50 100 150 200 250 300 350 400 450 500005

0052

0054

0056

0058

006

0062

0064Smoothed SALE Initial Holdout Residuals

Number of Local Observations

Abs

olut

e E

rror

Figure 39 Plot of prediction error on center of area versus subsample size

Usually there is spatial dependence even in small subsamples as shownin Figure 310

29

0 50 100 150 200 250 300 350 400 450 5000

01

02

03

04

05

06

07SALE Autoregressive Parameter Estimates

Number of Local Observations

Med

ian

Aut

oreg

ress

ive

Par

amet

er E

stim

ates

Figure 310 Spatial dependence parameter estimate versus subsample size

Local estimation leads to spatially varying parameter estimates such asthose shown in Figure 311

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 21: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

21

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1minus500

minus450

minus400

minus350

minus300

minus250

minus200

minus150

minus100

minus50

0

α

logminus

dete

rmin

ants

Exact logminusdeterminant Monte Carlo approximation and confidence limits

Lower ConfidenceMonte CarloUpper ConfidenceExact

Figure 36 Plot of Exact log-determinant with Monte Carlo Approximationand Limits

To see the effects of exact versus approximate log-determinant computa-tions consider Tables 31 and 32 using the 3107 county election data Theestimated autoregressive parameter is only off by 001 from using the approx-imation The approximate method also uses likelihood dominance inferencewhich results in a lower bound to the signed root deviances As shown bythe tables the likelihood dominance SRDs are smaller in magnitude thanthe exact SRDs However they can still document statistical significance formany variables and thus can prove useful in many circumstances

22 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07784 -311227 00000Education 02696 93438 00000Home Ownership 04530 262661 00000Income 00071 00035 09972Intercept 05443 78773 00000Alpha 07250 327719 00000

Table 31 SAR Estimation Results using Chebyshev lndet approximationand likelihood dominance inference

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07806 -321748 00000Education 02746 119438 00000Home Ownership 04525 270073 00000Income 00047 02187 08269Intercept 05528 94908 00000Alpha 07150 348648 00000

Table 32 SAR Estimation results using exact lndet

23

Some of the routines not only yield the maximum of the likelihood func-tion but also profile likelihoods in the dependence parameter α by model asshown in Figure 37

0 01 02 03 04 05 06 07 08 09 1minus7000

minus6800

minus6600

minus6400

minus6200

minus6000

minus5800

minus5600Profile likelihoods vs α for global model and deleteminus1 submodels

Dependence parameter (α)

Pro

file

logminus

likel

ihoo

d

Global likelihoodVoting PopEducationHome OwnershipIncomeIntercept

Figure 37 SAR Profile Likelihoods by Model

The toolbox includes SAR CAR and MESS error models as well asMESS closest neighbor and MIX autoregressive models as shown in Ta-ble 33 and Table 34

24 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables b OLS b closest b MESS b Mix

Voting Pop -08464 -07489 -07693 -07298Education 05167 02899 01941 01818Home Ownership 04291 04457 04832 04580Income -01439 -00332 00423 00427Lag Voting Pop 00000 01878 04186 04616Lag Education 00000 00975 01205 00450Lag Home Ownership 00000 -01569 -03249 -03299Lag Income 00000 -01079 -01802 -01410Intercept 09814 07495 05636 04205α 00000 03352 14628 06550

Table 33 Estimates on Election Data

The closest neighbor approach is intermediate to a non-spatial approach(OLS) and a full spatial approach (MESS) or the approximate mixed routineNote the close agreement between MESS and the mixed routine Note OLSin this case uses the spatial averages of the basic independent variables asadditional independent variables

None of these operations take long for the election data

25

Variables b OLS b closest b MESS b Mix

Voting Pop -347211 -305020 -288899 -294018Education 308928 134922 76654 77169Home Ownership 230066 254083 268836 273515Income -72373 -15742 17478 18952Lag Voting Pop 00000 77600 107656 125507Lag Education 00000 45090 39785 15705Lag Home Ownership 00000 -94245 -111259 -121200Lag Income 00000 -51399 -55551 -46603Intercept 202782 160533 107480 84626α 00000 242762 315948 337097

Table 34 Signed Root Deviances using Election Data

26 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Operation Timings in seconds

OLS 00150Closest AR 00470MESS 00940Exact Log-det 08750Mix 00470Doubly Stochastic Scaling 02660

Table 35 Timing for operations to Election data (n=3107)

In addition to global models the toolbox has spatial autoregressive localestimation (SALE) The user chooses the bandwidth (subsample size) by ex-amining cross-validation error at the fringe observations as in Figure 39 oreach observation as in Figure 38

27

0 50 100 150 200 250 300 350 400 450 5000055

006

0065

007

0075Smoothed SALE Recursive Residuals of Fringe Observations

Number of Local Observations

Abs

olut

e E

rror

Figure 38 Plot of Fringe prediction error versus subsample size

28 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 50 100 150 200 250 300 350 400 450 500005

0052

0054

0056

0058

006

0062

0064Smoothed SALE Initial Holdout Residuals

Number of Local Observations

Abs

olut

e E

rror

Figure 39 Plot of prediction error on center of area versus subsample size

Usually there is spatial dependence even in small subsamples as shownin Figure 310

29

0 50 100 150 200 250 300 350 400 450 5000

01

02

03

04

05

06

07SALE Autoregressive Parameter Estimates

Number of Local Observations

Med

ian

Aut

oreg

ress

ive

Par

amet

er E

stim

ates

Figure 310 Spatial dependence parameter estimate versus subsample size

Local estimation leads to spatially varying parameter estimates such asthose shown in Figure 311

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 22: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

22 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07784 -311227 00000Education 02696 93438 00000Home Ownership 04530 262661 00000Income 00071 00035 09972Intercept 05443 78773 00000Alpha 07250 327719 00000

Table 31 SAR Estimation Results using Chebyshev lndet approximationand likelihood dominance inference

Variables Beta Estimates Signed Root Deviances PR of Higher SRDS

Voting Pop -07806 -321748 00000Education 02746 119438 00000Home Ownership 04525 270073 00000Income 00047 02187 08269Intercept 05528 94908 00000Alpha 07150 348648 00000

Table 32 SAR Estimation results using exact lndet

23

Some of the routines not only yield the maximum of the likelihood func-tion but also profile likelihoods in the dependence parameter α by model asshown in Figure 37

0 01 02 03 04 05 06 07 08 09 1minus7000

minus6800

minus6600

minus6400

minus6200

minus6000

minus5800

minus5600Profile likelihoods vs α for global model and deleteminus1 submodels

Dependence parameter (α)

Pro

file

logminus

likel

ihoo

d

Global likelihoodVoting PopEducationHome OwnershipIncomeIntercept

Figure 37 SAR Profile Likelihoods by Model

The toolbox includes SAR CAR and MESS error models as well asMESS closest neighbor and MIX autoregressive models as shown in Ta-ble 33 and Table 34

24 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables b OLS b closest b MESS b Mix

Voting Pop -08464 -07489 -07693 -07298Education 05167 02899 01941 01818Home Ownership 04291 04457 04832 04580Income -01439 -00332 00423 00427Lag Voting Pop 00000 01878 04186 04616Lag Education 00000 00975 01205 00450Lag Home Ownership 00000 -01569 -03249 -03299Lag Income 00000 -01079 -01802 -01410Intercept 09814 07495 05636 04205α 00000 03352 14628 06550

Table 33 Estimates on Election Data

The closest neighbor approach is intermediate to a non-spatial approach(OLS) and a full spatial approach (MESS) or the approximate mixed routineNote the close agreement between MESS and the mixed routine Note OLSin this case uses the spatial averages of the basic independent variables asadditional independent variables

None of these operations take long for the election data

25

Variables b OLS b closest b MESS b Mix

Voting Pop -347211 -305020 -288899 -294018Education 308928 134922 76654 77169Home Ownership 230066 254083 268836 273515Income -72373 -15742 17478 18952Lag Voting Pop 00000 77600 107656 125507Lag Education 00000 45090 39785 15705Lag Home Ownership 00000 -94245 -111259 -121200Lag Income 00000 -51399 -55551 -46603Intercept 202782 160533 107480 84626α 00000 242762 315948 337097

Table 34 Signed Root Deviances using Election Data

26 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Operation Timings in seconds

OLS 00150Closest AR 00470MESS 00940Exact Log-det 08750Mix 00470Doubly Stochastic Scaling 02660

Table 35 Timing for operations to Election data (n=3107)

In addition to global models the toolbox has spatial autoregressive localestimation (SALE) The user chooses the bandwidth (subsample size) by ex-amining cross-validation error at the fringe observations as in Figure 39 oreach observation as in Figure 38

27

0 50 100 150 200 250 300 350 400 450 5000055

006

0065

007

0075Smoothed SALE Recursive Residuals of Fringe Observations

Number of Local Observations

Abs

olut

e E

rror

Figure 38 Plot of Fringe prediction error versus subsample size

28 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 50 100 150 200 250 300 350 400 450 500005

0052

0054

0056

0058

006

0062

0064Smoothed SALE Initial Holdout Residuals

Number of Local Observations

Abs

olut

e E

rror

Figure 39 Plot of prediction error on center of area versus subsample size

Usually there is spatial dependence even in small subsamples as shownin Figure 310

29

0 50 100 150 200 250 300 350 400 450 5000

01

02

03

04

05

06

07SALE Autoregressive Parameter Estimates

Number of Local Observations

Med

ian

Aut

oreg

ress

ive

Par

amet

er E

stim

ates

Figure 310 Spatial dependence parameter estimate versus subsample size

Local estimation leads to spatially varying parameter estimates such asthose shown in Figure 311

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 23: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

23

Some of the routines not only yield the maximum of the likelihood func-tion but also profile likelihoods in the dependence parameter α by model asshown in Figure 37

0 01 02 03 04 05 06 07 08 09 1minus7000

minus6800

minus6600

minus6400

minus6200

minus6000

minus5800

minus5600Profile likelihoods vs α for global model and deleteminus1 submodels

Dependence parameter (α)

Pro

file

logminus

likel

ihoo

d

Global likelihoodVoting PopEducationHome OwnershipIncomeIntercept

Figure 37 SAR Profile Likelihoods by Model

The toolbox includes SAR CAR and MESS error models as well asMESS closest neighbor and MIX autoregressive models as shown in Ta-ble 33 and Table 34

24 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables b OLS b closest b MESS b Mix

Voting Pop -08464 -07489 -07693 -07298Education 05167 02899 01941 01818Home Ownership 04291 04457 04832 04580Income -01439 -00332 00423 00427Lag Voting Pop 00000 01878 04186 04616Lag Education 00000 00975 01205 00450Lag Home Ownership 00000 -01569 -03249 -03299Lag Income 00000 -01079 -01802 -01410Intercept 09814 07495 05636 04205α 00000 03352 14628 06550

Table 33 Estimates on Election Data

The closest neighbor approach is intermediate to a non-spatial approach(OLS) and a full spatial approach (MESS) or the approximate mixed routineNote the close agreement between MESS and the mixed routine Note OLSin this case uses the spatial averages of the basic independent variables asadditional independent variables

None of these operations take long for the election data

25

Variables b OLS b closest b MESS b Mix

Voting Pop -347211 -305020 -288899 -294018Education 308928 134922 76654 77169Home Ownership 230066 254083 268836 273515Income -72373 -15742 17478 18952Lag Voting Pop 00000 77600 107656 125507Lag Education 00000 45090 39785 15705Lag Home Ownership 00000 -94245 -111259 -121200Lag Income 00000 -51399 -55551 -46603Intercept 202782 160533 107480 84626α 00000 242762 315948 337097

Table 34 Signed Root Deviances using Election Data

26 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Operation Timings in seconds

OLS 00150Closest AR 00470MESS 00940Exact Log-det 08750Mix 00470Doubly Stochastic Scaling 02660

Table 35 Timing for operations to Election data (n=3107)

In addition to global models the toolbox has spatial autoregressive localestimation (SALE) The user chooses the bandwidth (subsample size) by ex-amining cross-validation error at the fringe observations as in Figure 39 oreach observation as in Figure 38

27

0 50 100 150 200 250 300 350 400 450 5000055

006

0065

007

0075Smoothed SALE Recursive Residuals of Fringe Observations

Number of Local Observations

Abs

olut

e E

rror

Figure 38 Plot of Fringe prediction error versus subsample size

28 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 50 100 150 200 250 300 350 400 450 500005

0052

0054

0056

0058

006

0062

0064Smoothed SALE Initial Holdout Residuals

Number of Local Observations

Abs

olut

e E

rror

Figure 39 Plot of prediction error on center of area versus subsample size

Usually there is spatial dependence even in small subsamples as shownin Figure 310

29

0 50 100 150 200 250 300 350 400 450 5000

01

02

03

04

05

06

07SALE Autoregressive Parameter Estimates

Number of Local Observations

Med

ian

Aut

oreg

ress

ive

Par

amet

er E

stim

ates

Figure 310 Spatial dependence parameter estimate versus subsample size

Local estimation leads to spatially varying parameter estimates such asthose shown in Figure 311

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 24: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

24 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variables b OLS b closest b MESS b Mix

Voting Pop -08464 -07489 -07693 -07298Education 05167 02899 01941 01818Home Ownership 04291 04457 04832 04580Income -01439 -00332 00423 00427Lag Voting Pop 00000 01878 04186 04616Lag Education 00000 00975 01205 00450Lag Home Ownership 00000 -01569 -03249 -03299Lag Income 00000 -01079 -01802 -01410Intercept 09814 07495 05636 04205α 00000 03352 14628 06550

Table 33 Estimates on Election Data

The closest neighbor approach is intermediate to a non-spatial approach(OLS) and a full spatial approach (MESS) or the approximate mixed routineNote the close agreement between MESS and the mixed routine Note OLSin this case uses the spatial averages of the basic independent variables asadditional independent variables

None of these operations take long for the election data

25

Variables b OLS b closest b MESS b Mix

Voting Pop -347211 -305020 -288899 -294018Education 308928 134922 76654 77169Home Ownership 230066 254083 268836 273515Income -72373 -15742 17478 18952Lag Voting Pop 00000 77600 107656 125507Lag Education 00000 45090 39785 15705Lag Home Ownership 00000 -94245 -111259 -121200Lag Income 00000 -51399 -55551 -46603Intercept 202782 160533 107480 84626α 00000 242762 315948 337097

Table 34 Signed Root Deviances using Election Data

26 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Operation Timings in seconds

OLS 00150Closest AR 00470MESS 00940Exact Log-det 08750Mix 00470Doubly Stochastic Scaling 02660

Table 35 Timing for operations to Election data (n=3107)

In addition to global models the toolbox has spatial autoregressive localestimation (SALE) The user chooses the bandwidth (subsample size) by ex-amining cross-validation error at the fringe observations as in Figure 39 oreach observation as in Figure 38

27

0 50 100 150 200 250 300 350 400 450 5000055

006

0065

007

0075Smoothed SALE Recursive Residuals of Fringe Observations

Number of Local Observations

Abs

olut

e E

rror

Figure 38 Plot of Fringe prediction error versus subsample size

28 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 50 100 150 200 250 300 350 400 450 500005

0052

0054

0056

0058

006

0062

0064Smoothed SALE Initial Holdout Residuals

Number of Local Observations

Abs

olut

e E

rror

Figure 39 Plot of prediction error on center of area versus subsample size

Usually there is spatial dependence even in small subsamples as shownin Figure 310

29

0 50 100 150 200 250 300 350 400 450 5000

01

02

03

04

05

06

07SALE Autoregressive Parameter Estimates

Number of Local Observations

Med

ian

Aut

oreg

ress

ive

Par

amet

er E

stim

ates

Figure 310 Spatial dependence parameter estimate versus subsample size

Local estimation leads to spatially varying parameter estimates such asthose shown in Figure 311

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 25: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

25

Variables b OLS b closest b MESS b Mix

Voting Pop -347211 -305020 -288899 -294018Education 308928 134922 76654 77169Home Ownership 230066 254083 268836 273515Income -72373 -15742 17478 18952Lag Voting Pop 00000 77600 107656 125507Lag Education 00000 45090 39785 15705Lag Home Ownership 00000 -94245 -111259 -121200Lag Income 00000 -51399 -55551 -46603Intercept 202782 160533 107480 84626α 00000 242762 315948 337097

Table 34 Signed Root Deviances using Election Data

26 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Operation Timings in seconds

OLS 00150Closest AR 00470MESS 00940Exact Log-det 08750Mix 00470Doubly Stochastic Scaling 02660

Table 35 Timing for operations to Election data (n=3107)

In addition to global models the toolbox has spatial autoregressive localestimation (SALE) The user chooses the bandwidth (subsample size) by ex-amining cross-validation error at the fringe observations as in Figure 39 oreach observation as in Figure 38

27

0 50 100 150 200 250 300 350 400 450 5000055

006

0065

007

0075Smoothed SALE Recursive Residuals of Fringe Observations

Number of Local Observations

Abs

olut

e E

rror

Figure 38 Plot of Fringe prediction error versus subsample size

28 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 50 100 150 200 250 300 350 400 450 500005

0052

0054

0056

0058

006

0062

0064Smoothed SALE Initial Holdout Residuals

Number of Local Observations

Abs

olut

e E

rror

Figure 39 Plot of prediction error on center of area versus subsample size

Usually there is spatial dependence even in small subsamples as shownin Figure 310

29

0 50 100 150 200 250 300 350 400 450 5000

01

02

03

04

05

06

07SALE Autoregressive Parameter Estimates

Number of Local Observations

Med

ian

Aut

oreg

ress

ive

Par

amet

er E

stim

ates

Figure 310 Spatial dependence parameter estimate versus subsample size

Local estimation leads to spatially varying parameter estimates such asthose shown in Figure 311

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 26: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

26 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Operation Timings in seconds

OLS 00150Closest AR 00470MESS 00940Exact Log-det 08750Mix 00470Doubly Stochastic Scaling 02660

Table 35 Timing for operations to Election data (n=3107)

In addition to global models the toolbox has spatial autoregressive localestimation (SALE) The user chooses the bandwidth (subsample size) by ex-amining cross-validation error at the fringe observations as in Figure 39 oreach observation as in Figure 38

27

0 50 100 150 200 250 300 350 400 450 5000055

006

0065

007

0075Smoothed SALE Recursive Residuals of Fringe Observations

Number of Local Observations

Abs

olut

e E

rror

Figure 38 Plot of Fringe prediction error versus subsample size

28 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 50 100 150 200 250 300 350 400 450 500005

0052

0054

0056

0058

006

0062

0064Smoothed SALE Initial Holdout Residuals

Number of Local Observations

Abs

olut

e E

rror

Figure 39 Plot of prediction error on center of area versus subsample size

Usually there is spatial dependence even in small subsamples as shownin Figure 310

29

0 50 100 150 200 250 300 350 400 450 5000

01

02

03

04

05

06

07SALE Autoregressive Parameter Estimates

Number of Local Observations

Med

ian

Aut

oreg

ress

ive

Par

amet

er E

stim

ates

Figure 310 Spatial dependence parameter estimate versus subsample size

Local estimation leads to spatially varying parameter estimates such asthose shown in Figure 311

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 27: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

27

0 50 100 150 200 250 300 350 400 450 5000055

006

0065

007

0075Smoothed SALE Recursive Residuals of Fringe Observations

Number of Local Observations

Abs

olut

e E

rror

Figure 38 Plot of Fringe prediction error versus subsample size

28 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 50 100 150 200 250 300 350 400 450 500005

0052

0054

0056

0058

006

0062

0064Smoothed SALE Initial Holdout Residuals

Number of Local Observations

Abs

olut

e E

rror

Figure 39 Plot of prediction error on center of area versus subsample size

Usually there is spatial dependence even in small subsamples as shownin Figure 310

29

0 50 100 150 200 250 300 350 400 450 5000

01

02

03

04

05

06

07SALE Autoregressive Parameter Estimates

Number of Local Observations

Med

ian

Aut

oreg

ress

ive

Par

amet

er E

stim

ates

Figure 310 Spatial dependence parameter estimate versus subsample size

Local estimation leads to spatially varying parameter estimates such asthose shown in Figure 311

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 28: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

28 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

0 50 100 150 200 250 300 350 400 450 500005

0052

0054

0056

0058

006

0062

0064Smoothed SALE Initial Holdout Residuals

Number of Local Observations

Abs

olut

e E

rror

Figure 39 Plot of prediction error on center of area versus subsample size

Usually there is spatial dependence even in small subsamples as shownin Figure 310

29

0 50 100 150 200 250 300 350 400 450 5000

01

02

03

04

05

06

07SALE Autoregressive Parameter Estimates

Number of Local Observations

Med

ian

Aut

oreg

ress

ive

Par

amet

er E

stim

ates

Figure 310 Spatial dependence parameter estimate versus subsample size

Local estimation leads to spatially varying parameter estimates such asthose shown in Figure 311

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 29: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

29

0 50 100 150 200 250 300 350 400 450 5000

01

02

03

04

05

06

07SALE Autoregressive Parameter Estimates

Number of Local Observations

Med

ian

Aut

oreg

ress

ive

Par

amet

er E

stim

ates

Figure 310 Spatial dependence parameter estimate versus subsample size

Local estimation leads to spatially varying parameter estimates such asthose shown in Figure 311

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 30: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

30 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

minus130 minus120 minus110 minus100 minus90 minus80 minus70 minus6025

30

35

40

45

50

Longitude

Latit

ude

β3

Figure 311 Map of influence of homeownership on voting

In addition a user can obtain an idea of the sensitivity of parameter esti-mates to spatial variation such as summarized in Table 36

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 31: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

31

Percentiles 0 10 25 50 75 90 100

α 03100 04300 05500 06300 06700 07100 07800Voting Pop -10079 -08875 -08075 -06962 -05967 -05299 -04066Education -01871 00517 01142 01788 03135 04923 07118Home Ownership 01530 02964 03419 04253 05161 07660 08461Income -02576 -01644 -01072 -00473 00628 01448 02325Lag Voting Pop -01525 00655 02868 04527 05490 06523 08806Lag Education -05734 -03666 -01812 -00299 00677 01322 03318Lag Home Ownership -07010 -04499 -03278 -02337 -01100 -00382 01880Lag Income -04764 -02374 -01733 -00897 00216 00862 02786Intercept -01605 00473 01463 02847 06612 10772 18902

Table 36 Distribution of local estimates

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 32: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

32 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Method Time in seconds

OLS 01560Closest AR 01090MESS 06090Approximate Mix 03280Doubly Stochastic Scaling 71880Delaunay Weight Matrix 54220

Table 37 Times for Different Methods for 57647 Observations

To provide an idea about the performance of the techniques for a largerproblem we estimated a simple hedonic regression over US census tractsThis resulted in 57647 observations Table 37 shows the timings for someof the various operations All of these seem quite fast A user can find aDelaunay weight matrix and estimate a spatial autoregression in under 10seconds on desktop machines

Just selecting a particular weight matrix seems arbitrary Here we take30 nearest neighbors and weight these geometrically A ρ of 1 indicates nodecline in the weight given to further neighbors relative to closer ones whilea ρ of 05 would give half the weight to the second nearest neighbor as itwould to the first nearest neighbor Thus ρ allows changes in the effectivenumber of neighbors used without actually varying the number of neighborsIt often makes sense in this approach to set the number of neighbors to afairly high level (such as 30) Table 38 shows the effect of varying ρ on theprofile log-likelihood Small changes in ρ make large changes in the profilelog-likelihood evidence of the importance of this parameter

It did not take overly long to find the nearest neighbors or the optimalρ even with doubly stochastic scalings of the weight matrix as shown byTable 39

These operations lead to a table of profile log-likelihoods (Table 310)across weight matrices Examining the MESS loglikelihoods over ρ and De-launay and contrasting it with the loglikelihood from appling OLS to the

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 33: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

33

ρ log-likelihood

08000 -231895042608500 -230968848709000 -230828641309500 -231699108910000 -2334345371

Table 38 Likelihoods across ρ for Doubly Stochastic Scaling

Operation Time in seconds

NN computation 314060RS Time to find optimum ρ 567810DS Time to find optimum ρ 948750

Table 39 Times for Optimizing the Likelihood over ρ for Both Scalings

basic non-spatial independent variables demonstrates that even a subopti-mal choice of ρ or Delaunay still dominates the use of an aspatial model inthis case and that optimizing over ρ dominates an arbitrary choice of weightmatrices (Table 310) Moreover the doubly stochastic (DS) scaling helpedgreatly for this example over the regular scaling (RS) In addition inspectionof aspatial OLS versus MESS with an optimal selection of ρ in Table 311shows clear differences among the approaches Note the land area variablebecame insignificant after modeling space

It is not difficult to estimate a spatial autoregression with over one millionobservations In fact the toolbox provides an example (big_one subdirec-tory) under the dataset directory whereby a one million observation spatialautoregression is estimated in just under 20 seconds It took 13063 secondsto find the weight matrix 6024 seconds to simulate the dependent variableand 1942 seconds to estimate the autoregression

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 34: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

34 CHAPTER 3 A BRIEF SELECTED TOUR OF THE TOOLBOX

Variable Value

Aspatial likelihood (OLS) -2665051663Closest Neighbor -2439861676RS Delaunay maximum likelihood -2445098257RS Maximum likelihood across ρ -2542163172RS Optimum ρ 09000DS Delaunay maximum likelihood -2356266213DS Maximum likelihood across ρ -2308286413DS Optimum ρ 09000

Table 310 Likelihoods Across Doubly Stochastic and Regular Scalings

Variables OLS OLS SRD b MESS SRDS MESS

Land area -00850 -968455 -00008 -07159Pop 01146 368592 00239 127630Per cap Income 10837 2082192 06786 1527645Age -01269 -342809 -01384 -450489Lag Land area -00178 -145650Lag Pop 00165 52437Per cap Income -03702 -652683Lag Age 01088 275807Intercept 12236 229837 -05988 -152122α 30986 2532835ρ (relative to ρ = 1) 090 721927Nearest Neighbors 30parameters 5 11

Table 311 OLS versus MESS Results Using Optimal ρ for Doubly Stochas-tic Scaling

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 35: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

Chapter 4

References

If you need to know more about spatial statistics or about some of the specificroutines you may wish to examine

Anselin Luc (1988) Spatial Econometrics Methods and Models DordrechtKluwer Academic Publishers

Barry Ronald and R Kelley Pace ldquoA Monte Carlo Estimator of the LogDeterminant of Large Sparse Matricesrdquo Linear Algebra and its ApplicationsVolume 289 Number 1-3 1999 p 41-54

Chen Jian-Shen and Robert Jennrich (1996) ldquoThe Signed Root DevianceProfile and Confidence Intervals in Maximum Likelihood AnalysisrdquoJournal of the American Statistical Association Volume 91 Number 435 p993-998

Christensen Ronald (1991) Linear Models for Multivariate Time Series andSpatial Data New York Springer-Verlag

Cressie Noel AC (1993) Statistics for Spatial Data Revised ed New YorkJohn Wiley

Dubin Robin A (1988) ldquoEstimation of Regression Coefficients in the Pres-ence of Spatially Autocorrelated Error Termsrdquo Review of Economics andStatistics 70 466-474

Haining Robert (1990) Spatial Data Analysis in the Social and EnvironmentalSciences Cambridge

35

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References
Page 36: Spatial Statistics Toolbox for Matlab 2.0 - Documentation Only (pdf)

36 CHAPTER 4 REFERENCES

LeSage James and R Kelley Pace ldquoSpatial Dependence in Data MiningData Mining for Scientific and Engineering Applications Edited by Robert LGrossman Chandrika Kamath Philip Kegelmeyer Vipin Kumar andRaju R Namburu Kluwer Academic Publishing 2001

LeSage James and R Kelley Pace ldquoSpatial Probit and Tobit Spatial Statisticsand Spatial Econometrics Edited by Art Getis Palgrave 2003

Li Bin (1995) ldquoImplementing Spatial Statistics on Parallel Computersrdquo inArlinghaus S ed Practical Handbook of Spatial Statistics (CRC PressBoca Raton) pp 107-148

Ord JK (1975) ldquoEstimation Methods for Models of Spatial InteractionrdquoJournal of the American Statistical Association 70 120-126

Pace R Kelley and Ronald Barry (1997) ldquoFast CARsrdquo Journal of StatisticalComputation and Simulation 59 p 123-147

Pace R Kelley and Ronald Barry (1997) ldquoQuick Computation of Regres-sions with a Spatially Autoregressive Dependent Variablerdquo GeographicalAnalysis 29 232-247

Pace R Kelley and Dongya Zou ldquoClosed-Form Maximum Likelihood Es-timates of Nearest Neighbor Spatial Dependence Geographical AnalysisVolume 32 Number 2 April 2000 p 154-172

Pace R Kelley and Ronald Barry OW Gilley CF Sirmans ldquoA Method forSpatial-temporal Forecasting with an Application to Real Estate PricesInternational Journal of Forecasting Volume 16 Number 2 April-June 2000p 229-246

Pace R Kelley and James P LeSage Semiparametric Maximum Likeli-hood Estimates of Spatial Dependence Geographical Analysis Vol-ume 34 Number 1 January 2002 p 75-90

Pace R Kelley and James LeSage ldquoLikelihood Dominance Spatial Infer-ence forthcoming Geographical Analysis in January 2003

Pace R Kelley and James LeSage ldquoSpatial Autoregressive Local Estima-tion Spatial Statistics and Spatial Econometrics Edited by Art Getis Pal-grave 2003

Pace R Kelley and James LeSage ldquoChebyshev Approximation of Log-determinants of Spatial Weight Matrices forthcoming in ComputationalStatistics and Data Analysis

Ripley Brian D (1981) Spatial Statistics New York John Wiley

wwwspatial-statisticscom

  • Why the Toolbox Exists
  • Using the Toolbox
    • Hardware and Software Requirements
    • Installation
    • Help and documentation
    • Known Limitations
    • Tips on Using the Toolbox
    • Included Examples
    • Included Datasets
    • Included Manuscripts
      • A Brief Selected Tour of the Toolbox
      • References