37
Statistical Analysis in GIS Dr. M. Gavrilova

Statistical Analysis in GIS Dr. M. Gavrilova. Importance of correct data representation Variance and covariance Autocorrelation Applications to

Embed Size (px)

Citation preview

Page 1: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

Statistical Analysis in GIS Dr. M. Gavrilova

Page 2: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

Importance of correct data representation Variance and covariance Autocorrelation Applications to pattern analysis and

geometric modeling

Page 3: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

http://www.math.yorku.ca/SCS/Gallery/

Four colors, three dimensions, and two plots to visualize fivedata points

Page 4: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to
Page 5: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

Steven Skiena, Stony Brook, NY

http://www.cs.sunysb.edu/skiena

Page 6: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

http://www.math.yorku.ca/SCS/Gallery/

Page 7: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

Results of a poll of happiness from the World Values Survey project of people throughout the world in relation to economy, GNP per capita.Many countries, particularly those in Latin America, had higher marks for happiness than their economic situation would predict. Conclusion is based on the assumption that happiness should be linearly related to GNP.

Page 8: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

An organized collection of computer An organized collection of computer hardware, software, geographic data, hardware, software, geographic data, and personnel designed to efficiently and personnel designed to efficiently capture, store, update, manipulate, capture, store, update, manipulate, analyze, and display all forms of analyze, and display all forms of geographically referenced data.geographically referenced data.

Page 9: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

ProvidesProvides◦ an efficient and generally reliable means of

obtaining knowledge about spatial processes,◦ a way of maximizing our knowledge of spatial

processes with the minimum of error.

Page 10: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

Spatial DataSpatial Data location and attribute location and attribute Pi (x, y, z) Pi (x, y, z)

Spatial Stochastic ProcessesSpatial Stochastic Processes statistics and inferencestatistics and inference

Spatial is specialSpatial is special spatial autocorrelationspatial autocorrelation spatial non-stationarityspatial non-stationarity proximityproximity

Page 11: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

The Space Shuttle Challenger exploded shortly after take-off in January 1986. Cause: failure of the O-ring seals used to isolate the fuel supply from burning gases. Graph from the Report of the Presidential Commission on the Space Shuttle Challenger Accident, 1986. NASA staff had analysed the data on the relation between temperature and number of O-ring failures (out of 6), but they had excluded observations where no O-rings failed, believing that they were uninformative. They were main observations showing no failure at warm temperatures (65-80 degF).

Page 12: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

Apart from the disasterouse omitting the observations with 0 failures:   1. drawing a smoothed curve to fit the points    2. removing the background grid which obscure datagives a graph which shows excessive risks associated with both high and low temperatures

Page 13: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to
Page 14: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

Reanalysis of the O-ring data involved fitting a logistic regression model. This provides a predicted extrapolation (black curve) of the probability of failure to the low (31 degF) temperature at the time of the launch and confidence bands on that extrapolation (red curves). See also Tappin, L. (1994). "Analyzing data relating to the Challenger disaster". Mathematics Teacher, 87, 423-426

There's not much data at low temperatures (the confidence band is quite wide), but the predicted probability of failure is uncomfortably high. Would you take a ride on Challenger when the weather is cold?

Page 15: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

The French engineer, Charles Minard (1781-1870), illustrated the disastrous result of Napoleon's failed Russian campaign of 1812. The graph shows the size of the army by the width of the band across the map of the campaign on its outward and return legs, with temperature on the retreat shown on the line graph at the bottom. Many consider Minard's original the best statistical graphic ever drawn.

Page 16: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to
Page 17: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to
Page 18: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

• Samples, populations, consist of individuals.Samples, populations, consist of individuals.

• Values of certain attributes are called Values of certain attributes are called observations observations (e. g.: age, income).(e. g.: age, income).

• Attributes vary across Attributes vary across individualsindividuals, and they are , and they are called called variablesvariables..

• Variables are described by Variables are described by distributionsdistributions and and their parameters (e.g.: their parameters (e.g.: NormalNormal, , PoissonPoisson, )., ).

• A A randomrandom variablevariable X assumes its value X assumes its value according to the outcome of a chance experiment according to the outcome of a chance experiment (coin, dice).(coin, dice).

Page 19: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

Variance is the sum of squared deviations from the mean divided by n (or n-1) sample number.

Sample VarianceSample Variance

Population VariancePopulation Variance

Page 20: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

Spatial autocorrelation is a measure of the similarity of Spatial autocorrelation is a measure of the similarity of objects within an area.objects within an area.

Jay Lee and Louis K. Marion, 2001

Page 21: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

The formula to compute Moran’s index is the following:

ii

jijiij

z

zzw

A

nM

2

,

where n is the number of individual points,

A – area of the bounding polygon, i.e. the total area

of the map including all points

zi- value of the parameter measured for point I

(attribute)

Page 22: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

wij is computed according to the following rule, min(dij) is the smallest of all distances between all pairs of points computed:

In this formula, distance dij is computed according to the formulas for Euclidean, supremum or Manhattan metrics. Since dii is equal to 0, wii will become infinite, thus cases when i=j should be excluded. This will result in n2 –n pairs of points.

j

i

ij

ijijij z

z

d

dw

)(min

Page 23: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

◦ The sum by all i,j means that ALL ORDERED PAIRS of points (i.e. order of consideration of pair ij is important) should be considered by the formula.

Sometimes, only pair of sample points within a specific distance from each other are considered.

Page 24: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

Example: autocorrelation on a grid. Sample points are combined in one cell. Size

and location of the cell defines autocorrelation parameters.

Consider all pairs of GRID CELLS, where XC and YC now denote coordinates of the center of each grid cell and the attribute z for each grid is the sum of combined attributes of all points that belong to this cell.

Result: insight on pattern analysis and correlation can be obtained.

Page 25: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

Analysis of instances of patients undergoing Analysis of instances of patients undergoing cardiac catheterization, and location of those cardiac catheterization, and location of those instances, i.e. city blocks. instances, i.e. city blocks.

Primary question: spatial variation of heart disease: Primary question: spatial variation of heart disease: random or non-random pattern?random or non-random pattern?

Secondary question: relationship between disease Secondary question: relationship between disease occurrence and social and demographic factors occurrence and social and demographic factors (Spatial Regression).(Spatial Regression).

Page 26: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

Analysis results are affected by grid sizeAnalysis results are affected by grid size

• prone to subjective choicesprone to subjective choices

• constrained by spatial resolution of dataconstrained by spatial resolution of data

Solving the problem by Solving the problem by

• using a non-arbitrary grid(s)using a non-arbitrary grid(s)

• implementing a “guided” selection of the square implementing a “guided” selection of the square unit area or grid sizeunit area or grid size

Page 27: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to
Page 28: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

• Definition of a city-block grid based on the main Definition of a city-block grid based on the main division in the city, i.e. using the squared grid division in the city, i.e. using the squared grid centered on the intersection between Center centered on the intersection between Center Street and Center Avenue as the main axes of the Street and Center Avenue as the main axes of the geometric plan thus created. geometric plan thus created.

• Grid regularity decreases as distance increases Grid regularity decreases as distance increases from its center.from its center.

• L_p norms provide flexibility to adjust grid’s size L_p norms provide flexibility to adjust grid’s size and shape consequently.and shape consequently.

Page 29: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

Application of varying L_p normsApplication of varying L_p norms

Varying spatial weights for spatial Varying spatial weights for spatial autocorrelationautocorrelation

Autocorrelation analysis at varying scales Autocorrelation analysis at varying scales (CDA, community)(CDA, community)

Data: 2001/1996 censusData: 2001/1996 census

Page 30: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

Spatial Correlation Estimate

Statistic = "moran" Sampling = "free"

Correlation = 0.1429

Variance = 0.001341

Std. Error = 0.03662

Normal statistic = 3.921

Normal p-value (2-sided) = 8.802e-5

Null Hypothesis: No spatial autocorrelation

Page 31: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

Sensitivity of Spatial Autocorrelation to Sensitivity of Spatial Autocorrelation to

L_p normL_p norm

spatial weightspatial weight

Proposed method useful in determining Proposed method useful in determining

best distance best distance

best spatial weightbest spatial weight

In context of multivariate spatial regressionIn context of multivariate spatial regression

““best” best” lowest variance lowest variance

Page 32: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

The Calgary Journal, Regional publication, “Researchers link heart disease to urban lifestyles” on SPARCS activity profile, Oct. 26 – Nov. 8, 2005

High risk of heart attack: male, high education, married

Page 33: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to
Page 34: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

# cells* Min. Max. Mean St. dev. Sum Skew Kurt.

Oil spill counts44

(2,741)0 3 0.02 0.162 53 9.85 113.6

Flight counts2151

(2,741)0 309 13.75 27.12 37,681 4.21 25.6

The mean and the standard deviation provide information about the statistical dispersion of the data; and skewness (irregular) and kurtosis (bulging in Greek) indicate highly skewed distributions or lack of normality in the data.

Page 35: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to
Page 36: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

Our exploratory analyses indicate that there is a positive spatial autocorrelation within datasets for all variables.

An initial overview of the statistical distribution and normality of each of the variables selected for this study indicated absence of normality in the data.

Exploratory Spatial Analysis of Illegal Oil Discharges Detected off Canada’s Pacific Coast.

Norma Serra-Sogas1, Patrick O’Hara2, Rosaline Canessa3, Stefania Bertazzon4 and Marina Gavrilova5

Page 37: Statistical Analysis in GIS Dr. M. Gavrilova.  Importance of correct data representation  Variance and covariance  Autocorrelation  Applications to

Proper statistical analysis is important Variance and autocorrelation are two

important vehicles for data analysis Combining these measures with various

metrics, hierarchical structures, grids, attributes and also data filtering/visualization methods is a direction of current research.