Sa Presentation 20070917111 Thomas

Vector based spatial analysis

Nikolaos Spyropoulos and Thomas K. AndersenInstitute of Geography

The ESRI Guide to GIS Analysis, Mitchell 2005

• Chapter 4, Identifying Clusters

• Chapter 5, Analyzing Geographic Relationships

Chapter 4, Identifying Clusters

Identifying Clusters

Why identify clusters?

•Get an understanding of the location pattern in an area

•Compare these patterns with other features, for identifying possible contributing factors

•Take action on behalf of these identified clusters

Clusters of burglaries Income and emergency calls

Using statistics to identify clusters

Conclusions can be drawn when looking at a map (e.g. where is the cluster), by using statistics it is possible to test the conclusions and validate them

With statistics each events is counted as an unique occurrence, which is hard to see on a map;

Time period of data

The time period of data can vary a lot, from current conditions to long time periods

- For vacant parcels you need a snapshot of the current condition, for crimes or earthquakes, defining a time period is needed

Vacant houses Crimes Earthquakes

Now 6 month 100 years

Therefore: The time period is different, and has to be defined

Distance within clusters

Clusters are usually defined by using Euclidian distance.

Though travel time or cost can also be used.

- Clusters of burglaries can be dependent on driving time between the crimes. Because Euclidian distance doesn’t take barriers (such as a river) into account, the Euclidian distance seems very close, even though the travel time is long.

Identifying clusters - methods

Two methods for identifying clusters:

1. Finding clusters of features

when features are found in close proximity

2. Finding clusters of similar value

when groups of high and low values are found together (”hot and cold spots”)

Finding clusters of features

Nearest neighbour hierarchical clustering (1)

”One method for finding clusters is to specify the distance features can be from each other, in order to be part of a cluster, and the minimum number of features that make up a cluster.”

(Mitchell 2005:152)

Clusters with a specified number of features within a specified distance

The method is hierarchical because the routine continues on to group the clusters into larger clusters (shows several geographic scales e.g. neighbourhood and citywide for crimes).

Clusters at small scale(Neighbourhood) Clusters at bigger scale(citywide)



How nearest neighbour hierarchical clustering works:

A probability level is specified, to calculate the distance within which features will be considered a cluster

If the distance is greater than the high end of the range, the features are further apart than you would expect by chance. For clustering it is opposite, the low end of the scale is interesting (Confidence interval)

. The confidence interval is calculated by using the mean distance that would occur between points in a random distribution “mean random distance”.

--See page 155 and 156 for calculation

Finding clusters of similar value

Finding clusters of similar values

The GIS looks at the attribute values of each feature and its neighbours, as well as the proximity of the features.

Then calculates a degree to which nearby features have similar values for a given attribute.

Percent age 65 or over Percentages of seniors similar to their neighbours

(Blue less similar, red more similar)

Identifying clusters of similar values (1)

Where high values are surrounded by high values or low values are surrounded by low values, the features are similar

Identifying clusters of similar values (2)

A statistic is calculated for each feature. It is then possible to map the features based on this value, to see the locations of features of similar value

Moran’s Ii (1)

A method to identify similar values

Emphasizes how features differ from the values in the study area as a whole

Compares the value of each feature in a pair to the mean value for all features in the study area (local variation - the method looks what’s happening right around each feature)

--Calculation see page 167

Moran’s Ii (2)

The value for Moran’s Ii depends on the difference in attribute values, the number of neighbours with similar values, and the magnitude of the attribute data

• A high positive value for indicates that the feature is surrounded by features with similar values, either high or low.

• A Negative value indicates that the feature is surrounded by features of dissimilar values.

Gi statistic

Identifying concentrations (clusters) of high and low values within a distance

Compares neighbouring within a specified distance

Two versions:

1. Gi statistic

2. Gi*

Version 1 - Gi statistic

Is used to find out what’s going on around a feature/or cell, without taking the target value into account

-Used for dispersion of a certain phenomena in a certain area. Gi has been used to track down the spreading of AIDS in the counties in the San Francisco area. It was possible to see the increase over time and distance

Version 2 – Gi*

The value of the target feature is included. Used to find hot or cold spots.

A distance (search radius) is defined

This distance is based on the knowledge of the features and their behaviour. Example: how long are people willing to travel to go to a certain store? (Euclidian dist., travel time etc.)

Chapter 5, Analyzing Geographic Relationships

Analyzing Geographic Relationships

Why Analyze Geographic Relationships?

Analysis of feature distributions.

Analysis of relationships between features.

Understanding of Predict where Examine why what is going on something is things occur In a place. likely to occur. where they do.


Understanding what is going on in a place.

Example: Analysis of accidents related to speed limit in highways


Predicting where something is likely to occur.

Example: Analysis of landforms in order to identify artifacts locations.


Examine why things occur where they do.

Example: Improvment of newborns health.

Using Statistics to Analyze Relationships

• When we look for relationships we form an opinion about things based on personal knowledge of phenomena or visual analysis of the map.

• Statistics allow us to verify those relationships and measure how strong they are.

• The idea behind using statistics is: To see in what extent the value of an attribute changes

when an other changes,

measure the relationship between two or more maps representing the variables (analyze the relationship between two attribute data).

Assigning Variables to Geography

•Variables from different layers must be associated with the same geographic unit.

Case not:i)Different cell sizes Ratioii)Different set of features Combine feauturesiii)Points representing diff. categories of features Sum Features to areaiv)Combine two or more sets of features Raster

Example: Emergency calls and population data.

Using Statistics to Analyze Geographic Relationships

Two statistical assumptions:•Each value is likely to occur equaly to the sample•The value of an observation doesn’t affect an other value

In Geography:

•Attribute values vary across a region Regional trends influence attribute values


•Nearby features are more similar than distant ones

Spatial autocorrelation

Violation of observations independance

Smaller units tend to be more similar than bigger.


Identifying relationships Vs Analyzing processes

Asking for Relationships Analyzing processesbetween (x,y)

Measure the extent of main variablesvariation that drives a processTake actions predict values Understand of a variable

Identifying Geographic Relationships

How much two attributes vary.

direct relationship inverse relationship (positive correlation) (negative correlation)

If suspisious about a relationship then:

measure the relationship confirm measure direction and strenghth

Methods for Identifying Geographic Relationships

•Pearson’s Correlation Coefficient

Methods for Identifying Geographic Relationships

•Spearman’s Rank Correlation Coefficient

measures the extent to which two lists of ranked values correspond

Identifying Geographic Relationships

What correlation coefficient doesn’t measure

• Can not apply results of correlation e.g. from a county to the nation.

• Doesn’t measure causation X Y

• Correlation doesn’t explain why there is a relationship.

• Doesn’t measure the form of the relationship just the dispretion around a straight line.

Analyzing Geographic Processes

We analyze geographic processes in order to predict that something will occur.

Steps1. Develop a theory as to what is driving the process2. Analyze the relationships between various atributes of your

data (build a Model)


Linear Regression Analysis

•Plot variables on chart.•Find the line that passes between all data points (ordinary least squares method)


Ordinary Least Squares

Example from Wikipedia


Interpreting the results of regression analysis

We can see how our model works by comparing the variance inthe predicted values to the variance in the observed values.

• Perfect fit (all points on line) then R2 = 1

• Any other case with 1>R2 means not perfect fit

Calculate residuals (differences between predicted & observed values)

Using More Than One Independent Variable

Most geographic processes aren’t controlled by a single variable

New Regression Analysis Equation

r2 in multivariate regression describes the variation in y explained by the combination of independent variables.

Using More Than One Independent Variable

Identifying the key variables

Analysis

Test the significance of each variablet-test

Goal

Factors Influencing the Regression Analysis Results

Least squares regression analysis is effective only if the following are true:

1. Linear relationship between Y,X.2. Residuals have a Mean of 0.3. Residuals have a constant Variance.4. Residuals are randomly aranged along the regression line.5. Residuals are normaly distributed.6. Independent variables are not highly correlated.

Regretion Analysis & Geographic Data

For geographic data misspesification can result from many sources.

Can Occur When:

Analyze data Missing variablesat the wrongscale for the process

Dealing With Regional Variation

Geographic Weighted Regression (GWR)

• Allows model coefficients to vary regionally.

• Regression runs for each location and not as a whole.

Example: Per capita income.

Dealing with Local Trends

Methods to address local trends.

Resampling Spatial filtering(remove spatial autocorrelation)

Running A Linear Regression Analysis With Geographic Data.

1. Determine what are you trying to predict.

2. Identify the key independent variables.

3. Examine the distribution of your variables.

4. Run the ordinary least squares regression.

5. Examine the coefficients for each independent variable.

6. Examine the residuals.• Test for spatial autocorrelation• Look for missing variables• Plot y-values against residuals• Create a frequency curve

Documents

Sa Presentation 20070917111 Thomas