Upload
nspiropo
View
164
Download
0
Tags:
Embed Size (px)
Citation preview
Vector based spatial analysis
Nikolaos Spyropoulos and Thomas K. AndersenInstitute of Geography
The ESRI Guide to GIS Analysis, Mitchell 2005
• Chapter 4, Identifying Clusters
• Chapter 5, Analyzing Geographic Relationships
Chapter 4, Identifying Clusters
Identifying Clusters
Why identify clusters?
•Get an understanding of the location pattern in an area
•Compare these patterns with other features, for identifying possible contributing factors
•Take action on behalf of these identified clusters
Clusters of burglaries Income and emergency calls
Using statistics to identify clusters
Conclusions can be drawn when looking at a map (e.g. where is the cluster), by using statistics it is possible to test the conclusions and validate them
With statistics each events is counted as an unique occurrence, which is hard to see on a map;
Time period of data
The time period of data can vary a lot, from current conditions to long time periods
- For vacant parcels you need a snapshot of the current condition, for crimes or earthquakes, defining a time period is needed
Vacant houses Crimes Earthquakes
Now 6 month 100 years
Therefore: The time period is different, and has to be defined
Distance within clusters
Clusters are usually defined by using Euclidian distance.
Though travel time or cost can also be used.
- Clusters of burglaries can be dependent on driving time between the crimes. Because Euclidian distance doesn’t take barriers (such as a river) into account, the Euclidian distance seems very close, even though the travel time is long.
Identifying clusters - methods
Two methods for identifying clusters:
1. Finding clusters of features
when features are found in close proximity
2. Finding clusters of similar value
when groups of high and low values are found together (”hot and cold spots”)
Finding clusters of features
Nearest neighbour hierarchical clustering (1)
”One method for finding clusters is to specify the distance features can be from each other, in order to be part of a cluster, and the minimum number of features that make up a cluster.”
(Mitchell 2005:152)
Clusters with a specified number of features within a specified distance
The method is hierarchical because the routine continues on to group the clusters into larger clusters (shows several geographic scales e.g. neighbourhood and citywide for crimes).
Clusters at small scale(Neighbourhood) Clusters at bigger scale(citywide)
Nearest neighbour hierarchical clustering (2)
Nearest neighbour hierarchical clustering (3)
How nearest neighbour hierarchical clustering works:
A probability level is specified, to calculate the distance within which features will be considered a cluster
If the distance is greater than the high end of the range, the features are further apart than you would expect by chance. For clustering it is opposite, the low end of the scale is interesting (Confidence interval)
. The confidence interval is calculated by using the mean distance that would occur between points in a random distribution “mean random distance”.
--See page 155 and 156 for calculation
Finding clusters of similar value
Finding clusters of similar values
The GIS looks at the attribute values of each feature and its neighbours, as well as the proximity of the features.
Then calculates a degree to which nearby features have similar values for a given attribute.
Percent age 65 or over Percentages of seniors similar to their neighbours
(Blue less similar, red more similar)
Identifying clusters of similar values (1)
Where high values are surrounded by high values or low values are surrounded by low values, the features are similar
Identifying clusters of similar values (2)
A statistic is calculated for each feature. It is then possible to map the features based on this value, to see the locations of features of similar value
Moran’s Ii (1)
A method to identify similar values
Emphasizes how features differ from the values in the study area as a whole
Compares the value of each feature in a pair to the mean value for all features in the study area (local variation - the method looks what’s happening right around each feature)
--Calculation see page 167
Moran’s Ii (2)
The value for Moran’s Ii depends on the difference in attribute values, the number of neighbours with similar values, and the magnitude of the attribute data
• A high positive value for indicates that the feature is surrounded by features with similar values, either high or low.
• A Negative value indicates that the feature is surrounded by features of dissimilar values.
Gi statistic
Identifying concentrations (clusters) of high and low values within a distance
Compares neighbouring within a specified distance
Two versions:
1. Gi statistic
2. Gi*
Version 1 - Gi statistic
Is used to find out what’s going on around a feature/or cell, without taking the target value into account
-Used for dispersion of a certain phenomena in a certain area. Gi has been used to track down the spreading of AIDS in the counties in the San Francisco area. It was possible to see the increase over time and distance
Version 2 – Gi*
The value of the target feature is included. Used to find hot or cold spots.
A distance (search radius) is defined
This distance is based on the knowledge of the features and their behaviour. Example: how long are people willing to travel to go to a certain store? (Euclidian dist., travel time etc.)
Chapter 5, Analyzing Geographic Relationships
Analyzing Geographic Relationships
Why Analyze Geographic Relationships?
Analysis of feature distributions.
Analysis of relationships between features.
Understanding of Predict where Examine why what is going on something is things occur In a place. likely to occur. where they do.
Why Analyze Geographic Relationships?
Understanding what is going on in a place.
Example: Analysis of accidents related to speed limit in highways
Why Analyze Geographic Relationships?
Predicting where something is likely to occur.
Example: Analysis of landforms in order to identify artifacts locations.
Why Analyze Geographic Relationships?
Examine why things occur where they do.
Example: Improvment of newborns health.
Using Statistics to Analyze Relationships
• When we look for relationships we form an opinion about things based on personal knowledge of phenomena or visual analysis of the map.
• Statistics allow us to verify those relationships and measure how strong they are.
• The idea behind using statistics is: To see in what extent the value of an attribute changes
when an other changes,
measure the relationship between two or more maps representing the variables (analyze the relationship between two attribute data).
Assigning Variables to Geography
•Variables from different layers must be associated with the same geographic unit.
Case not:i)Different cell sizes Ratioii)Different set of features Combine feauturesiii)Points representing diff. categories of features Sum Features to areaiv)Combine two or more sets of features Raster
Example: Emergency calls and population data.
Using Statistics to Analyze Geographic Relationships
Two statistical assumptions:•Each value is likely to occur equaly to the sample•The value of an observation doesn’t affect an other value
In Geography:
•Attribute values vary across a region Regional trends influence attribute values
Using Statistics to Analyze Geographic Relationships
•Nearby features are more similar than distant ones
Spatial autocorrelation
Violation of observations independance
Smaller units tend to be more similar than bigger.
Using Statistics to Analyze Geographic Relationships
Identifying relationships Vs Analyzing processes
Asking for Relationships Analyzing processesbetween (x,y)
Measure the extent of main variablesvariation that drives a processTake actions predict values Understand of a variable
Identifying Geographic Relationships
How much two attributes vary.
direct relationship inverse relationship (positive correlation) (negative correlation)
If suspisious about a relationship then:
measure the relationship confirm measure direction and strenghth
Methods for Identifying Geographic Relationships
•Pearson’s Correlation Coefficient
Methods for Identifying Geographic Relationships
•Spearman’s Rank Correlation Coefficient
measures the extent to which two lists of ranked values correspond
Identifying Geographic Relationships
What correlation coefficient doesn’t measure
• Can not apply results of correlation e.g. from a county to the nation.
• Doesn’t measure causation X Y
• Correlation doesn’t explain why there is a relationship.
• Doesn’t measure the form of the relationship just the dispretion around a straight line.
Analyzing Geographic Processes
We analyze geographic processes in order to predict that something will occur.
Steps1. Develop a theory as to what is driving the process2. Analyze the relationships between various atributes of your
data (build a Model)
Analyzing Geographic Processes
Linear Regression Analysis
•Plot variables on chart.•Find the line that passes between all data points (ordinary least squares method)
Analyzing Geographic Processes
Ordinary Least Squares
Example from Wikipedia
Analyzing Geographic Processes
Interpreting the results of regression analysis
We can see how our model works by comparing the variance inthe predicted values to the variance in the observed values.
• Perfect fit (all points on line) then R2 = 1
• Any other case with 1>R2 means not perfect fit
Calculate residuals (differences between predicted & observed values)
Using More Than One Independent Variable
Most geographic processes aren’t controlled by a single variable
New Regression Analysis Equation
r2 in multivariate regression describes the variation in y explained by the combination of independent variables.
Using More Than One Independent Variable
Identifying the key variables
Analysis
Test the significance of each variablet-test
Goal
Factors Influencing the Regression Analysis Results
Least squares regression analysis is effective only if the following are true:
1. Linear relationship between Y,X.2. Residuals have a Mean of 0.3. Residuals have a constant Variance.4. Residuals are randomly aranged along the regression line.5. Residuals are normaly distributed.6. Independent variables are not highly correlated.
Regretion Analysis & Geographic Data
For geographic data misspesification can result from many sources.
Can Occur When:
Analyze data Missing variablesat the wrongscale for the process
Dealing With Regional Variation
Geographic Weighted Regression (GWR)
• Allows model coefficients to vary regionally.
• Regression runs for each location and not as a whole.
Example: Per capita income.
Dealing with Local Trends
Methods to address local trends.
Resampling Spatial filtering(remove spatial autocorrelation)
Running A Linear Regression Analysis With Geographic Data.
1. Determine what are you trying to predict.
2. Identify the key independent variables.
3. Examine the distribution of your variables.
4. Run the ordinary least squares regression.
5. Examine the coefficients for each independent variable.
6. Examine the residuals.• Test for spatial autocorrelation• Look for missing variables• Plot y-values against residuals• Create a frequency curve