View
222
Download
0
Category
Preview:
DESCRIPTION
Beáta Stehliková, Bratislava 3 How to obtain new knowledge? We want to answer the question: How to obtain new information, new knowledge from data?
Citation preview
Statistical methods for real estate data
prof. RNDr. Beáta Stehlíková, CSc.
2013
Beáta Stehliková, Bratislava 2
Informationis currently besides financial, energy,material resources
the main factor of progress.
Beáta Stehliková, Bratislava 3
How to obtain new knowledge?
We want to answer the question:How to obtain new information, new knowledge from data?
Talk only about one method of spatial statistics
Why spatial statistics ?Methods of spatial statistics are for spatial
data
Real estate data contain very often information about the geographic location – there are spatial data
Beáta Stehliková, Bratislava 4
Variable and data
A variable - a characteristic of population or sample that is of interest for us.
Data - the actual values of variables
Beáta Stehliková, Bratislava 5
6
Different kinds of data
Cross-sectional data are data on one or more variables collected at a single point in time
Time series data data are collected over a period of time on one or more variables
Panel data – the same cross-section over time
Obs Price (SEK) Living Area 1 600 000 80 2 750 000 95 3 675 000 75 4 825 000 84 . . .
200 925 000 96
Obs. Year Index GDP 1 1981 101 900 2 1982 105 1050 3 1983 110 1200 .
20 1999 250 8500
in real estate
Types of data (scale)
We have said that data - the actual values of variables
Types of data: Interval data are numerical observations Ordinal data are ordered categorical observations Nominal data are categorical observations
Beáta Stehliková, Bratislava 7
Types of data (scale)
Knowing the type of data (scale) is necessary to properly
select the technique to be used when analyzing data.
Beáta Stehliková, Bratislava 8
2.9
Descriptive statistics involves arranging, summarizing, and presenting a set of data in such a way that useful
information is produced.
Descriptive statistics
Descriptive statistics
Descriptive statistics
graphical techniques (histogram)numerical descriptive measures
Mean (average) Median (middle value) Mode (most frequently ) Variance Standard deviation
Beáta Stehliková, Bratislava 10
Beáta Stehliková, Bratislava 11
Descriptive statistics are not enough
Average (17,8) Standard deviation (4,7) Coefficient of variation
(26,4 %) n=25 1
4
0
11
9
6,1 10,1 14,1 18,1 22,1
2
10
0
12
1
0
2
4
6
8
10
12
9,8 13,8 17,8 21,8 25,8
It is necessary to know
the probability distribution
Consider two data sets A and B
A
B
Beáta Stehliková, Bratislava 12
Second example
Consider two large data sets A and B
Beáta Stehliková, Bratislava 13
The location information
It is not possible to identify differences between data sets without we take into account the location information
Beáta Stehliková, Bratislava 14
The location information
Variograms quantify changes in values in the space
there is no there is spatial autocorrelation
small distances
correspond to small changes in values
small distances
correspond to large
changes in values
Spatial autocorrelation
The degree to which near and more distant things are interrelated
Measures of spatial autocorrelation attempt to deal with similarities
in the location of spatial objects and their attributes
Spatial autocorrelation
Positive (objects similar in location are similar in attribute)
Negative (objects similar in location are very different)
Zero (attributes are independent of location)
Spatial autocorrelation - measures.
Several measures available: Moran’s coefficient I, Geary’s C coefficient, Getis-Ord coefficient G.
These measures may be •“global” - they apply to the study region • or “local” - autocorrelation may exist in some parts of the region but not in others.
Moran’s coefficient I
varies between –1.0 and + 1.0 0 indicates no spatial autocorrelation [1/(n-1)]
(indicate random pattern) When autocorrelation is high, the I coefficient is
close to 1 or -1 Negative values I indicate negative
autocorrelation Positive values I indicate positive autocorrelation
(indicate a tendency toward clustering)
Regression analysis
is a technique for using data to identify relationships among variables and use these relationships to make predictions.
Beáta Stehliková, Bratislava 19
Beáta Stehliková, Bratislava 20
Regression analyses that ignore spatial dependency can have
unstable parameter estimates and unreliable significance tests.
Solution: Spatial Autoregressive Models Lag model Spatial Error model
Beáta Stehliková, Bratislava 21
Spatial Models
22
SPATIAL LAG SPATIAL ERROROrdinary Least Squares
No influence from neighbors
Dependent variable influenced by
neighbors
Residuals influenced by neighbors
Y = β0 + Xβ Y = β0 + λ WY + Xβ + ε Y = β0 + Xβ + ρWε + ξ
Lag model controls spatial autocorrelation in the dependent variable
Error model controls spatial autocorrelation in the residuals, thus it controls autocorrelation in
the dependent and the independent variables
Software GeoDa
Beáta Stehliková, Bratislava 23
Compare different spatial models
Neither R2 nor Adjusted R2 can be used to compare different spatial regression models
We can used Akaike Information Criteria (the smaller the AIC value the better the model)
24
Example
Beáta Stehliková, Bratislava 25
dependent variable y – price of dwellingindependent variable x – living area
Classical regression analysis
Residuals
Beáta Stehliková, Bratislava 26
Moran´s I = 0.193022
Significance:P value= 0.03140<0.05
This indicate positive spatial autocorrelation
between residuals.
Spatial error model
Beáta Stehliková, Bratislava 27
Local Moran’s coefficients
Beáta Stehliková, Bratislava 28
Which values produce spatial autocorrelation ?
Spatial statistics
Methods of spatial statistics very use full for data with the location information
The art of looking for beauty,and science looking for true.
Spatial statistics will help us find the truewhen we use the right methods
Beáta Stehliková, Bratislava 29
Beáta Stehliková, Bratislava 30
Recommended