42
Uppsala University Department of Statistics Bachelor Thesis Spring 2018 Modelling Migration An evaluation of existing spatial interaction models and decay functions on municipality level in Sweden Klara Hvarfner and Teodor Sandell Supervisors: John Östh and Marina Toger

Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

Uppsala University

Department of Statistics

Bachelor Thesis

Spring 2018

Modelling Migration

An evaluation of existing spatial interaction models and decay

functions on municipality level in Sweden

Klara Hvarfner and Teodor Sandell

Supervisors: John Östh and Marina Toger

Page 2: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

Abstract

This study examines which distance decay parameter and spatial interaction model should be

used when studying migration on municipality level in Sweden. The data used consists of

100000 randomly sampled observations from the Uppsala University database PLACE from

the years 2013-2014. The decay functions that are examined are exponential decay, power

decay and exponential normal decay. The interaction models that are used are the

unconstrained, doubly constrained and half-life model. The different functions and models

were evaluated using RMSE and Pearson’s correlation. The results show that the power

function and doubly constrained model most accurately estimated the flows of migration.

However, all of the model types can be considered preferable depending on how the data are

constructed and what the objective of the study is.

Keywords: Migration, Spatial interaction model, Unconstrained, Half-life, Doubly

constrained, Distance decay.

Page 3: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

Acknowledgements

We would like to express our gratitude to our supervisors John Östh and Marina Toger from

the department of Social and Economic Geography at Uppsala University. Thank you for

always having your doors open and guiding us through an interesting and new subject.

Page 4: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

Table of Contents 1. Introduction ............................................................................................................................ 1

1.1. Spatial Interaction Modelling .......................................................................................... 1

1.2. Background ..................................................................................................................... 2

1.3. Purpose and Research Question ...................................................................................... 2

2. Data ........................................................................................................................................ 3

2.1. Variables.......................................................................................................................... 3

Distance .............................................................................................................................. 4

The Flow Variables ............................................................................................................ 7

3. Method ................................................................................................................................... 7

3.1. The Distance Decay Functions ........................................................................................ 7

3.2. The Unconstrained Model ............................................................................................... 8

OLS Assumptions ............................................................................................................... 9

3.3. The Doubly Constrained Model .................................................................................... 10

3.4. The Half-Life Model ..................................................................................................... 11

3.5. Goodness of Fit Measures ............................................................................................. 14

3.6. Limitations of Spatial Analysis ..................................................................................... 14

MAUP ............................................................................................................................... 14

Spatial Autocorrelation ..................................................................................................... 15

4. Results .................................................................................................................................. 16

4.1. The Unconstrained Model ............................................................................................. 16

Regression Assumption Evaluation .................................................................................. 17

4. 2. The Doubly Constrained Model ................................................................................... 20

4.3. The Half-Life Model ..................................................................................................... 21

4.4. Decay Parameter Evaluation ......................................................................................... 21

5. Discussion ............................................................................................................................ 23

6. Conclusions .......................................................................................................................... 24

7. References ............................................................................................................................ 25

8. Appendix .............................................................................................................................. 28

8.1. The Unconstrained Exponential Decay Model ............................................................. 28

8.2. The Unconstrained Power Decay Model ...................................................................... 32

8.3. Moran’s Index for the Unconstrained Models .............................................................. 36

Page 5: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

1

1. Introduction

Within human geography flows of goods, information and people have been analyzed through

spatial interaction analysis for a long time. Migration, or the movement of people, in particular

has been a subject of interest since it affects several societal functions. Migration is important

to understand considering aspects such as urban planning when for example more housing is

needed or whether there is a risk of depopulation. Such issues are easier to prevent or prepare

for if the flows of migration can be predicted. One of the first to study migration was Ernst

Georg Ravenstein and in his published work The Laws of Migration 1885 he stated seven

different rules in order to explain who migrates and why (Ravenstein, 1885). Since then,

researchers have tried to model migration and understand its “laws”.

Spatial interaction is, simply put, flows between locations in geographic space. In order to

understand migration or spatial interaction in general it is important to understand how spatial

separation, or distance, works as a deterrent to interaction. (Fotheringham, 1980). To measure

how distance works as a deterrent power in spatial interaction analysis there are several

different spatial interaction models (SIM) that can be used and within them several different

distance decay functions. Distance decay can be defined as: “The rate at which the volume of

interaction decreases as the distance over which the interaction is taking place increases, ceteris

paribus” (Fotheringham, 1980, p.2). In practice, this means that flows tend to travel the shortest

possible distance i.e. the principle of least effort.

1.1. Spatial Interaction Modelling

The most widely used model for studying spatial interaction is the gravity model which has

been used for a long time in order to understand, analyze and predict flows in geographic space

such as migration. The gravity model stems from Newton’s analogy of gravity between planets

as a function of mass and distance and is now used in different areas of social science as well

as other fields. (Haynes and Fotheringham, 1984). Part of its success is due to three facts, its

intuitive consistency with migration theories, the ease of estimating it and its goodness of fit.

(Poot et al. 2016). The model has the following form:

𝑇𝑖𝑗 = 𝐾𝑂𝑖𝐷𝑗𝑓(𝑑𝑖𝑗) (1)

Page 6: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

2

Where model (1) explains the number of flows between an origin and destination as a function

of repulsion, attraction and distance. From this general form, there are several more advanced

versions of the model that are explained thoroughly in the method section.

1.2. Background

In “A new way of determining distance decay parameters in spatial interaction models with

application to job accessibility analysis in Sweden” by Östh, Reggiani and Lyhagen three

different techniques for calculating distance decay parameters are examined and compared.

The more commonly used unconstrained and doubly constrained spatial interaction models are

compared with the half-life model, which is a relatively new model in spatial interaction

analysis. Five different decay functions are used: exponential decay, power decay, exponential

normal decay, exponential square root decay and the log normal decay. Two empirical

applications related to job accessibility in Sweden are made in order to compare the different

decay functions. The first example is a smaller dataset studying job accessibility flows on

municipality level and the second is a more disaggregated data set with 5km*5km squares. By

comparing RMSE and Pearson’s correlation for the estimated parameters using the different

techniques conclusions were drawn about which estimated parameters worked best. For smaller

to midsized datasets the doubly constrained spatial interaction models worked better. Half-life

models and unconstrained models behaved in a similar way, but when using larger datasets,

the doubly constrained models became impossible to estimate while the half-life models

became more accurate (Östh et al. 2016).

The authors state that since half-life models can be used for several different decay functions

it can be assumed that they can be useful in both long- and short-span trips in studies of

accessibility (Östh et al. 2016). A long-span type of flow of importance is migration which in

this study is explored through the three different types of spatial interaction models similarly

to Östh et al (2016).

1.3. Purpose and Research Question

The aim of this study is to examine which decay function and spatial interaction model fits best

for examining migration within Sweden. This is done by examining three of the most

commonly used decay functions on unconstrained and doubly constrained spatial interaction

Page 7: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

3

models as well as a half-life model on individual data aggregated to municipality level from

Sweden. The research question, the study attempts to answer is therefore the following:

Which spatial interaction model and what decay function should be used when studying

migration flows on municipality level in Sweden?

2. Data

The data used in the study are from the database PLACE 2013-2014 which is a subsample from

Statistics Sweden’s database LISA (Longitudinell integrationsdatabas för Sjukförsäkrings- och

Arbetsmarknadsstudier).1 The sample used in this study contains 100000 observations which

were randomly sampled from PLACE containing roughly 10 million observations of Swedish

citizens.

Out of the total 100000 observations in the sample only the 12612 individuals that actually

migrated between the years of 2013 and 2014 are studied. Since the aim is to study migration

on municipality level the data are aggregated to 2902 = 84100 observations, representing

flows from every municipality to all others. The variables needed are the number of migrational

inflows and outflows to each municipality, distances from each of the municipalities to each of

the municipalities as well as the total number of interactions between all of the municipalities.

Naturally, there are many observations with missing values since the number of observations

is far larger than the number of individuals in the data, these observations are set to zero.

2.1. Variables

The original dataset contains 17 different variables with information about where the

individuals lived and worked in 2013 and 2014, as well as if they moved and properties such

as gender, whether the individuals were rich or poor etc. The variables needed in the study to

create the models are: distance, outflows, inflows and total number of interactions. These

variables are further explained in the coming sections.

1 The data was generously provided by our supervisor John Östh from the Department of Social and Economic

Geography at Uppsala University.

Page 8: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

4

Distance

The measurement for distance that is used is the Cartesian distance which is calculated by

measuring the distance between the coordinates of where an individual lived 2013 and 2014

and it is given in meters. Using the Cartesian distance instead of the actual transport distance2

can be criticized for not being completely accurate. However, the available data do not contain

information about anything related to the travel distance except the coordinates hence the

Cartesian distance is the only available option. More precisely, the coordinate system that is

used is a projected coordinate system called RT90. (Lantmäteriet, 2018) The distance between

two positions is then calculated by the distance formula stemming from Pythagoras theorem

stated below:

𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 = √(𝑥2 − 𝑥1)2 + (𝑦2 − 𝑦1)2 (2)

The actual distances that are used are between municipalities, aggregated from the individual

data, and it is therefore important to choose the appropriate position within each zone where

from the distance is measured. This point is decided by calculating the average position for

each municipality’s inflow and for each municipality’s outflow i.e. the average point people in

each municipality migrate from as well as to. This might cause some irregularities e.g. if the

average position of a zone ends up in a lake, nonetheless it is still the method that captures the

most information and it is one of the downsides of aggregating the data. Another way of

determining the positions of inflows and outflows would have been to use the median

coordinates. This would perhaps yield more accurate positions but the difference this would

make for the results can be considered negligible.

The variable itself is calculated for each distance between all zones meaning that there in total

are 2902 = 84100 distances. It is calculated by using two vectors and creating a matrix.

2 A more complex way of calculating distance that takes into account existing transportation infrastructure.

(Rodrigue et al. 2017)

Page 9: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

5

|

|

|

𝑑 𝑂 𝐷1 1 1. 1 .. 1 .𝑛 1 𝑛

𝑛2 − 𝑛 + 1 𝑛 1. . .. . .

𝑛2 𝑛 𝑛

|

|

|

(3)

Matrix (3) shows how the distances for all 𝑖-to-𝑗 pairs are calculated, where 𝑑 represents

distance, O represents origin 𝑖 and D represents destination 𝑗. The last 𝑛 elements represent

the distance from the 𝑛:th origin to the destinations 1 to 𝑛.

Out of the 84100 distances there are only 2519 distances that contain actual flows, the rest are

zero interactions and these distances are ignored. This was expected since there are, as

previously, mentioned only 12612 individuals in the data.

Differences in distance on individual and municipal level

In Figure 2.1.1. the distribution of distance of migration on municipality level is shown. In the

figure, the distance decay effect on migration is visualized. In Figure 2.1.2. the distribution of

distance on the individual level data is shown. The two figures differ in how rapid the decaying

effect is. The difference in distribution of distance in the two different datasets is also shown

in Table 2.1.1. The differences are a result of aggregating the data.

Table 2.1.1: Properties of distance on municipality level data in meters

Distance

Minimum Median Mean Maximum

Municipality

data

60.4 73193.4 163378.3 1263211.0

Individual data 100.0 3700.0 44759.8 1373085.7

Page 10: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

6

Figure 2.1.1: Distribution of distance on municipality level data in meters

Figure 2.1.2: Distribution of distance on individual level data in meters

Page 11: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

7

The Flow Variables

The gravity models require three different types of flow variables. First of all, the Origin and

Destination variables. Where 𝑂𝑟𝑖𝑔𝑖𝑛𝑖 is the number of outflows from each municipality i,

representing the emission of the municipality and 𝐷𝑒𝑠𝑡𝑖𝑛𝑎𝑡𝑖𝑜𝑛𝑗 is the number of inflows to

each municipality j, representing the attraction to each municipality.

The third flow variable is 𝑇𝑖𝑗 which is the total number of interactions between each 𝑖 is and 𝑗.

𝑇𝑖𝑗 can be visualized as a matrix with 𝑂𝑖 vertically and 𝐷𝑗 horizontally. Since the total number

of municipalities in Sweden is 290, 𝑂𝑖 and 𝐷𝑗 consists of 290 different values meaning that 𝑇𝑖𝑗

is a matrix of 2902 = 84100 values. However, as previously mentioned only 2519 of these

cells have values that are not 0. This means that only 2519 of the 𝑇𝑖𝑗 observations are used

when estimations are conducted.

3. Method

This section explains how the method of the study is conducted. The softwares SAS, ArcGIS3

and RStudio are used to analyze and process the data. Some calculations have been processed

in the software Matlab.

3.1. The Distance Decay Functions

The beta parameter, or the decay parameter, is calculated differently depending on which

spatial interaction model is being used. For the unconstrained model the parameter is estimated

using a log-log least squares regression. For the unconstrained model, it is calibrated through

iterations. For the half-life model, it is derived mathematically from an integral function. The

calculations of the decay parameter for the different spatial interaction models are presented in

the method chapter.

The types of distance decay functions that are used are listed below. Functions 1-2 is used for

the unconstrained model. The doubly constrained model also uses functions 1-2 but there are

only results from function 1 in this study. For the half-life model functions 2-3 are used. The

three decay functions are chosen based on that they are amongst the most commonly used

3 ArcGIS is a licensed product, due to this we did not have access and our supervisor, John

Östh, provided help in using it for Figures 4.1.1, 8.3.1. and 8.3.2.

Page 12: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

8

within spatial interaction analysis when studying migration. The exponential is considered the

best for short distances and the power function best for long distances. (Fotheringham and

O`Kelly, 1989)

However, it is important to remember that in this study not all function types are examined and

there are plenty of others around in practice. Following, this study can in reality only claim to

be examining which out of these three decay functions fits the best for migration in Sweden on

municipality level. The type of decay function that is used in a spatial interaction model is also

often chosen depending on whether short- or long-distance migration is being examined. (Hipp

and Boessen, 2017) In this study both types of distances are used at the same time which might

potentially impact how well the model fits.

(1) Exponential decay

𝑓(𝑑𝑖𝑗) = 𝑒−𝛽𝑑𝑖𝑗

(2) Power decay

𝑓(𝑑𝑖𝑗) = 𝑑𝑖𝑗−𝛽

(3) Exponential normal decay

𝑓(𝑑𝑖𝑗) = 𝑒−𝛽𝑑𝑖𝑗2

3.2. The Unconstrained Model

The unconstrained spatial interaction model has the following general form:

𝑇𝑖𝑗 = 𝐾𝑂𝑖𝐷𝑗𝑓(𝛽, 𝑑𝑖𝑗) (4)

Where 𝑇𝑖𝑗 is the number of flows between each origin (𝑂𝑖 ) and destination (𝐷𝑗 ) in this case

the number of migration interactions. These flows are a function of the number of flows from

Page 13: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

9

𝑂𝑖 and flows to 𝐷𝑗 as well as of a distance decay function 𝑓(𝛽, 𝑑𝑖𝑗) which takes into account

the distance deterring effect and is a function of distance (𝑑). The beta value is the decay

parameter and is calibrated from equation (4). It determines the migration behavior and

expresses how the probability to migrate changes over distance. The constant K is a result of

the calibration made on real data when estimating the model and works as a scaling factor. K

is set to 1 since enables comparisons of different models. (Östh et al. 2016) The decay functions

that are used with this particular SIM are the power decay (2) and the exponential decay (1)

functions.

The unconstrained model is estimated as a log-log least squares regression model. Where the

dependent variable is expressed as ln (𝑇𝑖𝑗

𝑂𝑖 𝐷𝑗 ) and the independent variable is expressed as 𝑑𝑖𝑗in

the exponential decay model and ln (𝑑𝑖𝑗) in the power decay model. When the decay parameter

has been estimated the model can be used to calculate the estimated number of total interactions

𝑇𝑖�̂� which can be compared to the actual number of total interactions 𝑇𝑖𝑗 to evaluate the

performance of the model.

OLS Assumptions

Before using Ordinary Least Squares regression (OLS) the assumptions have to be checked.

The first one is that the dependent variable has a linear relationship with the independent

variable combined with some sort of error term. The second assumption is that the independent

variable has variation i.e. it is not always the same value. The independent variable also needs

to be non-stochastic, which means that it is not determined by chance. The number of

observations must exceed the number of independent variables. The last four assumptions deal

with the error terms which needs to be independently, identically, normally distributed with a

zero mean and share a common variance (Asteriou and Hall, 2016). A few of these assumptions

are already known to be fulfilled and will therefore not be further discussed. The independent

variable is already known to be varying and not be stochastic as not all people move the exact

same distance and not completely at random. The number of observations also, clearly, exceeds

the number of independent variables. When obtaining the decay parameter through regression

for the unconstrained model, the remaining assumptions will need to be evaluated.

Page 14: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

10

3.3. The Doubly Constrained Model

The doubly constrained model differs from the unconstrained SIM in that the model includes

restrictions on the origin and destination variables. The general form of the model therefore

looks very similar to the unconstrained model and is stated below:

𝑇𝑖𝑗 = 𝐴𝑖𝐵𝑗𝑂𝑖𝐷𝑗𝑓(𝛽, 𝑑𝑖𝑗) (5)

The variables are the same as in unconstrained model except that the equation has two

additional variables 𝐴𝑖 and 𝐵𝑗 and that the scaling factor K is excluded. 𝐴𝑖 and 𝐵𝑗 are the

constraints for the origins and the destinations ensuring that the totals of the origins and

destinations are predicted correctly. One of the effects of this is that the balancing effects for

the origins and destinations take into account spatial autocorrelation something the

unconstrained model does not adjust for (Griffith and Fisher, 2013). One downside with the

model is that the estimated 𝐴𝑖 and 𝐵𝑗 parameters lack any sort of meaningful interpretation

(Yang et al. 2014). Below the construction of a doubly constrained model is explained.

By looking at the following sums

∑ 𝑇𝑖𝑗𝑖

= ∑ 𝐴𝑖𝑖

𝐵𝑗𝑂𝑖𝐷𝑗𝑓(𝛽, 𝑑𝑖𝑗)

The following constraints

∑ 𝑇𝑖𝑗𝑖

= 𝐷𝑗

and

∑ 𝑇𝑖𝑗𝑗

= 𝑂𝑖

It then follows that

𝐷𝑗 = 𝐷𝑗𝐵𝑗 ∑ 𝐴𝑖𝑖

𝑂𝑖𝑓(𝛽, 𝑑𝑖𝑗)

Page 15: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

11

And it is then possible to solve for 𝐴𝑖 and 𝐵𝑗

𝐴𝑖=∑ 𝐵𝑗𝐷𝑗(𝑓(𝛽, 𝑑𝑖𝑗))−1

𝐵𝑗=∑ 𝐴𝑖𝑂𝑖 (𝑓(𝛽, 𝑑𝑖𝑗))−1

From the two previous equations, notice that 𝐴𝑖 and 𝐵𝑗 are interdependent, the two equations

are solved through iterations. By assigning 𝐵𝑗 the value 1 it is possible to solve for 𝐴𝑖. After

𝐴𝑖 has been solved the same procedure is repeated but for 𝐵𝑗. The process is repeated until the

errors no longer are significant. The errors are calculated by the following formula.

𝐸 = ∑|𝑂𝑖 − 𝑂𝑖1| + ∑|𝐷𝑗 − 𝐷𝑗

1| (6)

𝑂𝑖 are the real number of the origins while 𝑂𝑖1 are the calculated origins from the same zone.

The same for 𝐷𝑗 and 𝐷𝑗1 but for destinations. These steps are then repeated until the errors

dissipate when the real and calculated flows converge.

Different values of the decay parameter will result in different flow matrices and since the

parameter is unknown the beta value will be calibrated until it fits the model. This is done by

assigning the decay parameter an arbitrary value. If the iterative process demands too many

iterations it is an indication that the decay parameter needs adjusting. When the iterations are

completed the model has automatically generated �̂�𝑖𝑗 estimates.

3.4. The Half-Life Model

The half-life model has the same function form as the unconstrained model (3) other than that

the beta value is obtained through mathematical calculations instead of regressions. The

difference between half-life models and other spatial interaction models is that while most

statistical models try to reduce the deviation from the mean the half-life models instead use the

median distance. Thus, half of the moving population will migrate when the median distance

is reached i.e. it is possible to state that the probability of migrating among the migrating

population is 0.5 at the median distance. Mathematically this means that the median distance

Page 16: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

12

will intersect half of the Area Under Curve (AUC) of an integral function describing the

probability to move (Östh et al. 2016).

Using the individual level median distance of migration between municipalities will lead to

relatively large systematic errors. (Östh et al. 2016) This is because of the difference in median

migration distance between the individual and the municipality level datasets as previously

mentioned in Section 2.1.

The decay functions used for the half-life model are: exponential decay and exponential normal

decay. The power decay function cannot be used with the half-life model since it is not

mathematically possible to calculate the AUC of an integral which is asymptotic on the x-axis.

When calculating a decay function for the half-life model the decay function should be

perceived as an integral function which then describes the probability to migrate over all

possible distances (Östh et al. 2016). The derivations of the exponential decay function and

exponential normal decay function below are retrieved from Östh. et al ( 2016).

Exponential decay

∫ 𝑒−𝛽𝑥𝑑𝑥 = 1/𝛽∞

0

For obtaining half of the AUC we use the following expression:

∫ 𝑒−𝛽𝑥𝑑𝑥 = 0.5/𝛽𝑚

0

The integral now spans from zero to the median value m instead of infinity and is now only

integrated to 0.5 instead of 1, m will later be replaced by the true value of the median migration

distance. The equation can now be solved for 𝛽.

𝛽 ∫ 𝑒−𝛽𝑥𝑑𝑥 = 0.5 = 1 − 𝑒−𝛽𝑚𝑚

0

0.5 = 𝑒−𝛽𝑚

Page 17: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

13

ln(0.5) = −𝛽𝑚

𝛽 =ln(0.5)

𝑚

Exponential Normal decay

∫ 𝑒−𝛽𝑥2𝑑𝑥 =

√𝜋

2√𝛽

0

For obtaining half of the AUC we use the following expression:

∫ 𝑒−𝛽𝑥2𝑑𝑥 =

0.5√𝜋

2√𝛽

𝑚

0

𝛽 can then be solved for in the following way:

0.5 =2√𝛽

√𝜋

√𝜋

2√𝛽erf(𝑚√𝛽) = (𝑚√𝛽)

𝑒𝑟𝑓−1(0.5) = 𝑒𝑟𝑓−1(erf(𝑚√𝛽)) = 𝑚√𝛽

𝛽 = (𝑒𝑟𝑓−1(0.5)

𝑚)

2

𝛽 ≈ (0.47693628

𝑚)

2

Using the calculated beta value, the half-life model can estimate the total number of interactions

�̂�𝑖𝑗 in the same way as the unconstrained model.

Page 18: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

14

3.5. Goodness of Fit Measures

The goodness of fit of the models is determined by studying and comparing the RMSE and

Pearson's correlation of the estimated and observed number of total interactions. RMSE and

Pearson’s correlation could indicate different models as being the best since they measure

goodness of fit differently. Where RMSE measures the estimated values deviation from the

observed ones, correlation measures how the observed and estimated values relate. Since the

two variables, can follow a similar pattern and therefore be highly correlated, without being as

close as possible to each other, correlation and RMSE can give different results. (Östh et al.

2016)

3.6. Limitations of Spatial Analysis

MAUP

The Modifiable Areal Unit Problem (MAUP) deals with how the units used in spatial analysis

affects the results of the analysis. MAUP can be explained as two separate problems. Firstly,

different levels of aggregation of the same data leads to different results, something called the

scale problem. Secondly, zones with equal scale that are divided into different combinations

also leads to different results which is called the aggregation problem. (Openshaw and Taylor,

1979) In this study municipalities are chosen as the spatial units. Another type of division of

space would yield different results. This means that the results cannot be generalized to units

of other sizes than municipalities or differently combined units of the same size.

Another problem with Swedish municipalities is that they are not equal in size. (SCB, 2017)

Unequal sizes in units can cause problems with the analysis as well. Different sized zones lead

to different results in correlation analysis even though the underlying individual level data are

the same. (Robinson, 1956) In the case of Swedish municipalities it is therefore important to

keep in mind these difficulties when interpreting the results. The main differences in

municipality size in Sweden are between larger urban areas and smaller rural municipalities. It

is therefore expected that these difficulties predominantly surface in bigger cities as well as the

sparsely populated areas in northern Sweden.

Page 19: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

15

Spatial Autocorrelation

Spatial autocorrelation is the correlation between values of the same variable through

geographic space making the estimated dependent variable deviate from the true values of the

dependent variable. Positive spatial autocorrelation is common and indicates that units that are

close in geographic space share common properties. (Griffith, 2003) Some measures that can

be used in order to examine the spatial association in a dataset are Global Moran’s Index

(Moran’s I) and the Local Indicators of Spatial Association (LISA). Moran’s I indicates if there

is any statistical significant autocorrelation present in the models. (Li et al. 2007) It is computed

as following:

𝐼 =𝑛

𝑆0

∑ ∑ 𝑤𝑖,𝑗𝑧𝑖𝑧𝑗𝑛𝑗=1

𝑛𝑖=1

∑ 𝑧𝑖2𝑛

𝑖=1

Where 𝑧𝑖is the average residual for municipality 𝑖. 𝑤𝑖,𝑗is the spatial weight between 𝑖 and 𝑗,

which is 1 if the municipalities share a border and 0 otherwise. 𝑆0 is the summation of the

spatial weights and 𝑛 is the number of municipalities. The expected value under the null

hypothesis, that there is no spatial autocorrelation is:

𝐸(𝐼) =−1

(𝑛 − 1)

The z statistic is calculated by the following expression.

𝑧𝐼 =𝐼 − 𝐸(𝐼)

√𝑉(𝐼)

Page 20: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

16

Moran’s I does not however give information about where the autocorrelation is, it only

indicates that it is present. LISA is a measurement that detects local autocorrelations and

clusters. (Anselin, 1995) LISA can, compared to Moran’s I, highlight the locations of

autocorrelations. It is computed as following:

𝐼𝑖=

𝑧𝑖

𝑆𝑖2 ∑ 𝑤𝑖,𝑗𝑧𝑗

𝑛

𝑗=1,𝑗≠𝑖

𝑆𝑖2 =

∑ (𝑧𝑗2)𝑛

𝑗=1,𝑗≠1

𝑛 − 1

Where 𝐼𝑖 is the local Moran’s I statistic. A positive local Moran’s I value indicates spatial

autocorrelation clusters and negative values indicate that an observation is an outlier. The

expected 𝐼𝑖and 𝑧𝐼𝑖 are calculated in the same way as the Moran’s I.

4. Results

The results of applying the unconstrained SIM, doubly constrained and the half-life model on

migration flows on municipality level in Sweden are presented in the following sections.

4.1. The Unconstrained Model

In order to receive the decay parameter for the decay function to use with the unconstrained

SIM two regressions are run. For the model with exponential decay the dependent variable is

expressed as ln (𝑇𝑖𝑗

𝑂𝑖 𝐷𝑗 ) and the independent variable as 𝑑𝑖𝑗 For the model with power decay the

only difference is that the independent variable instead is ln(𝑑𝑖𝑗). By running these regressions,

the decay parameters deciding the effect of distance decay are determined.

Table 4.1.1: Decay function parameter estimates for the unconstrained model

Decay function Decay parameter estimate Significance level

Exponential decay -0.00000385 <.0001

Power decay -0.74711 <.0001

Page 21: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

17

As shown in Table 4.1.1. the distance parameters, or decay parameters, for both functions are

significant on the 1 % significance level.4

Using the estimated decay values, the estimated numbers of total interactions �̂�𝑖𝑗 are calculated

as shown in equation (7) for the exponential decay and as in equation (8) for the power decay

function.

�̂�𝑖𝑗 = 𝑂𝑖𝐷𝑗𝑒−𝛽𝑑𝑖𝑗 (7)

�̂�𝑖𝑗 = 𝑂𝑖𝐷𝑗𝑑𝑖𝑗−𝛽

(8)

The evaluation measures are discussed in Section 4.4.

Regression Assumption Evaluation

The full regression diagnostics are found in the appendix in section 8.

Linear Relationship

The first assumption of the independent and dependent variable having a linear relationship

can be examined by studying the regression plots. The scatter of observations for the

exponential decay function in Figure 8.1.1. in the appendix does, by ocular examination, not

seem to be clearly linear which can be considered a problem. The power decay scatter in Figure

8.2.1. in the appendix appears to be more linear compared to the exponential although it looks

a bit problematic as well. The plot appears to be divided into two separate clusters. Examining

the clusters more closely they are discerned to be within-municipality migration and between-

municipality migration.

Violations of the linearity assumption can create misspecifications errors such as wrong

regressors. (Asteriou and Hall, 2016) Since distance is assumed to be a deterrent to interaction

the expected signs of the regressors in both models are expected to be negative and since

distance is measured in meters the values are expected to be low. The obtained regressors at

least fulfill these requirements, but for the exponential model in particular the nonlinearity

should be viewed with caution. Nevertheless, it is clear that the regressors do not explain much

of the variation in the dependent variables studying the adjusted R-square of the models in

4 The full regression outputs are found in the appendix.

Page 22: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

18

Tables 8.1.3. and 8.2.3. in the appendix. Distance explains only 13,7 % of the variation in the

dependent variable for the exponential function and 39.9% for the power function. This could

be explained by the low degree of linearity.

Normality of the Residuals

The results in Table 8.1.4. and 8.2.4. in the appendix show that the Kolmogorov-Smirnov test

of normality of residuals for both the exponential- and the power decay functions were

significant indicating that the null hypothesis of the residuals being normal was rejected in both

cases which means that the normality assumption is violated. However, using large samples

the normality test should be considered with caution since large sample sizes means even small

deviations from normality become significant. Therefore, the residual distribution plots and

residual quantile plots in Figures 8.1.3 and 8.2.3. in the appendix are also studied. From the

residual distribution plots the distribution is approximately bell-shaped in both cases. In the

residual quantiles plots the distribution of observations approximately follows the diagonal

line. These observations indicate that the deviations from normality does not appear to be

serious and the violation of the assumption should not be a problem.

Homoscedasticity

In Tables 8.1.4. and 8.2.4. in the appendix, the White test results concerning homoscedasticity

i.e. a common variance among the residuals are displayed. For both functions it turns out that

the test is significant which means that the null hypothesis of homogenous variance of the

residuals is rejected. The assumption of homoscedasticity is therefore violated. However,

studying the residual plots of the functions in Figure 8.1.2. and 8.2.2. in the appendix, the

violation does not seem to be of a great magnitude.

Violating the homoscedasticity assumption mainly affects the distribution and variance of the

regressors but does not make them biased or inconsistent. (Asteriou and Hall, 2016) Since the

decay-parameters exact values are all that is needed in this case this violation is not considered

a serious problem.

Page 23: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

19

Spatial Autocorrelation

The spatial autocorrelation is evaluated by a Moran’s Index test. The Moran’s I value for the

unconstrained power model in Figure 8.3.2. in the appendix is 0.402 and for the unconstrained

exponential in Figure 8.3.1. in the appendix is 0.285. Both values are significant which means

that there is significant positive autocorrelation i.e. clustered patterns. In Figure 4.1.1. the

residuals for the exponential- and power functions are displayed showing how they

differentiate between municipalities. In the figure, spatially dependent patterns are also

displayed in the Local Indicator of Spatial Association (LISA) maps. The pink clusters

represent areas where residual values are high and correlated i.e. where the flows are

overestimated. These clusters are found in areas of Sweden where the population is relatively

small. The light blue clusters represent areas where residual values are low and correlated i.e.

where the flows are underestimated. These clusters are found in the more densely populated

urban areas. The red and dark blue areas represent areas with high/low residual values but the

municipalities are not correlated with their neighboring municipalities. They are to be

interpreted as outliers.

The presence of autocorrelation causes the estimated regressors to be inefficient and the

variance of them to be biased and inconsistent which causes problems with hypothesis testing.

R-square also becomes overestimated. However, the point estimates themselves can still be

unbiased and consistent (Asteriou and Hall, 2016). In this case, it is reasonable to assume that

the autocorrelation is a product of an omitted variable bias which means that more factors than

distance are needed to explain differences in flows between different municipalities. The

unconstrained model thus suffers from some level of bias. (Asteriou and Hall, 2016) The half-

life model can be assumed to suffer from the same issue as well since it is constructed using

the same variables as the unconstrained.

Page 24: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

20

Figure 4.1.1: Spatial Autocorrelation. Residuals are shown on municipality level and also local

autocorrelation clusters.

4. 2. The Doubly Constrained Model

The decay parameter is calibrated through an iterative process that generates the balancing 𝐴𝑖

and 𝐵𝑗 factors. From the iterative process values for 𝐴𝑖, 𝐵𝑗 and the decay parameter are

generated. In Table 4.2.1. below the decay parameters are presented. As can be seen the

parameter for the exponential decay is missing. Due to computational difficulties with the

iterative process, no results for the exponential decay were produced.

Table 4.2.1: Decay parameter estimates for the doubly spatial interaction model

Decay function Decay parameter

Power decay -1.074354

Exponential decay Na

The estimation is evaluated and compared to the other models in section 4.4.

Page 25: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

21

4.3. The Half-Life Model

When estimating the total number of interactions between every municipality using the half-

life model the decay parameter is calculated mathematically using the functional form of the

decay functions integral form and the median migration distance. The two decay functions used

are the exponential decay function and the exponential normal decay function. The median

distance is 3700m as is displayed in Table 2.1.1. The decay parameters are:

Exponential decay:

𝛽 = −ln(0.5)

𝑚= −

ln(0.5)

3700= 0.000187337

Exponential normal decay:

𝛽 = [𝑒𝑟𝑓−1(0.5)

𝑚]

2

= [𝑒𝑟𝑓−1(0.5)

3700]

2

= 0.000000016616

Table 4.3.1: Decay parameter estimates for the half-life model

Decay function Beta value

Exponential decay -0.000187337

Exponential normal decay -0.000000016616

These decay parameters are used in the same way as the ones estimated by regressions for the

unconstrained SIM in order to estimate the total number of interactions as shown in equations

(9) and (10).

�̂�𝑖𝑗 = 𝑂𝑖𝐷𝑗𝑒−𝛽 𝑑𝑖𝑗 (9)

�̂�𝑖𝑗 = 𝑂𝑖𝐷𝑗𝑒−𝛽 𝑑𝑖𝑗2

(10)

The evaluation measures are discussed in Section 4.4.

4.4. Decay Parameter Evaluation

The decay parameters used for the different spatial interaction models are evaluated by their

respective Pearson’s correlation and RMSE. The evaluation measurements are presented in

Table 4.4.1.

Page 26: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

22

Table 4.4.1: Decay parameter estimates and evaluation measures.

*** indicates that the p-value of the correlation is less than 0.001.

Model Decay parameter

estimate

𝐶𝑜𝑟𝑟(�̂�𝑖𝑗, 𝑇𝑖𝑗) RMSE

Unconstrained Exponential decay -0.00000385 0.724*** 39104.03

Unconstrained Power decay -0.74711 0.734*** 1364.56

Half-life Exponential decay -0.000187337 0.842*** 31821.50

Half-life Exponential Normal

decay

-0.000000016616 0.846*** 32464.84

Doubly Constrained Power decay -1.074354 0.985*** 6.99

From Table 4.4.1. it is clear that all of the correlations are relatively high ranging from

approximately 0.724 to 0.985. This indicates that all of the spatial interaction models to some

extent explain the actual flows. It also appears that both of the half-life models, to some extent,

do a better job of estimating the flows than both of the unconstrained models according to

Pearson’s correlation. However, the doubly constrained model far outperforms the other

models and is doing very well at predicting the migration flows.

The other evaluation measurement RMSE from Table 4.4.1. produces a slightly different

outcome. Here the unconstrained power decay model has a substantially lower value than the

half-life models and the unconstrained exponential decay model. Both of the half-life models

perform approximately equally and the unconstrained exponential decay model slightly worse.

The RMSE for the doubly constrained power model is better than all other models by several

magnitudes.

From the two measurements, RMSE and Pearson's correlation, it is obvious that the

unconstrained exponential decay performed worst considering all aspects. What is quite

interesting is that the unconstrained power decay has a much lower RMSE than the half-life

models but at the same time a lower correlation. The doubly constrained power model excels

at all aspects and is compared to the other models considerably better.

Page 27: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

23

5. Discussion

The doubly constrained power model outperformed both the unconstrained and the half-life

models regardless of their type of decay parameter. The result is not surprising, since the

construction of doubly constrained models depend on the data it will later predict, which means

that it will adjust itself through iterations for the specific dataset. The implications of this is

that the model is good for the data that is being used for but no other data. No predictions on

other data with the doubly constrained model will be valid since it is adjusted for a specific

dataset. This severely limits the applicability of the model and it is an aspect that should be

taken into consideration when using the doubly constrained model.

The issue with spatial autocorrelation that clearly affects the unconstrained models as shown

in Figure 4.1.1. is almost certainly present in the half-life model as well by the very construction

of the model. The only difference between the models are the estimations of the decay

parameters and thus the same flaws that the unconstrained models inherit are bound to be found

in the half-life model. As can be seen from the residuals in Figure 4.1.1. it is obvious that the

models over- and underestimate the migrational flows depending on where in Sweden the

municipality is situated. In this case municipalities situated around the largest urban areas in

Sweden underestimate the migrational flows. At the same time the model overestimates flows

for municipalities in the sparsely populated, large municipalities in the north. These findings

suggest that when studying migration on municipality level in Sweden with the unconstrained

and half-life model it is necessary to include municipality-specific parameters that take into

account the structural differences that exist, otherwise the models will suffer from omitted

variable bias.

Which model works best for studying migration flows on municipality level in Sweden also

depends on what type of analysis the model is used for. While the doubly constrained model

estimates are the most accurate its 𝐴𝑖 and 𝐵𝑗 values have no meaningful interpretation, and

makes the model completely adjusted for the specific dataset. The unconstrained model is more

intuitive but less accurate. If the model would include municipality specific variables that could

potentially improve the estimates substantially. The half-life model which turned out to

perform on a similar level as the unconstrained is easier to calculate and can be used to obtain

decay parameter where small amounts of flow data are available since all it needs is the median

Page 28: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

24

travelling distance. The models also had a higher correlation than the unconstrained models.

However, since the power function seemed to provide a better fit than the exponential in

general, the half-life model, which cannot use the power function could be considered an

inferior option in the context of migration.

As mentioned in the section covering the decay parameters, according to the literature, the

exponential function provides the most accurate results for short distances while the power

function suits better for longer distances. In this study both types of distances were used and

the power function had the best model fit. This does not necessarily mean that it is the preferred

functional form for models that combine distance types. It might be a product of a discrepancy

of short distances and an overrepresentation of longer distances in the data. This shortage can

partially be explained by how the data was aggregated. It is possible that it might be more

prudent to split our data in two and measure short distance migration and long-distance

migration separately.

6. Conclusions

When studying migrational flows in Sweden on municipality level there is no clear answer to

what model should be used. Given the tests of the models’ usefulness the doubly constrained

model is the most suitable choice. However, the three model types that have been examined

all have strengths and weaknesses. Depending on the research goal and resources available

every individual researcher must contemplate what model is the most appropriate.

The functional form of the decay parameter that best describes the migrational flows in this

study has been the power function. This result might be misleading to some extent due to

how the study was conducted and aggregated. Further investigation on how the aggregation

of the data affects the decay parameters could clarify this question. But from our findings the

power function has been the best at predicting migrational flows.

Page 29: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

25

7. References

Anselin, L. 1995. Local Indicators of Spatial Association-LISA. Geographical Analysis.

27(2): 93-115.

https://doi.org/10.1111/j.1538-4632.1995.tb00338.x

(Accessed 2018-05-10)

Asteriou, D. and Hall, S G. 2016. Applied Econometrics. 3rd ed. London: Palgrave.

Fotheringham, A. S. 1980. Spatial Structure, Spatial Interaction, and Distance Decay

Parameters. Diss. McMaster University.

https://macsphere.mcmaster.ca/bitstream/11375/11312/1/fulltext.pdf

(Accessed 2018-04-20)

Fotheringham, A. S. and O'Kelly, M. E. 1989. Spatial interaction models: formulations and

applications. Dordrecht: Kluwer Academic Publishers.

Griffith, D.A. 2003.Spatial Autocorrelation and Spatial Filtering: Gaining Understanding

Through Theory and Scientific Visualization. New York: Springer-Verlag Berlin Heidelberg.

https://link-springer-com.ezproxy.its.uu.se/content/pdf/10.1007%2F978-3-540-24806-4.pdf

(Accessed 2018-05-11)

Griffith, D.A. Fischer, M.M. and J Geogr Syst. 2013. Constrained Variants of the Gravity

Model and Spatial Dependence: Model Specification and Estimation Issues. Journal of

Geographical Systems.15(3): 291-317.

https://doi.org/10.1007/s10109-013-0182-7

(Accessed 2018-05-05)

Haynes, K E. and Fotheringham, A S. 1984. Gravity and Spatial Interaction Models. Beverly

Hills: Sage Publications.

Hipp, J. R. and Boessen, A. 2017. The Shape of Mobility: Measuring the Distance Decay

Function of Household Mobility. The Professional Geographer, 69 (1): 32-44.

DOI: 10.1080/00330124.2016.1157495

Lantmäteriet. RT 90. https://www.lantmateriet.se/sv/Kartor-och-geografisk-

information/GPS-och-geodetisk-matning/Referenssystem/Tvadimensionella-system/RT-90/

(Accessed 2018-05-03)

Page 30: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

26

Li, H. Calder, A.K. and Cressie, N. 2007. Beyond Moran’s I: Testing for Spatial Dependence

Based on the Spatial Autoregressive Model. Geographical Analysis. 39(4). 357-375.

https://doi.org/10.1111/j.1538-4632.2007.00708.x

(Accessed 2018-05-17)

Openshaw, S. and Taylor P J. 1979. A Million or so Correlation Coefficients: Three

Experiments on the Modifiable Areal Unit Problem. Statistical Applications in the Spatial

Sciences. Wrigely, N. London. Pion: 127-144.

http://www.csiss.org/GISPopSci/workshops/2009/UCSB/readings/Openshaw-Taylor-

1979.pdf

(Accessed 2018-05-12)

Poot, J. Alimi, O. Cameron, M. P., and Maré, D. C. 2016. The Gravity Model of Migration:

The Successful Comeback of an Ageing Superstar in Regional Science.

IZA Discussion Paper No. 10329. Available at SSRN: https://ssrn.com/abstract=2864830

Ravenstein, E. G. 1885. The Laws of Migration. Journal of the Statistical Society of London.

2(48): 167-235.

http://www.jstor.org/stable/pdf/2979181.pdf?refreqid=excelsior:c68114a1919ab9acde5ba396

f17c693b

(Accessed 2018-04-10)

Robinson, A H. 1956. The Necessity of Weighting Values in Correlation Analysis of Areal

Data. Annals of the Association of American Geographers. 46(2): 233-236.

http://www.jstor.org/stable/pdf/2561482.pdf?refreqid=excelsior%3A58d46761d574134843a3

5350fd6b8aa8

(Accessed 2018-05-10)

Rodrigue, J. Comtois, C. and Slack B. 2017. The Geography of Transport Systems 4th ed.

London: Routledge.

Statistiska centralbyrån. (2017-05-10). Folkmängd i riket, län och kommuner 31 mars 2017

och befolkningsförändringar 1 januari–31 mars 2017.

https://www.scb.se/hitta-statistik/statistik-efter-amne/befolkning/befolkningens-

sammansattning/befolkningsstatistik/pong/tabell-och-diagram/kvartals--och-halvarsstatistik--

kommun-lan-och-riket/kvartal-1-2017/

(Accessed 2018-05-08)

Yang, X. Herrera, C. Eagle, N. and González, C.M. 2014. Limits of Predictability in

Commuting Flows in the Absence of Data for Calibration. Scientific Reports. 4, Article

number: 5662.

doi:10.1038/srep05662

Page 31: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

27

Östh, J. Lyhagen, J. and Reggiani, A.2016. A New Way of Determining Distance Decay

Parameter in Spatial Interaction Models with Application to Job Accessibility Analysis.

European Journal of Transport and Infrastructure Research. 16(2): 344-363.

http://uu.diva-portal.org/smash/get/diva2:912729/FULLTEXT01.pdf

(Accessed 2018-03-26)

Page 32: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

28

8. Appendix

8.1. The Unconstrained Exponential Decay Model

In this section, the regression outputs from the unconstrained exponential decay model are

found. Table 8.1.1. shows the parameter estimates of the model and Table 8.1.2. and 8.1.3.

show some further evaluative measures.

In Figure 8.1.1. the regression plot is found where the y-axis represents the dependent

variable 𝑙𝑛 (𝑇𝑖𝑗

𝑂𝑖𝐷𝑗) and the x-axis represent the distance. Figure 8.1.2. shows the residual plot

and some fit diagnostics are expressed in Figure 8.1.3.

Thereafter, the official tests made in order to evaluate the assumptions are shown. In Table

8.1.4. the Kolmogorov-Smirnov test of normality is found and in Table 8.1.5. the White test

is found.

Table 8.1.1: Parameter estimates for the unconstrained exponential decay model

Parameter Estimates

Variable DF Parameter

Estimate

Standard

Error

t Value Pr > |t|

Intercept 1 -6.84205 0.05079 -134.71 <.0001

Distance 1 -0.00000385 1.919705E-7 -20.03 <.0001

Table 8.1.2: Analysis of variance for the unconstrained exponential decay model

Analysis of Variance

Source DF Sum of

Squares

Mean

Square

F Value Pr > F

Model 1 1613.44346 1613.44346 401.33 <.0001

Error 2517 10119 4.02029

Corrected Total 2518 11733

Page 33: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

29

Table 8.1.3: Evaluation measures for the unconstrained exponential decay model

Root MSE 2.00507 R-Square 0.1375

Dependent Mean -7.47037 Adj R-Sq 0.1372

Coeff Var -26.84026

Figure 8.1.1: Regression plot of the unconstrained exponential decay model. With 𝑙𝑛 (𝑇𝑖𝑗

𝑂𝑖𝐷𝑗)

on the Y-axis and Distance on the x-axis.

Page 34: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

30

Figure 8.1.2: Residual plot of the unconstrained exponential decay model

Page 35: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

31

Figure 8.1.3: Fit diagnostics for the unconstrained exponential decay model

Table 8.1.4: Test of normality of residuals for the unconstrained exponential decay model

Test for normality Statistic p Value

Kolmogorov-Smirnov D 0.055644 Pr > D <0.0100

Page 36: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

32

Table 8.1.5: White tests of heteroscedasticity of residuals for the unconstrained exponential

decay model

Test of First and Second

Moment Specification

DF Chi-Square Pr > ChiSq

2 37.39 <.0001

8.2. The Unconstrained Power Decay Model

In this section, the regression outputs from the unconstrained power decay model are found.

Table 8.2.1. shows the parameter estimates of the model and Table 8.2.2. and 8.2.3. show

some further evaluative measures.

In Figure 8.2.1. the regression plot is found where the y-axis represents the dependent

variable 𝑙𝑛 (𝑇𝑖𝑗

𝑂𝑖𝐷𝑗) and the x-axis represent the natural logarithm of distance. Figure 8.2.2.

shows the residual plot and some fit diagnostics are expressed in Figure 8.2.3.

Thereafter, the official tests made in order to evaluate the assumptions are shown. In Table

8.2.4. the Kolmogorov-Smirnov test of normality is found and in Table 8.2.5. the White test

is found.

Table 8.2.1: Parameter Estimates for the unconstrained power decay model

Parameter Estimates

Variable DF Parameter

Estimate

Standard

Error

t Value Pr > |t|

Intercept 1 0.72810 0.20314 3.58 0.0003

dlog 1 -0.74711 0.01826 -40.91 <.0001

Page 37: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

33

Table 8.2.2: Analysis of variance for the unconstrained power decay model

Analysis of Variance

Source DF Sum of

Squares

Mean

Square

F Value Pr > F

Model 1 4686.20765 4686.20765 1673.95 <.0001

Error 2517 7046.29940 2.79948

Corrected Total 2518 11733

Table 8.2.3: Evaluation measures for the unconstrained power decay model

Root MSE 1.67317 R-Square 0.3994

Dependent Mean -7.47037 Adj R-Sq 0.3992

Coeff Var -22.39737

Figure 8.2.1: Regression plot for the unconstrained power model. With 𝑙𝑛 (𝑇𝑖𝑗

𝑂𝑖𝐷𝑗) on the Y-

axis and the natural logarithm of distance on the x-axis.

Page 38: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

34

Figure 8.2.2: Residual plot for the unconstrained power decay model

Page 39: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

35

Figure 8.2.3: Fit diagnostics for the unconstrained power decay model

Table 8.2.4: Test of normality of residuals for the unconstrained power decay model

Test for Normality Statistic p Value

Kolmogorov-Smirnov D 0.035495 Pr > D <0.0100

Page 40: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

36

Table 8.2.5: White tests of heteroscedasticity of residuals for the unconstrained exponential

decay model

Test of First and Second

Moment Specification

DF Chi-Square Pr > ChiSq

2 8.34 0.0154

8.3. Moran’s Index for the Unconstrained Models

In the following two figures the results of the global Moran’s Index test are found. Figure

8.3.1. shows the results for the unconstrained exponential decay model and Figure 8.3.2.

shows the results for the unconstrained power decay model.

Page 41: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

37

Figure 8.3.1: Moran’s Index for the unconstrained exponential decay model

Page 42: Modelling Migration1213099/FULLTEXT01.pdf · different techniques for calculating distance decay parameters are examined and compared. The more commonly used unconstrained and doubly

38

Figure 8.3.1: Moran’s Index for the unconstrained power decay model