Youngsoo, An1 Valentina, Nacar2, Seungil, Lee 3

A Spatial Disaggregation Model for Maximising the Application of Long-Term Forecasting Land

Use Transport Models Based on Zonal Data

Youngsoo, An1, Valentina, Nacar2, Seungil, Lee3

Abstract

This paper presents a process and an empirical analysis on the new disaggregation model. This is

need to maximize application of long-term forecasting land-use transport model based on zonal

data. This study used approach for disaggregating the predicted results based on zonal data.

However, when the zonal data are disaggregated for each cell, we apply the aggregated cell data

based on building units in the base year. This paper composed the process of disaggregation model

as two parts which are “Which will be reconstructed before the target year?” and “How much will

the cell be reconstructed?” Regarding the two parts, this paper presents the results of empirical

analysis for them. First, we calculated the probability for whether or not a cell will be reconstructed

using binary logistic regression model. As the result, we could classified the cells will be

reconstructed up to 80.4% using rank of accessibility value and rank of density value for each cell.

Second, we estimated the expected reconstructed floor space in the cell using location utility. The

𝑅2 value, which represents the explanation power of the regression model, was approximately 69.5%,

which can be very high. Results of this study is expected to develop more detailed disaggregation

model.

1 The University of Seoul (First author, [email protected])

2 David Simmons Consultancy ([email protected])

3 The University of Seoul ([email protected])

1 Background and Goal

A city has been recognized as an object within the framework of the urban ecological

theory for a city, which means the city continues to evolve (Heeyun, H., 2002). For a

long period of time, many researchers have attempted to find patterns for the changes

in a city, and then to make forecast using these patterns. These attempts are natural

for sustainable growth in a city. In the early research in this area, intensive studies were

conducted on indexes such as the total population, number of households, employees,

and jobs to represent a city, region, or country based on macro-spatial units. Since then,

various urban simulation models have been developed using macro or mezzo spatial

units, i.e. zones with increasing demand, to determine the results of forecasting based

on smaller spatial units. In particular, models were developed in some countries in

Europe, with the most representative models including MEPLAN (Marcial Echenique and

Partners Ltd in the UK, Echenique et al., 1990; Echenique, 1994), IRPUD (Wegener, 1998,

1982), ITLUP (Integrated Transportation and Land Use Package, Putman, 1991, 1989),

TRANUS (de la Barra, 1989), and DELTA (Simmonds, 1999), which usually consider the

interaction with land-use and transport. Therefore, since the 2000s, studies on urban

simulation models have focused more on micro-spatial units.

Today there are several microsimulation models of urban land use and transport under

development in North America: the California Urban Futures (CUF) Model at the

University of California at Berkeley (Landis and Zhang 1998a, 1998b), the Integrated

Land Use, Transport and Environment (ILUTE) model at Canadian universities (Miller

2001), the Urban Simulation (UrbanSim) model at the University of Washington, Seattle

(Waddell 2000), and the 'second-generation' model of the Transport and Land Use

Model Integration Program (TLUMIP) of the State of Oregon, USA. There are no efforts

of comparable size in Europe. There are a few national projects, such as the Learning-

Based Transportation Oriented Simulations System (ALBATROSS) of Dutch universities

(Arentze and Timmermans 2000) or the Integrated Land-Use Modelling and

Transportation System Simulation (ILUMASS) in Germany (Moeckel, et al., 2003). A

microsimulation model can increase the demand for high-quality spatial data (Wagner,

P. and Wegener, M., 2007). Considering the nature of social phenomena with too many

(known & unknown) complex factors is the first problem in simulating these systems.

Therefore, although much academic attention has been given to the subject, there have

been very few applications (Bazghandi, et al., 2012).

Using spatial units with a higher resolution for simulating a city will make it possible to

find more specific and detailed spatial changes in the future. In addition, this will widely

expand the range of uses. On the other hand, such a model may be more complicated

and require the validation of many parameters. In addition, it is difficult to construct a

database based on extremely small spatial units. In particular, this becomes more

important when attempting long-term forecasting in a city compared to short-term

forecasting because the variables to validate getting increased.

This study focuses on how we can estimate changes in a city effectively and use those

results efficiently. To do that, this study suggests using the zone-based model for long-

term forecasting of a city and estimating the results based on a microsimulation model

of the disaggregated zone. Particularly, this study attempts to develop a

microsimulation model to disaggregate from zonal data. In doing so, we think using

both the zone-based model and the microsimulation model would effectively and

efficiently solve long-term forecasting.

2 Literature Review

In this section, the literature on spatial disaggregation methods and the applications in

urban simulation model are reviewed in turn. For the spatial disaggregation methods,

mainly we reviewed spatial interpolation, because the spatial disaggregation methods

are based on areal interpolation techniques, can be classified according to various

criteria such as underlying assumptions or the use of ancillary data (2015? Wu?). Then

we reviewed the applications of spatial disaggregation method in urban models which

are PROPOLIS, ILUMASS and SOLUTIONS. Considerations and suggestions for model

estimation and selection are discussed in the end.

2.1 Spatial Disaggregation Method

In the beginning, the reason why the spatial disaggregation method was studied

principally by geography researchers was due to the limitation of aggregated data

based on zone or region, such as population and employment numbers. Tobler (1979)

mentioned that aggregate data could indicate how population density, a continuous

quantity, varies over a particular portion of the earth. The usual assumption made here

is that the density of any individual reporting region is a constant, and given a lack of

information to the contrary, it is implicit that this is an optimal viewpoint. To overcome

this assertion, Tobler (1979) originally suggested using the pycnophylactic interpolation

method for isopleth mapping. This method assumes the existence of a smooth density

function, which takes into account the effect of adjacent source zones (Lam, 1983). The

function is that polygon data, such as population based on region, is rasterized by cell

units and then smoothed out (see Figure. 1).

Figure 1. Process of smoothing (data polygons, rasterized and smoothed) (Source by

Tobler, 1979)

Since then, the limitation of aggregated data has been continuously studied by many

researchers; it is known in the literature as the modifiable area unit problem (MAUP)

(Fotheringham & Wong 1991; Fotheringham & Rogerson 1993; Dennis & Wu 1996; Moon

& Farmer 2001). In addition, the spatial disaggregation method has been studying to solve

the MAUP, it can be divided into five types under different method as follows.

Table 1 A comparison of different spatial disaggregation techniques in terms of their

assumptions, methods and data demand. (Source by Li et al., 2007)

Technique Method Assumption Control Surface

(ancillary data)

Complexity

(1-5)

Simple

Area

Weighting

Cartogr

aphic

Homogeneous source zones None 5

Regression

Model

Statistic

al

Source zone composed of land

classes with global uniform density

Discrete or

Continuous

3

Binary

Dasymetric

Mapping

Cartogr

aphic

Source zone composed of

populated and unpopulated areas

Discrete (binary) 2

Three-Class

Dysymetric

Mapping

Cartogr

aphic

Homogeneity at different land class

(at each source zone)

Discrete 1-2

EM

Algorithm

Statistic

al

Source zone composed of land

classes with global uniform density

that conserve aggregate value

Discrete or

Continuous

1- 2

The simple area weighting method assumes homogeneity of the distribution within a

region. This method is also far-reaching from the real world of expected spatial

distributions. In some cases, simple cartographic processing methods, such as overlay,

are used to disaggregate the source zones. Other more advanced techniques embrace

the more realistic expectation that source zones are heterogeneous but with an

unknown structure (Li et al., 2007).

Different approaches have been proposed based upon the assumptions made about

the spatial structure imposed on the source zones that resulted from the overlaid spatial

data (Li et al., 2007).

Regression models (Langford et al. 1991; Yuan et al. 1997) assume that the ancillary

land use classes define areas of global uniform density. That is, the land classes have a

uniform area density that is related to the parameter of interest over the whole of the

area, but it is unknown. Using a combination of the aggregate source values and the

ancillary data with unknown densities it is possible to developed regression equations

to numerically resolve this relationship (Li et al., 2007).

A drawback of this approach is that the global densities it computes allow for small

errors between the estimated and the actual source-zone values. The quality of resolved

densities maintaining the volume of the aggregate data value is called the

pycnophylactic property (Tobler 1979; Goodchild et al. 1993).

Hence there is another statistical technique for estimating the globally uniform density

for each land class while satisfying the pycnophylactic property - the EM algorithm

(Flowerdew & Green 1991; Flowerdew & Green 1992; Gregory & Paul 2005). However,

the assumption of uniform area density for each land class might be problematic when

dealing with many areas over a large region where relationships between population

and land class are not spatially uniform. Langford (2006) argued that global fitted

density can be estimated at local level by dasymetric mapping which allows for some

global variability in density for each land class.

A simple example is binary dasymetric mapping (Eicher & Brewer 2001) which takes a

binary land classification to control the population allocation. It assumes a non-zero

density in the populated areas within each source zone and a zero density elsewhere.

Hence varying assumptions can be made about the density in a functional way. A

further refinement to this is three-class dasymetric mapping (Mennis 2003), which

incorporates a functional relationship with area densities so that densities are uniform

within a source zone even though they may vary across the larger region.

Overall, the density assumptions of different spatial disaggregation techniques can be

illustrated by Figure 1, where the vertical bars represent density for each land use class

and the parallel bar represents the density of the source zones. Comparably, the most

relaxed assumption of homogeneity used by three-class dasymetric mapping is close

to the complexity of real world.

The three-class dasymetric mapping is theoretically more appropriate to accommodate

the spatial heterogeneity of a large geographical area. Langford (2006) evaluated spatial

disaggregation techniques using UK Census data for the county of Leicestershire. The

results show that the three-class dasymetric method largely outperforms other spatial

disaggregation techniques, apart from the comparatively simpler binary dasymetric

method.

One possible reason of this inconclusive result is the more complex three-class

dasymetric technique is more sensitive to the land classification errors. On the other

hand, as Fisher and Langford (1995) pointed out, the significance of comparative results

is always limited by simplicity in the spatial structure of the study area, and a more

conclusive result could be experimentally validated by broadening the study area to

include more spatial heterogeneous density.

2.2 Cases of Application in Urban Model

In this section, two urban models, including the spatial disaggregation method, were

reviewed. The PROPOLIS project was reviewed first, followed by the ILUMASS project.

Figure 2 shows the difference between the PROPOLIS and the ILUMASS using the spatial

disaggregation method.

Figure 2. Cases of using the spatial disaggregation method in the urban model

based on zonal data (Source by Wegener, 2010)

Row (A) indicates the urban models in which the spatial disaggregation method was

not used, and (B) and (C) indicate urban models that used the spatial disaggregation

method in a different way. In row (B), first, the urban model determined long-term

forecasting using zonal input data, and then the spatial disaggregation method was

used to expand its usefulness, such as to check the environmental impact. The

PROPOLIS is included in this case. Lastly, in row (C), the zonal data was spatially

disaggregated before implementing long-term forecasting using zonal data, followed

by a microsimulation using the disaggregated cell data for forecasting. Regarding the

details of the methods for the two models, which included the PROPOLIS and the

ILUMASS, they are as follows.

PROPOLIS

The major objective of the PROPOLIS (Planning and Research of Policies for Land Use

and Transport for Increasing Urban Sustainability) project is to research, develop, and

test integrated land use and transport policy assessment tools and methodologies. The

project also defines sustainable urban strategies and demonstrates their long-term

effects (Spiekermann, 2003). In order to calculate such PROPOLIS indicators, a spatial

disaggregation module has been developed in the model called the raster module. The

raster module maintains the zonal organisation of the land use transport models and

adds a disaggregated raster-based representation of the space to include some of the

specific environmental and social impact sub models. Because the raster module is

based on the output of aggregate urban models, several steps must be taken to go

from a polygon-vector representation of zones and networks to a small scale of

environmental and social impacts to a re-aggregation of indicators for assessing

sustainability.

There are two main sources of input for the raster module. On the one hand, there is

a spatial database that depicts zone boundaries and land use categories as polygons,

and vectors are used to code the network. On the other hand, there are the policy-

dependent forecasts implemented by the land use transport models for the location of

households according to socio-economic group, employment, and zonal floor space

and traffic flow on the links of the network. This information is then converted to raster

cells. The main assumption concerning the disaggregation of activity locations is that

population and employment are not equally distributed over the territory of a zone,

but that there are differentiations in density. The assumption is that intra-zonal

differentiation is reflected by weights assigned to the raster cells based on typical

densities of land use categories (e.g. Bosserhof, 2000). These weights are converted to

probabilities by dividing them by the zone’s total weights. This gives a probability

distribution of households in a zone. Cumulating the weights over the cells of a zone

provides a range of numbers that can be associated with each cell. Using a random

number generator for each household, a cell is selected as the household's location.

This allocation of households takes into account the different weighting schemes for

the three socio-economic groups. The disaggregation of employment follows the same

procedure but with different weights (Spiekermann and Wegener, 1998).

ILUMASS

The ILUMASS project aims to develop, test, and apply a new type of integrated urban

land use/transport/environment (LTE) planning model. Urban LTE models simulate the

interaction between urban land use development, transport demand, traffic, and

environment. The distribution of land use in the urban region, such as residences,

workplaces, shops, and leisure facilities, creates a demand for spatial interaction, such

as work, shopping, or leisure trips. These trips occur as road, rail, bicycle, or walking

trips over the transport network in the region, and they have environmental impacts.

The land use component of the ILUMASS model is based on the land use parts of the

mostly aggregate land use transport model developed at IRPUD (Wegener, 1999).

However, the ILUMASS model is microscopic, i.e. all land use changes and traffic flow

are modelled by microsimulation. The micro database contains a listing of residential

buildings and floor space details of non-residential buildings. Features associated with

each dwelling include building type, size, quality, tenure, and price, and every dwelling

has a raster cell as a micro location. The non-residential floor space is also distinguished

by industrial, retail, office, and public use. Raster cells are used as addresses for the

microsimulation, and for the disaggregation of zonal activities to raster cells, GIS-based

techniques are used.

To disaggregate spatially aggregated data within a zone, the land use distribution within

that zone is taken into account, i.e. it is assumed that there are areas of different density

in that zone. As a result, the spatial disaggregation of zonal data consists of three steps:

the generation of a raster representation of land use, the assignment of probabilities

to land use categories, and the allocation of the data to raster cells. Figure 4 illustrates

the three steps for a simple example (Spiekermann and Wegener 1999, 2000).

- First, land use data and zone borders, in vector-based GIS usually stored as polygons,

are converted to a raster representation by using a point-in-polygon algorithm for the

centroids of the raster cells. As a result, each cell has two attributes, the land use

category and the zone number of its centroid. These cells represent the addresses for

the disaggregation of zonal activity data.

- For each activity to be disaggregated, weights are assigned to each land use category,

and all cells are attributed with the weights of their land use category. Dividing the

weight of a cell by the total weight of all the cells of the zone gives the probability that

the cell is the address of just one element of the zonal activity. Cumulating the weights

over the cells of a zone then yields the range of numbers associated with each cell.

- Using a random number generator for each element of the zonal activity, one cell is

selected as its address. The results are individual addresses for all activities with a raster

representation of the distribution of each activity within the zone.

Figure 3 Disaggregation of zonal data to raster data (Spiekermann and Wegener

2000: 48)

Although this study is similar to others from the viewpoint of using the data results

based on zone, such as in the Seoul Model with spatial disaggregation, this

methodology is different. In addition, this study also uses the microsimulation method,

however, the method is not for long-term forecasting, but rather for spatial

disaggregation of the estimated zonal data. Therefore, this study is proposing to use a

macro model based on zone for long-term forecasting as well as a microsimulation

model for an efficient use of the zonal data results in a spatial disaggregation method.

3 Implementation

3.1 Conceptual Diagram

Figure 4 shows the main conceptual diagram of this study. Because the Seoul model of

long-term forecasting is based on zones, the model needs zonal data (A). Usually, zonal

data such as population, employment, and jobs come from the National Statistical

Office, but some of the zonal data, such as an average of land price and floor space

area for each land use type (based on a building or a parcel) comes from the GIS spatial

data (A’), which is aggregated according to zone (①). Using the zonal data as input

data (②), the Seoul model indicates long-term forecasts from the base year and

calculates the estimated data (B) for the zone in the target year (③). Because of the

limits of the zonal data utilized, there needs to be a disaggregation by each cell (④).

At this point, the microsimulation model works to disaggregate each cell using the

estimated zonal data (B’), which reflects the building data from the base year (⑤).

Figure 4 Conceptual diagram

This concept, which is the largest differentiation in this study, is expected to produce

more realistic values when disaggregating the predicted zonal results for the Seoul

model through the reflection (⑤). In addition, this study is focused on using the

microsimulation model to disaggregate by cell (④+⑤). We also assumed that the

results of the Seoul model, which were based on zones in the target year, were

acceptable because we focused on the disaggregation methodology. Therefore, we are

omitting a detailed description of the Seoul model.

3.2 Study area and analysis spatial unit

The Seoul model used for the Seoul Metropolitan Area consists of a total of 579 zones,

which includes 522 small zones (Seoul city) and 57 large zones (Incheon city and Keong-

gi province) (refer to the side of the top-left map (①) in Fig. 5). We selected four small

zones (refer to the side of the bottom-right map (③) in Fig. 5) in Seoul city and applied

the disaggregation method in this study to them. In addition, we considered only retail

and business land use. This means that we did not consider residential land use or

other uses, such as for education and parks. The disaggregation method in this study

does not disaggregate all of the predicted zonal data for each cell at one time, but

rather first selects some of the zones as targets and then implements the

disaggregation method for each of them.

Figure 5 Study area and analysis spatial unit

In addition, we divided the four zones into 1,015 cells using a 50 m × 50 m rectangle.

In terms of the spatial units, previous research (PROPOLIS, ILUMASS) used a 100 m ×

100 m rectangle to disaggregate all of the zones, however this study performed an

analysis using smaller 50 m × 50 m units for the disaggregation method because only

four zones were considered.

The four zones selected as the study area included a metro station in the middle (Shillim,

refer to ③ in Fig. 5) and two other stations (Shindaebang and Bongcheon) outside the

zones. These areas are already developed, and many retail and office facilities are

located in the catchment area of the Shillim metro station. Recently, there has been a

significant increase in the total number of passengers at the Shillim station. Because

the zones have been under strong pressure to redevelop, there is a need for long-term

management in the future. The long-term forecast for these zones shows that they can

be of use for sustainable management.

3.3 Model process

The Seoul model can provide zonal data for the four zones, including the total floor-space

values for each land use type for the base year (2010) and the target year (2030). By

comparing the total amounts of retail and office floor space in the base and target years,

we can also find the values for the total difference in floor space (TDFs) for the four zones

between the base and target years. An increase in the TDFs values indicates that some cells

in the four zones were redeveloped. In addition, this increase means that other cells had

the same total floor space as before. However, this point could not be determined through

the allocation methods used in previous studies in which the zonal data was disaggregated

to cell units. Therefore, this study used a rotational process that selected a cell, redeveloped

it using some functions, and reiterated the process until a condition was satisfied.

In addition, the increase in TDFs values prompted two main research questions: ‘Which cell

will be redeveloped before the target year?’ and ‘How much will the cell be redeveloped?’

In terms of the first research question, because the cell to be redeveloped will have a

relatively higher probability value compared to the other cells, we used a probability

function to select a cell that is likely to be redeveloped. The second research question is

related to how much the new total floor space will increase when the cell is redeveloped.

As a result, we used a utility function to calculate the new total floor space for the cell

because it is related to the location utility. The process diagram created for this step is

shown in Figure 6.

Figure 6 Process of disaggregation model

The detailed explanation is as follows. First, reconstruction could have already started

in the area before starting the rotation process. In 2008, redevelopment had already

started in three places. The total floor space that had already been planned was

excluded from the total amount in the rotation process.

Apply existed plan

The first step is check for and apply the existing reconstruction field and fixed plan to

the reconstruction between the base year and target year.

Calculate probability for each cell

The probability function is used to calculate the probability for each cell whether the

cell will be reconstructed or not. The probability (p(𝑥𝑖)) in terms of reconstruction in a

cell can be presented by following Equation 1.

p(𝑦𝑖 = 1|𝑥𝑖) = 𝑓(𝛽0 + 𝛽1𝑥𝑖), 𝑓(𝐴) =𝑒𝐴

1+𝑒𝐴 Equation 1

Where 𝑦𝑖 denotes the redevelopment in cell I, 𝑦𝑖 =1 means the cell ( 𝑥𝑖 ) was

redeveloped.

The logistic regression model is used to obtain the probability for whether or not a cell

will be redeveloped. Two dependent variables are used: the rank of accessibility and

rank of density value. The rank of accessibility (RankAcc) refers to the potential users of

the cell in which accessibility is calculated by multiplying the total number of passengers

in the nearest metro station and estimating the number of users of the total floor space

of the cell, and then dividing by the distance between the nearest metro station and

the cell.

𝐴 = 𝛽0 + 𝛽1𝑅𝑎𝑛𝑘𝐷𝑒𝑛𝑠𝑖𝑡𝑦𝑖 + 𝛽2𝑅𝑎𝑛𝑘𝐴𝑐𝑐𝑖 Equation 2

Where, 𝑅𝑎𝑛𝑘𝐷𝑒𝑛𝑠𝑖𝑡𝑦𝑖 denotes the rank of density value in i cell (𝑥𝑖), 𝑅𝑎𝑛𝑘𝐴𝑐𝑐𝑖 means

the rank of accessibility value of i cell (𝑥𝑖).

The ranking value is calculated using the accessibility value for each cell for a relative

comparison; it is also used as a dependent variable. The accessibility value is based on

the gravity function and the equation to calculate the accessibility is as follows:

𝐴𝑐𝑐𝑖 =𝑇𝑁𝐷𝑘∙𝑇𝐹𝑈𝑖

𝑃𝑁𝐷𝑖∙𝑘 Equation 3

Where,

𝐴𝑐𝑐𝑖= Accessibility of i cell.

𝑇𝑁𝑃𝑘= Total number of passengers of k metro station in which k metro station

means the nearest metro station from i cell (it is calculated from the

transport model of the Seoul model).

𝑇𝐹𝑈𝑖=

=

Total number of floor space users in i cell.

Total floor space area in i cell / possession area per person

(it is calculated from the land use model of the Seoul model).

𝑃𝑁𝐷𝑘∙𝑖= Pedestrian network distance from i cell to nearest k metro station.

The rank of density (RankDen) refers to the total amount of that cell that has already

been developed. The density value, however, is calculated by dividing the total existing

floor space by the area of the cell (2,500 m2 = 50 m × 50 m). We converted from the

density value to the ranking value for the same reason as RankAcc.

Select a cell

We select a cell randomly based on the estimated probability value for each cell. The

cell is selected randomly because the probability does not suggest that it will be

redeveloped, but rather that the probability for redevelopment is high. At this step, if

the age of a building (establishment year: 2016) in the cell is less than 10 years, the

cell is not selected. The building’s age needs to be greater than 10 years as a minimum

selection requirement based on the results of an empirical analysis.

Calculate expected reconstructed floor space of the cell

At the next step, we calculate the expected total floor space when the selected cell is

redeveloped. The expected reconstructed floor space is calculated using a multiple

regression model based on the location utility of a cell with the location factors as

following Equation 4.

𝑅𝑒𝐶𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝐹𝑙𝑟𝑖 = 𝛽0 + 𝛽1𝐷𝑖𝑠𝑡𝑀𝑒𝑡𝑟𝑜𝑖∙𝑘 + 𝛽2𝐷𝑖𝑠𝑡𝑅𝑜𝑎𝑑𝑖∙𝑚 + 𝛽3𝐷𝑒𝑛𝑠𝑖𝑡𝑦𝑖 Equation 4

Where,

𝑅𝑒𝐶𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝐹𝑙𝑟𝑖= The expected reconstructed floor space in I cell

𝐷𝑖𝑠𝑡𝑀𝑒𝑡𝑟𝑜𝑖∙𝑘= Network distance from i cell to the nearest metro station k

𝐷𝑖𝑠𝑡𝑅𝑜𝑎𝑑𝑖∙𝑚= Distance from i cell to the nearest main road

𝐷𝑒𝑛𝑠𝑖𝑡𝑦𝑖= Density of i cell

There are three dependent variables included in this equation: the network distance

from the cell to the nearest metro station (DistMetro), the distance from the cell to the

main road (DistRoad), and the density of the cell (DenCell). In addition, we apply a

random variable in the range of ±10% in the calculation of the new total floor space.

At this step, the age of a building in the cell being redeveloped is converted to zero

years old to avoid reselecting it.

Conditional statement

Lastly, the accumulated value of total increased floor space (TIFs) by the redevelopment

is compared to TDFs, and if TIFs is less than TDFs, the process returns to the second

step and repeats. The expressions for TIFs and TDFs are as follows in Equations 5 and

6.

TIFs = ∑ (𝑅𝑒𝐶𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝐹𝑙𝑟𝑖 − 𝐸𝑥𝑖𝑠𝑡𝐹𝑙𝑟𝑖)𝑛𝑖=0 Equation 5

TIFs: Total increased floor space by the disaggregation model

𝑅𝑒𝐶𝑜𝑛𝑠𝐹𝑙𝑟𝑖 : 𝑇𝑜𝑡𝑎𝑙 𝑎𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝑟𝑒𝑐𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑒𝑑 𝑓𝑙𝑜𝑜𝑟 𝑠𝑝𝑎𝑐𝑒 𝑖𝑛 𝑐𝑒𝑙𝑙 𝑖

𝐸𝑥𝑖𝑠𝑡𝐹𝑙𝑟𝑖: 𝑇𝑜𝑡𝑎𝑙 𝑎𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝑒𝑥𝑖𝑠𝑡𝑖𝑛𝑔 𝑓𝑙𝑜𝑜𝑟 𝑠𝑝𝑎𝑐𝑒 𝑖𝑛 𝑐𝑒𝑙𝑙 𝑖

𝑖 (0, ⋯ , 𝑛): 𝑐𝑒𝑙𝑙𝑠 𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑 𝑎𝑛𝑑 𝑟𝑒𝑐𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑒𝑑 𝑏𝑦 𝑡ℎ𝑒 𝑚𝑜𝑑𝑒𝑙

TDFs = 𝑇𝐹𝑠𝑏𝑎𝑠𝑒 𝑦𝑒𝑎𝑟𝑧𝑜𝑛𝑒𝑠 − 𝑇𝐹𝑠𝑡𝑎𝑟𝑔𝑒𝑡 𝑦𝑒𝑎𝑟

𝑧𝑜𝑛𝑒𝑠 Equation 6

TDFs: Total difference in floor space from base year to target year in the zones

𝑇𝐹𝑠𝑏𝑎𝑠𝑒 𝑦𝑒𝑎𝑟𝑧𝑜𝑛𝑒𝑠 : 𝑇𝑜𝑡𝑎𝑙 𝑓𝑙𝑜𝑜𝑟 𝑠𝑝𝑎𝑐𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑧𝑜𝑛𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑏𝑎𝑠𝑒 𝑦𝑒𝑎𝑟

𝑇𝐹𝑠𝑡𝑎𝑟𝑔𝑒𝑡 𝑦𝑒𝑎𝑟𝑧𝑜𝑛𝑒𝑠 : 𝑇𝑜𝑡𝑎𝑙 𝑓𝑙𝑜𝑜𝑟 𝑠𝑝𝑎𝑐𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑧𝑜𝑛𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑡𝑎𝑟𝑔𝑒𝑡 𝑦𝑒𝑎𝑟

If the condition is satisfied, this simulation model ends. To recap briefly, this simulation

model reiterates to distribute the total increase of floor space in the zones for each of

the cells. There are two statistical models used, which are the binary logistic model

used to calculate the probability of each cell and the multiple regression model used

to estimate the expected reconstructed floor space.

4 Estimate Coefficient and validation

4.1 Introduction and base data

In this section, we use an empirical analysis to estimate the parameters for the model

constructed in the previous section. The base year is 2008, and the year to verify is

2016. Verification of the model is divided into two parts: selecting a cell and calculating

the reconstruction floor space.

In addition, we constructed some basic data for the empirical analysis as follows. First,

we needed the total floor-space data for each cell for the retail and business land use

in 2008. We extracted the buildings for which the main land use was retail or business

from the entire building GIS data set, and then aggregated the buildings using 50 m ×

50 m cells (Fig. 7).

Figure 7 Process for calculating base data

There was a need to exclude some cells because they would not be changed, such as

foothills, streams, or main roads. We also excluded cells with only residential floor space

because we did not consider that the land use would change for a type such as

residential to retail or business. Finally, the number of cells that were analysed was 459

cells, as shown in Fig. 8.

Figure 8 Exclusion of some cells and final studied cells.

Fig. 9 shows the case of redeveloped buildings and aggregation by cell. We collected

the redeveloped building data from 2008 to 2016. The total number of cases was 107

buildings. If these data are aggregated by cell unit, the total number of cells, including

those with at least one redeveloped building, is 83 cells. However, among these,

construction had already started on two buildings in 2008. Therefore, we applied the

two building case as the first step to apply the existing plan. We performed an empirical

analysis and verified the disaggregation model using the basic data for these two

buildings based on cells.

Figure 9 Redeveloped buildings and aggregate on each cell

4.2 Estimate parameters of the probability function

This study used a binary logistic model, which is one of the probabilistic choice models.

Redeveloping a building can be a discrete decision for an existing building. In addition,

it is the result of a decision by some developer, landowner, or planner. These decisions

are based on uncertainty. Therefore, when a decision is discrete and based on

uncertainty, the most popular method is a probability choice model that applies the

random utility theory. The probability for a cell being redeveloped can be explained as

the equation 1.

Among the total number of cells, the number that was redeveloped from 2008 to 2016

was 81 cells. Using this number as a dependent variable (𝑦𝑖 = 1), this study analysed

the binary logistic analysis. In this analysis, we used two independent variables: the

RankAcc and RankDen, as previously stated. RankAcc represents the rank value of cell

accessibility, and the accessibility and its rank value were calculated as shown in the

top maps of Fig. 10. In the same way, RankDen represents the rank value of cell density,

and the density and its rank value are shown in the bottom maps of Fig 10.

Figure 10 Calculated RankAcc and RankDen based on cell units

Table 2 lists the results of the binary logistic regression model. The values of both

RankAcc and RankDen are statistically significant. RankAcc has a negative parameter,

which means if a cell has a higher accessibility ranking (ascending order), it will have a

higher probability value. In addition, RankDen has a positive effect, which means if a

cell has a higher density ranking (ascending order), it will have a lower probability value.

In the binary logistic regression model, the classification accuracy is important, and it

was 80.4%. This value is not low and can be acceptable.

Table 2 Results of binary logistic analysis

Index variables β S.E. Wals Sig. Exp(β)

Explanatory

variables

(constants) -.830 .228 13.216 .000 .436

RankAcc -.010 .002 18.535 .000 .990

RankDen .007 .002 9.599 .002 1.007

Goodness of

fit for the

model

𝑥2 26.238 (0.000)

Classification accuracy 80.4%

-2 Log-likelihood 428.097

𝑅2 of Cox and Snell 0.056

𝑅2 of Nagelkerke 0.088

We can derive Equation 7 using the parameters in Table 1 for Equation 1.

Prob(𝑦𝑖 = 1|𝑥𝑖) =exp (−0.830+(−0.010)𝑅𝑎𝑛𝑘𝐴𝑐𝑐𝑖+(0.007)𝑅𝑎𝑛𝑘𝐷𝑒𝑛𝑖)

1+exp (−0.830+(−0.010)𝑅𝑎𝑛𝑘𝐴𝑐𝑐𝑖+(0.007)𝑅𝑎𝑛𝑘𝐷𝑒𝑛𝑖) Equation 7

We calculated the probabilities for each cell using Equation 9. The left side of Fig. 10

shows the order of the probabilities calculated for each cell (blue line), along with the

frequency of the cells that were actually reconstructed (orange vertical line). This graph

shows that even if a cell has a high probability value, it will not necessarily be

reconstructed, but simply has a high probability of being reconstructed. The right side

of Fig. 11 shows this point. The graph shows that with a higher range of probability,

the average frequency of reconstruction is higher. Therefore, we applied the random

sampling function based on the probability value of each cell.

Figure 11 Comparison of estimated probability and actual reconstructed cells (left),

along with average frequency for each range of probability (right).

4.3 Estimate parameters of expected reconstructed floor space

In this section, we verify the calculation of the expected reconstructed floor space in a

cell. We developed an equation to calculate the floor space using a multiple regression

model as the equation 4 using the location utility.

Regarding these dependent variables, the calculation processes for the DistMetro and

Density variables were presented in section 3. Fig. 12 presents the process for the

DistRoad variable. At the left side of Fig. 12, the bold red lines represent the main roads,

and the distance from each cell to the main road is presented on the right side of Fig.

12

Figure 11 Calculated distances from main roads to each cell

The natural logs of the two distance variables (DistMetro and DistRoad) were obtained

when the regression model was analysed. The independent variable was the total actual

reconstructed floor space for the 81 cells. The results of the analysis are listed in Table

3. The 𝑅2 value, which represents the explanation power of the regression model, was

approximately 69.5%, which is very high. DistMetro and DistRoad showed negative

effects, which means being closer to the nearest metro station and main road further

increased the reconstructed floor space of the cell. In addition, if the Density variable

had a high value in a cell, the cell had a large reconstructed floor space.

Table 3 Results of multiple regression analysis

Dependent

variables

Unstandardized

Coefficients

Standardized

Coefficients t Sig.

Collinearity

Statistics

𝛽 Std. Error 𝛽∗ Tolerance VIF

(constant) 10.874 0.847 12.832 0.000

Ln_DistMetro -0.605 0.135 -0.326 -4.492 0.000 0.747 1.338

Ln_DistRoad -0.040 0.032 -0.081 -1.258 0.212 0.960 1.042

Density 0.773 0.089 0.627 8.701 0.000 0.759 1.318

If the parameters listed in Table 2 are used for equation 8, the following is obtained,

𝑅𝑒𝐶𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝐹𝑙𝑟𝑖 = (10.874) + (−0.605)𝐷𝑖𝑠𝑀𝑒𝑡𝑟𝑜𝑖∙𝑘 + (−0.040)𝐷𝑖𝑠𝑅𝑜𝑎𝑑𝑖∙𝑚 +

(−0.773)𝐷𝑒𝑛𝑠𝑖𝑡𝑦𝑖 Equation 8

Using this equation, we estimated the reconstructed floor space when the cell was

reconstructed. Fig. 13 shows the estimated reconstructed floor space and actual

reconstructed floor space for each cell. At the left side of Fig. 13, the patterns of the

two graphs that are very similar exclude cell number 824. In addition, when we compare

the estimated and actual values at the right side of Fig. 13, the graph (bold red dotted

line) is between the two graphs within ±10%, which means the result of comparing the

estimated and actual values is acceptable.

Figure 12 Comparison of estimated floor space and actual floor space of

reconstructed cell.

In the case of cell 824, we checked the actual data to determine why the difference was

so large. There were a hospital (cell 797) and restaurant (cell 824) in 2008, but the

hospital was expanded to cell 824 in 2011. Therefore, the characteristics of the hospital

affected cell no. 824 (refer to Fig. 14).

Figure 14 Before (left) and after reconstruction in cells 824 and 797.

5 Conclusion

In this paper, we proposed an improved disaggregation model to enhance urban

models based on zonal data. The previous studies usually utilized a spatial interpolation

method or allocation method with weight values for disaggregating from zonal data to

cell units. These methods have many potential advantages in the visualization of the

resulting data predicted by the models, but there are some limitations because of the

difference between the results and reality. On the other hand, this paper presented a

process for a disaggregation method to overcome these limitations. It repeatedly

selected a cell with the highest probability to be reconstructed and calculated the

expected floor space after its reconstruction using a utility for the cell’s location to

distribute the increased floor space in zones based on cell data that were aggregated

from building data in the base year. We performed an empirical analysis to verify the

disaggregation process using actual data collected until 2016.

As a result of this empirical analysis, we first calculated the probability value for each

cell to select a cell to be reconstructed using a binary logistic regression model. We

used two dependent variables (RankAcc and RankDen), and the accuracy was

approximately 81% for the classification of whether or not the cell will be reconstructed.

If RankAcc was high and RankDen was low, the probability was high. Next, we estimated

the new floor space when the cell will be reconstructed using a multinomial regression

model as the utility function. Three variables were used as dependent variables:

DistMetro, DistRoad, and DenCell. As a result, the 𝑅2 value, which represented the

explanation power of the regression model, was approximately 69.5%. This is a very

high value. A comparison of the estimated reconstructed floor space and actual

reconstructed floor space showed similar patterns, excluding some cells.

The following conclusions were reached in this study. First, the zonal data predicted by

various urban simulation models based on zonal spatial units can be disaggregated

based on building data in the base year. This is very beneficial not only for the

visualization of the zones but also for reducing the difference compared to the actual

data from the base year. In particular, this study focused on retail and business land

use and the catchment areas of metro stations, which makes it useful for the

management of a business district, analysis of long-term changes in commercial power,

or opening of a store. Second, this study used random functions in two steps, when

selecting a cell and estimating the floor space, based on the uncertainty theory. We

believed that a future spatial structure cannot be presented using a single scenario

because there are many unpredictable conditions. Therefore, we presented various

future scenarios within an acceptable statistical range. In this sense, this study has

meaning in the application of the disaggregation process and can be a step forward

from previous studies.

As a final remark, we would like to point out some weaknesses of the research. First,

the dataset adopted only covers retail and business land use. There is a need for

another disaggregation model for residential floor space because retail or business

facilities are not separated from residential facilities. Therefore, if a disaggregation

model for the residential part is developed and integrated in the next study, the model

is expected to be more useful.

Reference

Bazghandi, A., (2012). Techniques, Advantages and Problems of Agent Based Modeling

for Traffic Simulation, International Journal of Computer Science Issues, 9_1(3): 115-119.

Fotheringham, A.S., Wegener, M., Eds., (2000): Spatial Models and GIS: New Potential

and New Models. GISDATA 7. London: Taylor & Francis, 45-61.

Goldner, W., (1971). The Lowry Model Heritage, Journal of the American Institute of

Planners, 37( ): 100-110.

Heonsoo, P. and Kyuyoung, C., (2008). An Empirical Analysis of Land Use Changes in

Daegu Metropolitan City by Using Probabilistic Choice Model, Korea Spatial Planning

Review, 58(): 137-150.

Heeyun, H., (2002). Urban Ecology and Urban Spatial Structure, Boseonggak Publishers,

Seoul, Republic of Korea.

Lam N. S., (1983). Spatial Interpolation Methods: A Review, The American Cartographer,

10(2): 129-149.

Moeckel, R., Spiekermann, K., Schurmass, C. and Wegener, M., (2003). Microsimulation

of Land Use, International Journal of Urban Sciences, 7(1): 14-31.

Oryani, K., (1997). Review of Land Use Models: Theory and Application, Transportation

Research Board, pp: 80-91

Pratesi M., (2015). Spatial Spatial Disaggregation and Small-Area Estimation Methods

for Agricultural Surveys: Solutions and Perspectives, Technical Report Series

PROPOLIS: Planning and Research of Policies for Land Use and Transport for Increasing

Urban Sustainability, (2004) DG Research, Brussel, Belgium.

Putman, S. H., (1983). Integrated Urban Models, Policy Analysis of Transportation and

Land Use, Pion Limited, London

Putman S. H., (1991). Integrated Urban Models 2: New Research and Application of

Optimization and Dynamics, Pion Limited, London.

Sungsil, H. and Munjung, K., (2004). Acceptance Probability Model Using the Logistic

Regression Model, Journal of the Korean Data Analysis Society, 6(4): 1153-1161.

Tobler W. R., (1979). Smooth Pycnophylactic Interpolation for Geographical Regions,

Journal of the American Statistical Association, 74(367): 519-530.

Wagner P. and Wegener, M., (2007). Urban Land Use, Transport and Environment Models:

Experiences with an Integrated Microscopic Approach, disP-The Planning Review, 170(3):

45-56.

Wegener, M., (1994). Operational Urban Models: State of the Art, Journal of American

Planning Association, 60(1): 17-29.

Youngsoo, A., Seongman, J. and Seungil, L., (2016). An Empirical Study on the

Relationship between Pedestrian Network Distance and Building Density in the Area of

Urban Rail Station, Journal of Korea Planning Association, 51(2): 179-192.

Youngsoo, A., Yeonggyeong, K. and Seungil, L., (2014). A Study on the Impact of Soft

Location Factors in the Relocation of Service and Manufacturing Firms, International

Journal of Urban Science, 18(3): 327-339.

Youngsoo, A., Seongman, J. and Seungil, L., (2012). A Study on the Distribution Pattern

of Commercial Facilities around a Subway Station Using GIS Network Analysis, Journal

of Korea Planning Association, 47(1): 199-213.

Documents

Youngsoo, An1 Valentina, Nacar2, Seungil, Lee 3