An Integrate Multistage Framework for Automatic Road Extraction From High Resolution Satellite Imagery

8/2/2019 An Integrate Multistage Framework for Automatic Road Extraction From High Resolution Satellite Imagery

1/25

RESEARCH ARTICLE

An Integrated Multistage Framework for Automatic Road

Extraction from High Resolution Satellite ImageryT. T. Mirnalinee & Sukhendu Das & Koshy Varghese

Received: 6 October 2009 /Accepted: 6 April 2010 /Published online: 12 March 2011# Indian Society of Remote Sensing 2011

Abstract Automated procedures to rapidly identify

road networks from high-resolution satellite imagery

are necessary for modern applications in GIS. In this

paper, we propose an approach for automatic road

extraction by integrating a set of appropriate modules

in a unified framework, to solve this complex

problem. The two main properties of roads used are:

(1) spectral contrast with respect to background and

(2) locally linear path. Support Vector Machine is

used to discriminate between road and non-road

segments. We propose a Dominant singular Measure(DSM) for the task of detecting linear (locally) road

boundaries. This pair of information of road seg-

ments, obtained using Probabilistic SVM (PSVM)

and DSM, is integrated using a modified Constraint

Satisfaction Neural Network. Results of this integra-

tion are not satisfactory due to occlusion of roads,

variation of road material, and curvilinear pattern.

Suitable post-processing modules (segment linking

and region part segmentation) have been designed to

address these issues. The proposed non-model based

approach is verified with extensive experimentations

and performance compared with two state-of-the-art

techniques and a GIS based tool, using multi-spectral

satellite images. The proposed methodology is robust

and shows superior performance (completeness and

correctness are used as measures) in automating the

process of road network extraction.

Keywords Dominant singular measure . PSVM .CSNN-CII . Road edges . Road segments . Fusion .

Segment linking . Region part segmentation

Introduction

Road networks are essential modes of transportation,

and provide a backbone for human civilization.

Cartographic object extraction from digital imagery

is a fundamental operation for GIS update. However

the complete automation of the extraction processes isstill an unsolved problem. Road feature extraction

from a raster image is a non trivial and image specific

process. Hence, it is difficult to have a general method

to extract roads from any given raster image. Road

layers on raster maps typically have two distinguish-

able geometric properties from other layers: (1) Road

lines are straight within a small distance (i.e., several

meters in a street block); (2) Unlike building layers,

which could have many small distinct connected

J Indian Soc Remote Sens (March 2011) 39(1):125

DOI 10.1007/s12524-011-0063-9

T. T. Mirnalinee : S. Das (*)

Visualization and Perception Lab, Dept. of CSE, IndianInstitute of Technology, Madras,

Chennai 600 036, India

e-mail: [email protected]

T. T. Mirnalinee


K. Varghese

Dept. of Civil Engg, Indian Institute of Technology,

Madras,

Chennai 600 036, India



2/25

components, roads are connected to each other to

form a road network. Road layers usually have few

connected objects or even only one huge connected

object forming a whole road layer. Many works on

this topic have been presented (Laptev et al. 2000; Shi

and Zhu 2002; Hinz and Baumgartner 2003; Hu and

Tao 2007; Mokhtarzade and Zoej 2007; Mena 2003;Tupin et al. 2002). However, the manual intervention

of the operator in extracting, defining and validating

cartographic objects for GIS update is still needed.

Applications of road extraction process are found in

updating GIS records, urban planning, traffic control,

car navigation, map generation etc.

Most of the works published in literature on road

detection from satellite images are classified in two

categories: (1) Semi Automatic (Gruen and Li 1995;

Udomhunsakul 2004; Bucha et al. 2006; Zhang et al.

2008; Hu et al. 2004; Xiao et al. 2005) processes thatrequire help from a human operator. In contrast to the

automatic methods they demand a number of seed

points which are usually chosen by the operator in an

interactive fashion. Given such seed points the semi-

automatic algorithm connects them by a path which is

most likely a road. On the other hand, (2) Automatic

(Laptev et al. 2000; Shi and Zhu 2002; Hinz and

Baumgartner 2003; Mokhtarzade and Zoej 2007;

Baumgartner et al. 2002; Zhu et al. 2005) road

extraction methods require no initial (prior) informa-

tion about the presence and location of roads. In thefollowing, we will discuss automatic road extraction

process.

Automated extraction of r oads f rom high-

resolution imagery is a difficult task because of

the complexity in spatial and spectral variability of

the road network. Roads exhibit a variety of

spectral responses due to differences in age and/or

material and vary widely in physical dimensions. In

addition, r oad network in dense urban areas

typically have different geometric characteristics

than those in suburban and rural areas. Techniquesto extract road networks using binarization and line

segment matching of high-resolution IKONOS

urban imagery were presented in (Shi and Zhu

2002; Zhu et al. 2005). A line segment match

method was used to detect long linear groups of

pixels for classification as roads. These road pixels

are then simplified into the road centerlines with the

use of morphological operators. Mayer et al. (1997)

presented a complex road net-work extraction ap-

proach that attempts to accurately map both the road

network and the road edges through the use of

snakes (Kass et al. 1987). In another approach, Hinz

and Baumgartner (2003) utilized multiple very high-

resolution aerial images and detailed scene models,

to perform road extraction.

One can find a survey of road extraction methodsfrom satellite images by Mena (2003). Tupin et al.

(2002) presented the road extraction algorithm using

feature extraction (line detector) and network recon-

struction (graph labeling), which uses multiple views

of the same scene. According to McKeown (1996),

roads extracted from one raster image need not be

extracted in the same way from another raster image,

as there can be a drastic change in the value of

important parameters based on nature, instrument

variation, and photographic orientation. Yang and

Wang (2007) proposed a road extraction algorithmwhich deals with detecting two types of road

primitives, namely blob-like primitive and line-like

primitive. These primitives are defined, measured,

extracted and linked using different methods for

dissimilar road scenes.

Tuncer (2007) proposed a method which comprises

of preprocessing the image via a series of wavelet

based filter banks and reducing the data into a single

image which is of the same size as the original

satellite image. Then a fuzzy inference algorithm is

utilized to perform road detection. Each waveletfunction resolves features at a different resolution

level associated with the frequency response of the

corresponding FIR filter. Resulting two images are

fused together using Karhounen-Louve transform

(KLT) which is based on principal component

analysis (PCA). This process underlines the promi-

nent features of the original image as well as

denoising it, since the prominent features appear in

both of the wavelet transformed images while noise

does not strongly correlate between scales. Next a

fuzzy logic inference algorithm which is based onstatistical information and geometry is used to extract

the road pixels. The approach is only suitable for the

Ikonos data on rural areas where roads are mostly

homogeneous and are not disturbed by shadows or

occlusions. The central idea is to take into account the

spectral information by means of a (fuzzy) classifica-

tion approach.

A back-propagation neural network (BPNN) with

one hidden layer has been proposed for road

2 J Indian Soc Remote Sens (March 2011) 39(1):125


3/25

extraction in Mokhtarzade and Zoej (2007). The

output layer consists of one neurode that expresses

the networks response by a number between 0 and 1,

as background and road pixel respectively. Back

propagation Neural Network with different sizes of

the hidden layers, were trained with different number

of iterations before converging. Training and recallingstages were time consuming in this approach.

Doucette et al. (2001) introduced a self-organizing

road map algorithm to extract roads from high-

resolution Multi-Spectral imagery. The self organizing

road map, a specialized version of the self organizing

neural network model, performs spatial clustering to

identify and group together elongated regions.

Most of the methods discussed so far use a limited

set of image samples of a particular area to obtain

descent results. Some of them do not exhibit

performance analysis and comparative study withexisting state of the art techniques. Techniques

adapted are often adhoc and tuned for a particular

set of (few) samples acquired to show results. Our

study of road extraction is solely based on the road

characteristics (geometrical and spectral) stored in an

implicit manner in a raster image.

It is often difficult to obtain satisfactory results, by

using only one of these methods to detect road

structures in complex pictures. However, it is possible

to improve the results by using the complementary

nature of edge-based and region-based information. Alarge amount of work on the fusion of edge and

region information have been reported in literature

(Haddon and Boyce 1990; Chu and Aggarwal 1993;

Moigne and Tilton 1995; Pavlidis and Liow 1990) for

image segmentation. Pavlidis and Liow (1990) de-

scribed a method to combine segments obtained using

a region growing (over-segmented) approach, where

the edges between regions are eliminated or modified

based on contrast, gradient and smoothness of

the boundary. Haddon and Boyce (1990) generate

regions by partitioning the image co-occurrencematrix and then refining them by relaxation using

the edge information. Chu and Aggarwal (1993)

present an optimization method to integrate segmen-

tation and edge maps obtained from several channels,

including visible, infrared, etc., where user specified

weights and arbitrary mixing of region and edge

maps are allowed. Most of the methods proposed for

combining region and edge information are highly

sensitive to the correctness of edge map.

Lin et al. (1992) proposed constraint a satisfaction

neural network for image segmentation. They posed

the image segmentation problem as a constraint

satisfaction problem (CSP) by interpreting the process

as one of assigning labels to pixels subject to certain

spatial con-straints. Kurugollu and Sankur (1999)

proposed a segmentation algorithm for color images,which implements the MAP estimation of the label

field using a CSNN. In their work, the initial class

probabilities are obtained via a fuzzy C-means

algorithm in contrast to Lin et al. (1992) method,

where an adhoc fuzzification of an initial map takes

place. They have tried to combine advantages of

GMRF formulation (Raghu and Yegnanarayana 1996)

with those of the CSNN based (Lin et al. 1992)

relaxation. The results are shown on synthetic images.

In a recent work proposed by Lalit et al. (2008), a

CSNN-CII (Constraint Satisfaction Neural Net-workComplementary Information Integration) has been

used for texture segmentation. Results are shown on

simulated and real world images.

The focus of this paper is on the design and

development of a technique, which enables the user to

extract road segments from an input image without

much of user interaction. The motivation of our work

comes from the fact that the complimentary informa-

tion of regions (road pixels in our case) and edges

(road boundaries) have not been exploited together to

obtain a decent road map from satellite images. Eitherof these techniques when solely applied, produce

errors which do not occur together (simultaneously),

in general. This is due to the fact that the criteria for

classification of pixels as road regions look for

continuity and local smoothness, whereas methods

to detect road boundaries look for discontinuities in

raster images. Road regions are separated from non-

road regions in our proposed framework using a

PSVM (Probabilistic Support Vector Machine) classi-

fier. In our previous work on DSM (Dominant

Singular Measure) (Mirnalinee et al. 2009) basedroad extractor, the performance was low as the local

contrast between the regions was only considered.

Therefore, we decided to merge the information from

both DSM and PSVM using a CSNN-CII (Constraint

Satisfaction Neural Network with Complimentary

Information Integration) (Lalit et al. 2008) to produce

better results. A modified constraint satisfaction

neural network (CSNN) has been designed for this

task, which uses a novel dynamic window to merge

J Indian Soc Remote Sens (March 2011) 39(1):125 3


4/25

the complimentary information of edges and regions.

The output of CSNN-CII needs to be processed

further to remove some undesired artifacts and errors.

Segment linking algorithm is used to bridge the

discontinuities detected between road segments. Re-

gion part segmentation algorithm separates the roads

from protruding or attached non-road regions therebyimproving the accuracy. Results are shown using four

categories of database of high-resolution satellite

images from the following areas: (1) Developed

suburban, (2) Developed Urban, (3) Emerging subur-

ban and (4) Emerging Urban. Performance analysis is

presented using completeness and correctness meas-

ures (Heipke et al. 1997).

This paper is organized as follows: Section

Research Issues and Design Strategy deals with

the research issues and design strategies. Section

Proposed Method

deals with the overall proposed

methodology. Description of various stages in

pr op os ed fr am ew or k is pr es en te d in Se ct io n

Description of the Different Stages in Our Proposed

Framework. We present experimental results in Section

Experimental Results and Comparative Study and

conclude the paper in Section Conclusions.

Research Issues and Design Strategy

The difficulties in the design of an automated roadnetwork extraction system using remotely-sensed

imagery lie in the fact that the image characteristics

of road feature vary according to sensor type, spectral

and spatial resolution, ground characteristics, etc.

Even for an image taken over a particular urban area,

different parts of the road network reveal different

characteristics. In real world, a road network is too

complex to be modeled using a mathematical formu-

lation or an abstract model. The existence of other

objects (e.g., buildings and trees) cast shadows to

occlude road features, thus complicating the extrac-tion process.

Human perceptual ways of recognizing a road

involves (Jin and Davis 2005) extracting geometric,

radiometric and topological characteristics of an

image. Humans usually recognize a road using first

its geometric characteristics considering a road to be a

long, elongated feature with uniform width and

similar radiometric variance along its path. Even

though spectral characteristics of road vary within an

image, its physical appearance tends to exist as long

continuous features. Humans fuse these vital clues to

identify a foreground road object from the back-

ground layer. This motivated us to develop a generic

framework that integrates suitable processing modules

necessary for extracting the different types of features

present in road objects available in satellite scenes.We present the characteristics of roads next, followed

by suitable modules designed specifically to address

these issues. We also validate the efficiency of the

extraction system using experimental results.

Most significant characteristics of roads, which

appear in high-resolution satellite imagery are:

1. Roads have a distinctively contrasting spectral

signature (both locally and globally) with respect

to the background layer (e.g. vegetation, soil,

waterways, manmade structures etc.).

2. Roads are mostly elongated structures, with

locally linear properties.

3. The road surface is usually homogeneous, with

occasional variations.

4. Discontinuities appear in a road structure mainly

due to occluding objects, such as trees, buildings,

large vehicles etc. or even shadows.

5. Roads do not appear as a small segment or patch;

either in isolation or attached to a large linear

segment.

6. Roads rarely terminate (no abrupt ending) within

short distances. In fact, they intersect, occlude

one another (bridges and highways) and bifurcate

to build a network (global appearance).

7. Roads have near-parallel boundaries, with both

linear and curvilinear patterns.

8. Road structures are rarely non-smooth and occur

generally without much of sharp bends.

Among the different properties stated above, the

two major characteristics of roads are their geometri-

cal shape and spectral contrast (as stated in (1) and (2)

above). Roads in high spatial-resolution images ofurban areas appear as piecewise linear segments with

spectrally homogeneous characteristics. These are

vital clues, which form the basis of the design of

our framework for automatically detecting roads in

satellite imagery.

In the design of a framework for road detection, we

first need to exploit these two vital characteristics of

roads. In such a case, one may be tempted to use a

foreground extracting algorithm trained with spectral



5/25

patterns for roads and then use linear features on top

of it. However, a classifier based on only spectral

features will produce false alarms (identify non-road

objects as roads and filter parts of roads as back-

ground, due to reasons mentioned in points (3) and

(4) above). On the other hand, a pattern classifier (for

classifying roads) trained with geometrical features isuseless, unless the target (road, in this case) is

available. It is also not possible to simultaneously

extract and fuse these pair of distinct/disconnected

features together, as not unless the road-like structures

are filtered from the background the linear features

may be estimated. It is impossible to design an

operator or mask for this purpose, as that would need

to simultaneously extract spectral and RST-invariant

shape (geometrical) features from the image data. It is

also not possible to formulate a mathematical (para-

metric) model for a road network, which will work forall complex variations in the geometric design

patterns (linear and curvilinear) formed by roads in

urban scenarios.

Due to the existence of these complex phenomena

for roads, it is almost impossible to consider and

model all these situations and incorporate them in a

single module or processing stage for road network

extraction. This drove us to formulate and design a

hierarchical pipelined framework, consisting of the

classification (supervised), information integration,

filtering and local neighborhood analysis to obtaindescent results with acceptable quality. Results will be

compared with two state-of-the-art methods (Tuncer

2007; Mokhtarzade and Zoej 2007) published in

literature and one GIS-based software (Geospace

2008) used for raster image analysis.

Because of the issues mentioned earlier, in most

cases with hyper-spectral dataset, the spectral infor-

mation alone is not sufficient to define roads. We

need an integrated multistage framework to achieve

our goal. Each stage of the framework deals with a

particular characteristic of roads and are given in the

left column of Table 1. The center column gives the

corresponding strategy (processing module) used by

us to solve the problem, while the right-hand side

column specifies the difficulties/drawbacks that onemay face in execution of that stage. In the next

section we describe our proposed multistage method

based on the issues discussed in this section, followed

by design details of the road extraction modules listed

in Table 1.

Proposed Method

A multistage pipelined framework for road extrac-

tion has been proposed in this paper. Figure 1shows the flowchart of our proposed method of road

extraction, which is a hierarchical pipelined multi-

stage framework based on details specified in

Table 1. The first stage consists of an iterative

merging of region and edge based information using

a set of constraints. Road edges (boundaries) are

extracted from edge features using DSM. We assume

roads appearing in satellite images to be locally

linear. Soft class labels (probabilities) for each pixel

belonging to either road or non-road regions are

produced by the PSVM. Then a modified CSNN,termed CSNN-CII (Lalit et al. 2008) is used for

integrating the complimentary information from the

edge and region outputs. A fruitful cooperation

could be established between region-based and

edge-based methods to extract elongated thick

objects like roads in high-resolution satellite imag-

ery. Elongatedness measure (shape feature) is used to

remove the isolated non-road structures. Then a

Table 1 Road characteristics & corresponding processing module

Sl.

No

Characteristics Strategy/module Remarks

1. Contrast w.r.t. background

Mostly homogenous

SVM classifier using mean and variance of

spectral response

Misclassification of non-road objects with iden-

tical spectral response

2. Elongated Structure DSM on edge map; Shape features Discontinuity due to occlusion

3. Discontinuities and distortions in

linear pattern

CSNN-CII and Segment Linking Chance of linking roads with other structures

4. Not appearing in isolation, rarely

terminate

Region Part Segmentation Removal of small road fragments



6/25

segment linking algorithm is used to link the

discontinuous road segments which result due to

occlusion. Region part algorithm module removes

the non-road structures which appear due to adjacent

manmade structures. The steps of the algorithm,

depicting the process illustrated in Fig. 1, is given in

Algorithm 1. In the next section, we present the

description of the different stages of our framework

along with intermediate results of processing using

two satellite image samples.

Algorithm 1 Proposed framework for road detection.

Input: Image.

Output: Segmented Image.

Steps:

1. Compute edge maps of the image using DSM.

2. Compute the probability of class-label for each pixel using PSVM.

3. Integrate region information and edge information (output of steps (2) & (1)) using CSNN-CII

(Lalit et al, 2008):

Initialize the neuron in CSNN-CII using the probability obtained from PSVM.

Iterate and update the probabilities and edge map to get the final segmented map.

4. Post-process the CSNN-CII output to remove stray patches and unnecessary artifacts.

5. Perform segment linking to reduce the false negative.

6. Perform Region part segmentation algorithm to reduce the false positive.

Description of the Different Stages

in Our Proposed Framework

DSM Based Edge Detection

Roads are expected to be locally linear. Hence, we

extract the local orientation from the image of the

road network. Extracting linear features from satellite

images have been of interest to pattern recognition

community for some time (Cooper and Cowan 2007;

Granlund and Knutsson 1995; Lyvers and Mitchell

1988; Wei and Xin 2008; B.Majidi and BabHadiashar

2009). In the work by Cooper and Cowan (2007),

amplitude balanced horizontal derivatives were used

for enhancing linear features in images. However, ifthe dataset possesses features with large variations in

amplitude then the horizontal derivative will also have

the same property, and the smaller amplitude features

(which may be of considerable importance) may be

hard to discern. Granlund and Knutsson (1995)

devised an elegant method for combining the outputs

of quadrature pairs to extract a measure of orientation.

Perona (1998) extended the idea of anisotropic

diffusion to orientation maps. Bigun et al. (1991)

posed the problem as the least squares fitting of a

plane in the Fourier transform domain. Anothertechnique (Haglund and Fleet 1994) b ase d o n

steerable filters (Jacob and Unser 2004), is limited

in precision and generalization. In (Lyvers and

Mitchell 1988), Lyvers et al. examined the accuracy

of various local differential operators for noiseless

situations, as well as in the presence of additive

Gaussian noise. In (Jiang 2007), Jiang proposed an

image integration operator which leads to unbiased

orientation estimation.Fig. 1 Framework of the proposed method for road detection



7/25

Our method of obtaining the dominant direction

using PCA and a gradient matrix (obtained using 1-D

Canny (Kumar et al. 2000)) for orientation estimation

to extract road segments is novel, more efficient and

produces more robust results. Most established local

orientation estimation techniques are based on the

analysis of the local gradient field of the image. Butthe local gradients are very sensitive to noise, thus

making the estimate of local orientation from these

images unreliable. We use the method of Principal

Component Analysis (PCA) for image orientation

estimation. For each pixel in the image, we first

calculate the local image gradients (using 1-D Canny

(Kumar et al. 2000)) and then perform SVD of the

gradient matrix. Gradient of image f(x,y) at point (xk,

yk) is denoted by:

rfk rfxk;yk dfxk;ykdx ; dfxk;ykdy

T 1

which involves 1-D processing along orthogonal

directions (for details see (Kumar et al. 2000)). For

example, the smoothing operator used along one

direction (say, x) is the Gaussian filter:

Gx 1ffiffiffiffiffi2p

ps1

expx2

2s21 2

and the 1-D Canny operator for computing the

derivative along y is:

dGy yffiffiffiffiffi2p

ps32

expy22s22

3

Similar processing are applied along the y and x

directions, where the two operators interchange their

directions of processing. This method is efficient and

produces better gradient vectors which are orthogonal

to the dominant orientation of the image pattern. Let

us assume that in the image of interest f(x,y), the

orientation field is piece-wise constant. Under thisassumption, the gradient vectors in an image block

should on average be orthogonal to the dominant

orientation of the image pattern. So orientation

estimation can be formulated as the task of finding a

unit vector a, which maximizes the average of theangles between a and gradient vectors (Feng andMilanfar 2002). The computational basis of PCA is

the calculation of the Singular Value Decomposition

(SVD) of the data covariance matrix. The majority of

the eigenvectors form a cluster along a dominant

direction indicating the presence of a linear structure.

The eigenvalue will reflect the strength (peakiness in

domain) of the distribution of the gradients towards

a particular direction. Generally, the first eigenvalue is

larger than the second one, and hence in case of an

ideal straight line the second eigenvalue is zero(indicating no spread along the orthogonal direction).

However, a digital line is represented stepwise

(aliased), and hence the second eigenvalue for the

case of a line in a digital image is a non-zero value. In

order to get the local orientation estimate, we

rearrange the gradient vectors into a 2 N2 matrix,

where a window size of NN is used for processing

around each pixel, as shown below:

G rf1 rf2 rf3::: rfN2 4

where, rfi rfxi;yi; i 1; 2; :::;N2 see Eq. 1.We then compute the SVD (Singular Value Decom-

position) (Strang 2005) of the gradient matrix for each

pixel, computed using a window of size NN. SVD

of the gradient map is computed as

G USVT 5where, U is an orthogonal 22 matrix, in which the

first column represents the dominant orientation of the

gradient field. S is a 2 N2 matrix, representing the

energy along the dominant directions and V isorthogonal matrix of size N2 N2 representing each

vectors contribution to the singular value.

Dominant Singular Measure

Dominant singular Measure (DSM) is computed as

the ratio between the singular value of the major axis

and the sum of the singular values. This measure

approaches 1 for an elongated shape, DSM is

defined as:

DSM s1s1 s2 ; s1 ! s2 6

When all the gradient components have the same

direction, only one singular value (s1) is non-zero,

which in turn makes the DSM value equal to 1. If both

the singular values are equal and non-zero, the DSM

value is 0.5. Range of values of DSM thus lies in the

range [0.5 - 1]. We use the DSM measure to distinguish

between scattered or disoriented image patterns and an



8/25

image region with an orientation pattern. If the DSM is

less than a threshold (0:5


9/25

allows the identification of samples drawn from

unknown classes through the application of a suitable

Bayesian decision rule (Duda et al. 2000). This

approach is based on support vector machines(SVMs) for the estimation of probability density

functions, which uses a recursive procedure to

generate prior probability estimates for known and

unknown classes. SVMs are exploited by Yager and

Sowmya (2003) as a classifier for road extraction,

which involves two stages of processing. Here, SVM

is trained using edge based features such as, edge

length, gradient and intensity within the edge pair. In

level 1, SVM is used to classify edges as road edges

or non-road edges. Edges classified as road edges are

given as input to the SVM in level 2 where oppositeedges are paired as road segments. However, they

have reported very low correctness measure. A new

method (Miliaresisa and Kokkasb 2007) is presented

for the extraction of buildings from light detection

and ranging (LIDAR) digital elevation models

(DEMs) on the basis of segmentation principles. The

accuracy of supervised classification largely depends

on the quality of the training data. The locations and

sample size of training data are difficult to be

optimized depending on image data types and

classifiers to be used.Support vector machines (SVM) represent a prom-

ising development in machine learning research that is

not widely used within the remote sensing community

(Pal and Mather 2005). The architecture of a SVM

machine (Theodoridis and Koutroumbas 2006) is

given in Fig. 5. Number of nodes is determined by

the number of support vectors Ns.

The main idea of SVM is to separate the classes

with a hyperplane surface so as to maximize the

margin among them. In this paper, support vector

machines are used to classify roads from satellite

imagery. In SVM the input vectors are mapped

nonlinearly to a very high-dimensional feature space(Cortes and Vapnik 1995). Considering a two-class

pattern classification problem, let the training set

of size N be Xi; diNi1 where, X2i 2 Rnis the inputpattern for the ith example and d2i 2 1; 1is thecorresponding desired response. The classifier is

represented by the function fx; a y with asthe parameters of the classifier. The SVM method

involves finding the optimum separating hyperplane

so that:

1. Samples with labels y = 1 are located on each

side of the hyperplane.

2. The distances of the closest vectors to the

hyperplane on each side are maximum. These

are called support vectors and the distance is the

optimal margin.

The membership decision rule is based on the

function f(x) where, f(x) represents the discriminant

(a) (b) (c)

Fig. 4 The results of DSM on a satellite image of a suburban scene. a Input image, b edge map extracted using multi-scale Canny

(Kumar et al. 2000; Qian and Huang 1996) c corresponding DSM output

Fig. 5 Architecture of SVM



10/25

function associated with the hyperplane in the trans-

formed space and is defined as:

fx w:fx w0 7where, w* is the weight vector, w0 is the bias fx 2

Rd0

d0d

: SVM is used to classify every pixel into

either road or non-road groups based on the sign ofthe discriminant function (y = sgn(f(x))). Pixels

belonging to roads are assigned as group 1 and others

to group 2 from training sample images. Since SVM

has good generalization ability, this decision function

can be applied to extract road structures from satellite

images. Through training, we obtain the decision

function. The feature vectors are fed into the SVM

classifier initially for training (to learn the pattern)

from known examples, and then for predicting the

labels of unknown samples once the training is

complete. Considering a classifier to produce aposterior probability is very useful in practical

recognition problems. Posterior probabilities are also

required when a classifier is making a small part of an

overall decision, and the classification output is

combined for overall decision. As described above,

SVM is principally a binary classifier. Polynomial

kernel of degree two was used due to its superiority

over other kernels for most of the applications.

However, SVM (Cortes and Vapnik 1995) produces

an uncalibrated value that is not a probability. In the

next section, we describe a mechanism to obtainprobabilistic classification of pixels as roads or non-

roads, using soft-class labels from SVM.

Soft Class Labels Using PSVM

SVM does not provide any estimation of their

classification confidence. Thus, SVM does not allow

us to incorporate any a-priori information. Hence we

use PSVM to produce posterior probability P(Class/

Input). The posterior probability outputs of SVMs are

based on the distance of testing vectors and support

vectors. Following a method presented in Platt

(1999), a sigmoid model is used to map binary

SVM scores into probabilities as shown below:

Py 1 fj 11 expAf B 8

where y is the binary class label and f is an output

of SVM decision function (Eq. 7). The two parame-

ters A and B are obtained using Maximum likelihood

using the training set (fi, yi). The parameters A and B

are found by minimizing the negative log- likelihood

of the training data. An image block is said to be road

if its probability output by PSVM is larger than a

predetermined threshold. As a result, the model has a

probabilistic output for further processing. Probabi-listic output of a classifier makes it possible to use

existing results for fusion theories, especially in cases

when a classifier is making a small part of an overall

decision, and the classification outputs must be

combined for the overall decision.

Training samples are gathered from regions sur-

rounding the road pixels. The sample sub images shown

in Fig. 6, illustrate the discriminative feature between

road and non-road samples. Spectral characteristics

vary for both the classes, which is analyzed by PSVM.

As seen from Fig. 6a, local homogeneous orientationfor the road class will be captured by DSM, whereas

non-road structures as shown in Fig. 6b will produce

distributed orientations. In order to demonstrate the

performance of the proposed method, we used the

generated dataset described in Section Dataset

Description and Performance Measures. Our system

is trained with 5,000 samples of road and 7,200

samples of non-road classes. Once the classifier is

trained, it is asked to predict the labels for the test

(a) (b)

Fig. 6 a Road samples and b non road samples, of size 2121



11/25

image pixels. Figure 7 shows the results of P-SVM for

the images given in Figs. 3a& 4a. Experimental results

for different scenarios, namely, urban and suburban

areas of developed and emerging countries and their

discussions are presented in Section Results and

Discussion. In the next section, we discuss the method

of fusing the two complementary information (segmentclass from PSVM and linear edgemap edge obtained

using DSM), using a CSNN (Constraint Satisfaction

Neural Network) based integrator.

CSNN for Integration

Edge extraction from satellite images often delivers

partly fragmented and erroneous results. Attributes

describing geometrical and radiometric properties of

the line segments can be helpful in sorting out the

most probable false alarms. However, these attributesmay be ambiguous and are not considered to be

reliable enough when used alone. Region based

segmentation produces over-segmentation whereas

edge based segmentation may lead to under -

segmentation. We used a fusion strategy proposed

by Lalit et al. (2008), which uses a constraint to

iteratively correct both these erroneous outputs to

produce a better result. The method is described

briefly in the following for the sake of completion of

this paper.

Each neuron in CSNN-CII contains two fields:probability and rank. Rank field stores the rank of the

probability in a decreasing order for that neuron. We

exploit the soft class labels produced by PSVM to

compute ranks, which in turn is used to initialize the

interconnection weights of the CSNN. In addition to

region-based constraints CSNN-CII also incorporates

edge constraints. The number of neighbors considered

for computation is determined using edge informa-

tion. The initial class probabilities can be obtained

using PSVM (Platt 1999). The initial edge maps can

be obtained using DSM based techniques for road

edge extraction.

Dynamic Window

The interconnection weights of the CSNN are

computed only for those neurons which are within

the effective size of the dynamic window. This

effective width is based on the presence of edge

information around the seed pixel. The stopping

criterion is based on the presence of the edge pixels.

Hence this process helps to mutually exploit both the

complementary information of regions and edges

inside the window. The window is considered to be

dynamic (or adaptive), as its effective size depends onboth these information one (region) for initial estima-

tion and the other (edge) for convergence. The

obvious advantage of using dynamic window at

region boundaries is that only the neurons which

correspond to a single class will be processed and the

neurons which may confuse the network would not be

used for computation. The optimal size of dynamic

window (m n) was obtained empirically as 3121.

Lalit et al. (2008) used a square window, whereas we

use a rectangular oriented window in our work. The

orientation of the rectangular window is obtainedfrom the DSM output. It was observed from experi-

mentation, that when a larger window size was used

small regions (or small sections of a region) were

merged with larger adjacent regions. The use of a

smaller window size makes the CSNN take a longer

time to converge to the final solution. Figure 8 shows

(a) (b)

Fig. 8 The results of CSNN-CII obtained by: a combining

those in Fig. 3c & Fig. 7a; b combining those in Fig. 4c &

Fig. 7b, for the images in Figs. 3a & 4a respectively

(a) (b)

Fig. 7 a The results of P-SVM for the image shown in Fig. 3a;

b the results of P-SVM for the image shown in Fig. 4a



12/25

the results of CSNN-CII using inputs from the

intermediate results of processing shown in Figs. 3,

4 and 7, for the images in Figs. 3 and 4a.

Post-Processing and Segment Linking

The objective of the refinement process presentedhere is to eliminate the false segments which do not

belong to roads. The result of CSNN integration

produces a few undesired patches, which do not

correspond to road segments. In the case of satellite

images, a few undesired or noisy structures will be

erroneously classified as road segments. To eliminate

these false alarms (segments), we use connected

component labeling (Haralick and Shapiro 1992) to

extract the disjoint segments from the output of our

algorithm. Segments with area less than a prefixed

threshold TA are deleted. Major axis and minor axis

lengths of each component are computed using

normalized second central moments for each segment

as shown below:

m20

M20

xM10;m02

M02

yM01;

x M10M00

; y M10M00

; Mpq X

x

Xy

xiyiIx;y

We computed the ratio of major axis length to the

minor axis length of each component as: E m20m02

Components having value of E less than a threshold

TE, are usually non-road structures and hence deleted.

The steps of the algorithm, depicting the post-

processing stage is given below in Algorithm 2.

Algorithm 2 Steps of Post-processing for refining the result.

Compute the connected components.

1. Compute Area (A) of each connected component.

2. Compute Eccentricity (E) of each connected component.

3. For each Component

if (E TE) then

delete that component

else

if (A TA)

delete that component

end if

end if

We used a region linking algorithm (Rizvandi et al.

2008) to eliminate the discontinuities detected between

road segments. Initially a dilation operation is performed

on the input image. Since dilation is an operation that

thickens or grows objects in the original image, the

result of this operation is that edge segments which arevery close to each other are automatically linked. In our

algorithm the structural element used for the dilation

operation is a disk of radius 10. The image is then

thinned and the edges are broken down into smaller

straight line edge segments. Heuristics based upon

proximity properties and alignment of road features are

used to cluster and integrate fragmented segments. For

each segment, the best neighbor is determined based on

the difference in direction and the minimum distance

between the end points. Results of post-processing and

segment linking are shown in Figs. 9 and 10.

(a) (b)

Fig. 9 The results of (a) post-processing using the output shown

in Fig. 8a; b segment linking using the output shown in (a)



13/25

Region Part-Segmentation

Region part segmentation is necessary to eliminate

some large patches of non-road structures which

appear to be fused to roads. These patches are man-made structures such as roof-tops, parking lots, with

similar spectral characteristics as roads. The proposed

algorithm for Region part-segmentation is based on

part-segmentation (Bennamoun and Mamic 2002),

consisting of the following steps:

1. Compute the smoothed inner and outer contours

(closed) of the image

2. Compute the smoothed curvature of the con-

tours.

3. Determine the local extrema, where the deriva-tive of smooth curvature equals zero, with

curvature value greater than a threshold.

4. Compute Convex/Concave Dominant Points at

which the interior angle is greater/less than

180 , by tracing the outer/inner contour of the

region as shown in Fig. 11.

5. Compute effective Convex (CDPcx) and Con-

cave (CDPce) dominant points, on outer and

inner contours respectively by logical AND

operation of the output in steps 3 and 4.

6. The CDPs (both CDPcx & CDPce) are moved

along the normal for a fixed number of iterations

(all the CDPs must move simultaneously) on the

respective contours.

7. A moving CDP will stop (freezes) only if ittouches another moving CDP or a point on the

same contour within a specified path distance

from it. For the outer contour, if the contour of

the segment touches the boundary of the image,

then respective CDPs are not freezed.

8. Trace back all the freezed CDPs and join the

pair of corresponding CDPs or the CDP and the

contour point using a line segment.

9. For each line segment obtained in step 8: form

two adjacent regions within a closed contour,

using the line as the new boundary.10. Merge the new pair of adjacent region, if they

have similar structural properties (orientation of

line segments near the CDPs).

11. Set a threshold and eliminate all the connected

components with area below the threshold.

Curvature Computation

A curve is represented in parametric form, where t is

the path length, x and y are the coordinates of the

contour.

rt xt; yt 9If there is more than one object, then outer contour

is traced for each object. If there is a child object

inside an object, we have to then trace the outer

contour for the child object as well.

Inner boundary pixels are extracted by tracing the

pixels at the inner contour in an object. A smoothing

of the contour with a Gaussian kernel is then needed

prior to the computation of the curvature, to overcomethe problem of discontinuities in derivatives needed

for curvature calculation (Pei and Lin 1992). The

smoothed contour is represented

xst xtG; yst ytG 10Figure 11ashows an image having one object with

two holes. The outermost pixels of the object are

traced to extract the outer contour and the boundary

of the holes gives the inner contours as shown inFig. 11 a Input image; b inner and outer contours

(a) (b)

Fig. 10 The results of (a) post-processing using the output

shown in Fig. 8b; b segment linking using the output shown in

(a)



14/25

Fig. 11b. Curvature is defined as the rate of change of

slope as a function of arc length t:

Kt dqtdt

11

where, (t) is the tangent to the curve at t. The

curvature is computed as (Bennamoun and Mamic2002)

Kst xs ys ys xs

xs2 ys23=212

The curvature obtained from Eq. 12 is smoothed

with a Gaussian kernel (Eq. 2) to obtain a smoothed

curvature, as given by the following equation:

Kst KtG 13Figure 12c shows the curvature plot of the image

shown in Fig. 12a. The smoothed curvature obtainedusing Eq. 13 is shown in Fig. 12d.

Extraction of Dominant Points

It has been suggested from the view point of the human

visual system (Bennamoun 1994) that the dominant

points have high curvature or the rate of change of

slope along the path length is high. In this paper, we

detect these points and use them to decompose theobject to remove the non-road structures. Dominant

points are points having a curvature value grater than a

threshold. Local extremas are defined by the points at

which the derivative of the curvature equals zero (Pei

and Lin 1992), as

K:

st dKstdt

0 14

which is equivalent to convolving the curvature with

the derivative of Gaussian and taking the zero cross-

ings of this operation. Figure 12e shows the localextremas for the input image in Fig. 12a.

(a) (b) (c)

(f)

(g) (h) (i)

(d) (e)

Fig. 12 a Input synthetic image; b smoothed contour; c curvature plot; d smoothed curvature; e local extremas; f effective CDPs

marked on the smoothed curvature in (d); g CDP marked on the smoothed contour; h contour normals at CDP; i segmented map of (a)



15/25

Convex Dominant points on the outer contour is

combined with local extremas using AND opera-

tion to give the effective CDPcx. Similarly Convex

Dominant points on the inner contour is combined

with the local extramas to get effective CDPce.

These points are then used to segment the non-road

parts from the given image. The CDPcx are movedinwards along the direction of its normal, where

as CDPce are moved outwards along the direction

of the normal. For a particular contour all the

CDPs (both CDPcx & CDPce) are allowed to move

simultaneously, and a CDP freezes only when it

touches another moving CDP in the same contour or

a point in the contour itself, which is within a

specified path length. The specified path length of

the moving CDP dictates the maximum perimeter of

the non-road region for the purpose of elimination.

All the freezed CDPs are traced back to their origins

and the corresponding CDPs or the CDP and the

contour point are joined using a line segment. The

effective CDPs on the smoothed contour in Fig. 12d

are shown in Fig. 12f. The same are marked on the

smoothed contour of Fig. 12 b, i n Fi g. 12 g.

Figure 12i show the results of region part segmen-

tation for the image in Fig. 12a.

Unlike Bennamoun algorithm (Bennamoun and

Mamic 2002) there is no necessity to freeze all the

CDPs and we only move the CDPs for a particular

number of iterations. Unfrozen CDPs are not takeninto account for segmentation. Now the regions fitted

with the new line segments are isolated as separate

components. Setting an area threshold, small noisy

non-road structures are eliminated. Figure 13 shows

the results of region part segmentation algorithm for

the images shown in Figs. 9b and 10b. It is observed

that the non-road regions have been eliminated

thereby improving the accuracy of road extraction

results (Fig. 13).

Experimental Results and Comparative Study

We now describe the results of experimentation usingour proposed framework. The performance of the

proposed method is verified on satellite images of size

512 512 each. The performance of the proposed

technique is compared with two state of the art

techniques: Tuncer (2007) and Mokhtarzade et al.

(2007), as well as a free commercial tool for feature

extraction (Geospace 2008), termed as FeatureObjeX.

FeatureObjeX (Geospace 2008) is a semi-

automatic system, where it allows the user to select

the training samples. Once the seed is created

intensity distributions are computed for a set of pixelsaround the seed, which are then used to fit a

multivariate normal distribution. Each seed region is

modeled by a Naive Bayes classifier (Duda et al.

2000). Then the likelihood of a given pixel is

computed with respect to each of the seed distribu-

tion. If the likelihood of a particular pixel is same or

greater than the likelihood of the seed, then that pixel

is classified as a target class. FeatureObjeX was used

to segment the image into road and non-road classes

using color features. Several configuration changes

were made in FeatureObjeX before the tests, to makeit more efficient and closer to our requirement for

working in road scenes over urban and suburban

environments.

Dataset Description and Performance Measures

We created a database for satellite images with 1-m/

pixel resolution from Wikimapia (Koriakine and

Saveliev 2006). The commercial cost of this type of

images is very expensive. We screen captured 100

images of Developed countries and 100 images ofEmerging countries that we considered useful for our

work. In our case the place and date were not very

critical, and the only characteristic that we were

looking for was the content of the images which had

views of highways and roads. For creating the dataset,

we consider selected sections (512 512 pixels) of

scenes from satellite images of 1 m/pixel resolution

acquired from Wikimapia (Koriakine and Saveliev

2006), which includes: (1) sub-urban and (2) urban

(a) (b)

Fig. 13 The results of region part segmentation for: a output

shown in Fig. 9b; and b output shown in Fig. 10b



16/25

areas from Developed and Emerging countries.

Figures 15a and 17a show three examples of images

from suburban areas in Developed and Emerging

countries, whereas Figs. 16a and 18a show three

examples of images from urban areas in Developed

and Emerging countries. For each image in the

dataset, ground truth (road) map was also obtainedusing a human operator. A portion of the dataset can

be downloaded from (Visualisation and Perception

Lab 2006). The categorization of the data into four

groups was done with the advice (based on visual

observation and geo-location) of a GIS expert. As the

data was distributed in four groups of 50 images each,

we trained four different P-SVMs with data (25

images) from each respective group. Rest (25 images)

was used for testing and performance analysis of the

output of our proposed multistage framework.

To assess the performance of road extractionsystem, the length of the extracted road network

(parameter obtained after morphological thinning)

that falls within a prespecified range with respect to

the reference road network is used for the calculation

of accuracy measures. The road segments in the test

sites are manually digitized to form the reference road

network. This subjectively obtained reference road

network, used to evaluate the proposed road extrac-

tion system, covers all roads present in the image.

Hence this is used as a ground truth to estimate the

accuracy measures for road extraction. Two measuresare used to evaluate the accuracy of the extracted road

network (Heipke et al. 1997), and these measures are

defined as follows: Completeness is defined as the

percentage of the reference data, which was detected

during road extraction:

completeness length of matched referencelength of reference

15

Correctness is represents the percentage of the

extracted road data, which is correct:

correctness length of matched extractionlength of extraction

16

Results and Discussion

Separate training of the P-SVM was necessary for the

four categories of image samples, as the spectral

characteristics exhibited for roads were different for

the four cases of our study. The road intensity and

contrast also varies between the four different types of

image samples. The proposed CSNN - based algo-

rithm iteratively shuttles between adding new and

removing redundant edge pixels, and hence inherently

produces a correction mechanism to the process of

fusion. Edge maps are obtained using the methoddiscussed in Section DSM Based Edge Detection.

The CSNN-CII algorithm requires the probability

values for all the pixels corresponding to each class

in an image. The initial probabilistic values and

segmented maps are obtained using the method

discussed in Section Segmentation Using Probabilistic

SVM.

In order to directly compare our approach with

the recently published results in (Tuncer 2007;

Mokhtarzade and Zoej 2007), we use a pair of

images published by them for evaluation. We willshow the results first on the images used in (Tuncer

2007) and (Mokhtarzade and Zoej 2007), and then

with a few examples from the testing dataset

acquired from wikimapia (Koriakine and Saveliev

2006) in Figs. 15, 16, 17 and 18. Figure 14a shows

two sample images used in (Tuncer 2007) and

(Mokhtarzade and Zoej 2007). Output of human

operator to detect roads from the two images are

presented in Fig. 14b. Figure 14c-I presents the

result published in (Tuncer 2007), while that in

Fig. 14c-II is taken from (Mokhtarzade and Zoej2007). The result in Fig. 14c-I shows that only roads

with rather large pixel widths such as the main

highways are recovered as thinned structure. Prom-

inent roads are recovered with good accuracy but

inner city roads which are narrow and road inter-

sections have not been recovered. Similarly for the

method presented in (Mokhtarzade and Zoej 2007),

false positives (non road structures) occur more

which reduces the correctness measure for this

method (see Fig. 14c-II). Some pixels belonging to

rooftops of buildings were falsely identified as roads.The completeness and correctness measures for the

given test images calculated for Tuncer (2007) and

Mokhtarzade et al. (2007) as well as our proposed

method are shown in Table 2. The completeness

measure in (Mokhtarzade and Zoej 2007) is higher

than that in (Tuncer 2007), as the true positives

(actual road parts) are detected more accurately.

Results of our proposed method are much better in

both the cases as shown in Fig. 14d. It can be



17/25

observed that our method outperforms both the prior

published work.

Figures 15 and 16 show the results obtained using

the proposed methodology on satellite images of

Developed countries, whereas Figs. 16 and 17 show

the results for Emerging countries. Figures 15b, 16b,

17b and 18b show the results of feature extraction

using the FeatureObjeX tool for the images inFigs. 15a, 16a, 17a and 18a respectively. Figures 15c,

16c, 17c and 18c show the results for the algorithm

proposed in (Tuncer 2007). Figures 15d, 16d, 17d and

18d show the extracted road segments from the input

satellite images using the technique presented in

(Mokhtarzade and Zoej 2007). Figures 15e, 16e, 17e

and 18e show manually plotted reference road layouts

from the respective input images. It can be observed

that the results of our proposed method, given in

Figs. 15f, 16f, 17fand 18fare significantly better than

other approaches and quite close to the groundtruth given in Fig. 15e, 16e, 17e and 18e. Our

system outperforms FeatureObjeX (Geospace 2008)

and other state of the art methods in all the cases.

The optimal values for the parameters used for

our proposed approach are given in Table 3, which

have been obtained empirically using a large set of

experiments.

Table 4 describes the comparison of accuracy

measures for the results presented in Figs. 15, 16, 17and 18 using the completeness and correctness

measures. From Table 4 it can be seen using the

completeness and correctness measure that our pro-

posed method outperforms the other techniques in

almost all the cases. In very few cases, the complete-

ness measure of the FeatureObjeX tool is marginally

better than our method. Tables 5, 6, 7 and 8 show the

average classification accuracy obtained by analyzing

images using the proposed method, FeatureObjeX

(Geospace 2008) and two state of the art techniques

(Tuncer2007) and (Mokhtarzade and Zoej 2007), over25 images in four different categories respectively.

Methods Completeness Correctness

(Tuncer 2007) (Fig. 14c-I) 82% 96%

Proposed (Fig. 14d-I) 100% 100%

(Mokhtarzade and Zoej 2007) (Fig. 14c-II) 92% 82%

Proposed (Fig. 14d-II) 96% 85%

Table 2 Performance of the

proposed approach and the

algorithm presented in

(Tuncer2007; Mokhtarzade

and Zoej 2007)

(a) (b) (c) (d)

(I)

(II)

Fig. 14 a Images presented in (Tuncer 2007) & (Mokhtarzade and Zoej 2007); b output of manual (hand-drawn) extraction; c results

reproduced from (I) Tuncer (Tuncer 2007) and (II) Mokhtarzade et al. (Mokhtarzade and Zoej 2007); d results of our proposed approach



18/25

(I) (II) (III)

(f)

(e)

(d)

(c)

(b)

(a)

Fig. 15 a Three satellite images of size (512 512), from a

suburban area of a developed region; b results from FeatureObjeX

(Geospace 2008); c results of the method proposed in (Tuncer

2007); d results of the method proposed in (Mokhtarzade and

Zoej 2007); e hand-drawn (manual) road map; f results of our

proposed method



19/25

(I) (II) (III)

(f)

(e)

(d)

(c)

(b)

(a)


urban area of a developed region; b results from FeatureObjeX




proposed method



20/25

(I) (II) (III)

(f)

(e)

(d)

(c)

(b)

(a)


suburban area of a emerging region; b results from FeatureObjeX




proposed method



21/25

(I) (II) (III)

(f)

(e)

(d)

(c)

(b)

(a)


urban area of a emerging region; b results from FeatureObjeX




proposed method



22/25

It is observed from the results shown in Figs. 15,

16, 17 and 18 and Tables 4, 5, 6, 7 and 8 that the

performance measure for our proposed algorithm is

superior than the other methods. The results obtained

using proposed methodologies are much superior to

the methods presented in (Tuncer 2007; Mokhtarzade

and Zoej 2007) and close to the manually drawn

reference road network. Compared to our preliminary

investigation in (Mirnalinee et al. 2009), the perfor-

mance in terms of completeness and correctnessmeasures have been enhanced significantly. The

region linking algorithm improves the completeness

measure whereas the region part segmentation

improves the correctness measure.

The correctness and completeness measures obtained

for scenes from emerging countries are in most cases

less compared to the scenes from the developed

countries. This decrease in accuracy for the scenes of

emerging countries is expected, since there are many

more opportunities for errors in these types of areas due

to the large numbers of linear non-road features, fourway crossings, non-linear road structures and unplanned

layouts. Comparing the results of developed urban and

suburban scenes, the performance of urban scenes is

low, because of the distortions. It is obvious that images

of urban areas exhibit a more complex structure than

scenes of suburban areas, as the number of different

objects and their heterogeneity is much higher in urban

scenes. Some of the roads comprise several lanes that

are linked by complex road crossings. Generally as

shown in Fig. 15, the extraction results for open

landscape areas are nearly complete and correct.

Suburban scenes of emerging countries are covered

by vegetation. Moreover the spectral response of roadsin these areas are on certain occasions similar to the

spectral response of open-fields and roof-tops, which in

turn increases the false positives thereby reducing the

correctness measure. Overall, our proposed method

outperforms the featureObjeX (Geospace 2008) and

the two state of the art methods (Tuncer 2007;

Mokhtarzade and Zoej 2007), for observations aver-

aged over images of 50 developed and 50 emerging

areas.

Conclusions

A novel and efficient method for automatically extract-

ing roads using low level information, directly from

satellite images based on region and edge integration

has been introduced and demonstrated. This new

method combines outputs of PSVM and DSM in such

a way that, it preserves the strong discriminative ability

of SVM while simultaneously exploiting the linear like

characteristics in the features derived using DSM. For

the determination of the discontinuity and elimination ofnon-road parts, two approaches were shown: the first

using several criteria concerning properties of the road

parts and their relations to each other. Segment linking

module solves the problem of discontinuity to some

extent, thereby increasing the completeness. Region part

segmentation and shape analysis based on elongated-

Road image type FeatureObjeX Tuncer Mokhtarzade Proposed

I II III I II III I II III I II III

Developed suburban A 97 84 100 98 82 97 98 68 95 100 94 100

Fig. 15 B 88 72 91 93 74 92 86 56 85 98 89 90

Developed urban A 85 96 97 75 92 91 65 66 76 92 100 99

Fig. 16 B 79 82 72 83 83 68 52 54 57 96 100 94

Emerging suburban A 96 83 94 73 62 92 61 51 87 88 96 95

Fig. 17 B 68 73 63 67 56 74 58 51 62 92 93 89

Emerging urban A 91 87 83 62 52 74 71 64 59 89 92 82

Fig. 18 B 74 71 75 72 51 61 57 58 58 83 92 85


system for the images

shown in Figs. 15, 16, 17

and 18

A: Completeness, B: Cor-

rectness

Table 3 Values of the parameters used in our proposed

approach

Road image type 1 2 N TE TA

Suburban 2 2.5 9 9 0.6 0.7 50

Urban 3 3.5 11 11 0.7 0.7 50



23/25


FeatureObjeX (Geospace 2008) 78% 60%

Tuncer (Tuncer 2007) 58% 52%

Mokhtarzade et al. (Mokhtarzade and Zoej 2007) 63% 52%

Proposed Method 85% 87%


system averaged over 25

images of urban scenes of

emerging countries








images of suburban scenes

of emerging countries



Tuncer (Tuncer 2007) 81% 65%Mokhtarzade et al. (Mokhtarzade and Zoej 2007) 62% 89%




images of urban scenes of

developed countries








images of suburban scenes

of developed countries



24/25

ness measure eliminates non-road parts and increases

the correctness. The results prove that the proposed

system is able to effectively extract major sections of the

road network, a few junctions and curved roads from

high-resolution satellite images.

It is observed that the road detection process

produces a high degree of accuracy especially forthe scenes of developed countries. In urban areas

however, only major roads with larger pixel widths

have been detected. Moreover, the presence of

buildings and other features similar to roads made

the extraction process somewhat more difficult com-

pared to the suburban case. Linking of discontinuous

segments, road junction detection and modeling of

shadows are issues to be addressed in future scope of

work for this problem. Vectorization of the extracted

road segments can also be a nice extension of this

work for GIS updates. The next step may include theformation of a road network by searching for

junctions connecting road segments. Results may

improve with the help of a road hypothesis verifica-

tion using parallelism of road boundaries and use of a

graph data structure to form a complete road network

representation.

References

Baumgartner, A., Hinz, S., & Wiedemann, C. (2002). Efficient

methods and interfaces for road tracking. In: Proceedings

of the ISPRS commission III Symp. Photogrammet.

Comput. Vision, pp. 2831.

Bennamoun, M. (1994). A contour based part segmentation

algorithm. In: Proc. of the IEEE ICASSP, pp. 4144.

Bennamoun, M., & Mamic, G. J. (2002). Object recognition

fundamentals and case studies. Springer.

Bigun, J., Granlund, G., & Wiklund, J. (1991). Multidimen-

sional orientation estimation with applications to texture

analysis and optical flow. IEEE Transactions on Pattern

analysis and Machine Intelligence, 13(8), 775790.

Bucha, V., Uchida, S., & Ablameyko, S. (2006). Interactive

road extraction with pixel force fields. In: IEEE The 18thInternational Conference on Pattern Recognition

(ICPR06), pp. 829832.

Chu, J., & Aggarwal, J. (1993). The integration of image

segmentation maps using region and edge information.

IEEE Transactions on Pattern Analysis and Machine

Intelligence, 15, 12411252.

Cooper, G., & Cowan, D. (2007). Enhancing linear features in

image data using horizontal orthogonal gradient ratios.

Computers and Geosciences, 33, 981984.

Cortes, C., & Vapnik, V. (1995). Support vector networks.

Machine Learning, 20(3), 273297.

Doucette, P., Agouris, P., Stefanidis, A., & Musavi, M. (2001).

Self-organized clustering for road extraction in classified

imagery. ISPRS Journal of Photogrammetry and Remote

Sensing, 55, 347358.

Duda, R., Hart, P., & Stork, D. (2000). Pattern classification.

Wiley Interscience.

Feng, X., & Milanfar, P. (2002). Multiscale principal compo-

nents analysis for image local orientation estimation. In:

Proceedings of The 36th Asilomar Conference on Signals,Systems and Computers, pp. 478482.

Geospace (2008). FeatureObjeX, http://www.pcigeomatics.

com/.

Granlund, G., & Knutsson, H. (1995). Signal processing for

computer vision. Boston: Kluwer Academic.

Gruen, A., & Li, H. (1995). Road extraction from aerial and

satellite images by dynamic programming. ISPRS Journal

of Photogrammetry and Remote Sensing, 50(4), 1120.

Haddon, J., & Boyce, J. (1990). Image segmentation by

unifying region and boundary information. IEEE Trans-

actions on Pattern Analysis and Machine Intelligence, 12

(10), 929948.

Haglund, L., & Fleet, D. (1994). Stable estimation of image

orientation. In: Proceedings of the First IEEE International

Conference on Image Processing III, pp. 6872.

Haralick, R., & Shapiro, L. (1992). Computer and robot vision.

Addison Wesley.

Heipke, C., Mayer, H., Wiedemann, C., & Jamet, O. (1997).

Evaluation of automatic road extraction. International

Archives of Photogrammetry and Remote Sensing,

pp. 4756.

Hinz, S., & Baumgartner, A. (2003). Multiview fusion of road

objects supported by self diagnosis. In: In Proceeding of

2nd GRSS/ISPRS Joint Workshop on Remote Sensing and

Data Fusion over Urban Areas, pp. 137141.

Hu, X., & Tao, V. (2007). Automatic extraction of main road

centerlines from high resolution satellite imagery usinghierarchical grouping. Photogrammetric Engineering and

Remote Sensing, 73(9), 10491056.

Hu, X., Zhang, Z., & Tao, V. (2004). A robust method for semi-

automatic extraction of road centerlines using a piece-wise

parabolic model and least square template matching. The

International Journal of Photogrammetric engineering

and Remote Sensing, 70(12), 13931398.

Jacob, M., & Unser, M. (2004). Design of steerable filters for

feature detection using Canny like criteria. IEEE Trans-

actions on Pattern Analysis and Machine Intelligence, 26

(8), 10071019.

Jiang, X. (2007). Extracting image orientation feature by using

integration operator. Pattern Recognition, 40, 705717.

Jin, X., & Davis, C. (2005). An integrated system for automaticroad mapping from high-resolution multispectral satellite

imagery by information fusion. Information Fusion,

pp. 257273.

Kass, M., Witkin, A., & Terzopoulos, D. (1987). Snakes: active

contour models. International Journal of Computer Vision,

1, 321331.

Koriakine, A., & Saveliev, E. (2006). Data, http://www.

wikimapia.org/.

Kumar, P., Das, S., & Yegnanarayana, B. (2000). One-dimensional

processing of images. In: International Conference on

Multimedia Processing and Systems, pp. 451454.

http://www.pcigeomatics.com/http://www.pcigeomatics.com/http://www.wikimapia.org/http://www.wikimapia.org/http://www.wikimapia.org/http://www.wikimapia.org/http://www.pcigeomatics.com/http://www.pcigeomatics.com/


25/25

Kurugollu, F., & Sankur, B. (1999). Map segmentation of color

images using constraint satisfaction neural network. In:

International Conference on Image Processing, pp. 236

239.

Lalit, G., Mangai, U. G., & Das, S. (2008). Integrating region

and edge information for texture segmentation using a

modified constraint satisfaction neural network. Image and

Vision Computing, pp. 11061117.

Laptev, I., Mayer, H., Lindeberg, T., Eckstein, W., Steger, C., &Baumgartner, A. (2000). Automatic extraction of roads

from aerial images based on scale space and snakes.

Machine Vision and Applications, 12(1), 2331.

Lin, W., Kuo, E., & Chen, C. (1992). Constraint satisfaction

neural networks for image segmentation. Pattern Recog-

nition, 25(7), 679693.

Lyvers, E., & Mitchell, O. (1988). Precision edge contrast and

orientation estimation. IEEE Transactions on Pattern

Analysis and Machine Intelligence, 10(6), 927937.

Majidi, B., & BabHadiashar, A. (2009). Aerial tracking of

elongated objects in rural environments. Machine Vision

and Applications, 20, 2334.

Mantero, P., Moser, G., & Serpico, S. (2005). Partially

supervised classification of remote sensing images through

SVM-based probability density estimation. IEEE Trans-

actions on Geoscience and Remote Sensing, 43(3), 559

570.

Mayer, H., Laptev, I., Baumgartner, A., & Steger, C. (1997)

Automatic road extraction based on multi-scale modelling,

context and snakes. In: International Archives of Photo-

grammetry and Remote Sensing, pp. 106113.

McKeown, D. (1996). Top ten lessons learned in automated

cartography.

Mena, J. B. (2003). State of the art on automatic road extraction

for GIS update: a novel classification. Pattern Recognition

Letters, 24(16), 30373058.

Miliaresisa, G., & Kokkasb, N. (2007). Segmentation andobject-based classification for the extraction of the

building class from LIDAR DEMs. Computers and Geo-

sciences, 33, 10761087.

Mirnalinee, T., Das, S., & Varghese, K. (2009). Integration of

region and edge based information for efficient road

extraction from high resolution satellite imagery. In: IEEE

Proceedings of ICAPR, Kolkata, India, pp. 373376.

Moigne, J., & Tilton, J. (1995). Refining image segmentation

by integration of edge and region data. IEEE Transactions

on Geoscience and Remote Sensing, 33, 605615.

Mokhtarzade, M., & Zoej, M. (2007). Road detection from

high-resolution satellite images using artificial neural

networks. International Journal of applied Earth Obser-

vation and Geoinformation, 9(1), 3240.Pal, M., & Mather, P. (2005). Support Vector Machines for

classification in remote sensing. International Journal of

Remote Sensing, 26(5), 10071011.

Pavlidis, T., & Liow, Y. (1990). Integrating region growing and

edge detection. IEEE Transactions on Pattern Analysis

and Machine Intelligence, 12, 225233.

Pei, S., & Lin, C. (1992). The detection of dominant points on

digital curves by scale space filtering. Pattern Recognition,

pp. 13071314.

Perona, P. (1998). Orientation diffusions. IEEE Transactions on

Image processing, 7(3), 457467.

Platt, J. C. (1999). Probabilistic outputs for support vector

machines and comparisons to regularized likelihood

methods. In: Advances in Large Margin Classifiers, MIT

Press, pp. 6174.

Qian, R., & Huang, T. (1996). Optimal edge detection in two-

dimensional images. IEEE Transaction on Image process-

ing, 5, 12151220.Raghu, P., & Yegnanarayana, B. (1996). Segmentation of

Gaborfiltered textures using deterministic relaxation. IEEE

Transactions on Image processing, 5(12), 424429.

Rizvandi, N., Pizurica, A., Philips, W., & Ochoa, D. (2008).

Edge linking based method to detect and separate

individual c. elegans worms in culture. In: DICTA,

pp. 6570.

Shi, W., & Zhu, C. (2002). The line segment match method for

extracting road network from high-resolution satellite

images. IEEE Transactions on Geoscience and Remote

Sensing, 40(2), 511514.

Strang, G. (2005). Linear Algebra and its application. Thomson

Brooks.

Theodoridis, S., & Koutroumbas, K. (2006). Pattern Recogni-

tion. Academic.

Tuncer, O. (2007). Fully automatic road network extraction

from satellite images. In: Recent Advances in Space

Technologies, pp. 708714.

Tupin, F., Houshmand, B., & Datcu, M. (2002). Road detection

in dense urban areas using SAR imagery and the

usefulness of multiple views. IEEE Transactions on

Geoscience and Remote Sensing, 40, 24052414.

Udomhunsakul, S. (2004). Semi-automatic road detection from

satellite imagery. In: IEEE International Conference on

Image Processing (ICIP), pp. 17231726.

Visualisation and Perception Lab (2006). http://www.cse.iitm.

ac.in/~sdas/vplab/downloads.html.Wei, W., & Xin, Y. (2008). Feature extraction for man-made

objects segmentation in aerial images. Machine Vision and

Applications, 19, 5764.

Xiao, Y., Tan, T., & Tay, S. (2005). Utilizing edge to extract

roads in high-resolution satellite imagery. In: IEEE

International Conference on Image Processing (ICIP), pp.

637640.

Yager, N., & Sowmya, A. (2003). Support vector machines for

road extraction from remotely sensed images. LNCS,

2756, 285292.

Yang, J., & Wang, R. (2007). Classified road detection from

satellite images based on perceptual organization. Inter-

national Journal of Remote Sensing, 28, 46534669.

Zhang, H., Xiao, Z., & Zhou, Q. (2008). Research on roadextraction semi-automatically from high resolution remote

sensing images. The International Archives of the Photo-

grammetry, Remote Sensing and Spatial Information

Sciences XXXVII (Part B):536538.

Zhu, C., Shi, W., Pesaresi, M., & Liu, L. (2005). The

recognition of road network from high-resolution satellite

remotely sensed data using image morphological charac-

teristics. International Journal of Remote Sensing, 26(24),

54935508.

http://www.cse.iitm.ac.in/~sdas/vplab/downloads.htmlhttp://www.cse.iitm.ac.in/~sdas/vplab/downloads.htmlhttp://www.cse.iitm.ac.in/~sdas/vplab/downloads.htmlhttp://www.cse.iitm.ac.in/~sdas/vplab/downloads.html

Documents

An Integrate Multistage Framework for Automatic Road Extraction From High Resolution Satellite Imagery