181
BUILDING MODEL RECONSTRUCTION FROM LIDAR DATA AND AERIAL PHOTOGRAPHS DISSERTATION Presented in Partial Fulfillment of the Requirements for The Degree Doctor of Philosophy in the Graduate School of the Ohio State University By Ruijin Ma ***** The Ohio State University 2004 Dissertation Committee: Approved by Rongxing Li, Advisor Alan Saalfeld _________________________________ Raul Ramirez Advisor Graduate Program of Geodetic Science

BUILDING MODEL RECONSTRUCTION FROM LIDAR DATA …twiki.cis.rit.edu/twiki/pub/Main/FengAndSteveSIMG... · BUILDING MODEL RECONSTRUCTION FROM LIDAR DATA AND ... A new method of 3D building

  • Upload
    hadiep

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

BUILDING MODEL RECONSTRUCTION FROM LIDAR DATA AND AERIAL PHOTOGRAPHS

DISSERTATION

Presented in Partial Fulfillment of the Requirements for

The Degree Doctor of Philosophy in the Graduate

School of the Ohio State University

By

Ruijin Ma *****

The Ohio State University

2004

Dissertation Committee:

Approved by Rongxing Li, Advisor Alan Saalfeld

_________________________________ Raul Ramirez Advisor

Graduate Program of Geodetic Science

© Copyright by

Ruijin Ma

2004

ABSTRACT

The objective of this research is to reconstruct 3D building models from imagery and

LIDAR data. The images used are stereo aerial photographs with known imaging

orientation parameters so that 3D ground coordinates can be calculated from conjugate

points; and 3D ground objects can be projected to image spaces. To achieve this

objective, a method of synthesizing both imagery data and LIDAR data is explored; thus,

the advantages of both data sets are utilized to derive 3D building models with a high

accuracy. In order to reconstruct complex building models, the polyhedral building model

is employed in this research. Correspondingly, the reconstruction method is a data-driven

oriented.

The general research procedure can be summarized as: a) building detection from

LIDAR data; b) 3D building model reconstruction; c) LIDAR data and imagery data co-

registration; and d) building model refinement. The main role of aerial image data in this

research is to improve the geometric accuracy of a building model.

The major contributions of this research lie in four aspects: 1) Two algorithms are

developed to perform LIDAR segmentation. Compared with the algorithms proposed by

other researchers, these two algorithms work well in urban and suburban areas. In

addition, they can keep fine features on the ground; 2) An algorithm of building boundary

ii

regularization is proposed in this study. Compared with the commonly used MDL

algorithm, it is simple to implement and fast in computation. Longer line segments have

larger weights in its adjustment process. This agrees with the fact that longer line

segments have more accurate azimuths provided that the accuracy of ending points are

the same for all segments; 3) A new method of 3D building model reconstruction from

LIDAR data is developed. It is comprised of constructing surface topology, calculating

corners from surface intersection, and ordering points of a roof surface in their correct

sequence; and 4) A new framework of building model refinement from aerial imagery

data is proposed. It refines building models in a consistent approach; and it utilized stereo

imagery information and roof constraints in deriving refined building models.

iii

Dedicated to my parents

iv

ACKNOWLEDGMENTS First of all, I would like to express my gratitude to my advisor, Dr. Ron Li, for his

support, patience, and encouragement throughout my graduate studies. Many

opportunities he gave me to practice have stimulated many of my interests and enabled

me to gain more experiences in my research field.

I am especially grateful to Dr. Raul Ramirez for his kind and all-aspect help. His

unreserved kindness has made the past years an ever-good memory in my life. His

supervision and continuous supports laid a smooth way for my studies and research work.

Special thanks also go to Dr. Alan Saalfeld. While serving on my dissertation

committee, he offered many constructive comments and suggestions. In addition, I

benefited enormously from taking his classes.

I am greatly thankful to Mr. Qian Xiao at Woolpert Inc. and Mr. Will Meyer in Harris

County, Texas, for their helps in providing experimental data for my research.

I wish to express my sincere appreciation to Dr. Kaichang Di. His comments and

suggestions to my dissertation are valuable. My gratitude should be extended to Dr. Tarig

Ali. Our friendship has led to many interesting and good-spirited discussions relating to

this research.

v

I also deeply appreciate my colleagues and friends both in GIS and Mapping Lab and

in Center for Mapping: Fengliang Xu, Xutong Niu, Dr. Mingjuan Huang, Dr. Lawrence

Spencer, Jue Wang, Leslie Smith, and many others.

Finally, I want to express my deep gratitude to my friend, Fang Ren, for her

emotional encouragement and support, and to my parents for their dedication and so

many years of support during my studies.

vi

VITA

April 27, 1974 . . . . . . . . . . . . . . . . . . . . . . . Born – Gaomi, Shandong Province, China

July 1996 . . . . . . . . . . . . . . . . . . . . . . . . . . . BS, Survey Engineering, Shandong

University of Science and Technology

June 1999. . . . . . . . . . . . . . . . . . . . . . . . . . . MS, Survey Engineering, Shandong

University of Science and Technology

August 2001. . . . . . . . . . . . . . . . . . . . . . . . . MS, Mapping and GIS The Ohio State

University

1999 – 2004. . . . . . . . . . . . . . . . . . . . . . . . . Research Assistant at The Ohio State

University

2004 – present. . . . . . . . . . . . . . . . . . . . . . . . Lecturer at SUNY Alfred

PUBLICATIONS

Research Publications

1. Di, K., R. Ma and R. Li, “Geometric Processing of IKONOS Stereo Imagery for

Coastal Mapping Application”, Journal of Photogrammetric Engineering & Remote

Sensing. Vol. 69 (8), pp. 873-879 (2003)

2. R. Li, K. Di and R. Ma, “3-D Shoreline Extraction from IKONOS Satellite

Imagery”, The 4th Special Issue on C&MGIS, Journal of Marine Geodesy. Vol. 26 (1/2),

pp.107-115 (2003) vii

3. Di, K., R. Ma and R. Li, “Rational Functions and Potential for Rigorous Sensor

Model Recovery”, Journal of Photogrammetric Engineering & Remote Sensing Vol. 69

(1), pp. 33-44 (2003)

4. Li, R., R. Ma and K. Di, “Digital Tide-Coordinated Shoreline”, Journal of

Marine Geodesy, Vol. 25, pp. 27-36 (2002)

FIELDS OF STUDY

Major Field: Geodetic Science Studies in:

GIS Mapping Photogrammetry and Remote Sensing

viii

TABLE OF CONTENTS

Abstract ...............................................................................................................................ii

Dedication ..........................................................................................................................iv

Acknowledgments...............................................................................................................v

Vita....................................................................................................................................vii

List of Figures ....................................................................................................................xi

List of Tables ...................................................................................................................xiv

Chapters:

1. Introduction and Problem Statement.............................................................................. 1

1.1 Motivation .........................................……………................................................ 1 1.2 Building Model and Model Reconstruction........................................................... 3 1.3 Peer research.......................................................................................................... 6

1.3.1 Reconstruction from LIDAR Data................................................................ 6 1.3.2 Reconstruction from Imagery Data............................................................. 10 1.3.3 Reconstruction from LIDAR, Imagery, and Other Auxiliary Data............. 16

1.4 Statement of Problem............................................................................................19 1.5 Research focus and Methodology.........................................................................22 1.6 Fundamental Concepts......................................................................................... 23

1.6.1 LIDAR vs. Photogrammetry....................................................................... 24 1.6.2 DTM vs. DSM............................................................................................. 28 1.6.3 Building Detection and Building Reconstruction........................................29

1.7 Dissertation Organization......................................................................................29

2. Building Detection From LIDAR Data......................................................................... 31

2.1 Conventional Terms............................................................................................. 33 2.2 DTM and DSM Generation.................................................................................. 34

2.2.1 Transformation from Point to Grid............................................................. 35

ix2.2.2 LIDAR Data Segmentation......................................................................... 37

2.2.2.1 Morphology Segmentation................................................................. 42 2.2.2.2 Planar-fitting Segmentation............................................................... 45 2.2.2.3 Height-jump Segmentation................................................................ 51

2.2.3 Comparison................................................................................................. 57 2.3 Building Detection from Normalized DSM......................................................... 59 2.4 Analysis and Conclusion...................................................................................... 65

3. Building Model Reconstruction.................................................................................... 67

3.1 Conventional Terms............................................................................................. 67 3.2 Boundary Extraction and Regularization.............................................................. 68

3.2.1 Line Simplification...................................................................................... 70 3.2.2 Boundary Regularization............................................................................. 75

3.3 Building Model Reconstruction............................................................................ 80 3.3.1 Roof Detection and Reconstruction............................................................ 84

3.3.1.1 Mean-shift Algorithm......................................................................... 86 3.3.1.2 Roof Reconstruction........................................................................... 93

3.3.2 Model Reconstruction................................................................................. 99

4. Building Model Refinement....................................................................................... 110

4.1 Co-registration of LIDAR and Aerial Photograph............................................. 112 4.1.1 3D Lines from LIDAR Data and 2D Edges from Photograph.................. 114 4.1.2 Image Resection from Linear Features...................................................... 115

4.2 Line Refinement in 2D Image Space.................................................................. 122 4.3 Reconstruct 3D Building Models with Refined Geometry................................. 128 4.4 Implementation................................................................................................... 133

5. Experiments and Results............................................................................................. 135

5.1 Data..................................................................................................................... 135 5.2 LIDAR Segmentation......................................................................................... 136 5.3 Building Reconstruction..................................................................................... 142 5.4 Building Model Refinement from Data Integration........................................... 147 5.5 Discussion........................................................................................................... 149

6. Conclusions and Future Research............................................................................... 152

6.1 Conclusions........................................................................................................ 152 6.2 Future Works...................................................................................................... 154

Bibliography................................................................................................................... 156

x

LIST OF FIGURES

Figure 1.1 Aerial photograph image geometry................................................................. 24

Figure 1.2 Stereo images and space intersection.............................................................. 25

Figure 1.3 Imaging geometry of a LIDAR system........................................................... 26

Figure 1.4 LIDAR height and reflectance data................................................................. 27

Figure 1.5 DTM and DSM................................................................................................ 28

Figure 2.1 Flowchart of DTM generation and building detection from LIDAR data...... 32

Figure 2.2 Conversion from points to grid........................................................................ 35

Figure 2.3 View of DSM from LIDAR data..................................................................... 37

Figure 2.4 A profile of LIDAR DSM............................................................................... 39

Figure 2.5 Morphology filter results................................................................................. 44

Figure 2.6 3D visualization of DTM and NDSM............................................................. 44

Figure 2.7 Flowchart of DTM generation from plane-fitting segmentation..................... 46

Figure 2.8 Planar surface conditions in classification...................................................... 48

Figure 2.9 Planar-fitting results........................................................................................ 49

Figure 2.10 3D visualization of planar-fitting DTM and NDSM..................................... 51

Figure 2.11 Flowchart of DTM generation from height-jump segmentation................... 52

Figure 2.12 Object height and topographical difference.................................................. 54

Figure 2.13 Height-jump segmentation results................................................................. 55

Figure 2.14 3D visualization of height-jump DTM and NDSM....................................... 56

Figure 2.15 Objects detected from height constraint........................................................ 60

Figure 2.16 Buildings detected from size constraint........................................................ 62

Figure 2.17 Separating buildings and trees from planar-fitting difference....................... 65

Figure 3.1 Extracted building boundaries......................................................................... 70

Figure 3.2 Line simplification using “sleeve” algorithm.................................................. 71

Figure 3.3 Line simplification using refined “sleeve” algorithm..................................... 74

Figure 3.4 An example of line simplification................................................................... 75

xi

Figure 3.5 Flowchart of boundary regularization............................................................. 76

Figure 3.6 An example of boundary regularization.......................................................... 79

Figure 3.7 Regularized building boundaries with DSM................................................... 79

Figure 3.8 Slope, aspect, and normal derived from DSM................................................ 85

Figure 3.9 Normal divergences in level surface............................................................... 85

Figure 3.10 Mean-shift in one dimension domain............................................................ 90

Figure 3.11 Feature space before applying mean-shift filtering....................................... 92

Figure 3.12 Feature space after applying mean-shift filtering.......................................... 92

Figure 3.13 Normal data calculated using different windows.......................................... 94

Figure 3.14 Normal data before and after applying mean-shift filtering.......................... 95

Figure 3.15 3D visualization of the X component from mean-shift filtering................... 96

Figure 3.16 Roof classification and extraction................................................................. 97

Figure 3.17 Point-in-polygon analysis.............................................................................. 98

Figure 3.18 Construct roofs and building boundary topology........................................ 100

Figure 3.19 Numbering roofs and vertical walls............................................................ 101

Figure 3.20 Reconstructed building corners................................................................... 105

Figure 3.21 Ordering roof polygon points...................................................................... 107

Figure 3.22 An example of reconstructed surface topology........................................... 109

Figure 3.23 An example of reconstructed 3D building model........................................ 109

Figure 4.1 Updating 2D lines in stereo images............................................................... 111

Figure 4.2 Co-registration of LIDAR and aerial images................................................ 113

Figure 4.3 Co-planarity of 2D and 3D line..................................................................... 115

Figure 4.4 A projected building model onto stereo images............................................ 123

Figure 4.5 Building image and detected edge pixels...................................................... 124

Figure 4.6 Detected pixels for line refinement............................................................... 126

Figure 4.7 Searching pixels and refined 2D line segments............................................. 128

Figure 5.1 Experimental data.......................................................................................... 136

Figure 5.2 The ground region detected from LIDAR segmentation............................... 138

Figure 5.3 Detected non-ground objects and buildings.................................................. 139

Figure 5.4 Regularized building boundaries with DSM................................................. 140

Figure 5.5 An example of boundary regularized............................................................ 141

Figure 5.6 Normal data after filtering and extracted roofs............................................. 143

xii

Figure 5.7 An example of 3D building models from LIDAR data................................. 145

Figure 5.8 A subset of reconstructed building................................................................ 146

Figure 5.9 3D visualization of reconstructed building models....................................... 146

Figure 5.10 Discrepancy among duplicated corners....................................................... 147

Figure 5.11 A refined model with consistency............................................................... 148

Figure 5.12 Deviations of reconstructed models from actual objects............................. 150

xiii

xiv

LIST OF TABLES

Table 3.1 Adjacency matrix of roofs and vertical walls................................................. 101

Table 3.2 An example of point-surface matrix............................................................... 105

CHAPTER 1

INTRODUCTION AND PROBLEM STATEMENT

1.1. Motivation

In the past few years, virtual city models have been used more and more in research

activities due to a great demand from a variety of users. A virtual city model can be

utilized in urban planning, cartography, architecture, environmental planning,

telecommunication, and tourism. One of the most challenging parts in building a virtual

city model is building model reconstruction. According to the survey conducted by The

European Organization for Experimental Photogrammetric Research (OEEPE), the most

interesting part within a virtual city model is 3D building data, which is superior to traffic

network data; the survey also showed that photogrammetry is the only economical

approach to acquire 3D city data (Förstner, 1999). Building extraction, especially in

urban areas, is one of the major problems in image understanding and photogrammetry

(Elaksher, et al., 2002).

Photogrammetry is the primary approach for cartographic and GIS production at

present. Experts from both computer science and photogrammetry are working together

for new applications. At the same time, new sensors are being developed and bring new

1

technologies into photogrammetry community, such as SAR (Synthetic Aperture Radar)

and LIDAR (LIght Detection And Ranging).

Introduced in the 1980s, LIDAR technology has continued to draw great attention

from researchers; and many commercial systems were already available by the mid-

1990s. LIDAR is now a mature technology, and it is widely used in flood control

applications, forestry applications, cartography, and other disciplines. Unlike traditional

photogrammetric sensors, a LIDAR system measures 3D coordinates of ground points

and makes it easy to automatically derive a digital surface model (DSM).

Photogrammetrists and computer scientists are demonstrating more and more interest

in building extraction and reconstruction. Much research has already been conducted

using imagery data captured from aerial platforms or ground platforms. People interested

in building extraction have begun using LIDAR technology due to its great degree of

automation for deriving a DSM from 3D ground coordinates. In reality, many building

reconstruction research studies based on imagery data began with the process of a DSM,

which was generated from stereo images. The major contribution of DSM in these studies

is to provide building candidates. In other words, the building detection usually is

achieved from a DSM. Many research works have been reported regarding LIDAR data

processing, among which building detection and reconstruction are two major issues

addressed by many experts. Despite the accomplishments achieved by researchers, there

are still many problems unsolved in this discipline. The objective of this research is to

find an approach, or methodology, to perform building model reconstruction from both

2

LIDAR data and aerial imagery. In this chapter, relevant research activities will be

reviewed and unsolved problems will be highlighted.

1.2. Building model and model reconstruction

Buildings in the real world have a great variety of forms. In the building research

community, a commonly accepted definition of building models is not available. People

are using their own models for research. Among those studies on building reconstruction,

there are roughly two kinds of building models defined (Förstner, 1999; Maas and

Vosselman, 1999). The first one is the parametric model and the second one is the

generic model. Because the generic model is too abstract, some sub models are proposed

such as the prismatic model, the polyhedral model and the CSG (Constructive Solid

Geometry) model (Förstner, 1999; Haala and Hahn, 1995; Wang, 1999). In reality, the

classification of building models is closely tied with the methods for model

reconstruction. Methods for building model reconstruction are generally classified as

model-driven, data-driven, and CSG methods. Generally speaking, a model-driven

approach deals with parametric building models and a data-driven approach deals with

generic models. A CSG approach is a hybrid of the other two approaches. In this section,

building reconstruction research will be reviewed according to these three major

reconstruction methods, namely, the model-driven approach, the data-driven approach,

and the CSG approach.

Model-driven method has a finite set of fixed building models. A building

reconstruction system using this method has a building model database. Each model in

3

the model base performs as a hypothesis in building reconstruction. Such a hypothesis

should be tested and verified from data. Several algorithms and strategies have been

developed to verify a building model hypothesis based on several kinds of information

derived from data. This approach is called the model-driven approach because it starts

from a model as a hypothesis, and it uses data to verify the model. This reconstruction

schema is easy to understand and to implement, but it can only handle simple building

models such as flat-roof and gable buildings. As mentioned above, actual buildings in the

world appear in a variety of forms and a model-driven system cannot model all kinds of

buildings in its model database. Some experimental systems have demonstrated very

good results in reconstructing simple buildings using this method.

The second approach of reconstruction is the data-driven method. This method deals

with generic building models, which are comprised of a series of building surfaces. The

data-driven approach usually follows three steps: 1) extraction of building primitives (the

surfaces of a building); 2) reconstruction of surface topology; and 3) construction of a

building model. This method doesn’t assume fixed building structures, thus it can handle

all kinds of buildings theoretically because buildings in the real world can be represented

as a set of primitives, regardless of whether they are planar facets or curved facets.

Compared with systems using parametric models, a system using generic models is more

difficult to implement.

Since the generic model is very complex and abstract, some sub models are proposed

based on specific distinctions (Förstner, 1999). They are prismatic models, polyhedral

models, and CSG models. Among these, the polyhedral model is the most important one.

4

It assumes a building is bounded by planar surfaces. This assumption is true for the

majority of actual buildings. In this research, a CSG model will be treated as a hybrid of a

parametric model and a generic model instead of being a sub model of the generic model.

A CSG model divides a complex building model into several simple primitives. Each

primitive is stored in a model base. The primitives play the same roles as the models in

the model-driven method. The critical procedure for a CSG reconstruction method is the

division of a complex building model into primitives. Sometimes it may generate

primitives not existing in the model base.

Building reconstruction systems can be distinguished as semi-automatic and

automatic systems depending on the degree to which a system user need to interact with

the system to guide it in finishing a project (Weidner and Főrstner, 1995; Forstner, 1999).

Automatic systems are still in the stage of proposal because there are many problems

unsolved yet. Semi-automatic systems have obtained great progress and generated

promising results because users can guide the systems to a reasonable direction and

discover results. User interaction can solve or avoid problems that cannot be solved by

computer itself.

The LIDAR research community is becoming very active in building reconstruction

(Maas and Vosselman, 1999; Maas, 1999a, 1999b, 1999c; Stamos and Allen, 2000;

Alharthy and Bethel, 2002; Haala and Hahn, 1995). One important reason is that

producing a DSM from LIDAR data is more easily automated than producing a DSM by

traditional photogrammetric techniques. Many research works on building reconstruction

begin with the process of a DSM, which is either obtained from LIDAR points or

5

imagery data. LIDAR data has the advantage in deriving a DSM over imagery data,

especially in a poor context situation. Generally, LIDAR data has a vertical accuracy of

25-30 centimeters and even as close as 15 centimeters; its horizontal accuracy varies

according to horizontal resolutions.

1.3. Peer Research

Building reconstruction involves two procedures. One is building detection, and the

other is 3D model reconstruction. Many studies have been carried out using different

types of data. The majority of this kind of research was conducted on aerial photographs.

In this section, the related research will be reviewed and analyzed.

1.3.1 Reconstruction from LIDAR data

Many researchers have reported their works on LIDAR data processing recently.

These works mainly include bald DTM generation, DSM generation, and building

reconstruction. Building reconstruction can be started from original LIDAR point data

(Maas and Vosselman, 1999) or from a grid DSM interpolated from LIDAR point data.

Generally, there are four steps involved to reconstruct building models from LIDAR data

(Alharthy, and Bethel, 2002; Axelsson, 1999; Elberink et al., 2000; Maas, 1999a, 1999b,

1999c):

• Data segmentation to distinguish LIDAR points falling onto a different object,

particularly points falling onto the ground, and points falling onto non-ground

6

objects such as buildings and trees. This work can be accomplished using

image filtering algorithms such as morphological filters;

• Building detection to differentiate building points from non-building points

(mainly points on trees) from the extracted non-ground points in LIDAR

segmentation. This task can be accomplished by computing and comparing

the region size, shape and elevation variance. Some LIDAR systems can

record the reflectance from objects together with the range recording. Thus,

reflectance data can also be used in this classification. Other auxiliary data

such as hyper-spectral data can also be used to help building detection

(Elberink and Maas, 2000; Alharthy and Bethel 2002; Mass, 1999a; Haala and

Brenner, 1999);

• Building reconstruction to generate 3D building models. In this procedure,

either a parametric model or a generic model can be used based on prior

knowledge of buildings. Some primitives are extracted here such as lines and

planes depending on how a building model is represented;

• Model refinement to improve model accuracy. For generic models, this task

involves plane combination, topology and geometry analysis. Due to poor

morphologic quality of LIDAR data, some algorithms or strategies are

employed to refine a building model. An important objective is to get sharp

and regular boundaries, typically rectangular boundaries for a building model.

Those algorithms usually use internal building characteristics like parallelism,

othogonality, symmetry and so on.

7

LIDAR segmentation is a major issue in LIDAR data processing. Several algorithms

have been developed to perform the classification of LIDAR points. To distinguish

ground points from LIDAR point data, morphology filters can be applied based on the

assumption that the ground point height is lower than its neighboring object points.

Another assumption is that the ground is smooth. In other words, there is no abrupt

elevation change on the ground. Some studies on this separation have been carried out

and promising results were produced (Weidner and Förstner, 1995; Morgan and Habib,

2002). However, morphology filters are sensitive to noise. Although a median filter can

be used to decrease the effects from a single error point, the effects from errors in a form

of a point patch cannot be eliminated or decreased. Kilian et al. (1996) used a “multi-

level opening” morphology operator in order to keep small ground features while

removing large non-ground objects. Because small windows have small weights in his

method, small features can still be removed.

“Linear prediction” is a statistical interpolation method. It is employed in LIDAR

data segmentation by researchers to generate digital surfaces (Lohmann and Koch, 1999;

Lohmann, et al., 2000; Kraus and Pfeifer, 1998; Lee and Younan, 2003). Vosselman

(2000) proposed a slope-based method to filter out non-ground points. It is a modification

of morphology erosion operator. Sithole (2001) modified this method to use different

maximal slope thresholds according to local terrain characteristics. Several other studies

have also been conducted to perform LIDAR segmentation (Matikainen, et al., 2003;

Rottensteiner and Briese, 2002; Lohmann, 2001; Brunn and Weidner, 1997; Axelsson,

1999; Haala and Brenner, 1999; Schiewe 2003).

8

Owner
Highlight

After building regions are detected, a 3D building model can be reconstructed from

the LIDAR points falling within the detected building regions. Mass et al. (1999)

reported their works using raw LIDAR data. In one method they used, invariant moments

were applied to reconstruct parametric building models. They concluded that high order

invariant moment can be used to derive complex building models but these moments are

sensitive to noise. In their experiment, the 1st and 2nd order moments were used to derive

gable building models including dorms on building roofs. They also employed the

generic model (polyhedral model) to reconstruct buildings. The building planar facets of

a building were detected first using a clustering algorithm. To detect roof facets, a 3D

Hough transformation was performed on a Delauney Triangulation mesh generated from

building roof LIDAR points. The LIDAR data they used has a density of 5 points/m2.

They assumed that the point distribution is homogeneous in order to use invariant

moments. Inhomogeneous point distribution will introduce biases into the derived

building models.

Some recent LIDAR systems are capable of capturing multi-pulse information,

especially the first and the last pulses. Alharthy and Bethel (2002) reported their works

conducted on the first and the last pulse laser scanner data. They obtained sound results in

separating vegetation/trees from buildings using these two pulse reflection data because a

building has no or very low reflection in the last pulse while a tree area has a high

reflection due to laser penetration. Other objects like cars were eliminated based on

height and size thresholds. For computation convenience, they calculated the major-

minor directions for a building region using the cross-correlation between a building

9

region and a template; then the building region was rotated to have a horizontal/vertical

pose.

LIDAR data has special characteristics and it needs some particular methodologies to

process. For general LIDAR post-processing, Tao and Hu (2001) gave an overview of

commonly used algorithms. Point densities of LIDAR data used by researchers in the

building reconstruction community are very high; usually the studies were carried out

using LIDAR data of a density of approximate 4 points/m2 in order to get fine building

models. Thus, the cost of LIDAR data acquisition and data processing will be high.

Another disadvantage of LIDAR data is its poor morphological quality; it cannot capture

sharp linear features such as building boundaries. The consequence is that it is difficult to

get high accuracy building models only from LIDAR data if its point density is not high.

1.3.2 Reconstruction from imagery data

Plenty of studies on building reconstruction using imagery data have been reported.

Researchers have explored methods to reconstruct building models using different image

data sources. Basically, research methods using imagery data to reconstruct building

models can be differentiated as methods using monocular images, methods using stereo

images, and methods using multi-images.

Monocular imagery

Monocular imagery is usually used for building detection rather than building

reconstruction. Although some studies have been conducted to reconstruct 3D building

models from monocular imagery, the reconstructed building models are quite simple.

Generally, there are two commonly used clues in monocular imagery related research:

10

Owner
Highlight
Owner
Highlight

building shadows and vertical walls. These two clues are very useful in detecting

buildings and in verifying building hypotheses. In order to reconstruct 3D building

models, some auxiliary information is necessary such as the sun angle and the flight

height. Lin et al. (1995) used a perceptual grouping approach to generate, select, and

verify a building hypothesis. They processed oblique view images in order to use vertical

walls as detection clues and verification criteria in their experiments. They extracted

edges and grouped them to form parallel pairs as primitives for building detection. Their

further studies were focused on model verification and error correction (Nevatia et al.,

1997). They built a system through which system users can interact with it so that users

can guide the system to produce results qualitatively and quantitatively. A qualitative

interaction indicates a problem, whether it is a missing building or a falsely detected

building. A quantitative interaction performs spatial or geometric corrections. They

concluded that user interaction could dramatically improve the accuracy of system

outputs. Xu et al. (2002) reported their works using a Hopfield Neural Network reasoning

in building reconstruction. A Gabor filter was employed to eliminate noisy edges; and

then a Normalized Central Contour Sequence Moment (NCCSM) was used to pick up

regular contours, which are building boundary candidates. After the regular contours

were generalized using Hough Transformation, a Hopfield Neural Network was applied

to reconstruct building models. The algorithm was tested on flat-roof and gable buildings.

The sun angle and the approximate flight height related to the image under study are

needed in their experiments.

11

McGlone and Shufelt (1994) used a projective imaging geometry to extract

buildings and to estimate building parameters from monocular aerial images. They

calculated vanishing points using the Gaussian Sphere technique to detect horizontal

and vertical lines that are candidates for building edges. They detected corners from

perpendicular lines; then they used perpendicular line pairs to form boxes, which are

building hypothesizes to be confirmed using clues. By geometric consistency

checking, they eliminated some false hypotheses. For surviving hypotheses, they

estimated shadow intensity to verify building models. The height of a building model

was estimated from its roof points in object space.

These methods using monocular imagery usually assume that the surrounding

ground of a building is flat and level. This assumption is very strict. It cannot deal

with the occlusion problem either. In addition, it can only reconstruct simple building

models such as flat-roof and symmetric gable buildings.

Stereo and multiple images

For methods using stereo images, researchers usually follow a procedure of

building detection, and then model reconstruction. Weidner and Főrstner (1995) used

stereo images to construct high resolution DSMs. From this DSM, they performed

image segmentation using gray-scale morphology operators to detect building

boundaries. The geometric constraints in the form of a parametric building model

were applied in their experiments. They also developed a new MDL-based (Minimum

Description Length) approach to regularize the polygonal ground building footprint.

12

They used the parallelism and perpendicularity characteristics of a building boundary

to eliminate noises introduced by the conversion from raster to vector.

Haala and Hahn (1995) reported their studies using stereo images. A DHM

(Digital Height Model) was generated from stereo images; then buildings were

initialized from the DHM as the regions with a local height maximum. 3D line

segments were generated from matched 2D stereo image edges. They used parametric

building models in their research. Building models were compared with extracted 3D

line segments. They can estimate the parameters of a building model by minimizing

the distances between model lines and the calculated 3D lines from stereo images.

The problem of this method is the poor morphology quality of the stereo-image

derived DHM. It will introduce errors into building models.

Elaksher and Bethel (2002) used multiple images to extract 3D building wire-

frames with a robust multiple image line-matching algorithm. They intended to

overcome the occlusion problem existing in stereo image processing. The image

regions from segmentation were classified into regions based on region shape, size

and spectral information. Roof regions are then matched among multi images pair-

wise using the Scott and Longuet-Higgins algorithm (Scott and Longuet-Higgins,

1991; Pilu and Lorusso, 1997).

Elaksher (2002) elaborated his research on building reconstruction in his Ph.D.

dissertation. He used multiple images in reconstructing building models. The

primitives he used for building reconstruction are homogeneous regions because

region matching is more robust than point and line matching. He extracted

13

homogeneous regions using the split-and-merge methodology from each image; and

then he matched these homogeneous regions pair-wise among multiple images. One

major objective of his research is to solve the occlusion problem in object

reconstruction using the photogrammetry technology. A neural network algorithm

was employed to distinguish roof regions from these extracted image regions using

the indices he derived, which are height and linearity. The height information came

from a high accuracy DSM. He projected the DSM into the image space; then he used

the height information in roof region classification. Another criterion for roof region

classification is the linearity of region borders. The linearity indicates a percentage of

the points that can be represented by a linear segment. Finally, he used geometric

constraints to merge adjacent corners or points on roofs in order to obtain a correct

building model topology.

Nevatia et al. (1997) also used multiple images in their research. But they

reconstructed building models from a single aerial photograph. Reconstructed

building models were then projected to other aerial photographs so that they could

verify and refine these building models. They reconstructed building models from

each of the multiple images, and then verified and integrated the models using

information from other photographs.

There are several other research works using stereo or multiple images for

building reconstruction. Zimmermann (2001) used multiple clues to isolate, locate

and identify buildings. The clues include color, edge, textures and a DSM. Brunn

(2001) extracted buildings using statistical methods. He detected and reconstructed

14

building models using a Bayesian Network, then he refined the models using the

Markov-Random-Fields. Scholze et al. (2001) used high-resolution color stereo

images to reconstruct building models based on a polyhedral model. 3D lines were

grouped into plane patches and the Bayesian technique was used to generate and

verify building hypothesizes. They employed a bootstrap strategy to iteratively

generate, verify and improve hypothesis from those plane patches that passed the

Bayesian test, until a complete building model was found the hypothesis. Fuchs

(2001) proposed a structural approach for building reconstruction aiming at dealing

with mid-level 3D features in a unified framework. The roof shapes were represented

using an attributed relational graph. Frère et al. (1997) reconstructed polyhedral

building models from multiple images. The primitives they used for reconstruction

were homogeneous regions. Spreeuwers et al. (1997) used a model driven approach

for building reconstruction. Building models were compared and verified based on

the clues extracted from multiple images.

Basically, the methodology of building reconstruction based solely on imagery

data has a low degree of automation. Systems based on this methodology still need

many guides or interactions from users in order to get accurate and robust results. The

model-driven approach is better than the data-driven approach because the former has

more prior knowledge of a building model.

The research studies mentioned above usually assume some conditions such as

homogeneous point distribution, level and flat surrounding ground, and no abrupt

height changes on roofs. In addition, the reconstructed building models usually are

15

simple building types. In order to make a system handle more complex building

models, some auxiliary data should be used.

1.3.3 Reconstruction from LIDAR, imagery and other auxiliary data

It is believed that the synergy of data from different sources will give more

effective information than the sum of data (Csathó et al., 1999; Schenk, 2002). The

two technologies, LIDAR and photogrammetry, are treated by researchers as

complementary to each other (Baltsavias, 1999). The integration of both technologies

is believed to lead to more accurate and complete products (Baltsavias, 1999). But

currently, there is no hardware integration to simultaneously capture LIDAR data and

imagery data that have the same accuracy level as traditional aerial photographs.

Some research works tried to integrate LIDAR data, imagery data and GIS data from

different sources for building reconstruction. How to fuse or integrate the data is an

important and active research topic.

Haala and Brenner (1999) reported their works on building and tree extraction in

urban areas using both LIDAR data and imagery data. They used multi-spectral

imagery data and LIDAR data to classify buildings, trees and grass-covered areas;

then they used the LIDAR data and building ground plan data to reconstruct building

models. The ground plan data has the basic information of a building, especially the

boundary information. They assumed that the ground plans are correct and exactly

defines the boundary of building roofs. This assumption would not work in cases

where the ground plan data does not match the LIDAR data.

16

Stamos and Allen (2000) reconstructed building models using LIDAR data and

images. Both data sets were obtained from ground platforms. LIDAR data was

segmented to identify planar facets. Linear features were extracted from both range

data and images. These linear features were used to co-register images with LIDAR

data. Imagery data was projected to 3D building model. Using this approach, they

built a geometric and photogrammetric 3D scene. Because the LIDAR data they used

are very dense and highly accurate, fine linear features can be extracted directly from

it.

Mcintosh and Krupnik (2002) presented their research work on generating

accurate surface models. A laser-derived DSM has poor textural and structural

information. Thus, they tried to improve the DSM quality from image information.

They extracted 3D line segments from stereo image processing; and they registered

the 3D line segments with the DSM derived from laser data. Those 3D lines acted as

discontinuity lines and were used to improve the surface model. The DSM was

improved from a TIN model, which was generated from laser point data using the 3D

line segments as break lines. They provided a good example of data fusion for

LIDAR data and imagery data.

Vosselman and Suveg (2001) used ground plan data and LIDAR data to

reconstruct building models. They decomposed a building ground plan into polygon

segments. Each segment indicates a planar facet of a building roof. Different segment

composites were tested. Each segment was used to extract LIDAR points. The

parameters of a planar surface can be derived from the extracted points using the least

17

square method. The topology of these planes was analyzed and intersection lines were

derived, as were the corners. The planar surfaces from each segment were analyzed

and tied together using CSG operators such as the union operator and the intersection

operator. They also used ground plan data and aerial images to reconstruct building

models. The second approach is more model-driven oriented. For each segment, three

building hypothesizes were generated. 3D lines were extracted from stereo pairs with

the constraints from the ground plan data. The gradients of images were calculated.

The edges of building hypothesizes were projected onto images. Visible projected

edges on images were compared with the gradients to verify the hypothesis and to

compute building model parameters. They also pointed out that the fusion for LIDAR

data and image data will be very promising due to the fact that two data sets are very

complementary.

Csathó et al. (1999) proposed a theoretical framework of data fusion for aerial

images, LIDAR data, and other multi-sensor images in order to obtain more

information for object recognition, especially building reconstruction. The fusion can

be performed at different levels, namely the data level, the feature level and the object

level. Multi-spectral data can be classified and split into different class regions; and

region boundaries can be extracted. Surfaces will be constructed from LIDAR data

using a perceptual organization methodology. Edges extracted from stereo aerial

images can be used as discontinuity lines to match LIDAR surfaces. Thus, a LIDAR

surface can be improved. Objects will be extracted from the surfaces and multi-

spectral classes. They proposed that objects could also be analyzed and integrated,

18

which is a kind of fusion at the object level. Schenk and Csathó (2002) fused LIDAR,

aerial images, and hyper-spectral images for object recognition. Similar results were

also reported by Schenk (2002). The hyper-spectral data he used is AVIRIS (Airborne

Visible/Infrared Imaging Spectrometer). Seo (2002) used LIDAR data to extract

contours; and he classified these contours to distinguish building regions. He used

point, line and region as feature primitives and compared the features from different

data sets.

Data fusion for LIDAR data and aerial images is usually performed to take

advantages of both height information and context information. Actually, a great

percentage of stereo-image based studies in building reconstruction used a DSM,

which was generated from stereo images. LIDAR can provide much better quality

height information and it is easier to be processed automatically compared with the

photogrammetric approach. Comparably, photogrammetry can provide much better

surface discontinuities (or break lines). A fusion involving other GIS data like

building ground plans is also very helpful as demonstrated from reported

experiments. Generally, integration of data from different sources can provide an

effective or even more efficient approach for building reconstruction.

1.4. Problem statement

The ultimate objective of building reconstruction is to automatically reconstruct

building models. The data used can be LIDAR data, imagery data, and other auxiliary

data. Despite the achievement from the active research in the last two decades, there are

19

still many problems that should be solved before an automatic system can be

accomplished. The major problems can be summarized as followings:

1. Monocular image based approaches usually cannot deal with complex building

models. They cannot deal with a scene with high relief because shadow clues will

introduce great errors in building hypothesis verification. In addition, vertical

walls are not always available to be used as clues. Urban areas are not suitable for

applying the monocular approach. The accuracy of reconstructed building models

is very low because of little information is available in one single image than

stereo and multiple images.

2. In urban areas, the occlusion problem is not fully solved by researchers regardless

of whether a monocular image based or a stereo image based approach is used.

Multi-image based approaches could be an alternative to overcome the occlusion

problem as demonstrated in some experiments, but it will increase the expense of

data acquisition and processing. Still, complex building models are difficult to

reconstruct.

3. LIDAR data has high vertical accuracy at the level of 15-30cm or even better, but

its horizontal accuracy is at meter level depending on specific applications. The

majority of the works on building reconstruction addressed by researchers used

high-density LIDAR data, usually around 4 points per square meter. This

increases the expense for data acquisition. LIDAR data has poor structure or

texture information, thus it is difficult to extract accurate sharp boundaries of

objects solely from LIDAR data.

20

4. Aerial photographs generally provide higher horizontal accuracy than LIDAR

data (Ackermann, 1999). It can provide plenty of texture and structure

information about buildings and accurate edges can be extracted from imagery

data. But building boundaries are usually not complete due to poor contrast of

optical reflectance from adjacent but different objects. Furthermore, processing of

image data is difficult to be achieved automatically. A problem with imagery data

based approaches is that building detection is an expensive and low-accuracy task.

5. Complex building models are not fully investigated yet. Polyhedral or CSG

models have potentials in complex building reconstruction. A particular problem

is the vertical facets or height-jump line detection within a building complex. In

reality, this also applied to imagery based approaches.

6. Data fusion has been applied to some extent, but it is still not well explored yet.

Further research should be performed to investigate how to integrate data, features

and objects at different levels.

In general, there are three steps for building reconstruction, namely building

detection, building model reconstruction, and model refinement. Because a surface is

easy to generate and human made objects can be recognized on a surface, usually a

building reconstruction starts from the process of DSM when a DSM is available. LIDAR

technology has the advantage in providing a high vertical accuracy DSM and a great

degree of automation in data processing. Meanwhile, photogrammetry data can provide

plentiful structure and texture information. Both technologies also have their own

disadvantages. LIDAR data has poor structure information; it cannot capture sharp

21

features such as break lines (Ackerman, 1999); while photogrammetry has difficulties

with object recognition due to image interpretation complexity and data processing cost.

As addressed by several researchers, it is a trend in the photogrammetry community

that imagery data and LIDAR data be combined together for industrial application. As

Ackerman stated in 1999, it would be a revolution in photogrammetry if imagery data

could be directly combined with spatial position data, specifically LIDAR derived digital

surface model.

The research presented here will describe and analyze a new method to integrate

LIDAR data and aerial imagery data to take advantages of both kinds of data. The DSM

used in this study is derived from LIDAR data. The LIDAR data and the aerial

photographs were acquired separately.

1.5. Research Focus and Methodology

The objective of this research is to reconstruct 3D building models from imagery and

LIDAR data. The images to be used are stereo aerial photographs with known imaging

orientation parameters so that 3D ground coordinates can be calculated from conjugate

points and 3D ground objects can be projected to image spaces. To achieve this objective,

a method of synthesizing both imagery data and LIDAR data will be explored; thus, the

advantages of both data sets can be utilized to derive 3D building models with a high

accuracy. In order to reconstruct complex building models, the polyhedral building model

will be employed in this research. Correspondingly, the reconstruction method is data-

driven.

22

The general research procedure can be summarized as: a) building detection from

LIDAR data; b) 3D building model reconstruction; c) LIDAR data and imagery data co-

registration; and d) building model refinement. The main role of aerial image data in this

research will be to improve the geometric accuracy of a building model. With a point

density of approximate 1 point/m2, the building can be detected with LIDAR

segmentation. In this research, new algorithms will be developed to perform LIDAR

segmentation and to differentiate buildings from other non-ground objects such as trees.

The features of a reconstructed building model from LIDAR data have different

geometric accuracy. The edges generated from roof intersection will have a high

accuracy, for example, the ridge of a gable roof building. However, vertical walls will

have a low accuracy, and they need to be refined with the help from aerial image data.

One important challenge of this research is to derive a consistent and topology-correct

building model with sufficient details, though its geometry accuracy may not be very

high. The expected contributions of this research lie in three aspects: 1) an effective and

efficient approach to detecting building regions from LIDAR data; 2) a well-organized

methodology used to reconstruct 3D building models from LIDAR data; and 3) a well-

developed methodology for integrating LIDAR data with imagery data to improve the

accuracy of reconstructed building models from LIDAR data.

1.6. Fundamental Concepts

This section provides the basic terms, concepts, and technologies that will be used in

this research.

23

1.6.1. LIDAR vs. Photogrammetry

Photogrammetry is defined by ASPRS (American Society for Photogrammetry and

Remote Sensing) as “the art, science, and technology of obtaining reliable information

about physical objects and the environment through processing of recording, measuring,

and interpreting photographic images and patterns of recorded radiant electromagnetic

energy and other phenomena” (Wolf and Dewitt, 2000). The information can be collected

from terrestrial, aerial and space based platforms. The media for recording and storing

information can be films or electronic chips. The most commonly used photographs are

aerial photographs. According to the geometry of imaging, aerial photographs can be

identified as vertical, low oblique and high oblique photograph. See Figure 1.1.

GroundOpticalCamera

Figure 1.1. Imaging geometries of vertical, low oblique and high oblique photograph (from left to right)

Photographs record spectral or magnetic information from the object space. The

object space is continuous while the image space is discrete. Thus the process of the

recording is a sampling process from continuous object space to discrete image space.

The sampling interval depends on the resolution of the media and the scale of the

photograph, which is usually called the ground resolution. Finer resolution will record

more details of the object space.

24

The objective of photogrammetry is to reconstruct object information in the object

space from image information. From the geometry of imaging, it is easy to understand

that the imaging is a process of information transformation from 3D space to 2D space.

The relationship from 3D space to 2D space is one-to-one; each object in 3D space

corresponds to one unique object in 2D space. However, the inverse transformation from

2D space to 3D space is one-to-many; each object in 2D space can find many

corresponding objects in 3D space. Thus, the reconstruction of 3D space cannot be

achieved from a single 2D image. In order to reconstruct the object space, stereo images

are utilized. See Figure 1.2. The fundamental principle used in photogrammetry is the

collinearity of the three points, i.e., the perspective center, image object (point) and the

object (point) in object space.

P2 P1

P

O1

Image 1 Image 2

Ground

O1 O2 O2

Figure 1.2. Stereo images (left) and space intersection for 3D space object reconstruction from stereo images (right)

According to the definition of photogrammetry, LIDAR (Light Detection And

Ranging) belongs to the scope of photogrammetry. However, due to its special

characteristics, LIDAR is usually treated as a separate technique different from

photogrammetry. The light used in a LIDAR system is laser. There are two kinds of laser

25

systems: the pulse laser system and the continuous-wave laser system. The commonly

used one is the pulse laser system.

A photogrammetry image is compact in space; the data looks like a tile covering a

study area. LIDAR data is different. It captures random points on the ground although the

distribution of the points may demonstrate a regular pattern. See Figure 1.3. The major

data that a LIDAR system delivers is the 3D coordinates of the measured point. Some

systems can also record the density of the reflected pulse and generate reflectance data.

The reflectance data can be used to generate a reflectance image using an interpolation

method. It is easy to distinguish an aerial image from LIDAR reflectance image; the later

has very coarse texture information. Figure 1.4 shows an examples of LIDAR height data

and LIDAR reflectance data.

Flight direction

Ground

Lens/Antenna

Figure 1.3. Imaging geometry of a LIDAR system

26

Figure 1.4. LIDAR height data (left) and reflectance data (right)

Accuracy is always a critical issue in both photogrammetry and LIDAR technologies.

The ultimate goal of both technologies is to reconstruct 3D information in object space.

Photogrammetry uses two 3D rays to get 3D point information by intersection. LIDAR

uses one 3D line segment and one point (the laser emitter) to determine a 3D point. The

errors in photogrammetry come from the interior orientation, the exterior orientation, and

the measurements. For LIDAR data, the errors come from the attitude of the 3D line

segment, the length of the line segment, and the footprint of a laser point. Different errors

demonstrate different patterns in ground position. Some researchers have published

papers containing detailed information about basic LIDAR system formulas [Baltsavias,

1999; Wehr and Lohr, 1999]. Related materials about error propagation can be found in

Schenk, 2000 and Schenk, 1997.

27

1.6.2. DTM vs. DSM

DTM refers to digital terrain model while DSM refers to digital surface model. DTM

depicts the topography of the bald earth. For DSM, both nature and man-made objects are

captured in the topography. See Figure 1.5.

Earth Earth

DSMDTM TreesTrees

Building Building

Figure 1.5. DTM (left) and DSM (right)

Both photogrammetry and LIDAR data can be used to generate DTM and DSM.

However, the direct product from these two techniques is DSM. In order to generate a

DTM, some filtering algorithms are employed to filter out the nature and man-made

objects from a DSM. The morphology filtering is one commonly used algorithm in

filtering out non-ground objects to generate a DTM. A DTM can be used for ground

planning applications such as flood control; while DSM can be used to detect objects on

the earth. The difference between DSM and DTM is usually called normalized DSM. It

keeps information about non-terrain objects such as buildings, trees, cars and so on. But

regardless of whether a DTM or a normalized DSM, they both come from DSM. Thus the

first data set to be processed for object reconstruction, especially building reconstruction,

is a DSM.

28

1.6.3. Building Detection and Building Reconstruction

Building detection refers to the process of differentiating buildings from other objects

measured within data. Taking an aerial image as an instance, which area or region on the

2D image is a building? This process is a qualitative process. An image can be segmented

into regions using algorithms; and each region can be analyzed and classified as an object

according to its spectral characteristics. Color image and multi-spectral images are used

in building detection because they capture more spectral information than Black/White

images. LIDAR can also be used for building detection. LIDAR data maps the

topography of the earth’s surface. Information about a building such as shape and

parallelism can be derived. These internal characteristics can also be calculated from

images.

Building reconstruction is the process of deriving building model parameters. The

commonly used building model is a CAD model, which has specific parameters such as

height, width, direction and other necessary information to reconstruct a building model.

These parameters cannot be captured by aerial images or LIDAR data directly. They need

complex spatial and topological analysis to be calculated. In this research, the building

detection and reconstruction are addressed in different parts.

1.7. Organization of this dissertation

This dissertation is organized into 6 chapters. Chapter 1 (the current chapter)

addresses the background related to the research.

29

30

Chapter 2 describes two new algorithms developed to perform LIDAR segmentation.

It illustrates how to extract building regions from LIDAR data. The so-called “height-

jump” and “planar-fitting” algorithms are developed and elaborated.

Chapter 3 presents a method for 3D building model reconstruction using a polyhedral

model. It describes a methodology for building model primitive construction, model

topology construction, and model reconstruction.

Chapter 4 elaborates an approach to building model refinement through the

integration of LIDAR data and imagery data. The focus of the refinement is the geometry

of a building model instead of its topology.

Chapter 5 demonstrates experimental results involved in this research to show how

the algorithms developed here work.

Chapter 6 concludes the dissertation research. It highlights the contributions and

analyzes the shortcomings of the research. It also projects further research in this topic.

CHAPTER 2

BUILDING DETECTION FROM LIDAR DATA

Building detection from LIDAR data is a part of LIDAR segmentation process. In

order to derive 3D building models, building regions will be detected and extracted from

LIDAR data first. In this study, two methods are developed to detect building regions in a

LIDAR segmentation process. These two methods are described in this chapter. The

comparison between these two methods and the morphology method is also presented.

The whole process of DTM generation and building detection can be illustrated in a

flowchart shown in Figure 2.1.

31

LIDAR point data

Buildings

Size filtering

Buildings and scatter tree points

Intersection

Building and treesRoads, buildings and

scatter tree points

Planar-fitting filtering Height filtering

Normalized DSM DTM

Planar-fitting filtering

Grid DSM

Bin process

Figure 2.1. DTM generation and building detection from LIDAR data

32

2.1. Conventional Terms

Morphology operators: Morphology operators include a group of filters. The

commonly used ones are open, close, dilation, and erosion. A basic character of

these operators is to order the pixel values within a neighbor defined using

structure elements, and assign a specific value to the pixel under test. Morphology

operators are commonly used in binary image data. When these operators are

applied to gray scale images, they are also called gray morphology operators.

Structure elements: Structure elements of a morphology operator define the

neighbor of a pixel under analysis. For example, a 3 by 3 square window defines

the pixels directly adjacent to the central pixel as the neighbor of the central pixel.

The structure elements of a morphology operator can be of other shapes such as a

circle and a cross.

Neighbor: A neighbor of a point is defined as points within a distance threshold to

the point under study. In different processes, different distance thresholds can be

applied. Points falling in a point’s neighbor are used in analysis.

Ground region: A ground region includes roads and open ground. Bridges are also

included in a ground region. In an urban or a suburban area, it is usually the

largest area.

Building boundary: In this study, a building boundary is referred as a building’s 2D

footprint on the ground.

33

2.2. DSM and DTM Generation

The original LIDAR range data are random points and it is not convenient to perform

building detection and DTM generation directly from these points. Two intermediate

products are commonly used by researchers to perform further process. One is grid

format image; and the other is Delaunay Triangulation. The later is also referred as

Irregular Triangulated Network (TIN). These are two forms representing a 2.5D surface.

LIDAR range data measures the surface exposing to a LIDAR antenna. For airborne

LIDAR data, it captures surfaces of objects on the ground and part of the earth’s surface

exposing to a LIDAR antenna. In GIS and remote sensing community, this surface is

called Digital Surface Model (DSM). Instead of DSM, the useful products for a GIS

system are Digital Terrain Model (DTM) and information about objects sitting on the

ground. Currently, how to separate DTM and objects from a DSM has become a popular

research topic, especially in the LIDAR research community. It is usually referred to as

LIDAR data segmentation.

In order to detect buildings, the DTM will be generated first, and then, a normalized

DSM (NDSM) will be generated for building detection in this research. A NDSM is a

surface relative to the DTM. Objects within a NDSM can be viewed as sitting on a level

plane. Building detection is then conducted on such a NDSM. In this research, the data

format used is grid format. Two criteria in choosing the grid format over the TIN model

are: 1) simple to process; and 2) available algorithms for preliminary process. The grid

format is simple; the spatial and topological relationships among pixels are easy to

34

Owner
Highlight
Owner
Highlight
Owner
Highlight

calculate. In addition, there are many mature image processing algorithms that can be

applied to grid format LIDAR data.

2.2.1. Transformation of Points to Grid

To generate a grid DSM from random points, a transformation between the ground

horizontal coordinate system (X, Y) and the image coordinate system(i, j) is applied. For

transformation convenience, the X and i axes of these two systems are of the same

direction while the Y and j axis are of opposite directions. See Figure 2.2. The

transformation between these two systems is expressed in equation 2.1.

−=

stepXXIntegeri min

(2.1)

−=

stepYYIntegerj max

Y

X(Xmax, Ymin)

(Xmin, Ymax)

Figure 2.2. Conversion from points to grid

35

The step, or the resolution, of a grid is calculated as the inverse of the average density

of LIDAR points (see next paragraph for details). For each pixel (i, j) of a grid, its value

is assigned as the value of the LIDAR point that falls into the pixel calculated through

equation 2.1. However, due to uneven distribution of LIDAR points, some pixels have no

corresponding LIDAR points, while some pixels have more than one corresponding

LIDAR point. If a pixel has no corresponding point, an interpolation method will be

applied to derive its pixel value. If more than one point falls within a pixel, only the

minimum value is assigned to the pixel. The basic steps can be described as followings:

a. Calculate the maximum and minimum X and Y coordinates. Determine the spatial

resolution of the grid according to the range point density. Usually the resolution

is calculated as n1 , where n is the average number of points within an area;

b. Using equation 2.1, for each LIDAR point, the corresponding grid pixel location

is calculated and the Z value of the point is assigned to the grid pixel. During this

process, a test is performed. If a pixel is already assigned a value from another

point, this LIDAR point value is compared with the existing pixel value, and the

smaller value will be assigned to the pixel;

c. After each LIDAR point is processed, the value of a single vacant or an empty

pixel will be derived from its neighboring pixels using an interpolation algorithm.

In this research, the nearest neighbor method is applied to avoid introducing new

height values into the generated DSM. For a large empty patch like a pond, it will

36

remain empty. By doing so, new elevation values will not be introduced so that

the grid data will not be smoothed out.

From the procedure listed above, it can be seen that new height values are avoided.

The reason for avoiding new height values is that the DSM will be used to detect and

reconstruct building models; thus, it is preferable to keep the original height values within

the DSM instead of smoothing out the height differences by introducing new values

Figure 2.3 depicts the DSM generated from LIDAR range points using the method

described above.

Figure 2.3. View of DSM from LIDAR range points (2D at left and 3D at right)

2.2.2. LIDAR Data Segmentation

After a DSM is generated from range data, LIDAR data segmentation is conducted to

separate points falling onto bald earth from points falling onto objects like buildings,

cars, trees, and other natural and human made objects. Due to a great diversity of natural

37

phenomena, there is no such single algorithm that can work in all situations. Algorithms

are usually application dependent. In other words, they are developed to solve specific

problems using specific data. Thus, the data used should be analyzed and an appropriate

algorithm should be employed for best performance. Several algorithms have been

developed by researchers for LIDAR data segmentation.

Based on an assumption that a ground point’s height is lower than its neighboring

object points, morphology filters can be applied to distinguish ground points in a LIDAR

data set. Figure 2.4 shows a profile demonstrating the difference between ground and

non-ground objects. Another assumption is that the ground is smooth. In other words,

there is no abrupt change on the ground. Some studies on this separation have been

carried out and good results were produced (Weidner and Förstner, 1995; Morgan and

Habib, 2002). However, morphology filters are sensitive to errors. Although a median

filter can be used to decrease effects from single error points, effects from errors in form

of a point patch cannot be eliminated or decreased. Kilian et al. (1996) used a “multi-

level opening” morphology operator in order to keep small ground features while

removing large non-ground objects. Because small windows they used have small

weights, fine features can still be removed.

38

Figure 2.4. A profile depicting the difference between ground and non-ground objects like building and trees

Axelsson (1999) presented an idea to separate ground and non-ground points from

LIDAR data. The idea is to move a surface from below to LIDAR point cloud to touch

the ground surface. Controlled by a few parameters, the moving surface can adjust itself

to include points on the ground.

“Linear prediction” is a statistical interpolation method. It is employed in LIDAR

data segmentation by researchers to generate digital surfaces (Lohmann and Koch, 1999;

Lohmann, et al., 2000; Kraus and Pfeifer, 1998; Lee and Younan, 2003). The

interpolation is founded on spatial correlations of neighboring points, which are

expressed in the form of covariance. A covariance is calculated using a covariance

function based on the distance between two points. By comparing the original digital

surface with the predicted one that is generated using the “linear prediction” method,

points’ weights are calculated according to their residuals. The weights are also used in

calculating a covariance matrix. A DTM is generated from the terrain points with their

weights. Iterative execution is necessary to obtain a high accuracy DTM using the “linear

39

prediction” method. An initial DTM surface is required in order to use this method,

which can be calculated using a moving plane.

Vosselman (2000) proposed a slope-based method to filter out non-ground points.

The method is a modification of the morphology erosion operator. A point is classified as

a ground point if the maximal slope of the vectors connecting this point under test to all

its defined neighbors does not exceed the maximal slope within the study area. Sithole

(2001) modified this method to use different maximal slope thresholds according to local

terrain characteristics. A rough slope map is required to calculate the local slope

threshold. Like morphology operators, the problem with these kinds of methods is how to

define the neighbor of a point.

Several studies have been conducted to investigate building detection from LIDAR

data. After performing a region growing segmentation, Matikainen, et al. (2003), used

height information to separate trees and buildings from the ground; then a fuzzy

classification method was applied to detect buildings based on three attributes: the Gray

Level Co-occurrence Matrix (GLCM) homogeneity of height; the GLCM homogeneity of

LIDAR intensity; and the average length of edges from a “shape polygon” derived from

the segment under test. Rottensteiner and Briese (2002) used the height difference

between a DSM and a DTM, a morphological opening operator, and a size measurement

to detect buildings and tree groups. A polymorphic feature extraction method was then

used to detect “point-like” pixels. Based on analysis of the number of “point-like” pixels

of each segment, tree groups were eliminated. This whole process was repeated before

final building regions were generated. Lohmann (2001) investigated the Gaussian

40

Laplace (GL) filter to detect break lines like dike edges. He proposed to use the mean

curvature to overcome the difficulty in determining thresholds for a GL filter result. This

method could be used to detect building outlines. Brunn and Weidner (1997) used a

Bayesian Network classification to detect buildings based on the height difference

between a DSM and a DTM, detected steep edges, and surface normal variation.

Axelsson (1999) used a classification method based on the minimum description length

criteria to separate buildings and trees. A cost function was calculated from the second

derivatives of the surface.

The measurement delivered directly in LIDAR data is the height information of the

earth’s surface. Other information such as texture can be derived based on neighboring

points, and this kind of information can be used to perform LIDAR data segmentation

(Elberink and Maas, 2000; Lohmann, 2001). Textures that can be derived from height

information are slope, variance, aspect, and so on.

From information delivered in a DSM, an important while simple characteristic is that

non-ground objects are higher than their neighboring ground region. Figure 2.4 shows a

profile demonstrating this characteristic. In this research, two algorithms were developed

based on this character to perform LIDAR data segmentation. The first algorithm is based

on the observation that objects are higher than the ground, and the ground and buildings

can be defined as planar surfaces locally. The second algorithm is based on the

observation that objects are separated, in other words, isolated from the ground by their

boundary points, which have high elevation differences within their neighborhood, such

41

as a 3 by 3 window. These two algorithms were tested and the results were compared

with the result from the morphology method.

2.2.2.1. Morphology Segmentation

The procedure using gray morphology operator to perform LIDAR segmentation can

be summarized as followings:

a. Based on a prior knowledge of the largest building size within a study area, the

window size of a morphology operator is determined to be larger than the largest

building. This window defines the neighbor of a pixel under analysis.

b. For each pixel of a DSM grid, the pixels within its neighbor are checked and their

height values are compared. The minimum value in its neighbor is assigned to it.

This operation is called erosion.

c. After the erosion is finished, each grid pixel is compared with the pixels within

it’s neighbor defined in step a. The maximum value in the neighbor is assigned to

it in this operation, which is called dilation.

This procedure is a morphology open filter on gray scale image. In order to apply this

method on segmentation, the neighborhood in the form of a window should be

determined in advance. Such a window is called structural element W. The size of W is

determined by prior knowledge of the maximum size of non-ground objects, and usually

it is the maximum building size in urban or suburban areas. The size of W is determined

in such a way that no object within the study area can totally cover the structural element.

Otherwise, points falling on objects other than the bald ground will not be filtered out;

42

and consequently, the generated DTM will be biased. The biases will also propagate to

NDSM and final building models. On the other hand, a large size window will remove

fine features on the ground, such as a cliff. Obviously, the critical step in morphology

operation is to correctly determine an optimal size of the structural element.

A filtered DSM is a version of the DTM. Due to the internal characteristic of the

morphology operator, some or many small ground features were eliminated. The quality

of the DTM can be improved by tracking back the removed ground points. To achieve

this objective, LIDAR points were compared with the DTM. Those points with small

height differences are tracked back as ground points. Then, a new version of DTM can be

generated with a better quality. This comparison can be repeated until no more or a small

number of new points can be tracked back. The height difference threshold is calculated

based on the vertical accuracy of LIDAR points.

Figure 2.5 depicts a DTM generated from original DSM using gray-based

morphology open filter. The ground points were detected by comparing the DTM with

LIDAR points. Those points have a elevation difference smaller than the difference

threshold. The final DTM and the normalized DSM are displayed in Figure 2.6.

43

Figure 2.5. DTM from morphology filter (left) and ground points detected from normalized DSM (right)

Figure 2.6. 3D visualization of final DTM (left) and normalized DSM (right) from morphology opening

In Figure 2.5, the DTM has a pattern of tiles. This is caused by a large structural

element used in morphology open filtering. In a gray-based morphology open

filtering, the local smallest value will extend to a size of the structural element. Since

44

the structural element used in this experiment is a square window, the image obtained

a tile pattern. It should be noted that there are blank pixels in the original data. These

blank regions are areas that have low reflectance so that a LIDAR sensor cannot

detect them correctly. Examples of these areas are water regions. During the

morphology opening operation, these regions were not processed. They appear as

holes in Figure 2.6.

2.2.2.2. Planar-fitting Segmentation

In urban and suburban areas, the ground region is usually continuous without

abrupt changes in topography. Given LIDAR data with a certain resolution and

accuracy, such a smooth ground can be observed locally as a planar surface. That

means a group of neighboring ground LIDAR points, for example points within one 3

by 3 window, can form a local planar surface. Such a planar surface can be derived

using a regression method, such as the least square method, with a fitting accuracy no

worse than the accuracy of the original data points. Another observation is that the

ground is continuous. The planar-fitting segmentation in this research is developed

based on these observations to detect the ground as a large, continuous, and locally

planar surface. Figure 2.7 shows the flowchart of this method.

45

DTM generation

Ground points

Filtering from size

Planar points

Final DTM

DTM refinement

DTM

Filtering

Texture grid

Plane fitting

Refined grid

Hole filling

Grid

Grid conversion

LIDAR points

Figure 2.7. The flow chart of DTM generation from planar-fitting segmentation

For each LIDAR point, its neighboring points are used to calculate a regression

planar surface; and how well the point under test matches the regression planar

surface tells if it is on a planar surface. The neighbor can be defined as a 3 by 3

square window. Two measurements for the point under test can be derived using this

regression planar surface. One is the Root Mean Square Error (RMSE) calculated

from the neighboring points that are used to derive the regression planar surface, and

the other is the height difference between the actual height value of the point under

46

test and its height value calculated from the regression planar surface. Both

measurements indicate how well a point fits onto a planar surface formed by its

neighboring LIDAR points. These two measurements correlate with each other and

thus produce similar results. Points with a small RMSE or a small height difference

are classified as points on ground or building roofs, and points with a large RMSE or

height difference are classified as falling onto objects with non-planar surfaces like

trees and shrubs.

From its vertical accuracy of the original LIDAR data, a threshold of height

difference or RMS is determined to differentiate points falling onto planar planes

from points falling onto non-planar surfaces like trees. In this research, double stated

vertical accuracy of the original LIDAR data is used as the threshold. This threshold

can correctly classify most planar points, around 96 percent assuming a normal error

distribution. Further processing will be performed to test and classify more LIDAR

points as ground points. The classified planar points include points on the ground,

points on building roofs, and some scattered points on objects like trees and cars.

Another issue to deal with is what kind of ground surface this algorithm can

detect. The real terrain is not a perfect plane. The question is to what degree a rough

terrain can be classified as a local planar surface. This can be deduced from data

accuracy and data density, or the resolution of grid format data. Figure 2.8

demonstrates their relationship. The roughness of a terrain can be measured as

changes of terrain slopes.

47

2

α

β

)2

arctan(

2

Rh=

=

β

βα

βh

R

Figure 2.8. Conditions for a rough terrain to be classified as a planar surface using the planar-fitting algorithm

In Figure 2.8, α is the slope change caused by a terrain change. It is double β .

The symbol h is the fitting threshold used to test whether a point is falling onto a

planar surface or not. R is the resolution of the grid DSM. Since the window used for

plane fitting in this experiment is a 3 by 3 window, the maximal distance to the center

of the window is 2 R. As used in this experiment, R is 1 meter, and h is 0.3 meter.

Thus α is approximate 24 degrees. That means a ground surface with slope changes

smaller than 24 degrees can be classified as a planar surface. Thus, it can be correctly

extracted as a ground surface. In actual implementation, a rougher surface could be

classified as a planar surface because the central point is also used to define the

regression-fitting plane.

Points or pixels classified falling onto planar surfaces were extracted for further

process. First of all, connected regions were detected and labeled; and areas of these

connected regions were calculated by counting their pixel number. To differentiate

ground points from building roof points, it is assumed that the ground points form one

48

or more connected planar surfaces that have larger areas than the largest building

within a study area. In other words, the largest connected planar region is on the

ground. This assumption is true in urban and suburban areas where the ground region

is connected by a road system. Regions with areas larger than the largest building size

within a study area were extracted as ground regions. The ground region can also be

simply picked manually. LIDAR points falling within the ground regions were

extracted and used to generate an initial DTM. Figure 2.9 shows the calculated fitting

difference and an extracted ground region.

Figure 2.9. Calculated planar-fitting difference (left) and detected ground points (right) from planar-fitting segmentation method

After ground points were extracted, a DTM was generated using an interpolation

method. In this experiment, a TIN (Triangulated Irregular Network) model was

created first; then a DTM grid was generated from the TIN model. However, this

DTM is a rough one whose accuracy can be improved. A refinement was conducted

on the initial DTM to get a higher accuracy by tracking back more ground points

through a comparison of LIDAR point heights with the DTM elevation. In the planar-

49

surface fitting procedure, points falling on boundary regions of a planar surface also

had large fitting differences due to height jumps occurring in boundary regions. These

boundary points that are not initially classified as ground points should be included in

the final DTM. After the initial DTM was generated from initially classified ground

points, all LIDAR points classified as non-ground points were compared with the

DTM. Those points with differences smaller than the threshold were re-classified as

ground points, and these ground points were used to refine the DTM. Again, the

difference threshold in the comparison is twice the stated vertical accuracy of original

LIDAR point data.

Ground points were updated to include newly classified ground points from the

comparison of the previously generated DTM and actual point heights. The new

generated DTM had better accuracy since more ground points were included. This

procedure is repeated until no significant number of points are classified as on the

ground. This procedure can also be repeated for a fixed iteration number. For point

detection purpose during iteration, a simple interpolation method can be applied to

generate an intermediate DTM. The final DTM can be generated using a complicated

but accurate interpolation method, such as the Kriging method. A normalized DSM

can be obtained by subtracting the final DTM from the DSM. The final DTM and the

normalized DSM are displayed in Figure 2.10.

50

Figure 2.10. 3D visualization of final DTM (left) and normalized DSM (right) with2 times exaggeration in elevation from planar-fitting segmentation

In the normalized DSM, it can be seen that there are wall-like boundary regions.

These regions are caused by the different size of the DTM and the DSM. The size

difference exists as a consequence of point loss during LIDAR segmentation.

2.2.2.3. Height-jump Segmentation

An important kind of information delivered in LIDAR data is expressed in the

form of changes among data distribution. For example, a person can distinguish

roads, buildings, and trees from a photograph because he can differentiate spectral

changes of these objects. For a computer vision system, it is important to detect

changes among data under certain criteria. This rationale can also be applied to

LIDAR data segmentation. LIDAR data captures height information of the earth’s

surface. It can be observed that large changes within the elevation data indicate

changes of objects or structures. This observation is the basis of the height-jump

segmentation algorithm. The flow chart of this method is shown in Figure 2.11.

51

Normalized DSM

Grid

Grid conversion

LIDAR points

Subtract from DSM

DTM generation

Ground points

Connection and size filtering

Large height jump points

Final DTM

DTM refinement

DTM

Classify

Difference grid

Height difference

Figure 2.11. The flow chart of DTM generation from height-jump segmentation

Height changes happen mostly at object boundaries. They also happen at rough

topographical features, such as cliffs. It is easy to understand that objects on the

ground have higher elevations than the ground itself. Furthermore, objects are

52

isolated by large height-change points. This forms the major tasks of the height-jump

segmentation algorithm, to detect boundary points from height changes and to detect

connected regions separated by large height change points.

The question is, what is the height threshold to detect height-jump points? The

solution is determined by the vertical accuracy of LIDAR points, its point density or

the resolution of LIDAR grid data, topographical changes of the ground, and real

object heights. First of all, the height difference should be significantly larger than the

data vertical error. Otherwise, change information is at the same level as data error.

Signal-to-noise ratio is too small to correctly extract information. If the height

difference is larger than 3 times the vertical error, we have approximately 98 percent

confidence to conclude that the difference is caused by an actual physical elevation

change based on the assumption that the error follows a normal distribution.

How do data resolution and ground topography affect height difference? The

criterion is to differentiate the height difference caused by objects from the height

difference caused by topographic change. Figure 2.12 illustrates the relationship

among data resolution, topography, and objects. In Figure 2.12, h represents the

height change introduced by a terrain surface with a slope angle of α ; R is the

resolution of the grid DSM while H is the height of an object. H should be

significantly larger than h in order to distinguish H out of h’s. Thus, with an

estimation of the largest terrain slope within a study area, a height threshold of

minimal H can be calculated. For instance, a system can use double h as the minimum

53

H. Considering data vertical accuracy, the larger one of 2h and 3δ (δ is the vertical

error) will be used as height difference threshold.

α2

)tan(*2 αRh =

R

Hh

Figure 2.12. Object height vs. topographical difference

For each pixel of a grid DSM, its elevation is compared with the maximum

elevation within its neighborhood, such as a 3 by 3 square window. A new grid is

generated with each pixel representing the elevation difference as shown in Figure

2.13a. Compared with the difference threshold as described above, each pixel is

classified as either a pixel indicating existence of a shift from the terrain to an object

or a pixel without such a shift, see Figure 2.13b with white pixels representing large

difference points. From Figure 2.13b, it can be seen that the whole grid area is

divided into separate regions by such large height difference pixels. These isolated

regions are then labeled; and the ground region is detected and extracted using the

same method as in the planar-fitting segmentation. Figure 2.13 shows the process and

immediate results of the algorithm.

54

dc

b

dc

ba a

Figure 2.13. Process results of height-jump segmentation algorithm: a)

calculated height difference; b) classified large-difference points (white); c) isolated regions (white); and d) the detected ground region (white)

The ground region can be detected automatically. One straightforward way to

achieve this goal is to extract the ground as the region with the largest area. This

method is based on the observation that the largest connected ground region is usually

the largest object within an urban or a suburban area. Another approach is to compare

the elevation of each region with its adjacent regions. Across each sharing boundary

segment, the ground region should be lower than its adjacent regions. This is based on

the assumption that all objects are sitting on the ground and thus they are higher than

the ground. The first method is simple to implement. However, some small isolated

55

ground regions, such as a yard surrounded by buildings, cannot be detected. The

second approach can detect small ground regions, but it can also include some

isolated lower roof regions surrounded by higher roof regions. In addition, the second

method is complicated in comparing adjacent regions.

The extracted ground region is used to retrieve ground LIDAR points. Ground

points are then used to generate a DTM through a TIN model. As with the planar-

fitting algorithm, the generated DTM can be improved by including more ground

points. Using the same method as described in the planar-fitting segmentation

algorithm, new ground points are added to the DTM iteratively by comparing LIDAR

points’ elevations with the DTM. After a pre-set number of iteration is finished or

until no significant number of new points can be detected, the DTM refining is

stopped. Consequently, a finalized DTM is generated. Figure 2.14 shows two 3D

scenes displaying the DTM and the normalized DSM, which is generated by

subtracting the DTM from the DSM.

Figure 2.14. DTM and normalized DSM from height-jump segmentation

56

2.2.3. Comparison

LIDAR segmentation algorithms are all application-dependent. That means

algorithms are implemented based on an analysis of specific data sets. There is no such

algorithm that can be pre-set to apply to all kinds of LIDAR data under any conditions.

The characteristics of data should be analyzed before any segmentation algorithm can be

applied.

Morphology method needs prior knowledge about the maximal non-ground object

size in order to determine a moving window’s size. This is a disadvantage of morphology

operators. In addition, large non-ground objects are common in LIDAR sets so that a

large window size is required in order to correctly remove non-ground objects. At the

same time, a larger window will produce a smoother result. In many cases, this will

remove fine features and change the topography dramatically. This will make it very

complex and difficult, even impossible, to recover the topography. However, morphology

operators are easy to understand and simple to implement. In urban or suburban areas

without large topographic changes, this group of operators can work well in LIDAR

segmentation to obtain a good approximation of the terrain.

The planar-fitting algorithm is based on the assumption that the ground surface can be

observed as a planar surface considering the tolerance from data errors. It requires the

setup of the planar-fitting threshold, which is based on data errors, topography, data

resolution, and the roughness of a terrain. And sometimes it needs analysis of the

minimal height of the objects within a study area. However, these requirements are easy

to meet in applications especially for urban and suburban areas. The advantage is this

57

method can keep important ground linear features as long as a feature is continuous

across the data coverage. For example, a freeway intersection can be correctly classified

onto the ground. On the other hand, if an application doesn’t want the freeway

intersection in the ground, this becomes a disadvantage. A further process or new

algorithm should be developed to handle this. Compared with morphology operators, this

algorithm keeps the topography much better; thus, it is faster to recover the final DTM by

tracking back excluded ground points during the process at the beginning.

The height-jump algorithm is based on the observation that non-ground objects are

higher than their neighboring or surrounding ground, and their boundaries have

significant height difference from the ground. In practical applications, this method

should be applied with analysis of the topographic characteristics and data resolution. A

building object may be connected to a large-slope ground in a coarse resolution data.

Similar to the planar-fitting algorithm, this algorithm can keep fine ground linear features

as long as the features are not broken down. Compared with the planar-fitting algorithm,

the height-jump algorithm doesn’t have requirements for the roughness of a terrain while

it has requirements for terrain slopes. The planar-fitting algorithm doesn’t require the

terrain slope to meet certain condition, but it does have requirements for the terrain

roughness.

The differences among three algorithms of LIDAR data segmentation in this research

are reflected in the detection of the ground region, which is an initial DTM. The

following DTM refinement processes are the same. They all compare the original LIDAR

point data with the generated DTM to include more ground points so that the quality of

58

the DTM can be improved because more ground points provide more information about

the terrain. Different interpolation methods can be applied to the same ground data to

generate different quality DTM and normalized DSM. The method used in this research

is a TIN model, which is a linear interpolation. Based on analysis of terrain

characteristics, complex methods such as B-spline surface or the Kriging method can be

utilized to generate more accurate results. But the rationale or the philosophy related to

this research topic is the same.

2.3. Building Detection from Normalized DSM

After a normalized DSM is generated, non-ground objects such as buildings and trees

are observed as sitting in a level plane. In reality, errors are always inevitable. Even

though errors exist, the observation still makes sense from the data point of view. By

segmenting a normalized DSM using a height threshold, trees and building can be

distinguished out of the normalized DSM. A height threshold is determined from a prior

knowledge of the minimum height of objects the application wants to detect. For

example, a threshold of 3 meters can be used to detect buildings while eliminating other

objects like cars and bushes. In this research, the focus is to detect buildings from LIDAR

data, so a threshold of 3 meters is employed. Consequently, objects like cars and bushes

were eliminated while large trucks and some trees are kept from such a height threshold.

Other measurements or derived texture can be applied together with height information to

differentiate building from other objects. For example, shape measurements like

parallelism and size can be used to separate buildings and trees.

59

To reconstruct building models, buildings should be separated from other objects,

most likely trees, and be identified in order to derive parameters of building models. By

analyzing the characteristics of trees and buildings, we can find that they demonstrate

different spatial distributions and patterns as illustrated in Figure 2.15. It is true that

different objects can always be separated as long as sufficient information is available. In

the following discussion, the focus is how to separate buildings from trees.

Figure 2.15. Objects (buildings and trees) detected as objects higher than 3 meters

60

Buildings can be separated from trees because they have different characteristics.

With sufficient information, we can totally differentiate them. The questions are: what

kind of information do we need? And what kind of information can we obtain from a data

set? From elevation data, we can see that buildings and trees have different shapes, sizes

and elevation distribution patters. Usually buildings have regular shapes, such as

perpendicularity and parallelism. Their boundaries are straight line-segments or curves

that can be described as mathematic equations like one or half circle. Most buildings have

parallel boundaries, which means two consecutive boundary segments are perpendicular

to each other. Individual trees usually have random boundaries. Thus parallelism can be

utilized to differentiate some buildings from trees but not all of them. The reason is some

buildings have curved boundaries.

Before separating objects, objects themselves should be detected and labeled first.

After the height threshold was applied to a normalized DSM, building-like objects were

detected. See Figure 2.15. They are called “building-like” objects because they are

merged with buildings and buildings can only be detected through further processing.

First of all, all connected regions are extracted and assigned a unique label. The

connection is detected based on 8-neighbor connection relationship. At the same time, the

pixel number of each connected region is calculated. This produces the histogram in the

resulted image. Then, a prior knowledge of the minimal building size is used to eliminate

all objects with an area smaller than the minimum building size. After the elimination,

objects remaining are mostly buildings. Figure 2.16 shows the result after an elimination

using a minimum building size of 100 m2. Due to some noisy points adjacent to

61

buildings, detected building regions may have ragged boundaries. To smooth the

boundaries, a morphology close operator was applied to the resulted grid. This operator is

comprised of a dilation operator followed by an erosion operator.

Figure 2.16. Regions detected as buildings using the size threshold

The size threshold can dramatically reduce the number of objects remaining within

the resulted grid. However, size measurement alone cannot differentiate all buildings and

trees. Some trees have areas larger than the minimum building size, for example, large

trees and a group of trees. In addition, some trees are adjacent to buildings; and thus they

62

are connected to their adjacent buildings. If trees are connected with buildings, it will

introduce blunders in reconstructed building models. Thus, additional information or

measurements should be used to further classify buildings and trees.

In the real world, the elevation distribution of human made objects like buildings has

regular patterns. People can observe that most of buildings have roofs comprised of

planar facets. Such roofs can be represented as a planar facet. Thus the elevation of a

point on a planar roof can be predicted from the plane formed by its neighboring points.

On the other hand, trees usually have irregular elevation distribution patterns. This

distribution difference can be utilized to differentiate buildings from trees. One

measurement could be the distribution of slopes. Points on a building roof will form an

approximate homogeneous slope region except for its boundary, while points on top of a

tree will form a heterogeneous slope region due to random distribution of point

elevations. A second texture can be planar fitting difference as used in detecting ground

region in the previous section. The fitting difference or variation can be used to detect

planar-roof buildings.

The planar fitting measurements and the slope measurement are all based on point

elevation distribution. Internally, they will produce similar results. Since the planar fitting

difference image was already generated in DTM generation, this difference data will be

used to separate buildings from trees.

The size, slope, fitting difference, and height measurements are constraints applied to

differentiate buildings from trees. Each constraint can be used individually and the passed

results are the regions detected as building by that constraint. Each result can be treated

63

as a set. The intersection of the sets will be considered as building regions. Intersection

operator will eliminate those building regions not detected by one of the constraints. Thus

the constraints should be relaxed. It is preferable to get class I (commission) error instead

of class II (omission) error. That means buildings should be detected maximally at the

cost of trees surviving in one constraint test. Commission errors can be reduced by

applying other constraints later on, while omission error is difficult to reduce because

buildings that failed to pass a test are eliminated.

In detecting buildings with LIDAR data, a challenging problem is how to separate

large trees adjacent to buildings. After elevation and size constraint tests, these trees will

exist because they connect to building regions and are detected as a whole object. The

planar fitting algorithm demonstrates its advantage in differentiating between these two

different objects. The method produces more precise building objects by eliminating the

tree areas connected to buildings. Figure 2.17 shows an example.

In Figure 2.17, the image on the left shows a DSM generated from LIDAR points. It

can be seen that there are trees surrounded by one building and these trees are connected

to the building. The image in the middle shows the objects detected after a height

threshold was applied to the normalized DSM. It is obvious that these trees survived in

the height constraint. It is clear that a size constraint cannot remove these trees because

they are connected to the building and form a sufficiently large object to pass the test of

the size threshold. The image on the right presents the result after a planar-fitting

algorithm was applied to the DSM together with size constraint. It can be seen that trees

were removed because their heights cannot pass the planar-fitting constraint. The

64

structure of the building is correctly recovered. It can also be seen from the right image

that the skeleton of the building is thinner than the ones in the left and the middle images.

This is caused by the planar-fitting algorithm because boundary points of planar surfaces

cannot pass the test of the planar surface test. A further process is needed to compensate

for the loss. For example, a morphology dilation operator can be applied to widen the

building region.

Figure 2.17. Separation of buildings and trees using planar fitting difference

2.4. Analysis and Conclusion

DTM generation and building detection from LIDAR data are application-dependent

processes. The results and the accuracy depend on data quality and the characteristics of

an application area under study. Data resolution/density and data accuracy can

dramatically affect the results. The topography of a study area also plays an important

role. However, the algorithms presented in this chapter can be applied to most urban and

suburban areas. The height-jump algorithm can even be applied in forestry areas for

DTM generation and forest study.

65

66

The building detection algorithm proposed in this research is a new method. It should

be noted that this method is not supposed to detect every kind of buildings to every detail.

Actually, no building detection algorithm is capable of detecting all kinds of buildings. In

this research, buildings with arch roofs cannot be detected because they cannot meet the

planar surface requirement. In addition, some small structures like dormers on buildings

cannot be detected because they are too small given a data resolution of 1 meter.

CHAPTER 3

BUILDING MODEL RECONSTRUCTION

Building model reconstruction is a process to derive or calculate CAD building

models, which are of vector format. In this chapter, the primitives of a building model,

which are roof faces of a building, will be detected and their parameters will be

calculated. Topological relationship of primitives will be analyzed afterward to obtain

correct building model topology, followed by analysis of building model spatial or

geometric characteristics to reduce effects of errors propagating from the proposed

process sequence.

3.1. Conventional Terms

Building boundary: A building boundary in this part refers to the outline of a

building footprint on the ground. It is a series of line segments in a 2D space. It is

a close 2D polygon.

Boundary regularization: A building boundary is assumed to have a rectangular

shape in this study. Boundary regularization is the process to adjust a building’s

67

boundary into a 2D polygon with a rectangular shape. After regularization, two

consecutive line segments in a building boundary are perpendicular to each other.

3.2. Boundary Extraction and Regularization

Based on their imaging geometry, neither LIDAR data nor optical imagery can

capture or observe every structure of a building. The common missing information is the

vertical walls of a building. Some vertical walls cannot be observed by imaging sensors.

Thus, there are no measurements for these vertical walls. During the conversion from

LIDAR point data to grid DSM, available vertical measurements, if any, were lost. One

method could be going back to the original LIDAR data to recover the lost vertical wall

information. However, this method still cannot recover information about vertical walls

not captured in original point data.

In the real world, we can observe that almost every building is surrounded by vertical

walls. In other words, vertical walls form the boundary of a building. Thus, the boundary

of a building indicates existence of vertical walls. In this research, vertical walls will be

recovered from building boundaries. Another observation is that boundary segments of a

building, or vertical walls of a building, are perpendicular to each other. Two connected

vertical walls form a right angle. Thus an assumption is made in this research to

reconstruct building models. The assumption is that all boundary segments of a building

are perpendicular to each other. In other words, buildings have rectangular-shaped

boundaries. Buildings with arch boundaries will not be modeled in this research. Instead,

they are forced to have rectangular boundaries.

68

Before the roofs of a building are reconstructed, vertical walls will be recovered.

Thus, the boundary of a building will be extracted and then be regularized. Due to errors

within LIDAR data, also because of data resolution, detected building regions have noise

in their boundaries. One obvious distortion is that a straight line segment of a building

has ragged small line segments. In addition, consecutive segments are not perpendicular

to each other. So methods are needed to generalize line segments and to adjust them to

have right intersection angles. This is the purpose of building boundary regularization.

In this research, a refined version of the so-called “sleeve” method is first employed

to generalize boundary segments so that redundant points will be discarded. Then, a new-

developed regularization method is applied to adjust the generalized segments to be

perpendicular to each other.

The boundary of a building is first extracted and recorded as an ordered sequence of

points. Several commercial programs provide this function, such as ERDAS Imagine.

Figure 3.1 shows an example of extracted vector building boundaries.

69

Figure 3.1. Building boundary of vector format

3.2.1. Line Simplification

From Figure 3.1, it can be seen that the extracted building boundaries are very noisy.

In order to get regularized boundaries, a necessary step is to extract skeletons of these

buildings. A building skeleton represents the structure and topology of the building. It is a

generalization of the original, noisy building boundary. Thus, some line generalization

algorithms can be applied to the original boundary line segments. One example of such

algorithms is Douglas-Peuker algorithm. However, the Douglas-Peuker algorithm may

generate unexpected results when applied to polygons like building boundaries because it

70

choose critical points as the points with large distance to base lines. In a building

boundary, the starting base line could be any two consecutive points. In addition, it

cannot be applied in the boundary extraction process because it needs all points to be

available before it starts to generalize.

In this research, an algorithm originally proposed by Zhao (2001) is modified to

perform the generalization work. The great advantage of this algorithm is that it can

process points in sequence, which is very suitable for processing boundary points when

they are extracted from raster to vector. Starting from the beginning point, a point can be

determined whether it should be kept or discarded. The idea is illustrated in Figure 3.2. A

pipe (dash line) with diameter of d is used to match line segments in a sequence. This

method is called “sleeve” algorithm.

P1

P2

Figure 3.2. Line simplification using the “sleeve” algorithm

The algorithm can be described as followings:

1. The diameter d of the pipe is determined and given as the input parameter to

the algorithm. The indicates how far a point deviating from a line can be

kept as a critical point;

d

71

2. Starting from the first two points P1 and P2, the direction 0β and length of

the line connecting these two points are calculated;

0l

3. Perpendicular to the current line segment (the line P1P2 at the beginning), the

direction range 0α at current point Pi (the second point P2 at the beginning) is

calculated according to the line segment length and d . The 0α is an angle

range 00 αβ ∆± and 0α∆ is calculated as )2arctan(0l

d ;

4. Next point Pi+1 is connected with the starting point P1 to form a new segment,

and its direction β and length l are calculated. In addition, a new direction

range α is calculated as αβ ∆± with )2arctan( ld=∆α ;

• If the direction β is within the calculated direction range 0α , the current

point Pi is discarded because it is not a critical point. Then a new direction

range at the point Pi+1 is generated as the intersection of the new direction

range α and the current direction range 0α . The new range 0α for

further testing is [max( 00 αβ ∆− , αβ ∆− ), min( 00 αβ ∆+ , αβ ∆+ )]. Go

back to step 3;

• If the new direction β is out of the direction range 0α , the current point Pi

is kept in the generalized line as a critical point. The current point Pi is

taken as the first point of a new line segment and the point Pi+1 is taken as

the second point for next generalization process. Repeat procedure 1 to 3

till the last point.

72

The advantage of the “sleeve” algorithm is that it can process points in a dynamic

manner. It is not necessary to wait until all points are available before processing. This

method is suitable for building boundary generalization because it can generalize the

boundary points when these points were traced. However, one special process should be

applied to building boundary generalization using the “sleeve” algorithm. A building

boundary is a close polygon. The starting point could be any point on a boundary

depending on the scanning process during the conversion from raster boundary to vector

boundary. In addition, the staring point is the ending point in a vector building boundary

polygon.

In order to apply the “sleeve” algorithm in generalizing building boundaries, two

refinements were applied to the original algorithm. The first one is to process the ending

point of a boundary. After the regular “sleeve” algorithm was performed, the starting and

the ending points in the original polygon were kept. These two points are actually the

same point. It is compared with the line formed by the second point and the second to the

last point. If the distance from the point to the line is larger than the distance tolerance d ,

the starting and the ending points are kept. Otherwise, if the point to line distance is

smaller than the distance tolerance, the original starting and ending points will be

discarded. In this case, either the second point is taken as the ending point, or the second

to the last point is taken as the starting point.

The second refinement is to improve the accuracy of generalized line segments. In the

original “sleeve” algorithm, all intermediate, non-critical points were discarded. In

reality, these discarded points also provide useful information. In this research, all

73

intermediate points of a line segment are used to derive parameters of the line using a

least square regression model. In this case, consecutive lines will intersect with each

other to generate critical points. Most likely, the generated critical points will not be in

the original data set. This intersection method will provide a more accurate result because

it takes much more information into account for calculating a line’s parameters. Figure

3.3 shows the refinement of the original algorithm. The bold line is the calculated one

from all intermediate points using a least-square regression model.

Figure 3.3. Line simplification using the refined “sleeve” algorithm

The parameter to be set up in this algorithm is the distance threshold, which can be

adjusted according to specific situations. Figure 3.4 depicts a simplification result using

this refined “sleeve” algorithm. The original boundary is the one extracted from a

building region in raster format.

74

Figure 3.4. Line simplification: The graph on the left shows the original line

segments. The squares are the remaining points after the simplification. The right one shows the simplified line segment

3.2.2. Boundary Regularization

The purpose of regularization is to adjust the boundary of a building to have a

rectangular shape. All line segments are either perpendicular to, or parallel to each other.

To eliminate noise effects and to get parallel or perpendicular boundary segments, some

researchers used the Minimum Description Length (MDL) method to regularize ragged

building boundaries (Weidner and Förstner, 1995). MDL is a statistical method, and it is

very expensive in computation due to iterative comparison. Mayer (2001) used a

constrained active contour method in optimizing building boundaries. However, the

constrained active contour needs a good initial approximation of a boundary. In addition,

it cannot merge or eliminate small line segments. Besides, the image used in the study

should provide sharp boundaries, which is difficult in LIDAR DSM. In this study, a new

regularization method was developed to get rectangular-shape building boundaries.

75

After boundary simplification is finished using the refined “sleeve” algorithm,

fragmented line segments and redundant points were eliminated. Instead, only the

skeleton of a boundary was kept. Then the regularization algorithm was conducted on

simplified building boundaries from the “sleeve” algorithm. Figure 3.5 illustrates the

process of this algorithm in a flow chart.

Regularized building boundary

Intersect perpendicular lines

Merge consecutive parallel segments

Perpendicular or parallel segments

Assigning class value to each line

Weighted averaging

Class B Class A

Weighted averaging

Clustering on azimuths

Line Segments

Figure 3.5. The process of boundary regularization

76

The proposed algorithm is comprised of a cluster process and an adjustment process.

The cluster method is similar to k-means method. It is described in the first item of the

following paragraph. The whole algorithm is described as followings:

• The azimuth of each line segment of a building boundary was calculated, and all

segments were clustered into two classes according to their azimuths. The

criterion used here was inter-class distances. A segment was classified into class

A if the difference between its azimuth and the averaged azimuth of class A was

smaller than the difference between its azimuth and the averaged azimuth of class

B. The result of this step is two groups of line segments, and they are supposed to

be perpendicular to each other;

• For each segment class, the weighted average of line azimuths was calculated.

The weight used for each line segment is its length:

∑∑=

i

ii

lazimuthlazimuth * , li is the length of ith segment in one class. This

matches the observation that a longer line segment has higher azimuth accuracy

than a shorter line segment provided that the position accuracy of end points is the

same. The output of this step is two azimuths that are supposed to be

perpendicular to each other;

• A weighted adjustment using the Gauss-Markov model was carried out to make

the azimuths of these two classes perpendicular. Again, the weight was calculated

as the total length of all segments in each class. After the adjustment, these two

azimuths assigned to two classes are perpendicular;

77

• Each segment is adjusted in such a way that it is rotated around its central point

until it has the azimuth of its class. Up to this point, all line segments of a building

under investigation are parallel or perpendicular to each other;

• Adjacent parallel segments are merged to form one new line segment. The new

line passes a calculated central point, which is a weighted average of central

points of the merged adjacent segments. The weights are lengths of the merged

segments. For example, the x coordinate from two merged adjacent parallel lines

can be calculated as )()(

21

2211ll

xlxlx +∗+∗= . For segments parallel to each

other but not adjacent, if the distance between them is smaller than a pre-defined

threshold, they were adjusted in a similar way to pass through the same line. But

they are not merged, so they are still two different line segments. The threshold is

an experimentally determined value, 2 meters was employed in this study, which

corresponds to two pixels in grid DTM;

• Regularized building boundaries were calculated by intersecting adjacent line

segments.

Figure 3.6 shows an instance of building boundary regularization using the method

described above. The advantage of this method is that it takes information from all lines

into account by calculating and adjusting azimuths using line segment lengths as weights.

This method agrees with the fact that a longer segment has better azimuth accuracy given

the end point position accuracy. Figure 3.7 presents regularized building boundaries

overlapping with the LIDAR DSM.

78

Figure 3.6. Results of building boundary regularization: the boundary before regularization (left); the boundary after regularization (central); and the

comparison of the regularized boundary with the original boundary (right)

Figure 3.7. Regularized building boundaries overlaid on the

grid DSM generated from the LIDAR point data

79

Compared with the MDL method, the proposed regularization method is efficient in

computation. It follows a schema from top to bottom. It finds out the general picture of a

building boundary; then it refines details of the boundary. In this way, it achieves a

globally optimized result.

3.3. Building Model Reconstruction

After the boundary of a building is extracted, a 3D model can be reconstructed from

LIDAR points falling within the building footprint. The process of 3D building model

reconstruction is to derive 3D building CAD models. Generally, there are two basic

approaches to building model reconstruction. One is model based, and the other is data

based. The former one has a database of known building models. A known building

model has available building structures with fixed topology. Thus, the parameter needed

to be calculated is its geometry. This type of method can only work with buildings that

match the models in the model database.

The data-driven approach usually assumes that a building is a polyhedral model,

which is comprised of a group of connected planar surfaces. The tasks are to derive the

topology and to intersect the planar roofs of a building to calculate model parameters.

This approach is flexible in working with different types of building models.

Theoretically, it can reconstruct all kinds of planar-roof buildings because it doesn’t

require prior knowledge about a building’s structure. Some researchers combined the

data-driven method and the model-driven method to take advantages of both approaches.

This combined method is referred as the CSG (Constructive Solid Geometry) method. In

80

the CSG method, a complex building model is decomposed into small building

primitives. These building primitives are stored in a primitive database. Each

decomposed component of a building model is matched with one primitive in the

database; and its geometric parameters are calculated. In the CSG approach, only a

limited number of building primitives are stored in the database, but a large number of

building models can be reconstructed by composing a group of such primitives. However,

the decomposition is usually very tricky.

The algorithm proposed in this research is a data-driven approach instead of a model-

driven approach, which means no a priori knowledge about a specific building model is

required. The assumption made here is that a building is a polyhedral model, which is

comprised of planar surfaces. This approach can theoretically handle most buildings

because most buildings are encompassed by planar surfaces in the real world. Each

surface in such a model is a model primitive. The mission for a building reconstruction

thus becomes detecting and reconstructing building roof primitives. Vertical walls can be

derived from building boundaries. Some researchers have presented their works on

building reconstruction using LIDAR data.

Maas and Vosselman (1999) proposed two algorithms to reconstruct 3D building

models using model-driven and data-driven approaches. They applied invariant moments

to reconstruct simple rectangular shaped buildings using model-driven approach. The

invariant moments cannot be applied to complex buildings. For generic data-driven

model reconstruction, they applied Hough-Transform to detect planar roof surfaces.

Points were organized into a TIN model. One triangle is a plane in 3D; and its plane

81

parameters were used to vote in the Hough parameter space. Vosselman (1999) also used

Hough-Transform to detect planar roofs. However, there are several drawbacks using

Hough-Transform in roof detection: 1) the computation is expensive; 2) the interval in

the parameter space is difficult to determine. A smaller interval will generate a higher

accuracy result but the computation is more expensive; a larger interval decreases the

computation but it generates a lower accuracy result; 3) the peak points in the parameter

space are difficult to detect; and further process is necessary for peak point detection. For

example, the entry with the second largest vote in the parameter space usually does not

represent a second roof of a building. It usually comes from the points contributing to the

first roof. Some algorithms can be applied to detect local peaks in order to correctly

detect roofs. However, this will increase computation load. Another option is to apply the

Hough-Transform algorithm iteratively. Each time only one roof will be detected. Points

contributing to the detected roof(s) will be masked out of the next iteration. Only

remaining points are used to perform Hough-Transform in a next iteration. However,

each of the alternatives will dramatically increase computation load. Even though high

dimensionality Hough-Transform can be implemented on the combination of one-

dimension Hough-Transforms, the right solution out of the combination is still

computation intensive if there are a great number of building LIDAR points, which is

usually the case.

A cluster algorithm, i.e., the k-means algorithm, can also be utilized to detect

buildings roofs based on surface normal data. However, a small horizontal space in high

density LIDAR data will cause its normal data to be very sensitive to noise (Maas and

82

Vosselman, 1999). Thus, a robust filter should be developed to decrease noise effects in

order to use normal data for roof detection.

Some studies were carried out using a region-growth method. Rottensteiner and

Briese (2002) detected seed regions for planar roofs by counting the percentage of “point-

like” points, which were classified using a polymorphic feature extraction based on the

Förstner operator. These seed regions were then compared with adjacent points to grow.

The topology of roofs was generated using a raster Voronoi Diagram. Adjacent co-planar

roof regions were then grouped together; and the roof topology was updated. From the

topology, a building model can be reconstructed by intersecting adjacent roofs.

To avoid direct detection of roofs from LIDAR data, building ground plans can be

utilized to decompose a building into simple building primitives. Such primitives can be

matched with LIDAR points; and their parameters can be calculated from a least square

regression. Finally, a CSG building model is constructed out of the primitives

(Vosselman and Dijkman, 2001; Haala, et al., 1998). Haala, et al. (1998) also used the

relationship between gradient direction and direction of ground plan segments to detect

roofs. This is based on the observation that the normal of a roof is perpendicular to the

bounding line segments of a building. However, the decomposition of the ground plan

could be very tricky sometimes. It may generate primitives that don’t match the models

in a primitive database. As a consequence, wrong models will be generated. In some

cases, the decomposition could generate too many small primitives although these small

primitives are in fact from the same roof.

83

3.3.1. Roof Detection and Reconstruction

In this study, a polyhedral model will be used to reconstruct buildings with planar

roofs. To reconstruct a building model, the roofs of a building should be detected first

and their parameters will then be calculated from LIDAR points falling onto it. A roof is

different from another in forms of slope, aspect, location, and height. The height

information is used to differentiate adjacent roofs with the same or similar slopes and

aspects. The location information is expressed in form of adjacency. It is obvious that two

roofs not adjacent to each other cannot be merged together. The most important

information is slope and aspect. For continuous roofs, they are adjacent to each other,

and their heights at the intersection boundaries are the same. Thus, slope and aspect are

the only information that can be used to separate these surfaces.

In this research, surface normal data will be used instead of slope and aspect. The

reason is that aspect has a circular representation and it needs a special and careful

process. For example, the aspect 360 and the aspect 1 are close enough to be grouped

together. However, the mathematical difference, 359, will separate them into two distinct

groups. Figure 3.8 shows slope, aspect, and normal data. It can be seen from the figure

that a roof facing north (area 1 in Figure 3.8(b)) has heterogeneous values in the aspect

data while they are more homogeneous in the normal data.

84

1

cba

Figure 3.8. Slope (a), Aspect (b), and normal (c) derived from DSM

As addressed above, the normal of a surface is sensitive to errors in height data,

especially when the LIDAR data has a high density. A higher density indicates a finer

resolution. The same vertical deviation causes larger divergence in higher density data.

See Figure 3.9. The level roof of building 1 has large normal divergence. In this research,

a method has been developed to overcome this disadvantage.

11

Figure 3.9. Normal divergence in level surfaces

Statistically, normal divergence caused by random errors can be averaged out if a

group of neighboring points belonging to the same surface are used in calculating the

85

normal. Generally, the more points used for calculating normal at each point, the more

consistent the calculated normal is. To calculate normal, a regression plane is used to fit

neighboring points of a point under study. The normal of the fitting plane is taken as the

normal at this point. Although regression calculation can decrease the divergence of

normal data, it is still necessary to be smoothed out for clustering. In this study, a mode

seeking algorithm will be used. This algorithm is the mean-shift algorithm.

3.3.1.1. Mean-shift

Mean-shift is an algorithm for nonparametric density gradient estimation using a

kernel. A mode means a local density maximum. It was first proposed by Fukunaga

and Hostetler (1975) to calculate density gradient. Cheng (1995) further investigated

this algorithm. He proved the convergence of the algorithm and applied it to

clustering. Comaniciu and Meer (2002) further proved its convergence and provided

the sufficient conditions for a mean-shift algorithm to converge. They proved that the

algorithm would converge as long as the kernel applied has a convex and

monotonically decreasing profile. In actual implementation, they argue that

convergence can be achieved if a uniform kernel is applied to estimating density.

They applied the mean-shift algorithm to filtering and segmenting gray scale and

color images.

Given a set of n data points xi (i=1, …., n) in a d-dimensional space Rd, the

multivariate kernel density with a kernel K(x) and a bandwidth matrix H (defined in

equation 3-3) is calculated as

86

∑=

−=n

iiH xxK

nxf

1)(1)( (3-1)

where

)()( 21

21

xHKHxK H = (3-2)

=

2

2

21

0...0............0..00...0

d

i

h

hh

H or (3-3) IhH 2=

The bandwidth matrix H defines how to scale each component of the d-

dimensional variant.

A d-variant kernel K(x) should satisfy

∫=

=

d

d

R

R

dxxxK

dxxK

0)(

1)(

∞∞<

=

0)(

)()(

drrk

xkcxK d

(3-4)

In equation 3-4, k(||x||) is the profile of kernel K(x) and cd is a constant. It is

nonnegative, non-increasing, and piecewise continuous. It relates the kernel to a

function of the 2nd norm of a d-dimensional vector x. Thus, the density gradient can

be estimated as

87

}))((

))((}{))(({

}))((

))((}{))(({1

))(()(1

)())((11)(

1

21'

1

21'

1

21'

1

21'

1

21'

1

21'

1

21'

1

21'

xHxxk

HxxkxHxxk

Hnc

xHxxK

HxxKxHxxK

Hn

HxxKxxHn

xxHxxKHn

xf

n

ii

n

iiin

ii

d

n

ii

n

iiin

ii

n

iii

i

n

ii

−−

−−−=

−−

−−−=

−−=

−−=∆

∑∑

∑∑

=

=

=

=

=

=

=

=

(3-5)

The mean shift refers to the second term in equation 3-5.

xHxxk

Hxxkx

xHxxK

HxxKxxm

n

ii

n

iii

n

ii

n

iii

−−

−=

−−

−=

=

=

=

=

1

21'

1

21'

1

21'

1

21'

))((

))((

))((

))(()(

(3-6)

A function g(x) is defined as g(x)=-k’(x). Consequently, the kernel G(x) is defined

as

)()( xgcxG d= (3-7)

The kernel K(x) is called the shadow of the kernel G(x) (Cheng, 1995; Comaniciu

and Meer, 2002). Thus, the mean shift can be represented as

88

xHxxg

Hxxgxxm n

ii

n

iii

−−

−=

=

=

1

21

1

21

))((

))(()( (3-8)

The shadow of a Gaussian Kernel is also a Gaussian Kernel. The Uniform Kernel

is the shadow of the Epanechnikov Kernel, which is defined as

>≤−

=101)1(

)(xifxifxC

xE d (3-9)

and the Uniform Kernel is defined as

>≤

=101

)(xifxifC

xF d (3-10)

In equation 3-9 and 3-10, the Cd is a constant to normalize the function so that the

integral of the function equals 1. In this study, the mean shift will be used to perform

filtering and clustering of DSM based on surface normal data. The Uniform Kernel

will be used.

In order to apply the mean-shift method to clustering, a feature space need to be

built up first. The advantage of the mean-shift method is that the feature space can be

built simultaneously from spatial information (x, y, and z) and attribute information

(gray values, color information, and texture measurements). Another filter that works

on both spatial domain and attribute domain simultaneously is the bilateral filtering

method (Tomasi and Manduchi, 1998). The difference between the mean-shift

method and the bilateral filtering method is that the former uses a dynamic searching

window while the later uses a static one. Another advantage of the mean-shift method

89

is that it is a nonparametric method, which means it doesn’t have embedded

assumptions. The parameter that a user need specify is H. In actual implementation, H

can be selected as a diagonal matrix of (h12, h1

2, h22, …, h2

2). Here, h1 is the

bandwidth for spatial domain while h2 is the bandwidth for attribute domain.

If the mean-shift method is applied to a gray scale image, the feature space

without spatial information is a histogram of the gray image. For a color or a multi-

spectral image, the feature space without spatial information can be treated as an

extended histogram. Supposing there is a histogram of a gray image (the density

distribution of gray values), at each position, the mean-shift window will move to the

local density maximum as illustrated in Figure 3.10. The convergence of a moving

window actually indicates a cluster of the gray values. There are three clusters in the

example shown in Figure 3.10.

Pixel number

Gray value

Figure 3.10. Mean-shift on one dimension domain

90

As addressed before, the mean-shift algorithm can be applied to spatial domain

and attribute domain jointly. In this case, the bandwidth of spatial domain and the

bandwidth of attribute domain should be selected individually. However, after

normalization, these two domains can be joined together to perform filtering. Figure

3.11 and Figure 3.12 illustrate a feature space before and after a filtering applied to a

gray value image using the mean-shift algorithm. X and Y axes are spatial

coordinates; and the gray axis is image pixel value. It can be seen that the feature

space has a plateau-like shape. After the filtering, the plateau shaped surface is

smoothed out (Figure 3.12).

The general procedure to perform the mean-shift filtering can be summarized as

the followings:

• User inputs of the bandwidths hs for spatial domain and ha for attribute

domain. Build up the feature space F(X|X=(x, y, a1, …, an));

• For each data in the data set (for each pixel in the image), initialize Y=Xi;

• Calculate the mean shift using equation 3-7. The kernel is a uniform kernel

(equation 3-10). Update Y=Y+m(x). Repeat until m(x) equals 0;

• Output the result Zi=(Xi(x, y), Y(a1, …, an)) . This assigns the spatial

information of the initial position and the attribute information of the

convergent position.

91

Figure 3.11. The feature space before applying the mean-shift filtering

Figure 3.12. Feature space after the mean-shift filtering is applied

92

Some methods can be employed to speed up the computation of the mean-shift

algorithm. The most expensive computation part is to search neighboring points,

which fall within bandwidths and are used to calculate the mean shift. Some

researchers proposed certain computation optimization methods in calculating the

mean shift (Georgescu, et al., 2003). In this research, the searching of neighboring

points is conducted in the spatial domain first, and then in the attribute domain. The

reason is that pixels’ coordinates provide the spatial topology of an image. Thus, the

points neighboring in spatial domain can be retrieved quickly.

3.3.1.2. Roof Reconstruction

The roofs of a building are detected based on the mean-shift algorithm as

described above. The data used is surface normal data derived from a LIDAR DSM.

As addressed before, the divergence in surface normal data introduced by errors is

largely due to small point spacing. Neighboring points are used to derive a fitting

plane; and the plane’s normal was assigned to the point under study. In this study, a 5

by 5 window is used to fit a DSM at each point as a plane. Figure 3.13 displays

calculated normal from a DSM. The one on the left is calculated from a 3 by 3 fitting

window, while the one on the right is calculated from a 5 by 5 window. It is obvious

that the normal calculated from a 5 by 5 window is much more consistent that the one

calculated form the 3 by 3 window (see region 1 in Figure 3.13).

93

11

Figure 3.13. Normals calculated from the DSM using different fitting windows: a 3 by 3 window (left) and a 5 by 5 window (right)

The calculated normal is clustered using the mean-shift algorithm. The normal is

filtered first by assigning the value of a density mode to pixels that converge to it.

One mode indicates one cluster in the data. All data points belonging to the same

mode are classified into one cluster. In actual implementation, points belonging to

the same mode do not have the exactly same value due to the round up in conversion

from float/double format data to integer format data. Thus, after filtering, all points

with a difference smaller than the bandwidths employed in the filtering process are

grouped together as one cluster and are labeled. The mean-shift algorithm was carried

out on the joint feature space, (x, y, nx, ny, nz). The (x, y) components are spatial

coordinates; the (nx, ny, nz) components are used to represent the red, green, and blue

channels. Figure 3.14 shows the normal data before and after the filtering. It can be

seen that the normal is smoothed out. Figure 3.15 illustrates the X component of the

normal before and after the filtering for building A outlined in Figure 3.14.

94

A

Figure 3.14. The normal data before (left) and after

(right) the filtering using the mean-shift method

95

Figure 3.15. 3D visualization of the X normal component before (left) and after (right) the filtering for the building A outlined in Figure 3.14

96

After filtering, the data can be classified into clusters by grouping similar points.

In this study, a supervised classification was performed on the normal data through a

commercial software. Small segments were merged to their adjacent large segments,

which share the longest boundary with them. This merge was achieved in vector

format. The segmented image was converted into Arc coverage using ERDAS

Imagine software. Polygons with a small area were then identified and merged to

their neighboring segments. The elimination of small polygons was conducted in

ArcGIS software. Figure 3.16 presents the classification result and roofs converted to

vector format. In a building reconstruction system, roofs can be extracted using the

detected building boundaries as described in the previous chapter.

Figure 3.16. Roof clustering and extraction

After the roofs of a building were extracted, their plane parameters can be

calculated from 3D LIDAR points. The boundary of a roof was used to extract

LIDAR points falling within its footprint using a point-in-polygon algorithm. The

algorithm applied here is the well-known Ray-Crossing method. Figure 3.17

97

illustrates its concept. The algorithm can be summarized as the following: 1) Given a

point and a polygon, a half horizontal line is draw from the point to the positive x-

axis direction; 2) This line is analyzed with the polygon’s boundary segments. The

number of segments that intersect with the half horizontal line is counted; 3) If the

number is an even number, then the point under analysis is outside the polygon; if the

intersection number is an odd number, that means the point under test is inside the

polygon. Points within a roof boundary are retrieved to calculate the roof’s plane

parameters using the Gauss-Markov model. The 3D plane is represented as

; and the parameters to be calculated are a, b, and c. 1** =++∗ zcYbXa

Figure 3.17. Point-in-polygon analysis

Due to roof detection and extraction biases, some points not belonging to a roof

could be extracted by the roof’s boundary. In order to improve the accuracy of a

derived roof plane, the plane fitting error δ , MSE (Mean Square Error), was

calculated; and those points with a residual larger than δ2 were detected and

rejected. Those remaining points were used to calculate a new set of parameters for a

roof plane. This refinement process can be repeated until no larger number of rejected

98

points can be detected. Usually, the repetition number is not large. In this research,

the refinement was performed twice.

3.3.2. Model Reconstruction

To reconstruct a 3D building model, the topology (the adjacency relationship) of its

roofs and vertical walls need to be built after 3D roof surfaces were calculated. This

produces an adjacency graph depicting the inter-relationship of those 3D surfaces. With

the adjacency graph, adjacent planes can be used to derive building corners.

Consequently, a 3D building model can be reconstructed.

The topology among those detected roofs can be derived immediately by checking if

they share a common boundary line segment. This can be done after roof polygons were

extracted or converted from a raster thematic image. To build an adjacency graph, some

researchers performed the raster Voronoi algorithm using the Chamfer mask (Ameri and

Fritsch, 2000). In their study, the detected roof planes are not complete. Roofs do not

touch each other due to the data and algorithm they employed. In this study, the detected

roof planes are complete. They share common boundaries. The topology of these detected

roofs was retrieved using a commercial software package, the ArcGIS software. After a

raster thematic image was generated, it was converted into Arc Polygon Coverage. The

topology was read out through an ArcView script. Obviously, the topology was built in a

2D space.

Vertical walls are represented as line segments in a 2D horizontal space, which forms

the boundary of a building. During roof detection, a roof’s bounding regions were lost

due to a large plane-fitting window. Thus, a building’s boundary (vertical walls) is

99

separated from its roof boundaries. Their adjacency cannot be constructed by directly

checking if they share common line segments. A new method is proposed in this study to

construct the topology between a building’s boundary (vertical walls) and its roofs.

The disconnection between a building’s boundary and its roof boundaries is mainly

caused by the plane fitting in normal calculation. Thus, the general distance of such a

disconnection is proportional to the size of a fitting window. If the building boundary is

shrunk or the roof boundaries are expanded, the building’s boundary will intersect or

touch its roof boundaries. In this research, a building’s boundary will be shrunk to

determine which roof polygon it is adjacent to. See Figure 3.18. A building boundary line

segment is expanded to its perpendicular direction to form a rectangle. If the rectangle

touches or intersects a boundary segment of a roof, the boundary is adjacent to the roof.

The width of such a rectangle is determined by the plane-fitting window size. It can be

double the size of the fitting window. In reality, the width can be controlled as a

parameter input by users. If the rectangle intersects or touches one roof polygon, the

adjacency is built between these two surfaces (the vertical wall and the roof).

2l

B

A

Figure 3.18. Topology of roofs and the building boundary

100

Based on the method described above, the topology of a building can be constructed.

Supposing the roofs (1-4) and the vertical walls (5-12) are numbered as shown in Figure

3.19, the topology can be represented in an adjacency graph or a correlation table, which

is represented as an adjacency matrix. Table 3.1 shows the adjacency matrix for the

building example in Figure 3.19.

1211

10

98

7

6

5

3 4

2

1

Figure 3.19. Numbering roofs and vertical walls

Table 3.1: Adjacency matrix for roofs and vertical walls

1111211111

1111101119

111811117

111611115

1111411113

1111111211111

121110987654321

101

Table 3.1 demonstrates an adjacency matrix. One entry of the matrix indicates the

relationship between two surfaces. The value of 1 indicates that the two surfaces are

adjacent to each other, while the value of 0 indicates that these two surfaces are not

connected. The matrix is symmetric. The diagonal elements are set to 0. Thus, adjacent

surfaces to a surface under study can be immediately retrieved by checking the row or

column entry values.

Each surface has an attribute indicating whether it is a roof surface or a bounding

vertical wall in an actual implementation. The bounding vertical walls of a building are

numbered in sequence. In this manner, the adjacency relationship among vertical walls

can be constructed from their numbers. Each vertical wall can only be adjacent to another

two vertical walls.

To reconstruct a 3D building model, building corners will be reconstructed first

because they are the primitives of a CAD model. In this research, a building’s corners are

divided into two groups; and they will be calculated using different approaches. The first

group is the corners formed by two vertical walls; the second group is formed by at least

two roof surfaces. This division is not an exclusive division. That means there is an

overlap between these two groups. After these two groups of corners were calculated

independently, all corners will be checked to eliminate any duplication. The method for

reconstructing corners formed by two vertical walls is described as followings:

1. Given two adjacent vertical walls a and b, retrieve their common adjacent roof

surface from the surface adjacency matrix. If they have common roof

surfaces, the common roof surfaces will generate a corner with these two

102

vertical walls. For example, in Figure 3.19, vertical walls 5 and 12 will find

the common roof 2. Record the belonging-to relationship between the

calculated corner and its calculating surfaces in a point-surface matrix;

2. If no common roof surfaces were found (see vertical walls 11 and 12 in

Figure 3.19), all roofs adjacent to either a or b are retrieved. If exactly two

roofs were retrieved, these four surfaces will be used to generate one corner

(see vertical walls 11, 12 and roofs 2, 3). Record the belonging-to relationship

between the calculated corner and its calculating surfaces in a point-surface

matrix;

3. If more than two roofs were found adjacent to a or to b, each of the retrieved

roofs generates a corner with these two vertical walls. The actual corner will

have at least two duplications among those corners. These duplications of the

actual corner will be close to each other while other corners will be far away

from the duplications. In this study, a threshold of 1 meter in distance was

used to test if two calculated corners are close to each other. If their distance is

smaller than the threshold, they will be taken as duplications of the actual

corner. The average of these two calculated corners will be taken as the actual

corner. Record the belonging-to relationship between the calculated corner

and its calculating surfaces in a point-surface matrix;

4. Repeat above calculations for all consecutive vertical walls.

The building corners in the second group are formed by at least two roof surfaces.

Similarly, the belonging-to relationship between the calculated corner and its calculating

103

surfaces is also recorded in the point-surface matrix when a corner is constructed. The

method can be summarized as:

1. Given a roof surface S, find out all its adjacent roof surfaces {S1, …, Sn};

2. For each T in {S1, …, Sn}, find out all the surfaces adjacent to both T and S

including roofs and vertical walls {T1, …, Tm}. For example, in Figure 3.19,

vertical walls 5 and 7 are adjacent to both roofs 1 and 2. They will be

retrieved as the set {T1, T2};

3. If {T1, …, Tm} is empty, go to step 5. Otherwise, for each I in {T1, …, Tm}, a

corner is calculated from {I, S, T}. Record the belonging-to relationship in the

point-surface matrix;

4. After all roofs S are processed using the steps from 1 to 3, corners in the

second group are compared. If two corners share at least three common roofs,

they are merged together. The point-surface matrix will be changed

accordingly. The reason to perform this operation is that the process is

redundant. For example, starting from roof 1 in Figure 3.19, roof 2 will be

retrieved and a corner will be generated with vertical wall 5. When starting

from roof 2, roof 1 will be retrieved to form a new corner with vertical wall 5.

In reality, these two corners are the same actual corner;

5. Repeat the process for each roof surface.

104

After all corners in the first group and the second group were generated, they are all

compared to check corner redundancy. If two corners are shorter than a threshold of 1

meter, they will be merged into one corner. The point-surface matrix will be changed

accordingly. Figure 3.20 shows an example of reconstructed building corners. The

Roman numerals represent corner number while the Arabic numerals represent surface

number. Correspondingly, Table 3.2 shows its point-surface matrix. The example matrix

has equal row and column numbers.

6

4

3 1 2 5

IIIIII

VI V IV

Figure 3.20. Reconstructed building corners

Table 3.2. Point-surface matrix for the example in Figure 3.20

I

1116115

1114113

1111211111

VIVIVIIIII

105

Up to this point, 3D corners of a building are constructed. A 3D building model can

be reconstructed through its 3D surfaces. A surface can be described by its bounding

corners, which are retrieved from the point-surface matrix. However, these bounding

points are not ordered to their sequence on surface boundary. A convex hull can be

calculated from corners as a surface for a convex roof polygon. But it does not work for a

concave polygon. For a vertical wall, the lower points can be derived by intersecting the

ground DTM. Before the lower points are derived, the upper points that are retrieved

from a point-surface matrix are ordered first. By doing so, points can be ordered in

consistency with the boundary. The method to order points on vertical walls can be

summarized as:

1. Giving a vertical wall surface, retrieve all its upper points from the point-

surface matrix, P{p1, …, pn}. These points are the upper points;

2. If n equals 2, take the order P {p1, p2}. Go to step 5;

3. If n is larger than 2, the horizontal coordinates will be used to order these

points. These points are projected to a horizontal plane and a 2D line is

formed. This line can be derived from the first two points, (x1, y1) and (x2, y2).

The line can be represented in the form shown in equation 3-11. Each point on

the line can be represented by a unique value t;

∆+=∆+=

ytyyxtxx

**

1

1 where

−=∆−=∆

12

12

yyyxxx

),( +∞−∞∈t (3-11)

4. Calculate t for every point and order the points based on t. This order applies

to the 3D points, P′{p1′, …, pn′};

106

5. The lower points are ordered in the reverse sequence as P′ with a Z value

taken from the DTM. The points are joined together and a vertical wall

polygon is formed correctly, { p1′, …, pn′, pn′, …, p1′};

6. Taking a new vertical wall, and go to step 1 till all vertical walls are ordered.

For a roof surface, the projection to a horizontal plane is a 2D polygon instead of a

1D line segment. In this research, a method is developed to order the points along its

boundary. This method is based on the assumption that an originally detected roof

polygon keeps the shape of the roof’s actual polygon. It can be illustrated as in Figure

3.21. The rectangular polygon drawn in dash line is a roof to be reconstructed; and the

solid polygon is the originally detected roof polygon. The circle centers are the corners

calculated in the reconstruction process. Each calculated corner will find a closest point

in the original polygon. The closest points are shown as bold dots in Figure 3.21.

Figure 3.21. Ordering roof polygon points

from the original roof polygon

107

The method for ordering a roof polygon requires its original roof polygon detected

from normal data, which has its points ordered in the correct sequence. The method can

be described as followings:

1. Giving a roof polygon, retrieve all its points from the point-surface matrix,

P{p1, …, pn};

2. Retrieve the original roof polygon detected from normal data, which is

represented as a sequence of points in order. Number its points based on their

orders in the sequence, P′{p1′, …, pm′};

3. For each corner point pi in P, find the closest point pj′ in P′. Assign the

number j to the point pi as an attribute;

4. Order the corner points in P according to their attribute number js. This order

is the sequence for these calculated corners along a roof boundary. Thus, a

roof polygon is reconstructed.

Figure 3.22 presents an example of topology construction between vertical walls and

roof surfaces. The red segments are expanded segments parallel to vertical walls, which

are represented as blue segments. The green polygons are roof boundaries. After the

topology was built, coordinates of building corners were calculated and plotted as star

marks in Figure 3.22. The reconstructed 3D model of the same building from Figure 3.22

is shown in Figure 3.23.

108

109

Figure 3.22. The reconstruction of surface topology

Figure 3.23. An example of reconstructed 3D building models

CHAPTER 4

BUILDING MODEL REFINEMENT

LIDAR point data are random points with high vertical accuracy; however, these

points have lower horizontal accuracy mainly caused by large laser footprints on the

ground and navigation errors. LIDAR points are sample points of the earth’s surface.

This determines that LIDAR data cannot capture sharp linear features. Thus, a building

model reconstructed from LIDAR will have low geometrical accuracy, especially the

bounding boundary. This means that there is a great potential to improve a building’s

geometry, specifically its boundary, through data or information integration. A linear

feature derived from intersection of two roof surfaces will have a high accuracy because

the roofs derived from LIDAR points have high accuracy. Instead of being refined, these

features can be used as control features in data registration.

In this research, aerial photographs are employed to integrate with LIDAR data for

building model refinement. Aerial photographs are mainly used to extract linear features,

specifically building boundary edges. A method will be developed to integrate these

edges with reconstructed building models from LIDAR data to improve the accuracy of

the models. The method for data registration will be described first; then data integration

110

will be illustrated, followed by a description of building model refinement. Figure 4.1

shows the flowchart of the refinement process.

No

Keep the original linesKeep the updated lines

Yes

Both 2D lines are updated

NoNo

Yes

Calculate new line parameters

Found edge pixels for refinement

Yes

Calculate new line parameters

Found edge pixels for refinement

Edge pixel search for refinement

Edge pixel search for refinement

Edge detection Edge detection

2D polygon in image 2 2D polygon in image 1

Projection

3D building roof polygon

Figure 4.1. Simultaneously updating 2D lines in both stereo images

111

4.1. Co-Registration of LIDAR and Image Data

Integration of data from different sensors or platforms can be performed in different

levels: data level, feature level, and object level [Csathó, 1999]. In order to perform

integration, data from different sources should be registered under the same framework, a

common coordinate system. In order to integrate aerial photograph and LIDAR data for

refinement, these two data sets should be co-registered within one coordinate system.

Some research studies have been conducted to use linear features in image resection.

Habib (2002) used 2D ground linear features to estimate photograph exterior orientation

parameters (EOP) using a Modified Iterated Hough Transform methodology. Stamos

(2000) registered 3D and 2D linear features derived from LIDAR data and image data

respectively with known image EOP. Straight line-segments perform as control features

to derive EOP. Figure 4.2 shows the general procedure for co-registration of LIDAR data

and aerial photographs.

LIDAR data points are already in a 3D coordinate system. So it is convenient to take

the coordinate system of the LIDAR data as the common framework. Thus, a aerial

photograph will be registered to the LIDAR data system. The registration is the

calculation of interior and exterior orientation parameters for an aerial image. The interior

orientation is a transformation from a measuring system to the image system originated at

its calibrated principle point. The exterior orientation is to derive the EOP, which are the

position of the exposure center (X0, Y0, Z0) and camera pose (ω, φ, κ). The EOP of an

aerial photograph can be calculated from image resection using ground control points like

GPS control points or points from other sources with high accuracy.

112

Interior Orientation

ExteriorOrientationParameters

Least-Square to solve unknowns

Coplanarity Conditions

Conjugate LineSegmentsMatching

2D LineSegments

2D EdgeExtraction

Aerial Photo

3D LineSegments

AdjacentPlanar Surface

SurfaceIntersection

Planar surfacedetection

LIDAR Data

Figure 4.2. Co-registration of LIDAR and aerial images

For data integration, internal consistence of integrated data is very important. To have

a high consistence, it is a wise choice to select points or features from LIDAR data as

known control features for image resection. As mentioned before, LIDAR data cannot

capture either sharp linear or point features directly. Consequently, control features

113

cannot be measured directly from LIDAR data. However, accurate linear features can be

calculated from LIDAR data. Planar roofs have high accuracy parameters because a large

amount of points were applied to derive their parameters. Linear features derived from

intersection of adjacent planar roofs can be employed as control features. Compared with

linear features, corners formed by three intersecting roofs are less common in a LIDAR

data set. So linear features derived from roof intersection will be utilized to perform

image resection. After image resection, two data sets are registered in a common

coordinate system.

4.1.1. 3D Intersection Lines from LIDAR data and 2D Edges from Photograph

To perform image resection, conjugate line segments should be measured in 3D space

of LIDAR data and 2D space of aerial photographs. 3D line segments from LIDAR data

are calculated from adjacent intersecting planar roofs. This requires that 3D planar roofs

be calculated first.

To calculate 3D roof plane parameters for image resection, points belonging to a

plane are distinguished and extracted. After the points of a plane were extracted, the

plane’s parameters can be calculated using the least square regression method. A 3D line

is obtained by intersecting two adjacent 3D planes; and it will be represented using two

points on the line.

From a 2D image, the conjugate line segment of a 3D line segment in the 3D ground

space is interpreted manually, and its end points are measured. After applying the image

interior orientation transformation, the 2D edge will be ready for image resection

described in the following section.

114

4.1.2. Image Resection Using Linear Features

After corresponding 2D and 3D line segments are extracted, image resection can be

carried out using the so-called co-planarity condition. The advantage of using co-

planarity is that the end points of corresponding line segments are not necessary to be

conjugate points. The only requirement is that a 2D line segment in image space is the

conjugate line of the 3D line segment on the ground. No constraints are put on end points.

The co-planarity condition can be illustrated using Figure 4.3.

v

(x2, y2)

(x1, y1)(X1, Y1, Z1)

(X0, Y0, Z0, ω, φ, κ) b

a

B

AO

(X2, Y2, Z2)

Figure 4.3. Co-planarity of 2D and 3D line segments

In Figure 4.3, O is the expose center of a camera, AB is a 3D ground line segment,

ab is the 2D conjugate line segment in image space, and the vector v is the normal of

the plane formed by the expose center O with the 3D ground line segment AB . Since

AB and ab are conjugate control features, ab should lie on the plane formed by line

AB and the expose center O. In other words, the end points of line ab should be on the

plane. In general, all 5 points, O, a, b, A, and B should be on the same planar surface,

which is determined by a camera’s imaging geometry, a central perspective projection.

115

In order to represent and to use co-planarity conditions in deriving EOP, two vectors

are drawn from the expose center O to points a and b respectively. Thus, the co-planarity

condition of the 5 points is equivalent to the condition that both vectors, Oa and Ob , are

perpendicular to the normal vector v . This guarantees that the line ab lies on the

common plane formed by O, A, and B. Vector v is also perpendicular to vectors OA and

OB . This condition can be written as equation 4-1.

×==•=•

ObOavOBvOAv

00

(4-1)

Each pair of control feature provides two equations, 0=• vOA and 0=• vOB .

Thus, for 6 unknowns of EOP, three pairs of linear control features can solve the

problem. For better accuracy, more than three pairs of control features are needed for

redundant checks. Then, the least-square method is used to minimize the discrepancy

among the conditions.

In order to use the co-planarity condition, all coordinates should be within one

common coordinate system. Here, the 3D coordinate system of the image space is used,

which is originated at the expose center O. Thus, a translation and a rotation will be

performed to transform ground coordinates into the image space system.

Suppose the rotation matrix from ground space to image space is R, the coordinates

of expose center O in the ground space is (X0, Y0, Z0), the coordinates of the end points

of a 2D line segment after interior orientation are (x1, y1) and (x2, y2) respectively, the

116

points’ coordinates of the conjugate 3D line segment in the ground space are (X1, Y1,

Z1) and (X2, Y2, Z2) respectively, and the calibrated camera lens focal length is f, the

co-planarity condition equations can be re-written for EOP calculation.

−+−−−+

=

=

ϕωϕωϕκϕωκωκϕωκωκϕκϕωκωκϕωκωκϕ

cos.coscos.sinsinsin.sin.coscos.sinsin.sin.sincos.cossin.coscos.sin.cossin.sinsin.sin.sinsin.coscos.cos

333231

232221

131211

rrrrrrrrr

R

(4-2)

−=

fyx

Oa 1

1

−=

fyx

Ob 2

2

(4-3)

−+−+−−+−+−−+−+−

=

−−−

=).().().().().().().().().(

.

013301320131

012301220121

011301120111

01

01

01

ZZrYYrXXrZZrYYrXXrZZrYYrXXr

ZZYYXX

ROA (4-4)

−+−+−−+−+−−+−+−

=

−−−

=).().().().().().().().().(

.

0233021320231

022302220221

021302120211

02

02

02

ZZrYYrXXrZZrYYrXXrZZrYYrXXr

ZZYYXX

ROB (4-5)

−−−

=

=×=

1221

21

12

..).().(

yxyxxxfyyf

cba

ObOav (4-6)

117

Thus, the co-planarity conditions can be written as

0)).().().(.()).().().(.(

)).().().(.(

013301320131

012301220121

0113011201111

=−+−+−+−+−+−+

−+−+−=•=

ZZrYYrXXrcZZrYYrXXrb

ZZrYYrXXravOAF

(4-7)

0)).().().(.()).().().(.(

)).().().(.(

023302320231

022302220221

0213021202112

=−+−+−+−+−+−+

−+−+−=•=

ZZrYYrXXrcZZrYYrXXrb

ZZrYYrXXravOBF

(4-8)

The six unknown exterior orientation parameters of a camera are included in

equations 4-7 and 4-8. These two equations can be used to solve the unknowns. However,

the equations are non-linear equations. So linear forms are derived using the Taylor

series, and an iterative approach is necessary to solve these parameters. The deduction is

presented in the following sections.

Suppose the variable in form of a vector is u , then

[ ]κϕω000 ZYXu T=

0).(!1

)()(

...).(!

)(...).(

!1)(

)()(

00

0

00

)(

00

0

≈−′

+≈

+−++−′

+=

uuuF

uF

uun

uFuu

uFuFuF n

n

(4-9)

118

In equation 4-9, function )(uF is written in Taylor series/expansion. The items of

order higher than 1 are omitted because they have small values. They will be treated as

errors. Now non-linear function )(uF is approximated using a linear function. The

Gauss-Markov model can be employed to perform the least-square adjustment to

calculate the parameters from the linear function. Knowing the initial value 0u , the value

of u can be obtained through a standard iterative approach. The value of u is the

unknown EOP to be calculated from image resection. The relation is 00 uuu ∂+= . 0u∂ is

correction to 0u at every iteration. The equation 4-9 can be written in the form

eFFFZZFY

YFX

XFuF +∆

∂∂

+∆∂∂

+∆∂∂

+∆∂∂

+∆∂∂

+∆∂∂

=− κκ

ϕϕ

ωω

......)( 00

00

00

0 (4-10)

To solve the problem, at least 6 independent equations are needed from equation 4-

10. That means at least 3 pairs of conjugate line segments should be measured because

each conjugate line pair contributes two equations. In practice, more than 6 equations are

required to obtain a high accuracy and a robust estimation. The Gauss-Markov model is

used to derive the solution.

1

1

1

111

***)**(ˆ

*

×××

××××

××××

=

+=

nnnnm

T

mnnnnm

T

m

nmmnn

YPAAPA

eAY

ξ

ξ (4-11)

Here,

119

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

⋅⋅⋅⋅⋅⋅∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

=

∂∂∂∂∂∂=

−−=

κϕω

κϕω

κϕωξ

nnnnnn

T

Tn

FFFZF

YF

XF

FFFZF

YF

XF

A

ZYX

uFuFY

000

111

0

1

0

1

0

1

000

010

],,,,,[

])(,...,)([

(4-12)

The coefficients of A can be calculated as in equation 4-13. In equation 4-13,

constants (a, b, c) are the coordinates of vector v . See equation 4-6. The Gauss-Markov

Model can be used to calculate the increments or the corrections

( κϕω ∂∂∂∂∂∂ ,,,,, 000 ZYX ). Then a camera’s EOP can be updated from its initial values.

After several iterations, the EOP can be calculated with required accuracy requirements.

The procedure can be summarized as following steps:

• Knowing the initial values of EOP 0u , matrix A and vector Y are calculated

using equations 4-12 and 4-13;

• Using the Gauss-Markov model to calculate the parameter increments

0u∂ (equation 4-11), and update 0u with 0u∂ by adding 0u∂ to 0u . The

weight matrix is determined by actual measurement accuracy. To simplify the

calculation, matrix P can be assigned as an identity matrix. This means

measurements are equally weighted;

• Check the increments 0u∂ , if 0u∂ is smaller than a pre-defined value, then

stops and 0u is the calculated EOP. Otherwise, go to step 1 and continue.

120

)]cos.sin.cossin.sin).(()cos.sin.sinsin.cos).(()cos.cos).(.[()]sin.sin.coscos.).(sin(

)sin.sin.sincos.).(cos()sin.cos).(.[(

)]sin.cos).(()sin.).(sin.[()]sin.cos.).(cos()sin.cos.sin).(()sin.).(sin.[()]cos.cos.cos).((

)cos.cos.).(sin()cos.sin).(.[(

)]cos.sin).(()cos.cos).(.[()]sin.sin.sincos.).(cos(

)sin.sin.coscos.sin).(.[()]cos.sin.sinsin.).(cos(

)sin.sincos.sin.).(cos.[(

)...(

)...(

)...(

00

00

00

000

000

00

00

0

0

0

0

3323130

3222120

3121110

0

0

0

0

0

0

κϕωκωκϕωκωκϕκϕωκω

κϕωκωκϕκ

ϕωϕωκϕωκϕωκϕκϕω

κϕωκϕϕ

ϕωϕωκϕωκω

κϕωκωκϕωκω

κωκϕωω

+−−+−−−+−−++−+

−−+−−=∂∂

−−+−+−+−−+−+−−+

−+−−=∂∂

−−+−−+−−+

−−−++−+

−−=∂∂

++−=∂∂

++−=∂∂

++−=∂∂

ZZYYXXbZZ

YYXXaFZZYYcZZ

YYXXbZZ

YYXXaF

ZZYYcZZ

YYbZZ

YYaF

rcrbraZF

rcrbraYF

rcrbraXF

u

u

u

u

u

u

(4-13)

To assess the accuracy of derived EOP, a co-variance matrix Σ is used. From such a

covariance matrix, variances of each parameter and covariance among EOP parameters

can be checked. The variance component 0σ and cofactor matrix Q are estimated and

calculated respectively using equation 4-14 and equation 4-15. Suppose the estimation of

0σ is m . Then 0

6.

0 −= ∑

nvv

m ii (4-14)

121

In equation 4-14, n is the number of equations/observations used for deriving the

EOP, is the discrepancy of the objective function iv )(uF . Here, ii uFv )(−= . The

cofactor matrix

1)**( −= APAQ T (4-15)

Then, the covariance matrix of the derived exterior orientation parameters is

Qm .20=Σ (4-16)

4.2. Line Refinement in 2D Image Space

Only a few researchers have reported their work conducted on integration of LIDAR

and imagery data for building reconstruction. Ameri and Fritsch (2000) presented their

work on building reconstruction from planar-roof structure. They detected building seed

regions from a LIDAR DSM based on surface mean curvatures. The seed regions were

then projected to images to perform image segmentation using a region-growing

algorithm. The detected roof regions from images were then projected to the LIDAR

DSM to calculate 3D parameters using LIDAR points. After a 3D building model was

constructed, it was projected to the images again for refinement. Image gradient was

employed as a clue for searching edge pixels belonging to a building line segment. Seo

(2003) used contours generated from LIDAR data to detect buildings and he

reconstructed building models from the LIDAR data after buildings were detected.

Reconstructed models were then projected to images for refinement. He detected image

edges and selected edges with a certain length to refine building models.

122

The direct information provided by images is in 2D image space. In order to improve

its geometry, a 3D building model will be projected to the image space in this study.

Then the projected model will be refined using extracted image information.

Consequently, the 3D model is refined. With known EOP, a 3D building model can be

projected to an image. Figure 4.4 shows an example of projected building roofs on a pair

of stereo aerial photographs. It is clear that there are discrepancies between the projected

model lines and their corresponding image lines. It should be noted that the discrepancies

at the upper right corners are caused by model assumption. In this research, building

models are assumed to have rectangular shapes. The refinement of model topology is not

the content of this study.

Figure 4.4. A building model from LIDAR data projected onto a pair of stereo images

The advantage of aerial photographs over LIDAR data is that they can capture sharp

linear features, i.e., edges. In order to use linear features to perform refinement, these

123

edges should be detected and extracted first. The Canny edge detector is employed to

detect edge pixels from aerial photograph data. Figure 4.5 presents an example of

detected edge pixels from a sub image covering a projected building model. The edge

detection was conducted on the red band of a color image.

Figure 4.5. Building image and the edge pixels detected from the Canny detector

In reality, a building under study is not the only foreground object in a study area.

There are also many other objects like cars and other constructions. Some of these objects

adjacent to a building have similar spectral characteristics to buildings. In this case, an

edge detector will be confused and will fail to detect edge pixels of the building under

study. Some of the unwanted objects have very different spectral characteristics. In this

case, they will introduce unwanted edge pixels. This can be seen from Figure 4.5. In

order to use the right edge pixels to refine a projected 2D building model, pixels

belonging to the building should be correctly separated from other edge pixels.

124

Although LIDAR data cannot capture sharp features, reconstructed building models

are still good approximations of actual buildings. Thus, a projected 2D building model is

close to the image of a real model; and provides valuable clues for further processing. To

pick the correct pixels of a building edge, a projected model provides two important

clues, orientation and position. An edge line of a projected model is parallel or almost

parallel to its corresponding image line. In addition, a pixel on a line has a gradient

azimuth perpendicular to the line it belongs to. The gradients of image pixels were

computed; and gradient azimuths were also calculated. Only those detected edge pixels

with a gradient azimuth perpendicular to a projected model line will be extracted for

further investigation. Here, the “perpendicular” is not exactly a 90-degree difference.

Instead, it is a confidential range. For example, a 5-degree deviation from 90-degree can

be treated as “perpendicular.”

Besides the azimuth, the position of a projected model line provides a second clue for

edge pixel searching. A reconstructed building model is a good approximation of a real

building. A projected 2D model on an image should be close to the actual image model.

To use this information, a buffer can be drawn to eliminate unwanted edge pixels far

away from a model line. The width of the buffer matches how well a model from LIDAR

data approximates its real building model.

In this study, the buffer sizes of the azimuth search and distance search are

determined by trying. In future research, they could be determined automatically. For

example, LIDAR data accuracy, image EOP accuracy, and comparison of building

boundaries before and after the regularization will provide some information for

125

calculating a buffer’s size. Figure 4.6 shows the detected edge pixels belonging to their

corresponding line of a projected model edge line.

Figure 4.6. Edge pixels detected from the azimuth constraint (left) and from both the azimuth and the position constraints (right)

After edge pixels were correctly detected, they can be used to refine projected model

edges. The refinement will be conducted line by line. In this study, the orthogonal least

square regression method will be employed to derive line parameters. Instead of

minimizing the vertical distance between observations and a fitting line as the traditional

least square method does, the orthogonal regression method minimizes the perpendicular

distance between the observations and a fitting line. For easier implementation, a line in a

2D image space will be represented using polar coordinates. The polar representation of a

line is (d, θ). Here, d is the perpendicular distance from the origin to this line while θ is

126

the normal direction of the line. This line is formed by all points whose projection on the

vector (cos(θ),sin(θ)) has a length of d. Such a line can be represented in equation (4-17).

)sin(*)cos(* θθ yxd += (4-17)

Equation (4-17) can be rewritten as constraint function in equation (4-18).

0)sin(*)cos(*),( =−−= θθ yxdyxF (4-18)

Given a random point (x’, y’), its perpendicular distance to the line is the absolute

value of F(x,y), |F(x’,y’)|. To derive the parameter of the fitting line using the orthogonal

regression, F(x,y) can be rewritten as

0)sin(*)cos(*),( =−−= θθθ yxddF (4-19)

Equation (4-19) has a nonlinear form. Thus, a linear form will be derived based on

the Taylor series. In order to get accurate line parameters, iterative calculations will be

performed.

0*),(*),(),(),( 00'

00'

00 ≈∆+∆+≈ θθθθθ θ dFddFdFdF d (4-20)

edFddFdF d +∆+∆=− θθθθ θ *),(*),(),( 00'

00'

00 (4-21)

Equation (4-20) is the form applied to derive the increments of line parameters. The

Gauss-Markov model will be used to minimize the error e. The projected 2D model lines

provide initial values for distance d and azimuth θ. At each iteration, increments of d and

θ are calculated; then d and θ are updated using the increments. Figure 4.7 shows the

searching for edge pixels and the refined edges using the orthogonal regression method.

127

Figure 4.7. Searching edge pixels (left) and refined edge lines (yellow line in right)

4.3. Reconstruct 3D Building Model with Refined Geometry

Based on imaging geometry, a 3D surface on the ground can be reconstructed through

stereo image processing. After 2D projected building lines were refined using image

information, specifically edge pixels detected from the Canny operator in this study, new

corners can be generated by intersecting the refined lines. In this way, the coordinates of

original corners can be updated from the refined lines. Due to different imaging

geometries, stereo images have different qualities. Some edges can be detected in one

image, but not in the other. In this case, the corners updated independently from each

image of a stereo pair are actually not conjugate points. They are not the same point on

the ground. So they cannot be used to perform space intersection for calculating 3D

ground coordinates.

128

To solve this problem, it is necessary to keep track of the updating of each corner.

Only corners updated from both stereo images from the same refined lines will be

updated. From the description of the previous section, it can be seen that the fundamental

refinement primitives are line segments. To simplify the processing, updating information

of a line segment was kept instead of a corner’s updating information. A boundary line is

updated if and only if it is updated in both stereo image spaces. Otherwise, the line will

remain unchanged. After all lines were processed, new corners were generated through

line intersection. Using this strategy, corners generated from stereo images are conjugate

points; and they can be processed using space intersection to derive 3D coordinates. It

should be noted that some of the new corners are actually not updated. However, it is not

necessary to differentiate updated corners from updated ones.

After the refinement of line segments was finished, coordinates of corners can be

calculated through intersecting adjacent line segments. Based on the line refinement

principle, conjugate corners are updated together in both stereo images. There will be no

case that a corner is updated in one image while its conjugate corner in the other image is

not updated correspondingly. 3D ground coordinates are calculated from conjugate points

using collinearity conditions of a ground point, its image point, and a camera’s exposure

center.

While the collinearity equations are the basic ones applied in space intersection, there

is other information that can be applied to derive a reliable and high accuracy for 3D

ground coordinates. From a 3D model reconstructed from LIDAR data, the relationship

among corners and surfaces is available. Due to high accuracy of LIDAR data in

129

mapping 3D surfaces, a planar roof’s parameters derived from a large number of points

have a high accuracy. This is the idea of quantity for quality, which means parameters

with high accuracy can be calculated from a large number of measurements with

relatively lower accuracy. In this study, a detected roof can have tens of or even hundreds

of LIDAR points. The points provide a great redundancy to derive a high accuracy planar

surface. Roof plane information can force constraints in space intersection.

Knowing the interior and exterior orientation parameters of an aerial image,

collinearity conditions can be written in equation 4-22. Collinearity means that the three

points (the ground point, the corresponding image point, and the camera exposure center)

are lying in the same 3D line. It is the nature of central perspective projection.

))(*)(*)(*)(*)(*)(*

(*

))(*)(*)(*)(*)(*)(*

(*

))(*)(*)(*)(*)(*)(*

(*

))(*)(*)(*)(*)(*)(*

(*

20332

20322

20312

20232

20222

20212

202

20332

20322

20312

20132

20122

20112

202

10331

10321

10311

10231

10221

10211

101

10331

10321

10311

10131

10121

10111

101

ZZrYYrXXrZZrYYrXXr

fyy

ZZrYYrXXrZZrYYrXXr

fxx

ZZrYYrXXrZZrYYrXXr

fyy

ZZrYYrXXrZZrYYrXXr

fxx

−+−+−−+−+−

−=

−+−+−−+−+−

−=

−+−+−−+−+−

−=

−+−+−−+−+−

−=

(4-22)

where,

(x1,y1) and (x2,y2) are the image coordinates of conjugate points after transformation;

(x10,y10) and (x20,y20) are the calibrated principle point coordinates respectively;

f is the calibrated camera focal length;

(X10,Y10,Z10) and (X20,Y20,Z20) are the exposure centers of the stereo images;

(X,Y,Z) is the ground coordinates to be solved from the equations;

r1ij and r2

ij are the (i,j) entry of the rotation matrix for each image respectively.

130

Equation 4-22 can be rewritten in a uniform formula to calculate ground coordinates.

See equation 4-23.

0))(*)(*)(*)(*)(*)(*

(*),,(

0))(*)(*)(*)(*)(*)(*

(*),,(

0))(*)(*)(*)(*)(*)(*

(*),,(

0))(*)(*)(*)(*)(*)(*

(*),,(

20332

20322

20312

20232

20222

20212

2024

20332

20322

20312

20132

20122

20112

2023

10331

10321

10311

10231

10221

10211

1012

10331

10321

10311

10131

10121

10111

1011

=−+−+−−+−+−

+−=

=−+−+−−+−+−

+−=

=−+−+−−+−+−

+−=

=−+−+−−+−+−

+−=

ZZrYYrXXrZZrYYrXXr

fyyZYXF

ZZrYYrXXrZZrYYrXXr

fxxZYXF

ZZrYYrXXrZZrYYrXXr

fyyZYXF

ZZrYYrXXrZZrYYrXXr

fxxZYXF

(4-23)

In order to calculate the unknowns from equation 4-23 in a computer, its linear form

should be used. The linear form can be written as

0***),,(),,( 000 =+∆∂∂

+∆∂∂

+∆∂∂

+= iiii

ii eZZF

YYF

XXF

ZYXFZYXF (4-24)

iiii

i eZZF

YYF

XXF

ZYXF +∆∂∂

+∆∂∂

+∆∂∂

=− ***),,( 000 (4-25)

In equations 4-24 and 4-25, ei is the error introduced by omitting items with high

order derivatives. The subscript “i” ranges from 1 to 4 representing the 4 equations in

equation 4-23. Using a vector form, the observation equations can be written as

eAL += ξ* (4-26)

where

−−−−

=

),,(),,(),,(),,(

0004

0003

0002

0001

ZYXFZYXFZYXFZYXF

L , , e ,

∆∆∆

=ZYX

ξ

=

4

3

2

1

eeee

131

and

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

=

ZF

YF

XF

ZF

YF

XF

ZF

YF

XF

ZF

YF

XF

A

444

333

222

111

The constraint that a ground corner is lying in a roof surface is applied together with

the observation equations to derive ground coordinates for a corner. A corner point can

belong to more than one roof surface. Generally, the constraint from the ith roof surface

can be written as

1*** =++ ZcYbXa iii (4-27)

To be integrated with the observation equations, 4-27 should use the increments

( ) as unknowns. It can be expressed in equation 4-28. ZYX ∆∆∆ ,,

000 ***1*** ZcYbXaZcYbXa iiii −−−=∆+∆+∆ (4-28)

Applying the same form as equation 4-26, equations 4-28 and 4-26 can be written in

combination as the equation 4-29.

WBeAL

=+=

ξξ

**

(4-29)

Applying the least square method together with the Lagrange approach, the unknowns

can be solved as in equations 4-30 and 4-31.

132

=

WLPA

BBAPA TTT **

*0

**λξ

(4-30)

=

WLPA

BBAPA TTT **

*0

**1

λξ

(4-31)

where,

P is the weight matrix;

λ is the unknowns introduced by the Lagrange method with a dimension of n by 1,

and n is the number of roof constraints applied to the current corner.

After ground coordinates were calculated for all corners of a building, a building

model was refined. The refined model from integration of LIDAR and imagery data has a

high geometric quality compared with the one reconstructed solely from LIDAR data. In

this study, corners belonging to building roofs were processed. Lower corners of vertical

walls were derived from their corresponding upper corners.

4.4. Implementation

The refinement can be implemented on roofs individually or on a whole building

model simultaneously. To use individual roofs, some 3D corners will be processed on

each roof respectively. Because these corners were processed independently from

different roofs, it is likely that coordinates of a corner calculated from different processes

don’t match each other. This is mainly caused by roof planar constraints applied to the

same corner from different roofs in different processes.

133

To solve the discrepancy problem, the refinement is carried out on a whole building

model simultaneously. The procedure can be described as followings:

1. Initialize a weight vector for corners N={np1,np2,…npm} to zero;

2. Given a roof polygon, retrieve its points and project the 3D polygon to stereo

image spaces;

3. Update each roof segment using the method presented previously. Record the

number of points used to update the line segments, nli. For a non-updated

segment, the number is zero;

4. Intersect the segments to calculate corners. Each corner is assigned a weight,

which is calculated as the sum of the two numbers of updating points to its

two intersecting line segments. The equation is )1(' ++= illipi nnn ;

5. Update a corner’s coordinates (xci, yci) as weighted average coordinates,

pipi

ipiipii nn

xcnxcnxc ''*'*

++

= and pipi

ipiipii nn

ycnycnyc ''*'*

++

= .

Update the corner’s weight pipipi nnn '+= ;

6. Repeat steps 2-5 till all roof surfaces are processed.

134

CHAPTER 5

EXPERIMENTS AND RESULTS

In previous chapters, the methodology of building model reconstruction from LIDAR

and imagery data is described in details. Some examples were also presented to illustrate

how the proposed method works. To substantially examine this methodology, a large data

set was employed to demonstrate how the method could be applied to reconstructing

building models from data integration. Instead of an autonomous system, the

implementation of this methodology was carried out as a sequence of user guided

operations, namely, building detection, 3D model reconstruction, and model refinement.

They will be demonstrated in the same sequence in this chapter.

5.1. Data

Two different types of data were used in this experiment, LIDAR data and aerial

photographs. The experimental site is located in a suburban area of Houston, Texas. The

LIDAR data has a point density of approximately 1 point per square meter, with a vertical

accuracy of approximate 15 centimeters and a horizontal accuracy of approximate 0.5

135

meter. The flying height is approximately 915 meters with a data swath of approximately

550 meters. The aerial photographs have a ground resolution of approximately 0.3 meter

with a scale of approximate 1 to 20000. The flying height is approximate 3350 meters.

Figure 5.1 shows a picture of the data used in this study. The one on the left is LIDAR

data converted into grid format; and the right one is an aerial image subset covering

approximately the same area. The area of this place is approximate 1.5 square kilometers.

Figure 5.1. Experimental data: the LIDAR data (left) and the aerial imagery data (right)

5.2. LIDAR Segmentation

LIDAR segmentation is the process to differentiate LIDAR points falling onto

different types of objects. One important process is to distinguish points falling on the

ground. Extracted ground points can be used to generate the DTM, a commonly used

product in geo-science. In this study, the focus is detection of building points. This is

achieved from LIDAR segmentation using the proposed methods. After a DTM was

generated from classified ground points, non-ground points were extracted by testing

136

their heights above the ground. These non-ground points were then analyzed to

differentiate buildings from other objects such as trees and cars. The information applied

at this process is surface texture and object size.

In order to separate ground points, two methods were proposed in this study, the

planar-fitting segmentation algorithm and the height-jump segmentation algorithm. In

fact, all LIDAR data segmentation algorithms are based on height difference among

different objects. The same principle applies to these two methods. LIDAR data was

converted to grid format for process. The ground region was extracted as the one with the

largest area after different objects were disconnected based on segmentation methods.

The classification of ground points is an iterative process. At the first iteration, the

majority of ground points were distinguished. However, points at boundaries and points

inside an inner court cannot be correctly classified at this time. Boundary points have

similar character as tree points in the planar-fitting segmentation. In the height-jump

method, some classified height-jump points are sitting on the boundary of the ground

region. For inner court points, they form an independent object different from the

detected ground region that is usually connected by a road network. These height-jump

points and inner-court points can be re-classified as ground points by checking their

heights with the DTM generated from ground points detected in the previous iteration.

Figure 5.2 presents a detected ground region after the first iteration and the 5th iteration

using the height-jump algorithm. From this figure, it is clear that the ground region after

the 5th iteration has much more details than the one detected in the first iteration.

137

Figure 5.2. The ground region detected from an iterative process: the 1st iteration (top) and the 5th iteration (bottom)

After a final DTM is generated from ground points, the so-called normalized DSM

can be generated by subtracting the DTM from the DSM, which is generated from

138

LIDAR data through a point-to-grid process. In order to detect buildings, a height

threshold was applied to filter out the ground and other objects such as cars. In this study,

a threshold of 3 meters was applied. However, the detected objects include some trees.

To eliminate these trees, the planar-fitting algorithm was used. The rationale is that tree

crowns are rougher than building roofs. Based on the assumption that building roofs are

planar surfaces, most tree objects can be differentiated from buildings. Still some trees

with very large crowns cannot be eliminated completely from the planar-fitting

algorithm. However, their crown areas are dramatically decreased. This is the motivation

for using a size threshold in building detection. Those eliminated regions from a size

constraint include trees and some buildings. In this study area, many houses are covered

by trees. Because of the imaging geometry of LIDAR data, these parts cannot be

recovered directly from data without prior knowledge. Some of these building parts were

eliminated by the size constraint. Figure 5.3 shows buildings detected using the height

and size constraints.

Figure 5.3. Non-ground objects detected after applying the height constraint

(left) and buildings after applying the size constraint (right)

139

Due to imaging geometry, the vertical walls of a building can rarely be measured like

roofs in either LIDAR data or aerial photograph. In fact, their existence is implicated by a

building’s boundary. In order to fully reconstruct a 3D building model, the boundary of a

building should be extracted. Here, a building’s boundary is its 2D footprint on the

ground. In the proposed methodology, boundaries are assumed to have rectangular

shapes. This means line segments forming a boundary are parallel or perpendicular to one

another. Rectangular shapes are achieved using the regularization algorithm developed in

this research. Figure 5.4 displays building boundaries after regularization with a DSM.

Figure 5.5 presents n building example of regularization.

Figure 5.4. Building boundaries after regularization with LIDAR DSM

140

cba

Figure 5.5. An example of boundary regularization

Boundary regularization involves two operations: line simplification and boundary

adjustment. During line simplification, the distance threshold employed here is 3 meters.

This parameter was applied to all boundary simplification process. In practical

implementation, it can be determined based on each individual building. From Figure

5.5a, it can be seen that there are many points on a straight segment. When a boundary

was extracted from a grid format data, each pixel on this boundary is recorded as a point

in its vector format. Before a simplification process was performed, these intermediate

points on a straight segment were eliminated. Otherwise, they would introduce biases in

the simplified boundary. In case the distance threshold applied in simplification is larger

than twice the distance between two consecutive points on a boundary, an intermediate

point will be detected as a corner instead of the endpoints of a straight segment. In this

situation, a straight segment will be broken up. This bias will propagate to its final

regularized boundary.

141

By visually checking with the aerial photos, detected building objects are all from

ground buildings. However, some buildings on the ground were not classified as building

objects out of the detection. The major cause is the size constraint applied. These

undetected buildings are obscured by trees; and they cannot be mapped in the LIDAR

data. Totally, 144 buildings were detected in this experiment. Although some buildings

are covered by trees from top, they can still be detected because a large area they occupy.

Around 90 percent of the buildings are detected in this study area. But this number should

be refined through field check.

5.3. Building Reconstruction

After boundaries were regularized to rectangular shapes, they were included in the

building reconstruction process as vertical walls. To detect building roofs, the normal

data of a DSM was employed. The normal data was calculated from a plane fitting a 5 by

5 square window in this experiment. A small window will not be able to smooth out

noise, such as a 3 by 3 window; while a large window will decrease the difference

between different roof surfaces, such as a 9 by 9 window.

After the normal data was smoothed using the mean-shift filter, a supervised

classification was carried out to perform surface segmentation. The mean-shift algorithm

can also be applied to perform segmentation. The classification was employed because it

can be conducted in commercial software. The principles of the applied classification are

the same as the mean-shift algorithm.

142

After the classification was finished, roof surfaces were converted from the

classification theme to vector format and extracted using building boundaries. The roofs

with their centers falling with a building’s boundary were extracted for further process.

Some small features like chimneys were also detected. However, they will not be

processed in building reconstruction. These small features were eliminated using a size

constraint. Figure 5.6 demonstrates the normal data after mean-shift filtering, and

extracted roof surfaces are displayed on the top of the classification result. It should be

noted that the chimney regions were not eliminated yet.

Figure 5.6. Normal data after the filtering (left) and extracted roof surfaces (right)

To reconstruct a 3D building model, the topology among its surfaces was built using

the method developed in this study. The topology was built in a 2D space; and vertical

walls were represented as boundary line segments. After the boundary of a building was

regularized, the topology among the vertical walls can be directly extracted. For roof

surfaces, their adjacency relationship was determined by testing if they share a common

143

boundary point. The major difficulty in building a topology is to determine the adjacency

relationship between a vertical wall and a roof surface.

In this study, an algorithm was designed to calculate the adjacency between a roof

surface and a vertical wall. The 2D line segment representing a vertical wall was

extended perpendicular to its azimuth. Thus, a rectangle was formed. This rectangle was

then tested if it overlaps a roof surface. If there is an overlap between these two surfaces,

the vertical wall and the roof are considered adjacent to each other. The extension width

is a user-controlled parameter. Basically, two factors affect the parameter. One is the size

of the window used in calculating normal data; and the other one is the changes to a

boundary during the regularization process. The first one can be obtained directly from

the window’s size. The second factor is not fully investigated in this study. But it could

be tracked to evaluate its effect on the extension width. For example, the comparison

between a boundary before regularization and after regularization can provide some

clues.

Among the 144 detected buildings in the detection stage, 141 3D building models

were reconstructed. Three buildings failed in reconstruction. The reason is that the

corners of their reconstructed roof polygons cannot be correctly ordered due to a

relatively large deviation of their reconstructed roofs from their original roof polygons.

Figure 5.7 presents an example of 3D building model reconstructed from LIDAR

data. In Figure 5.7a, parallel red line pairs are those segments expanded perpendicularly

to both sides of vertical walls, which form the blue rectangular polygon in the middle of

the parallel pairs. The green polygons with ragged boundaries are extracted roofs. After

144

the topology was built, coordinates of 3D corners were calculated and plotted as red star

marks in Figure 5.7a. Figure 5.7b shows a 3D view of the reconstructed building model.

Figure 5.7c and 5.7d presents the projection of the same 3D model in one stereo pair of

aerial photographs. The figure shows that the reconstructed model has an accurate

geometry. Figure 5.8 demonstrates a close view of a subset of building models

reconstructed from the experimental data. Figure 5.9 presents a 3D view of all the

reconstructed building models.

dc

ba

Figure 5.7. An example of 3D building model reconstructed from LIDAR data

145

Figure 5.8. A close view of 3D building models

Figure 5.9. 3D view of all reconstructed building models

146

5.4. Building Refinement from Data Integration

The building models from LIDAR data were refined from stereo aerial imagery data.

As described in the previous chapter, refinement was implemented on model geometry.

The topology of a building model will not be changed from the refinement process. To

perform refinement, a 3D model from LIDAR data is projected to stereo images. 2D

models from the projection will be updated from imagery information in both stereo

images. Through space intersection, a 3D building model with refined geometry can be

obtained from the refined 2D models. When refinement is implemented on individual

roofs, discrepancies will occur at roof intersections. Figure 5.10 demonstrates such a

discrepancy at duplicated corners.

Figure 5.10. The discrepancy between duplicated corners caused by the refinement based on individual roofs

147

To solve the problems, a building model should be refined based on the whole model.

In this case, corners can be updated uniquely instead of multiple times. The algorithm is

described in Chapter 4. The refinement primitives are line segments. Whenever a line

segment is refined, it triggers the updating of its two end points.

After all roof surfaces of a building were processed for refinement, its corners were

updated consistently. From the surface-point relationship matrix built in the

reconstruction process, the surfaces containing a corner can be retrieved. Their planar

constraints will be applied to the space intersection for calculating 3D corner coordinates.

Compared with the corners’ original coordinates derived from LIDAR, the new

coordinates have a higher accuracy. In this way, a 3D building model was refined from

the imagery information. Figure 5.11 presents the same model as in Figure 5.10. It can be

seen that the discrepancy disappeared.

Figure 5.11. An example of model refinement with consistency

148

5.5. Discussion

Building models are important information applied in many disciplines. Building

reconstruction is the process used to calculate parameters of building models in a model

space, which is an abstract of the real world. “Abstract” means that the model space

cannot represent exactly the same details as the real world. Thus, it is normal that some

exceptions or errors will exist in reconstructed models due to limitations from both data

and algorithms applied.

Due to the limitation of LIDAR point data, small features cannot be detected or

reconstructed, such as chimneys. In addition, some buildings close to each other were

detected as one single building because of the lack of points between them. This will

introduce errors in detection results, and it may fail to provide a 3D model reconstruction.

As for imagery data, it is not possible to extract all the information needed for

refinement. The aerial photographs used are optical images. In some places, different

features have similar spectral characteristics. In this situation, the required information

for refinement cannot be extracted. Thus, the refinement in this study was applied to

those building features where required imagery information was available. For buildings

obscured by trees, they either cannot be detected or only a portion of such a building can

be detected. Imagery data cannot provide more information about the obscured parts of

buildings. This problem can be solved or partially solved by integrating other sources of

data into application. For example, hyper-spectral imagery data can be applied to

differentiate trees from buildings. Multi-return LIDAR data can also differentiate trees

from buildings effectively. Another possible solution is to deduce or guess the whole

149

picture of an obscured building from its exposed part using some artificial intelligent

algorithms. Further research can be conducted on this topic.

In this study, building models are assumed to have rectangular 2D boundaries.

Buildings without a rectangular footprint will have large deviations because they are

forced by the modeling to have rectangular shapes. Figure 5.12 provides two examples of

this type of deviation. Improvement to the modeling and algorithms can be investigated

in future study.

Figure 5.12. Deviations of reconstructed models from actual building objects

150

151

In spite of the limitations mentioned above, the methodology developed in this

research can produce high accuracy building models from the integration of LIDAR data

and aerial photographs. The integration provides more information than either single data

type. From the experiments, the refinement from aerial photographs can improve the

accuracy of the models derived from LIDAR.

The co-registration of LIDAR and aerial photographs was not experimented in this

study because the exterior orientation parameters of the photographs were available from

the data provider. The method and related equations are described in chapter 4 for

completeness of the proposed building reconstruction methodology.

CHAPTER 6

CONCLUSIONS AND FUTURE RESEARCH

To reconstruct 3D building models from LIDAR and aerial photographs, a new

methodology is proposed in this research. The methodology is basically comprised of

three procedures: building detection, 3D model reconstruction, and 3D model refinement.

6.1. Conclusions

Under the proposed framework, a 3D building model can be reconstructed using

LIDAR and aerial imagery data. The methodology is implemented on polyhedral building

models. The major contributions of this research can be summarized as followings:

• Two algorithms are developed to perform LIDAR segmentation. Compared with

the algorithms proposed by other researchers, these two algorithms work well in

urban and suburban areas. In addition, they can keep fine features on the ground;

• An algorithm of building boundary regularization is proposed in this study.

Compared with the commonly used MDL algorithm, it is simple to implement

and fast in computation. Longer line segments have larger weights in its

152

adjustment process. This agrees with the fact that longer line segments have more

accurate azimuths provided that the accuracy of ending points are the same for all

segments;

• A new method of 3D building model reconstruction from LIDAR data is

developed. It is comprised of constructing surface topology, calculating corners

from surface intersection, and ordering points of a roof surface in their correct

sequence;

• A new framework of building model refinement from aerial imagery data is

proposed. It refines building models in a consistent approach; and it utilized

stereo imagery information and roof constraints in deriving refined building

models.

This approach doesn’t need much prior information about a building model to be

reconstructed. Thus, it can be used to reconstruct more types of buildings than methods

using a model-based approach. It can also avoid the decomposition of complex buildings

that CSG based methods have to do. Usually, such decomposition is a tricky task. The

experiments have shown that the methodology works successfully in building detection

and building model reconstruction. Besides the methodology of 3D model reconstruction

itself, there are also several methods developed to perform specific tasks in order to

achieve the ultimate goal of 3D building model reconstruction. These methods include

the algorithm for building footprint boundary regularization, the method for constructing

the topology of building surfaces, and the method for filtering surface normal data.

153

6.2. Future Works

It is a fact that no algorithm can work in every situation. The methodology proposed

in this study is not an exception. It also has some limitations. Some of the limitations are

caused by the data applied. For example, two different buildings might be detected as one

single building because they are very close to each other. They are so close that there are

only few points falling on the ground between them. These points are not sufficient to

separate two buildings. Another limitation is that the algorithm in ordering roof polygons

may fail to order a polygon in the correct sequence. This usually happens when a roof

polygon has a narrow part on it. When buildings are covered by trees, either they cannot

be detected or they can only be detected partially. Data from other sources can be applied

in the detection stage to help solve the problem. In addition, multi-return LIDAR data can

be applied to effectively differentiate trees from buildings.

A second type of limitation is from building modeling itself. In this study, buildings

are assumed to have rectangular footprints. Thus, non-rectangular footprints will be

forced to have rectangular shapes as the examples showed in experiments. Another

limitation is that 3D modeling assumes no height breakups within a building. In other

words, no vertical walls in a 3D building model are allowed. These issues can be further

investigated in future research. Major research topics in future work are the following:

• The reconstruction of small features using high-resolution data. For example, a

LIDAR data with a point density of approximately more than 4 points per square

meter can be used to detect dormers;

154

155

• The inclusion of vertical walls inside a building model. New data structure and

algorithm need to be developed to handle such inner vertical walls;

• The integration of LIDAR data and aerial imagery data in building detection. The

spectral information from imagery data could be integrated with LIDAR height

information to perform building detection;

• Model refinement from a single image. Currently, the information used in

refinement is an intersection of the information from both stereo images. For

example, a line segment is updated if and only if its corresponding segment in the

other image is updated. In future research, the union of the information from both

stereo images could be applied;

• Refinement of the topology of a building model from imagery data. New methods

should be developed to refine the topologies of building models derived from

LIDAR data. Fine features or structures not detected/reconstructed from LIDAR

data can be added to building models to be refined.

156

BIBLIOGRAPHY

Ackermann, F., 1999. Airborne laser scanning-present status and future expectations,

ISPRS Journal of Photogrammetry & Remote Sensing, Vol.54 pp64-67, 1999

Aelst, S., X. Wang, and R. H. Zamar, 2003. Linear grouping using orthogonal regression,

http://hajek.stat.ubc.ca/~ruben/website/ORCpaper.pdf, visited March 2004

Alharthy, A. and J. Bethel, 2002. Building extraction and reconstruction from LIDAR

data, Proceedings of ASPRS annual conference, 18-26, Washington

Ameri, B. and D. Fritsch, 2000. Automatic 3D building reconstruction using plane roof

structures, ASPRA Conference, Washington, DC, 2000

Ameri, B., 2000. Feature Based Model Verification (FBMV): A new concept for

hypothesis validation in building reconstruction, IAPRS Vol. XXXIII, Part B3/1,

Comm. III, pp. 24-35, ISPRS Congress, Amsterdam. 2000

Axelsson, P., 1999. Processing of laser scanner data – algorithm and applications, ISPRS

Journal of Photogrammetry & Remote Sensing, Vol.54 pp138 - 147, 1999

Axelsson, P., 2000. DEM generation from laser scanner data using adaptive TIN models,

IAPRA, 33, B4/1

Baltsavias, E. P., 1999. A comparision between photogrammetry and laser scanning,

ISPRS Journal of Photogrammetry & Remote Sensing, Vol.54 pp83-94, 1999

Bourke, Paul. 1987. http://astronomy.swin.edu.au/~pbourke/geometry/insidepoly/

Brenner, C., 2000. Towards fully automatic generation of city models, ISPRS,

Vol.XXXIII, Amsterdan, 2000

Brunn A. and U. Weidner, 1997. Extracting buildings from digital surface models,

IAPRS, 32, Stuttgart

Brunn, A., 2001. Statistical interpretation of DEM and image data for building extraction,

in Baltsavias et al. (edit), Automatic Extraction of Man-made Objects from Aerial and

Space Images (III), 2001.

Cawsey, Alison, 1998. Line Intersection.

http://www.cee.hw.ac.uk/~alison/ds98/node114.html, visited May 7, 2004

Cheng, Y., 1995. Mean shift, mode seeking, and clustering, IEEE Transaction on Pattern

Analysis and Machine Intelligence, Vol.17 (8), pp790-799, 1995

Comaniciu, D. and P. Meer, 2002. Mean shift: a robust approach toward feature space

analysis, IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol.24 (5),

pp603-619, 2002

Csathó, B. and T. Schenk, 2002. Multisensor fusion to aid automatic image

understanding of urban scenes, http://dfc.jrc.it/doc/csatho%20020624.pdf

157

Csathó, B., T. Schenk, D.C. Lee, and S. Filin, 1999. Inclusion of multispectral data into

object recognition, International Archive of Photogrammetry and Remote Sensing,

Vol. 32, Part 7-4-3 W6, Valladolid, Spain, 3-4 June, 1999.

Dissanaike, G. and Sh. Wang, A Critical Examination of Orthogonal Regression and an

Application to tests of firm size interchangeability,

http://les1.man.ac.uk/sapcourses/Semstuff/Ort-wang.PDF, visited March 2004

Elaksher, A., 2002. Building extraction from multiple images, Ph.D. thesis, Purdue

University, 2002

Elaksher, A., J. Bethel, and E. Mikhail, 2002. Building extraction using multiple images,

Proceedings of ACSM-ASPRS 2002 Annual Conference, Washington, May 2002

Elberink, S.O. and H.G. Maas, 2000. The use of anisotropic height texture measurements

for the segmentation of airborne laser scanner data, IAPRS, 33, Amsterdam

Elberink, S.O. and H.G. Maas, 2000. The use of anisotropic height texture measurements

for the segmentation of airborne laser scanner data, IAPRS, 33, Amsterdam

Förstner, W., 1999. 3D-City Models: Automatic and semiautomatic acquisition methods.

http://www.ifp.uni-stuttgart.de/publications/phowo99/foerstner.pdf, visited in 2002.

Frère, D., M. Hendrickx, J. Vandekerckhove, T. Moons and L. Van Gool, 1997. On the

reconstruction of urban house roofs from aerial images, in Automatic Extraction of

Man-made Objects from Aerial and Space Images (II), edited by Gruen et al., 1997.

158

Fua, P. and C. Brechbüler, 1997. Imposing hard constraints on deformable models

through optimization in orthogonal subspaces, Computer Vision and Image

Understanding, 65(2):148-162

Fuchs, F., 2001. Building reconstruction in urban environment: a graph-based approach,

in Baltsavias et al. (edit), Automatic Extraction of Man-made Objects from Aerial and

Space Images (III), 2001.

Fukunaga, K. and L.D. Hostetler, 1975. The estimation of the gradient of a density

function, with applications in pattern recognition, IEEE Transaction on Pattern

Analysis and Machine Intelligence, Vol.IT21 (1), pp32-40, 1975

Gamba, P., 2000. Digital Surface Models and Building Extraction: A Comparision of

IFSAR and LIDAR Data, IEEE Transactions on Geoscience and Remote Sensing,

Vol.38, No.4, July 2000

Geogescu, B., I. Shimshoni, and P. Meer, 2003. Mean shift based clustering in high

dimensions: a texture classification example, Proceedings of the 9th IEEE

International Conference on Computer Vision, 2003

Haala, N. and C. Brenner, 1999. Extraction of buildings and trees in urban environments,

ISPRS Journal of Photogrammetry & Remote Sensing, Vol.54 pp130 - 137, 1999

Haala, N. and M. Hahn, 1995. Data fusion for the detection and reconstruction of

buildings. In Automatic Extraction of Man-Made Objexts from Aerial and Space

Images, edited by A. Gruen, O. Kuebler and P. Agouris, 1995.

159

Haala, N., C. Brenner, and K. Anders, 1998. 3D urban GIS from laser altimeter and 2D

map data, http://www.ifp.uni-stuttgart.de/publications/1998/ohio_laser.pdf, visited

July 2003

Habib, A.F., S.W. Shin, and M.F. Morgan, 2002. Automatic pose estimation of imagery

using free-form control linear features. ISPRS Commission III Symposium

“Photogrametric Computer Vision”, Graz, Austria, September 9-13, 2002

Hough, P.V.C., 1962. Method and means for recognizing Complex Patterns. U.S. Patent

3.069.654

Hu, Y. and C.V. Tao, 2002. Bald DEM generation and building extraction using range

and reflectance LIDAR data, Proceeding of ACSM-ASPRS 2002 Annual Conference,

Washington, D.C. (CD-ROM)

Hubeli, A., K. Meyer, and M. Gross, 2000. Mesh edge detection,

http://graphics.cs.ucdavis.edu/hvm00/abstracts/hubeli.pdf

Huising, E.J. and L.M. G. Pereira, 1998. Errors and accuracy estimates of laser data

acquired by various laser scanning systems for topographic applications, ISPRS

Journal of Photogrammetry & Remote Sensing, Vol.53 pp245-261, 1998

Iisaka, J. and T. S.A., 2000. Image analysis of remote sensing data integrating spectral

and spatial features of objects,

http://www.gisdevelopment.net/aars/acrs/2000/ts9/imgp0013.shtml, ACRS 2000

Kilian, J., N. Haala, and M. Englich, 1996. Capture and evaluation of airborne laser

scanner data, IAPRS, 31, B3

160

Kraus, K. and N. Pfeifer, 1998. Determination of terrain models in wooded areas with

airborne laser scanner data, ISPRS Journal of Photogrammetry & Remote Sensing, 53

(1998): 193-203

Lacroix, V. and M. Acheroy, 1998. Feature extraction using the constrained gradient,

ISPRS Journal of Photogrammetry & Remote Sensing, Vol.53 pp85 - 94, 1998

Lee, H. and N. H., Younan, 2003. DEM extraction of LIDAR returns via adaptive

processing, IEEE Transaction on Geoscience and Remote Sensing, 41(9): 2063-2069

Lin Ch., A. Huertas, and R. Nevatia, 1995. Detection of buildings from monocular

images. In Automatic Extraction of Man-Made Objexts from Aerial and Space

Images, edited by A. Gruen, O. Kuebler and P. Agouris, 1995.

Lohmann, P. and A. Koch, 1999. Quality assessment of laser-scanner-data,

http://www.ipi.uni-

hannover.de/html/publikationen/1999/koch/isprs99%20koch%20lohmann.pdf, visited

in September 2003

Lohmann, P., 2001. Segmentation and filtering of laser scanner digital surface models,

IAPRS, 34, 2

Lohmann, P., A. Kock, and M. Schaeffer, 2000. Approaches to the filtering of laser

scanner data, IAPRS, 33, Amsterdam

Ma, Y.W. and B.S. Manjunath, 1997. Edge flow: a framework of boundary detection and

image segmentation, Proceedings of the IEEE Conference on Computer Vision and

pattern recognition, Puerto Rico, 1997

161

Maas, H. and G. Vosselman, 1999. Two algorithms for extracting building models from

raw laser altimetry data, ISPRS Journal of Photogrammetry & Remote Sensing,

Vol.54 pp153 - 163, 1999

Maas, H.G., 1999a. Closed solution for the determination of parametric building models

from invariant moments of airborne laserscanner data, ISPRS conference ‘Automatic

Extraction of GIS Objects from Digital Imagery’, Munchen/Germany, 8-10. 9. 1999’.

(IAPRS Vol.32 pp193-199)

Maas, H.G., 1999b. Fast determination of parametric house models from dense airborne

laserscanner data, ISPRS workshop on mobile mapping technology, Bangkok,

Thailand, April 21-23, 1999.

Maas, H.G., 1999c. The potential of height texture measures for the segmentation of

airborne laserscanner data, 4th International Airborne Remote Sensing and Exhibition

/ 21st Canadian Symposium on Remote Sensing, Ottawa, Ontario, Canada, 21-24 June

1999

Matikainen, L., J. Hyyppä, and H. Hyyppä, 2003. Automatic detection of buildings from

laser scanner data for map updating, IAPRS, 34, Dresden

Mayer, Stefan, 2001. Constrained optimization of building contours from high-resolution

ortho-images, ICIP 2001, Thessaloniki, Greece

McGlone, J.Ch. and J. A. Shufelt, 1994. Projective and object space geometry for

monocular building extraction. Proceedings of IEEE Computer Society Conference

on Computer Vision and Pattern Recognition, 1994. Page(s): 54 -61

162

Mcintosh, K. and A. Krupnik, 2002. Integration of laser-derived DSMs and matched

image edges for generating an accurate surface model, ISPRS Journal of

photogrammetry & remote sensing, Vol. 56 pp167-176, 2002.

Morgan, M. and A. Habib, 2002. Interpolation of LIDAR data and automatic building

extraction, Proceeding of ACSM-ASPRS 2002 Annual Conference, Washington, D.C..

(CD-ROM)

Murakami, H., K. Nakagawa, H. Hasegawa, T. Shibata, and E. Iwanami, 1999. Change

detection of buildings using an airborne laser scanner, ISPRS Journal of

Photogrammetry & Remote Sensing, Vol.54 pp148 - 152, 1999.

Nevatia, R., Ch. Lin, and A. Huertas, 1997. A system for building detection from aerial

images. In Automatic Extraction of Man-Made Objects from Aerial and Space Images

(II), edited by A. Gruen, O. Kuebler and P. Agouris, 1997.

Pilu, M. and A. Lorusso, 1997. Uncalibrated stereo correspondence by singular value

decomposition, http://www.hpl.hp.co.uk/people/mp/docs/bmvc97/index.htm#content

Rottensteiner, DI. F., 2001. Semi-automatic extraction of buildings based on hybrid

adjustment using 3D surface models and management of building data in a TIS, PhD

thesis, http://www.ipf.tuwien.ac.at/fr/buildings/diss/dissertation.html

Rottensteiner, F. and Ch. Briese, 2002. A new method for building extraction in urban

areas from high-resolution LIDAR data, IAPRS, 34, Graz

Schenk, T., 1997. Towards automatic aerial triangulation, ISPRS Journal of

Photogrammetry & Remote Sensing, Vol.52 pp110-121, 1997

163

Schenk, T., 2002. Fusion of LIDAR and Imaging Data, in Mapping Geo-Surficial

Processing Using Laser Altimetry, The 3rd International LIDAR Workshop,

Columbus, Ohio, 2002.

Schiewe J., 2003. Integration of multi-sensor data for landscape modeling using a region

based approach, ISPRS Journal of Photogrammetry & Remote Sensing, 57 (2003):

371-379

Scholze, S., T. Moons and L. Van Gool, 2001. A probabilistic approach to roof patch

extraction and reconstruction, in Baltsavias et al. (edit), Automatic Extraction of Man-

made Objects from Aerial and Space Images (III), 2001.

Scott, G. and H. Longuet-Higgins, 1991. An algorithm for associating the features of two

patterns. In Proceedings of Royal Society London, vol. B244, pp. 21-26, 1991

Seo, S., 2002. Data fusion of aerial images and LIDAR data for automation of building

recognition, The 3rd International LIDAR Workshop, Columbus, Ohio, 2002.

Seo, S., 2003. Model Based Automatic Building Extraction from LIDAR and Aerial

Imagery, Ph.D. dissertation, The Ohio State University, 2003

Sithole, G., 2001. Filtering of laser altimetry data using a slope adaptive filter, IAPRS,

34, 3W4

Spreeuwers, L., K. Schutte, and Zweitze, 1997. A model driven approach to extract

buildings from multi-view aerial images, in Automatic Extraction of Man-made

Objects from Aerial and Space Images (II), edited by Gruen et al., 1997.

164

Stamos, I. and P. K. Allen, 2000. 3-D Model Construction Using Range and Image Data,

http://www.cs.columbia.edu/~allen/PAPERS/cvpr2000.pdf, 2000

Tao, V. and Y. Hu, 2001. A review of post-processing algorithms for airborne LIDAR

data. Proceedings of ASPRS Annual Conference (CD-ROM), 23-27 April, St. Louis,

2001.

Teh, Ch. and R. T. Chin, 1988. On image analysis by the methods of moments, IEEE

Transactions on pattern analysis and machine intelligence. Vol. 10, No. 4, July 1988.

Tomasi, C. and R. Manduchi, 1998. Bilateral filtering for gray and color images,

Proceedings of the 1998 IEEE International Conference on Computer Vision,

Bombay, India

Vosselman, G. and I. Suveg, 2001. Map based building reconstruction from laser data

and images, in Baltsavias et al. (edit), Automatic Extraction of Man-made Objects

from Aerial and Space Images (III), 2001.

Vosselman, G. and S. Dijkman, 2001. 3D building model reconstruction from point cloud

and ground plans, IAPRS, V.34/3-4W3, October, 2001, Annapolis, Maryland

Vosselman, G., 2000. Slope based filtering of laser altimetry data, IAPRS, 33,

Amsterdam

Vosselman, G., 1999. Building reconstruction using planar faces in very high density

height data, IAPRS, V.32/3-2W5

Wang, Zh., 1999. Surface reconstruction for object recognition, Ph.D. dissertation, The

Ohio State University, 1999

165

166

Wehr, A. and U. Lohr, 1999. Airborne laser scanning-an introduction and overview,

ISPRS Journal of Photogrammetry & Remote Sensing, Vol.54 pp68-82, 1999

Weidner, U. and W. Förstner, 1995. Towards automatic building extraction from high-

resolution digital elevation models, ISPRS Journal of Photogrammetry & Remote

Sensing, Vol.50 pp38 - 49, 1995

Wolf, P. R. and B. A. Dewitt, 2000. Elements of Photogrammetry with Applications in

GIS, 3rd edition. Published by Thomas Casson, 2000

Xu, F., X. Niu, and R. Li, 2002. Automatic recognition of civil infrastructure objects

using Hopfield Neural Networks, ASPRS annual conference, 18-26, Washington

Zhao, Zhiyuan, 2001. Line Simplification. http://www-cg-hci.informatik.uni-

oldenburg.de/~da/peters/Kalvin/Doku-CG.htm, visited November 12, 2002

Zimmermann, P., 2001. Automatic building detection analyzing multiple data, in

Baltsavias et al. (edit), Automatic Extraction of Man-made Objects from Aerial and

Space Images (III), 2001.