ON THE USE OF GEOMETRIC AND SEMANTIC MODELS FOR … · Besides the geometric description an increasing number of applications also demand thematic information about acquired buildings

ON THE USE OF GEOMETRIC AND SEMANTIC MODELSFOR COMPONENT-BASED BUILDING RECONSTRUCTION

Andre Fischer and Thomas H. Kolbe and Felicitas LangInstitute for Computer Science III, Institute for Cartography & Geoinformation, Institute for Photogrammetry

University of Bonne-mail: [email protected], [email protected], [email protected]

Semantische Modellierung; SMATI 99

KEY WORDS: 3D building reconstruction, hierarchical 2D/3D modeling, component-based reconstruction,inexact relational matching, minimum description length principle, multi-image correspondence analysis

ABSTRACT

3D building data is needed in many application areas. Besides the geometric description an increasing numberof applications also demand thematic information about acquired buildings. We present a concept for theautomatic extraction of buildings from aerial images. In contrast to other approaches generic building structuresboth are geometrically reconstructed and semantically classified. A component-based, parameterized buildingmodel is employed to control the reconstruction of buildings.This paper describes how geometric and semantic knowledge of buildings is propagated through the differentaggregation levels of the building model. Furthermore, it is shown how rules and constraints are derived fromthe model and exploited at each stage of the reconstruction process.

1 INTRODUCTION

3D building data play an important role for digital citymodels and geoinformation systems, which are usedin numerous application areas: architecture, urbanplanning, telecommunication and environmental in-vestigations show an increasing need for up-to-date3D building models. To reduce the cost of time-con-suming manual data acquisition the automation of 3Dbuilding reconstruction became an active researcharea in recent years.

The quality requirements like accuracy, degree of ab-straction, and actuality of the 3D building data to beacquired in general depend on the application area.Especially in GIS applications like urban planning notonly the geometric description of the buildings is nec-essary, but also their semantic classification to facili-tate thematic inquiries. Therefore, it is essential thatautomatic building reconstruction procedures extractboth the geometric and semantic aspects of build-ings.

Several research groups presently are working on 3Dbuilding reconstruction and have demonstrated pro-mising results. The work on automatic building recon-struction reveals essentially two different approaches.One employs a database of predefined buildings,which are matched with the images to extract the cor-responding building instances. The other utilizes ageneral or prismatic polyhedral model, where build-ings are reconstructed by geometric grouping of im-age features. Whereas the former approaches yield

both geometrical and semantical 3D building informa-tion, they are limited by their fixed number of prede-fined buildings. In contrast, the generic model of thelatter approaches allow the representation of arbitrar-ily shaped buildings, but provide no building specificinterpretation of the reconstructed polyhedrons.Up to now only semiautomatic systems are able toperform both geometric and semantic reconstructionusing generic scene descriptions (cf. [LLS95], [EG96],[TDM96]).In earlier articles (see [BKL�95, FKL97, FKL�98]) wehave presented a concept for automatic building re-construction which enables both the reconstructionand the interpretation of generic building structures.The employed building model is based on buildingspecific components that are combined in a flexibleway. Rules and constraints represent the buildingspecific knowledge. They are derived from the build-ing model and guide respectively restrict the recon-struction. In this paper we present these rules andconstraints in detail, and show how they are used andexploited in the different stages of the reconstructionprocess to ensure the reliable extraction and classifi-cation of buildings.

2 DOMAIN SPECIFIC MODELING

2.1 Related Work

In the past ten years the development of concepts forautomatic building extraction has been a topic of ac-

tk

Textfeld

Published in: W. Förstner, C.-E. Liedtke, J. Bückner (eds): Proceedings of the Workshop on Semantic Modeling for the Acquisition of Topographic Information from Images and Maps in München (SMATI’99), 1999

tive research. For an up-to-date overview of the differ-ent approaches see the Ascona workshop proceed-ings 1995 and 1997 [GKA95, GBH97], the SMATI97workshop proceedings [FP97], and the special issueof the CVIU journal on building extraction [FKL�98].

Although the proposed concepts differ wrt. to the typeof input data (e.g. digital elevation models and rasterimages), their reconstruction strategy, and implemen-tation, they can be compared by their employed build-ing model. Following the line of Steinhage in [Ste98],we basically classify the employed modeling schemesinto three categories:

Polyhedral models provide the most flexible approx-imation for the representation of buildings. They poseno constraints on the form of the objects and thus al-low the representation of arbitrarily shaped buildingswith plane faces.The use of general polyhedral descriptions is prob-lematic, because of the missing semantic classifica-tion of the objects and the object parts. To avoid thereconstruction of impossible objects, building knowl-edge has to be incorporated into the reconstructionprocedure, for instance, in form of heuristics and im-plicit assumptions. This can be observed in the ap-proaches of Bignone et. al. [BHFS96] and Frere et.al. [HVF�97]. Both concepts are based on a poly-hedral model. A bottom-up strategy is used, startingwith an initial image feature extraction, where fea-tures are successively grouped to 3D roof patchesusing photometric and chromatic attributes of imageregions with spatial edge geometry. Roof patchesthen are grouped by an overall optimization accord-ing to the simplicity, compactness, and complete-ness of the resulting roof shape. To complete thebuilding shape, vertical walls are assumed. Hen-ricsson et. al. [HB97] present impressive results oftheir approach on some test data, but show no ex-plicit modeling of building types or building specificparameterizations. Due to the lack of further build-ing specific modeling it cannot be decided wether areconstructed object is a building, a lorry or a dog-house.Another drawback of a general polyhedral model is,that due to the absence of a more specific build-ing modeling no predictions about occluded partscan be made. Buildings can only be extracted com-pletely, if every part is visible and is detectable bythe feature extraction process. Thus, this approachwill only give good results, when there are severalimages from different viewpoints available and thequality of the images is high.

Prismatic models can be seen as a special case ofpolyhedral models. They allow the representationof buildings with arbitrary ground plans, but are re-stricted to vertical walls and flat roofs. They arebased on the assumption that the ground plan of aflat roof house is congruent with the roof’s outline.

Prismatic models are used in a number of differentapproaches: Nevatia et. al. [NP82, MN88, LHN95,NLH97], Fua and Hanson [FH87], and Weidner et.al. [WF95, Wei97]. Whereas the possible buildingshape is restricted explicitly in these concepts, build-ings are not further classified and have no specificparameterization. In all cases the reconstruction strat-egy is bottom-up, following mostly the principles ofperceptive grouping (cf. [Moh89]) to successivelyconstruct geometrically — but not necessarily se-mantically — more meaningful objects. Thus, pris-matic models suffer from the same lack of buildingspecific knowledge as general polyhedral models.

Parameterized volumetric primitives are buildingmodels with fixed topology and variable size. Mod-eling buildings by volumetric primitives has severalparticular advantages: because every building typeis explicitly modeled, their different forms of appear-ance can be derived a priori. Second, even partiallyoccluded buildings can be fully reconstructed, andthird, the identification of an instance of a volumetricprimitive in the image implies its classification.Parameterized volumetric primitives are employed by,among others, McGlone and Shufelt [MS94], Jayneset. al. [JHR97], and Schutte et. al. [Sch96, SSH97].All approaches use a hypothesize-and-verify strat-egy, where building primitive instances are hypoth-esized in a data-driven way, and then are verifiedtop-down according to the model. The major draw-back of volumetric primitives is the lack of flexibilitywrt. to different building shapes, because only build-ings can be detected, that are explicitly stored in themodel database.

Discussion: Whereas (prismatic) polyhedral modelsallow the representation of buildings in a generic way,they are too unspecific and do not take explicitly intoaccount constraints wrt. to the domain ’building’. Pa-rameterized volumetric models instead represent asemantically more meaningful modeling scheme, be-cause building type specific parameters like width,height, and length can be used to further restrict mod-els to reasonable sizes. Unfortunately, they are re-stricted to the detection of a small number of prede-fined building types. To bridge the gap between bothapproaches, we have developed a component-basedbuilding model, which is described in the next section.

2.2 Building Model

To overcome the limitations of parameterized volu-metric primitives with their fixed topology, we employa component-based modeling. Buildings with com-plex topology can be constructed by connecting pa-rameterized building part primitives in a generic way.

Each building part primitive is parameterized by itsown set of form parameters like roof height, buildingwidth etc. A set of constraints is defined on these pa-

rameters, introducing further specific building knowl-edge into our model. Simple constraints define upperand lower bounds for valid parameter intervals. Morecomplex constraints relate parameters to others andenforce geometric properties (i.e. roof slopes). Pa-rameters keep their meaning during the aggregationof building part primitives. Thus, a meaningful pa-rameterization is maintained during the aggregationprocess.

In detail, the proposed building model consists of afour level part-of hierarchy (see fig. 1). It reflects dif-ferent levels of semantic abstraction. The primitivesof each aggregation level are specialized by an is-a hierarchy into subclasses. Building specific knowl-edge and restrictions are propagated top-down fromthe higher levels to the lower ones. To allow tightcoupling of 2D and 3D reconstruction processes, theprimitives on each level are coherently modeled in 2Dand 3D.

Region

Point

Line

Corner

Wing

Terminal

Building

Face

Building Part

Feature Aggregate

Feature

Connector

2D 3D

Figure 1: Building model: Different semantic levels ofthe part-of hierarchy are shown in vertical direction,different levels of the is-a hierarchy in horizontal di-rection (here shown only for the 3D side). The 2Dimage model describes the expected appearance ofthe building in the images.

The lowest level contains attributed features, namelypoints, lines and regions. They establish the link toa symbolic image description, that is derived fromthe original raster images by a feature extraction pro-cess. Attributes for lines and regions, for instance,are the orientation classifications horizontal, vertical,and oblique. Regions have an additional attribute de-scribing their role: valid values among others are walland roof.

The next level consists of feature aggregates, whichare induced by points, lines and regions and containall their topological neighbours. They are subdividedinto corners, wings and faces. A corner, for exam-ple, contains one point and all its adjacent lines andregions.

The two top levels contain the already mentioned build-ing parts and buildings. Building parts are composed

from feature aggregates. Actually, they are repre-sented by corner graphs.

3 STRATEGY

In a data-driven preprocessing step the aerial imagesare segmented and features are extracted from them.Regions of interest which are likely to contain buil-dings are determined and all following steps focus onthese regions and the features contained in them.

The further process is strictly model-driven and fol-lows the paradigm of hypothesize and verify. It can bedivided into three parts which use the building modelin different ways.

The transition from 2D image data to the 3D objectspace is realized by the reconstruction of 3D verticesfrom 2D image features. Classification transforms thevertices into 3D corners.

The corners are used to index into the database ofparameterized building part primitives, which are in-stantiated and aggregated into building hypotheses.

In order to verify the building hypotheses, they areprojected back into the images. The obtained 2Dviews are matched with the image features using con-straint solving techniques. Of all best matches themost probable is determined and selected. A finalparameter estimation determines form and pose pa-rameters.

A detailed descriptions of these three parts follows inthe next three sections.

4 3D CORNER RECONSTRUCTION

One crucial part within the complete reconstructionprocess is to derive 3D object parts which serve asan appropriate basis for the subsequent aggregationprocess. For the transition into object space we usethe level feature aggregates, especially the buildingcomponents of type ��. The use of corners ismotivated by the following reasons:Observability: The projections of corners into the im-ages are image structures which show a high stabil-ity against occlusions.

Structure: The topologic and structural properties ofcorners are helpful for steering the search procedureduring the correspondence analysis and have theadvantage of using aggregates in contrast to singlefeatures.

Interpretation: The richness of the observable geom-etry and topology of the corners, especially in 3Dobject space, is suitable for interpreting the observed3D descriptions. The interpretation then can be usedfor stabilization and verification.

Aggregation: The corner geometry as well as the cor-ner semantics give powerful information for the sub-sequent 3D aggregation process (cf. section 5.4)and thus enables to use a generic model for 3D ag-gregation.

The corner reconstruction process follows the hypo-thesize and verify paradigm. Hypotheses are build upmainly data-driven inferring the class membership ofthe observed data to a corner class from analyzingthe observable geometric description. They repre-sent model instances with fixed geometry, topologyand structure. Having built up the hypotheses, theverification can be performed model-driven and ex-ploits the class-specific constraints for evaluation andfor increasing the accuracy of the reconstruction.

The corner reconstruction is performed as a multi-image procedure using all available image data si-multaneously. The interior and exterior image orien-tation is presumed to be known and is used for geo-metrically restricting the search during the correspon-dence analysis.

4.1 3D and 2D Corner Model

The level feature aggregates of our building hierarchydistinguishes three different object types, namely cor-ners �, wings � and faces � (cf. [FKL�98]), each ofthem given by its geometric and topologic form de-scription and its semantic description. They repre-sent components of a general polyhedron which addi-tionally are specified by building-specific geometrical,topological and structural properties. In this context,we restrict ourselves to describing corners becausewings and faces so far are of subsidiary importancefor our building reconstruction procedure.

3D Corner Model: Each corner � � �� is givenby its form description being a vertex � composedfrom elements on the level features and its seman-tic description by a class label ��. The distinction ofcorners from wings and faces is due to the structuralcomposition of the underlying feature aggregates.Each corner � of order � is described by the cornerpoint � and each � lines � and regions � that arepreliminarily open-ended in their spatial extension (cf.fig. 2 a.) and thus represent plug elements for group-ing and aggregating corners. The graph representa-tion (cf. fig. 2 b.) expresses the topologic and struc-tural description of a corner where the graph nodesrepresent features and the graph arcs the adjacencyrelations between them.

The building-specific part of the corner model, whichmakes the difference of a building corner from a gen-eral polyhedral corner depends on the corner spe-cialization hierarchy, which divides corners into sub-classes. The specialization depends as well on the

R

L

L

1

R

R

P

L

3

3

2b.

2

3L

2L

2R

L R

R

1

1

3

1

P

a.

Figure 2: a. shows a corner represented by its com-ponents points, lines and regions. b. shows the graphrepresentation of a corners.

corner topology as on its geometry. We use a two-level specialization hierarchy. Each subclass impliesclass dependent constraints ��

onto the form de-scription of the vertex �. On the first level of the spe-cialization hierarchy we use unary constraints whichrefer to single components of a corner. They espe-cially restrict the corner components of type line tobuilding-specific qualitative attributes like being hor-izontal, vertical or sloped. On the second level ofthe specialization hierarchy we use binary constraintswhich refer to the geometric relationship of pairs ofcorner components and likewise restrict the cornercomponents of type region to building-specific attrib-utes like being horizontal or vertical.

If no constraints are attached to the corner we call itthe unconstrained corner with class label ��, which isidentical to a polyhedral corner, that is a vertex �.

2D Corner Model The image model aims at de-scribing object components coherent to the 3D mod-eling to provide direct access from image observa-tions to 3D objects and vice versa. The image rep-resentation coherently uses components on the hier-archy level features. Therefore we use an appropri-ate feature extraction to derive a polymorphic sym-bolic image description consisting of points, lines andregions and their mutual neighborhood relations (cf.[Fuc98]). The result is stored in a feature adjacencygraph analogous to the graph representation in objectspace (cf. fig. 2 b.).

At the beginning of the corner reconstruction, the geo-metric corner instances are unknown. As the geo-metric variability of the appearance of the differentcorner types is too large for an efficient represen-tation in the image model we renounce integratingthe corner specialization into the image model. Sothe image model is a vertex model, especially mod-eling the topological and structural properties of theappearance of a corner depending on its underlyingvertex.

Furthermore, due to the image characteristics andthe characteristics of the feature extraction procedure(cf. [FLF94]), the observed symbolic image descrip-tion in general differs from the ideal projection of thecorresponding 3D corner. Fig. 3 shows some typicaldeviations of the observed vertices from their ideal

projection for the example of a corner of order � ��. To take into account the uncertainties of the fea-ture aggregates we therefore statistically formulatethe image model by a vertex classification model (cf.[FKL�98]) which classifies feature aggregates beingvertices or non-vertices taking for reference the idealprojection of a corner of order 3 (cf. fig 3a.).

b.a. c. d.

Figure 3: shows the extracted features (first row) andfeatures with a neighborhood relation to the cornerpoint (second row). a. shows the ideal projection of acorner of order 3, b.-d. show possible appearances ofextracted image corners differing from the ideal pro-jection.

4.2 Generation of 3D Corner Hypotheses

The corner reconstruction starts with the generationof corner hypotheses and comprises the reconstruc-tion of 3D vertices and their interpretation as 3D cor-ners.

Initial form reconstruction of 3D vertices: By mul-ti-image correspondence analysis we build an initialform reconstruction of 3D vertices ��, which pos-sess the topologic and structural properties of cor-ners as defined by the image model.

� Feature extraction and vertex selection: The firststep in reconstruction is the extraction of a symbolicimage description. Based on the extracted points,lines and regions and their mutual neighborhood re-lations we build vertices �� by point-induced 2Daggregation analyzing the feature adjacency graph.The selected vertices serve as a basis for the corre-spondence analysis.

� Correspondence analysis of vertices: The corre-spondence analysis starts with the selected verticesand is formulated as search procedure which ex-tends over three layers. In the first layer a mostpromising vertex �� in image is selected for refer-ence using a priority list of the 2D vertices by evalu-ating their suitability for the correspondence analysisand reconstruction. The evaluation is done by vertexclassification using the image model and consider-ing the stability, uniqueness and structural richnessof a vertex. The second layer establishes a stereocorrespondence of vertices �� in the images

and �. In addition to the evaluation score of single

σd

σd

II

i2

i1j1

i3i2

j3

j1

iv

I I

II i j i

i j

jc.)

a.)

b.)

L

PP e

iv

jvd

P j

L

L

L

j3 LL

L

L

Lj2i3L

Figure 4: a.) shows the epipolar constraint for thecorner point of a vertex pair. The corner point in im-age � is restricted to lie on the epipolar band aroundthe epipolar line defined by the point in image �. b.)and c.) show two examples of corresponding ver-tices. The epipolar geometry restricts the matching ofthe corner lines. Example b.) is unambiguous. Thusthree corner lines can be reconstructed. In examplec.) only one corner line can be reconstructed usingthe line �� in image � . In image � the matching isambiguous and both lines �� or �� can be taken forcorrespondence leading to two different vertex alter-natives.

vertices the structural similarity of matching candi-dates which fulfill the epipolar constraints is incorpo-rated. The third layer generates a multi-image corre-spondence tuple �� . Epipolar geometry one againgives restrictions of the search space.

� Transition to 3D vertices: Based on the correspon-dence tuple �� , the transition to object space isperformed by a joint forward intersection of the cor-responding image feature components points �� and lines �� of the vertex correspondence tuple�� using all images � � simultaneously. Epipolargeometry once again defines geometric restrictions,which facilitate the matching of the features.

Interpretation: For the 3D vertices we perform theinterpretation by class assignment of each vertex toa corner classes given by the object model, which re-sults in corner hypotheses. The interpretation is per-formed in object space as the 3D information definesstronger restrictions during the interpretation than in2D. By geometric analysis each 3D vertex is assignedto one or several alternative corner classes definedby the 3D corner specialization hierarchy. The 3D ver-tex classification is performed hierarchically by check-ing possible class-specific constraints and inferringthe corresponding corner classes:

� Checking for unary constraints: We start check-ing unary constraints on the first level of the cornerspecialization hierarchy. We use qualitative line at-tributes depending on the slope orientation with re-spect to the corner point, given by the qualitativegeometric labels �� , �� ,

�� , �� and ��

[Gul92].Actually we specialize corners � of node degree �.We exclude meaningless corners like e.g. cornerswith line attributes ��, which make nosense in the context of buildings and their function-ality. Thus the first specialization level compounds� possible classes.

� Checking for binary constraints: Depending on un-ary constraints that are identified as being satisfiedbinary constraints are checked. Examples of binaryconstraints are �� of two lines with respectto the vertical or �� of two lines. Theexplicite definition of subclasses on this level is notsensible as we cannot be capable to predefine itwithout restricting the variability of buildings.The set of identified constraints of each vertex arerelated to a class label �� resulting in a corner hy-pothesis �� . If no constraints are at-tached to the corner we call it the unconstrained cor-ner with the class label ��.

� Model based corner prediction: In general the ver-tex observations may be incomplete. Therefore weuse the corner model on the one hand for predic-tion of unobserved corner components due to miss-ing or fragmented features, or missing feature ad-jacency relations or on the other hand for predic-tion of constraints. The prediction only can be per-formed if the predicted instances are geometricallydefined by the model. E. g. if we observe a re-constructed vertex of node degree 2 with line labels(�) and (��) with the two lines further fulfilling theconstraint ��, we perform a modelbased prediction of a corner of order � of class (�,��,��) where the line pair �� fulfills the con-straint �� as it occurs forthe gable corner of a gabled roof building with thelines �� spanning a vertical plane.

gable corner gable roof

orthogonal ground corner general ground cornerorthogonal corner flat roofgeneral corner flat roof

TSGWTWTK

eaves corner gable roofgable corner hip/crippled hip roofeaves corner hip roofeaves corner crippled hip roof

AGOFAFGS

OG

hh

v+

hh

v-

v+

hh

v-

o+h

ho-

OF

GS

TSv-

ho-AF

h

AGOG

flat roof prismatic roof gable roof

o-h

o+

TShv-

o+o-

ho-

hGW

TW

GWTK

hip roof crippled hip roof

Figure 5: Different corner types that are defined bythe building specific corner specialization hierarchy.

4.3 Verification of 3D Corners

In contrast to the mainly data-driven generation ofcorner hypotheses the verification is performed model-driven and profits from a strong model. On the onehand the geometric model instances give approximatevalues for further steps of the analysis. On the otherhand the class membership defines class-specific con-straints that can be used for geometric stabilizationand for checking the conformity of data and model.

As each 3D vertex can be assigned to several corre-sponding corner classes and thus leads to alternative3D corners the verification has to resolve this ambi-guity. Further on the verification is necessary to sus-tain predicted corners possibly by additional obser-vations to avoid blind prediction. Otherwise the pre-dicted corners have to be rejected. In both cases theknowledge of the model instances defines strongerconstraints than during the generation of corner hy-potheses.

The verification of the corner hypotheses is performedby statistical analysis and is formulated as an opti-mization problem for finding the best interpretation �of the data �� from all possible corner hypotheses.Using bayes theorem, the optimization of the condi-tional probabilities � ��

��

� �� can be broken downto

� � ��

� ��

� ��

� ��

� �� (1)

neglecting the denominator by normalization. Equa-tion 1 clearly shows the influence of the model �: Theconditional probability � �� thereby evalu-ates how good the corner instance fits the observedimage features of the vertex correspondence tuple�� and can be derived using the classical mod-eling techniques of observation errors. The a prioriprobability � �� gives information on the probabilitydistribution of the different corner classes.

For each corner hypotheses we perform a maximumlikelihood parameter estimation using all supportingimage features simultaneously.

The verification process is subdivided into the follow-ing steps: a. the model-based selection of observa-tions, b. the model-based form reconstruction of eachcorner hypotheses and c. the optimization of the in-terpretation.

Model-based selection of observations: For eachcorner hypothesis the model instance defines approx-imate values for the model-based reconstruction.Thus the selection of matching features is done byback-projecting the instantiated corner model into theimages and analyzing the deviations between the pro-jected model and the symbolic image description.

That way we possibly get access to features whichwere originally not contained in the selected verticesand be able to partially bridge the incompleteness,fragmentation and missing neighborhood relations ofthe extracted symbolic image description.

Figure 6: shows image features of type line andpoint (left) which are used for the corner reconstruc-tion (right) during the verification steps. They areselected model-based by back-projection of the cor-ner hypotheses into the images. Please note thatthe matching features contain fragmented lines andbridge missing feature adjacency relations to the ver-tex points.

Model-based form reconstruction: For each hy-pothesis, we estimate the geometric parameters bya maximum likelihood parameter estimation using allsupporting image features simultaneously. The resultof the parameter estimation gives for each hypothe-sis the optimal geometric reconstruction with the ex-pected values �� of the observations dependingon the functional model �� with ��

which defines the relation between the observations� and the geometric parameters ��. The parame-ters � of each corner class �� depend on the class-specific constraints ��

and thus are predefined bythe model. Each corner � of order � in principle re-quires �� geometric parameters � being the threecoordinates of the point vector � of the corner point ��and two parameters for each of the � corner line �,being the direction angles �� and the azimuth �� (cf.fig 7 a.). These parameters are reduced by the classspecific constraints (cf. fig 7 b.) For corner represen-tation we actually store the �� 3D points being thecorner point �� and virtual points �� alongthe corner line direction which are given by the nor-malized direction vectors �, determined from the an-gle parameters of the corner line . The estimationuses multi-image point and line observations simulta-neously.

Optimization of reconstruction and interpretation:The optimization of the corner reconstruction useseq. 1 which contains both the data-dependent part� �� which is responsible for the optimal ge-ometric reconstruction and the model-dependent part

a.) b.)

m

nl

Y

mX

P

P X

Y

m

Zm

m

λ

φ

S

p

Zo

X

Y

o

o

Pα

κ

n2Pn3P

mZ=Z

n1P

p

Zo Xo

Yo

X o

Figure 7: a. parametrization of an unconstrained cor-ner line. b. parametrization of a corner of type ��

�� . The � � � � � parameters � ofthe ��corner are reduced to three coordinates of thepoint vector � and two angle parameters � and �.

� �� which is responsible for the optimal semanticreconstruction.

� Optimal geometric reconstruction: For optimizationof the geometric corner reconstruction, the evalua-tion can be derived from the residuals � � � � � ofthe optimal estimation � � �� using the probabilitydensity function �� in case thefeatures exist and have been successfully matchedto the model.

��

��det��

��

��

�� (2)

The evaluation depends on the squared sum of theresiduals � � ��

�� of the estimation � � ��.� is the number of unknowns in � and �� is thecovariance matrix of the observations �.

� Optimal semantic reconstruction: The a priori prob-ability � �� for the corner class �� can be obtainedempirically by learning (cf. [Eng97]) and can in prin-ciple be integrated to evaluate the model dependentinfluence on the result. Actually we prefer special-ized corners �� from unconstrained corners �� andassume equally distributed corner classes �� .

4.4 Grouping of 3D Corners

The next step in reconstruction is to perform a 3Dgrouping of corners. In contrast to the subsequent3D aggregation, the grouping connects the cornerssolely using the knowledge of the aggregation levelfeature aggregates without using any building-specificaggregation relations. The result of the grouping pro-cess is a corner adjacency graph CAG, whereas thegraph nodes are classified and attributed corners andthe graph arcs denote a connection between corners,possibly evaluated by its connection probability.

The grouping is performed in two steps:

� Qualitative compatibility: The first step in group-ing is to check the qualitative compatibility of thecorner pairs which depends on the class specificconstraints of the corners. Therefore we postulated

the class compatibility of the corners and their com-ponents stored in a class-connection-table. Due totheir class-specific constraints a corner of type ��

�� can be connected to a cornersof type �� but not to a cor-ners of type �� . On the fea-ture level a corner line of type �� can only be con-nected to a line of type ��. As the class compatibilityof the corresponding corner components is partiallyexpressed by the corner classes itself why we firstcheck the corner classes and then the correspond-ing feature classes.

� Quantitative compatibility: The class-compatibilityof corners is necessary but not sufficient for the group-ing of corners. Therefore the second step in group-ing is to check the quantitative compatibility of theclass-compatible corner pairs analysing the underly-ing geometric corner instances. E. g. for connectiontwo lines of type �� have to be collinear,two regions of type �� have to be coplanar.The geometry can be statistically tested exploitingthe statistical properties of the features given by thecovariance matrix �� of the estimated feature ge-ometry � which results from the parameter estima-tion while corner reconstruction (cf. [HLF99]).

During the 3D aggregation the corner adjacencygraph can be used to facilitate and focus the index-ing into parameterized building parts (cf. chap. 5.4)or building primitives (cf. [Lan99]).

a.) b.) c.)

d.)

s-,s+

h,s+,v-

h,s-,s-h,s-,s-

h,s+,v-

h,s-,s- h,s-,s-

h,s+

s-,s+

h,s-,s-

TS

AC

GS

TS

GW

GW

AC

AC

GS

AC

Figure 8: shows the reconstructed corners and thecorner grouping. a.) wire frame representation ofthe reconstructed corners. b.) corner connections ofclass compatible corners denoted in black. c.) adja-cency relations of the corners, which fulfill the class-compatibility as well as the compatibility of th geome-try. d.) representation of the corner adjacency graphCAG. The graph arcs denote connectivity relationsbetween corners. The graph nodes denote cornersattributed by their class membership.

4.5 Results and Accuracy

The presented procedure for corner reconstructionand interpretation for the time being aims at deliver-ing an appropriate basis for reconstruction of genericbuildings by 3D aggregation of the corners using theproposed hierarchical building model.

The corner reconstruction is tested with stereo imagedata with �� images overlap using image scales be-tween � �� up to � �� and �� pixelsize on ground. The achieved accuracy of the re-constructed corner points is internally estimated with�� and � � ��. The orienta-tion accuracy of the corner main direction amounts�� whereas the slope of the roof is deter-mined with �� .

For testing the complete building reconstruction pro-cedure we used the the international distributed dataset Avenches residential with scale � �� whichwas disposed for the Ascona workshop on AutomaticExtraction of Man-Made Objects from Aerial andSpace Imagery (cf. [GKA95], [MBS94]). The cornerreconstruction is performed using the grey level im-ages with a pixel resolution of ��. Altogether 56�of the building corners were reconstructed and iden-tified to have an adjacency relation to at least anothercorner. 82� of these are correctly reconstructed and18� are false corners. The correct and false recon-structed corners are distributed with 72� to 10� and4� to 14� onto the building-specific classes �� andthe unconstrained corners of class ��. The recon-structed corners are sufficient for aggregating the 11buildings contained in the data set (cf. [FKL�98]).

For details concerning the 3D corner reconstructionprocess we refer to [Lan99].

5 GENERATION OF BUILDING HYPOTHESES

This section describes the construction of building hy-potheses for a given set of reconstructed corner ob-servations. Furthermore, it is shown how 3D buildinghypotheses are transformed into a parameterized 2Dview hierarchy, which allows a subsequent verifica-tion.

5.1 Model

In our component-based building model, parameter-ized building part primitives are instantiated and ag-gregated to more complex building parts and finallycomplete buildings. Instantiated primitives and ag-gregations of them are referred to as building aggre-gates. Each building aggregate is represented by agraph of corners ! � ��"� where " � � � is aadjacency relation on the set of corners �.

Each building part has one or more plug faces, bywhich the building parts can be connected with eachother. A plug face � � �� "� is a subgraph of !consisting of one single loop of corners �� . Thebuilding model assigns a type to each plug face. Onlyplug faces with compatible types may be connectedtogether. Thus the model reduces the number of con-nectable building aggregates considerably. Buildingparts with one plug face are called terminals, suchwith two or more plug faces are called connectors(see fig. 9).

Terminals: Connectors:

Figure 9: Some examples of building part primitives.Plug faces, which are used to connect them, aredrawn dashed.

The connection of two building aggregates to a newaggregate reduces the set of plug faces. The two plugfaces taking part in the connection are removed anddo not belong to the newly generated aggregate. Thegoal of the aggregation process is the generation ofa building aggregate with an empty set of plug faces.Such aggregates are called closed, otherwise open.

The coordinates of each vertex of all corners in � areparameterized by a set of pose parameters and a setof form parameters. The four pose parameters deter-mine the location and rotation about the vertical axis.The reduction to one rotation parameter is imposedby our knowledge of buildings, because other rota-tions are seldomly observed. The form parametersdefine features like slope of the roof, width and height(see fig. 10). The coordinates may depend nonlin-early on the parameters. If a parameter has an as-signed value it is called fixed, otherwise it is calledfree.

v5

l5

l4

l3l2

l1

v4

v3

v2

hrv1

w

hs

zy

x

#� $� �� %�#� & � �#� & '� �#� � '� � '� �#� �& '� �# �& � �

Figure 10: Parameterized description of a saddle roofterminal. The table on the right shows the coordi-nates of the vertices #�� #.

The parameterized coordinates are stored as sym-bolical expressions to allow their adaption and mod-ification during connections and the construction of

equation systems that are needed for computing pa-rameter estimations.

5.2 Operations

Four different types of are used for generation andaggregation of the building aggregates:Indexing provides the link between the corner ob-servations and the building part primitives. For agiven corner observation an indexing operation se-lects a building part primitive from a library that has amatching corner, instantiates that primitive and gen-erates by this a new building aggregate. Subsequentinvocations of indexing operations for the same cor-ner observation yield different building aggregatesas long as there are corners of building part primi-tives that have not yet been used in this process.The classification of the corner observation as de-scribed in section 4.2 is used to limit the number ofpotential candidates for instantiation. Model cornerswith a different classification from that of the givencorner observation are ignored.

Merging unifies two building aggregates (� and (�

which are instantiations of the same building partprimitive. This operation is necessary, because for �corner observations that belong to the same buildingpart indexing operations generate � instantiations ofthe same primitive instead of one. The set of plugfaces of the resulting aggregate is identical to that of(� and (�.

Connection aggregates two building aggregates (�

and (� to a more complex aggregate (. The set ofplug faces is reduced by the two elements �� and ��whose corners are to be connected with each other.�� and �� must have the same type. With this condi-tion the building model restricts the number of con-nectable building aggregates.During the connection process each corner of �� isidentified with one corner of ��. In a gluing process,both building aggregates are combined into the newaggregate, such that each pair of identified cornersis replaced by a new corner having all of their edges.By design of the plug faces these are exactly twoedges lying on a straight line. In a second step eachof these newly created corners is removed and theiradjoining edges are replaced by one new edge.The following six steps describe the processing ofthe parameters in detail. Pose and form parametersare independent of each other so the pose parame-ters can be ignored until the last step. For an exam-ple refer to figure 11.

1. The connection is carried out in the coordinatesystem of (�. Both (� and (� are translated androtated such that the normal vector of the �� arealigned with the $-axis, the origin of �� lies in theorigin of the coordinate system and the origin of ��lies on the positive $-Axis.

2. The set of form parameters �� of (� is divided intothe disjoint subsets �� . The set ��contains all parameters that are referenced by ver-tices of the plug face ��. The remaining parametersof (� make up the set �� .

3. A new length parameter ) is introduced into �� by translation of (� by the vector �)� �� . If ��already contains a parameter with that name ) hasto be renamed. This length parameter describesthe distance between the plug faces �� and ��. Infig. 11 the origins of the two plug faces are depictedas two discs. It is not chosen as a fixed value be-cause different observations might be contradictoryabout its value.

4. In order to build the set of form parameters � ofthe resulting aggregate ( the four sets of parame-ters �� , �� , �� and �� are processed by thefollowing steps:

a) Though being of the same type �� and �� mayhave different parameterizations. To cope with thateach plug face has two translation tables. Onetranslates its parameters into expressions over theset of parameters of a generic parameterization ofthe plug face type of the ��. The second translatesin the other direction.With these tables the parameters of �� are trans-lated to expressions over the set �� of parame-ters of the plug face type. The parameters of ��are then mapped to expressions over the set �� .In fig. 11 the parameters *, '� and '� of (� arefirst mapped to *, '� and '� because � and (�

have the same parameterization. They are thenmapped to *, ' and * �� of (� since the roof of(� is described by its slope � instead of its heightlike '� in (�.

b) To avoid duplicate names, some of the parame-ters in �� may have to be renamed. In fig. 11,parameter of (� is renamed to � in (�. The re-sulting set is named � �

�� .

The set of parameters of the new building aggre-gate ( is � � �� .

5. All the renamings and translations of form parame-ters in step 4 have to be propagated to all parameterreferences of (� and (�, especially the vertex coor-dinates. In fig. 11 the origin of the coordinate systemcoincides, as stated above, with the origin of the plugface of (�, which is depicted by the left disc. Thenthe point + of (�, marked by a circle, has after theconnection the coordinates �)� '� * �� .

6. The transformation back into the world coordinatesystem reintroduces the pose parameters. It is doneby a following parameter estimation of form andpose parameters which is described in section 5.3.

Prediction operations instantiate building part primi-tives for which no observations exist, but which areneeded for the construction of a closed building hy-

dd’ d Renaming

l New

tanαb h h11h h h22

b bbTranslation

blb

h

F

b

h1

h2

h1

h 2

dd Q

α

A 2A1

Figure 11: Connection of two hip roof terminals(� and (�. The plug face type � is shown be-tween them. The diagram shows how parameters aremapped, renamed or introduced.

pothesis. It only instantiates building part primitivesif no other operation is executable. The new buildingpart is chosen so that two existing building aggre-gates can be joined by it or that one open buildingaggregate can be closed by it. This ensures, that aprediction operation is carried out only as a last re-sort and that some information of one ore two otherbuilding aggregates can be used to guide the predic-tion.The connection itself is not done by this operation,but by one or two subsequent connection operations.

5.3 Parameter Estimation

Each operation except the prediction operations is fol-lowed by a parameter estimation which is also usedto verify that the newly created building aggregatematches its associated corner observations well en-ough. If a verification is not possible, the new buildingaggregate is rejected.

In the previous section we have shown how buildingaggregates are constructed at runtime. The draw-back of this is, that it is not possible to set up theequation system beforehand that represents the leastsquare optimization problem. Instead it is constructedon demand every time a parameter estimation is re-quired. The symbolic representation of the vertex co-ordinates makes this possible.

Each vertex of a building aggregate has a $, � and %-coordinate which is parameterized with respect to theaggregates form parameters. The multiplication ofthese vectors with the matrix that transforms the ag-gregate into the world coordinate system introducesthe four pose parameters. Expressions of edge direc-tions can be derived from the edges end points.

Therefore we have for every corner observation whichis assigned to one model corner three equations ofthe corner point and three equations for every edgeof the corner. Equations that do not contain a param-eter are removed. The resulting nonlinear equationsystem describes a least squares parameter estima-tion problem. It is solved with a standard Levenberg-Marquardt algorithm with the only difference that itoperates symbolically on the equation system. Pleasenote, that also the derivatives used in the algortihmare computed symbolically.

5.4 Aggregation Strategy

Building aggregates are connected to more complexaggregates and finally to closed building hypothesesin an iterated process. A priority queue holds all op-erations not yet carried out. Initially it is filled withall indexing operations. If after the execution of thefirst operation in the queue the resulting building ag-gregate is not rejected, new operations are createdwhich have the new aggregate as one argument andare inserted into the queue. Because the executionof the operations and the parameter estimations arecomputationally expensive, and there exist an expo-nential number of combinations of the building partswith respect to the number of corner observations weemploy a heuristic to define the order of the opera-tions inside the queue. The goal is to use as few op-erations as possible to generate a building hypothesisthat explains the reconstructed corners.

Our construction principles for this ordering are:S1 Indexing is performed before any aggregations ormergings take place. This provides the starting pointfor the algorithm.

S2 Aggregation is done depth first. This constructsbuilding aggregates with as much covered cornerobservations as soon as possible.

S3 The growth of competing building aggregates shallbe as uniform as possible. This avoids having oneaggregate growing while others are neglected.

S4 Predictions take place only if necessary. Becausea prediction operation can not use a corner obser-vation directly it has less information than the otheroperations and therefore its results are more uncer-tain.

These construction principles result in the followingordering of the operations. Let �� and �� be two op-erations.

1. If �� and �� have different operation types, index-ing operations come first, then merging, connectionand finally prediction operations (S1 and S4).

2. So �� and �� have the same operation type.

Indexing operations. If the operations belong todifferent corner observations,

a) then the corner with less building part interpre-tations is preferred (S2),

b) else the operation with an a priori better expect-ed result is performed first (S2).

Merging or connection operations. Let the build-ing aggregates *�� and *�� be the arguments of ��resp. ��.

a) The operation �� with the higher number of cor-ner observations covered by *�� and *�� is per-formed first (S2).

b) The operation �� with the smaller difference incorner observations covered by *�� and *�� is per-formed first (S3).

c) Finally, the operations �� are sorted according tothe sum of the evaluations of *�� and *��. Smallervalues first (S2).

Prediction operations. Prefer the operation withthe larger number of corner observations coveredby their arguments.

5.5 Example

For an example of the aggregation process refer tofigure 12. Six corners have been reconstructed fromthe images (a). The indexing operations assign sev-eral instances of building part primitives to each cor-ner. Those with the best scores are shown in thefigure (b). Two merging operations combine the threeterminals in the front to one aggregate (c). The threeterminals and the one connector in the center are ag-gregated in three connection operations to a closedbuilding hypotheses (d). Please note that there areno corner observations at ground level, and thereforethe height parameter is undetermined. For visualiza-tion a default value has been chosen.

5.6 Parameterized View Hierarchies

In order to verify the generated building hypothesis,they are transformed into parameterized view hier-archies. For each aerial image one parameterizedview hierarchy is generated. This hierarchical imagemodel is a modified aspect hierarchy [DPR92]. It con-sists of four levels:

Level 1: This level contains all of the three dimen-sional building hypotheses generated by the algo-rithm described in section 5.4. Note that a buildinghypothesis may still have undetermined parameters,hence the name parameterized view hierarchy.

a)

c) d)

b)

Figure 12: Aggregation: a) The reconstructed cor-ners; b) Best fitting building aggregates after all in-dexing steps; c) The three building aggregates in thefront are merged; d) Connection of these building ag-gregate results in a closed building hypothesis.

Level 2: For each building hypothesis all topological-ly different views are stored on this level. In contrastto an aspect hierarchy, we assume a fixed eye posi-tion defined by the known camera parameters of theaerial images. Ideally this would lead to one viewper building hypothesis and image. But uncertaintyof the camera parameters and free parameters ofthe building hypotheses make it necessary to do asampling of this parameter space, which may resultin more than one view.

Level 3: Each view is decomposed into its regions.A region is either a (part of a) face projection or a(part of a) shadow region.

Level 4: The lowest level consists of image features,namely points, lines and regions, and their interrela-tionships e.g. line parallelism.

The image model currently employed for hypothesisverification utilizes a pinhole sensor model with weakperspective projection of the visible object contours.Knowledge about the date and time when the aerialimages were taken, the geographical position of thebuilding hypotheses, their geometrical appearanceand physical material of the different parts of a build-ing allow the inclusion of lighting information. Cur-rently we are investigating view representations whichemploy a standard lighting model including ambientand directed light and diffuse reflection. The analysisof illumination and shadows allows the derivation of

3D object parameters and to test the consistency ofshadow-ground transitions and inter-surface intensityratios [Ste97].

6 VERIFICATION OF BUILDING HYPOTHESES

The previous processing step can produce differentpossible 3D building hypotheses depending on theambiguity and the quality of the 3D corner observa-tions. This is illustrated in figure 13. Furthermore, the

Figure 13: Generated building hypotheses for threereconstructed roof corners. The corner on the bot-tom right does not fit in completely with the T-shapedbuilding on the right. Thus, two more hypotheseswere generated neglecting the corner.

3D hypotheses are projected into the different pos-sible 2D building views. Since free parameters mayremain in the hypotheses (e.g. the building width andheight in figure 13), the verification has to accomplishtwo tasks: 1) From the different hypotheses select theone which has most support in the image resp. rejectall hypotheses, if there cannot be found enough ev-idence for any of them in the image. 2) Determinethe actual geometric extent of the 2D building viewsin each image and estimate the unknown 3D buildingparameters.

For the identification of the appropriate building viewthe following circumstances have to be considered:

1.) Due to occlusions and noise it is likely that notevery part of a building view can be observed in theimage. For this reason, the verification will be doneon the lowest level of our building model hierarchy.Building views are decomposed into point, line, andregion features.

2.) Because of possibly free parameters of the build-ing views, the absolute coordinates and geometricproperties of image features (like e.g. line lengths,area of regions) are not fixed in general. Above, dueto noise, shadows and low contrast the extracted im-age features often are fragmented into smaller parts.

Therefore, no simple feature based matching, com-paring the attributes of building view and image fea-tures, can be done. Geometrical and topological re-lations, however, are stable wrt. free building param-eters. These relations are also more robust in caseof disturbances (for example, fragmented lines re-main parallel). Therefore, the identification of an in-stance of a building view in the image is done usingrelational matching (cf. [HS93, Vos92]).

3.) The selection of the best fitting hypothesis resp.the rejection of all hypotheses should be done in aconsistent and conclusive way. Therefore, the em-ployed evaluation function for measuring the similar-ity between a building view and the image data isbased on probability theory, which also provides anobjective interpretation of the reconstruction results.

4.) The evaluation function has to take into accountthat building hypotheses both of the same and ofdifferent complexity have to be compared (see fig-ure 13). This is achieved by incorporating the com-plexity of the building views into the evaluation func-tion using the Minimum Description Length Principle(MDL, cf. [Ris87]). The application of the MDL prin-ciple requires optimally coded hypotheses. As anapproximation, a compact encoding scheme for therelational representation of arbitrary building viewshad to be developed.

The verification process is done in two steps: 1.) Forevery building view find the best matching to the ex-tracted image features (maximum likelihood estima-tion). Reject every hypothesis for which too little ev-idence was found in the image data. 2.) From allremaining hypotheses determine the most probableone (maximum a posteriori classification) and finally,estimate the building parameters.

In the following, we first show which types of relationsare used for the relational representation of buildingviews. Then we describe the evaluation function formeasuring the similarity of relational structures andexplain how the needed probabilities are acquired.Finally, it is shown how the evaluation function is ex-tended to realize the MDL criterion.

6.1 Relational Representation of Building Views

The relational representation for 2D building views isdefined by a set of relations. The selection of the ap-propriate relations is of utmost importance, becauseof the following two considerations:

1.) The search for an instance of a building view isdone by finding the image structures having the max-imum relational similarity with the building view. Forthis task, the set of relations representing a build-ing view has to be specific enough to discriminatethe image features that correspond to a building viewfrom image features of the background.

2.) Different building views must have different rela-tional representations to allow unambigous classifi-cation. This becomes particularly important for re-sembling building views.

Furthermore, each type of relation has to satisfy thefollowing condition: As explained above, the geom-etry of 2D building views may remain unfixed. Therelational representation of a building view, however,should abstract from this variability and must remainconstant for every possible geometrical extent of thisbuilding view. Therefore, the employed relations mustbe stable within all possible geometric instantiationsof a given building view.

We have found out, that the following set of relationsmeets these requirements (cf. [Kol99]):Feature Adjacency: The feature adjacency relationstates that two image features have to be neigh-boured. Due to the applied image model (cf. sec-tion 4.1) only features of different type are assumedto be neighboured, namely (point,line), (point,region),and (line,region).

Line Parallelism: states that two line segments haveto be parallel. It is realized by computing the angledifference between both lines and the test wether itis smaller than a given threshold value.

Collinearity: This relation defines that a line and apoint feature will be considered collinear, iff the dis-tance between the point and the straight line goingthrough the line segment is below a given threshold.

Lines on the same side: The Same Side Lines(SSL) relation constrains two line segments to lie inthe same halfplane that is defined by the straight linegoing through a third line segment. Because this isa rather unusual relation, it is illustrated in figure 14.

��

��

refL

L1half planesame

L2

L2

L4

L5

L6

L7

L3L1

Figure 14: Definition of the (ternary) Same Side Linerelation. Left: Three line segments ��, �� and ��

satisfy the Same Side Lines relation, iff �� and ��

lie in the same half plane that is spanned by the linerunning through �� . Right: All Same Side Lines re-lations of the top view of a saddle roof (��

denote the lines).

Distinctness of Features: This relation states thattwo different building view features have to be as-signed different extracted image features. This rela-tion is generated for all pairs of features of a buildingview.

In order to search for an instance of a building view inthe image, each view is transformed to its relationaldescription. Since building hypotheses are gener-ated at runtime, the relational representation for eachbuilding view is derived dynamically, too. For a moredetailed explanation refer to [Kol98, Kol99].

6.2 Relational Matching of a Building View withthe Extracted Image Features

Finding a relational matching between a building viewand the extracted image features amounts to thesearch for an assignment of appropriate image fea-tures to the features of the building view, such thatthe image features satisfy the same relations that aregiven between the building view’s features.

This task defines a combinatorial problem and is com-putationally hard to solve (see [HS79]). Thus, effi-cient search techniques have to be applied to find thebest matching. We employ constraint solving tech-niques — but with a special extension: Standard con-straint solving methods [Mes89] demand that everyobject part is observable and extracted by the imagesegmentation, and that every constraint can be sat-isfied. They fail, if only one of these conditions doesnot hold. Due to occlusions, noise and segmenta-tion errors this happens rather often. Therefore, thestandard constraint techniques were extended to al-low the explicit handling of unobserved features andconstraint violations. These operational aspects areexplained in detail in [Kol98, KPC99]. In the follow-ing, we will focus on the description of the evaluationfunction, being used to measure the similarity of aninexact matching between a building view hypothesisand the extracted image features.

In literature several evaluation functions for measur-ing the similarity resp. difference between two rela-tional descriptions have been proposed (see [SH85,BK88, Vos92]). Following the line of Boyer and Kak[BK88] and Vosselman [Vos92] we employ an evalua-tion function that is based on information theory. Re-lational matching can be regarded as a communica-tion problem, where the first relational description ,�

is transmitted over a discrete, noisy communicationchannel and is received as a somewhat distorted de-scription ,�. The similarity between the transmittedstructure ,� and the received structure ,� is mea-sured by the mutual information ��,� � ,�� of bothstructures. The mutual information is a measure forthe true information content that is transmitted overthe noisy channel (cf. [Ing71]).

Since a matching ' assigns image features (on thetransmitter side) to features of the building view (onthe receiver side), it defines a channel. Vosselmanshowed that finding the assignment ' (the best chan-nel) with the maximum mutual information is equalto maximizing � �'�,�� ,��, which means that ' is

the most likely matching for the given relational struc-tures.

Building views are represented by sets of relations.Therefore, the mutual information between the build-ing view and the image features is defined by the sumof the mutual information of every single relation:

��-�� .'� � ��

��

��'�� (3)

'�� means relation ��'�� '�� betweenthe image features that are assigned by the match-ing '. �� denote the features of the buildingview that are related by ��. In the following '�� willbe referred to as ��, denoting that this is a relation be-tween extracted image features. Using this notation,the mutual information for a relation on features of thebuilding view and the image is defined as

��

� ��[Bit]. (4)

The conditional probability � �� models the char-acteristics of the observation process. For instance,probability � �� .�/�� .�/�� expresses thechances that if relation �� is true in the building viewit can also be observed between the correspondingfeatures in the image. Actually, we estimate theseprobabilities from manually trained matchings. The apriori probability � �� for observing image relation ��can be derived from the a priori probability � �� byapplying Jeffrey’s rule of total probability, getting

��

��

[Bit]. (5)

Since every relation can only be true or false, the apriori probability � �� has the two values � �� .�/�� and � �� )0��. Hence, the conditionalprobability � �� is given by a 2�2 table. Tables 1and 2 show an example for the computation of themutual information.

�� 0.17�� 0.83

� ��

�� 0.67 0.01�� 0.33 0.99

Table 1: Example for a priori probabilities for buildingview relation �� (left) and conditional probabilities ofimage relation �� wrt. building view relation �� (right).

��

�� 2.45 �3.61�� 1.41 0.17

Table 2: Resulting mutual information calculated fromthe probabilities shown in table 1 using equation 5.

From this example one can see that if the relation istrue in the building view and also holds for the cor-responding image features, the matching will be sup-ported by 2.45 bit. Otherwise, if the relation cannotbe observed on the assigned image features, the re-lation contradicts the matching by 1.41 bit. In case

that one incident feature cannot be observed in theimage the relation is rated 0, because then it is unde-fined and cannot support or contradict a matching.

In the following two sections it is described how the apriori probabilities � �� and the conditional proba-bilities � �� are derived. Because of the differentlikely visibility of roof parts and walls, relations aresubdivided into roof and wall relations having theirown probability distributions. A roof feature is eithera roof region or a line resp. a point that is incidentto a roof region. The information, which regions areroof regions, is derived from the building model. Ev-ery relation that is incident exclusively to roof featuresbelongs to the set of roof relations, otherwise to theset of wall relations. For the sake of simplicity thisdistinction is omitted in the following.

Computation of the a priori probabilities. � ��expresses the probability that a relation between itsincident features is true for a given building view. Itis computed for each relation type from the ratio ofthe number �"�

�� of existing relations �� "�� and

the number of possible relations �"�� of the buildingview:

� �� .�/�� "�

��

�"��

� �� )0�� .�/�� (6)

The number of possible relations is bounded by thecardinality of the relation type. For a given set ofmodel features 1 the cardinality of an �-ary relation"� over 1 is in the general case �"�� 1 ��, be-cause every feature can be in relation to every other(including itself). The number of possible relationsreduces, if the relation is symmetric or reflexive. Forexample, if one line �� is parallel to another line ��,�� represents the same relation as �� .Table 3 shows the properties and cardinalities of thedifferent employed relations.

Relation �� Arity Properties Cardinality ��

FAG(Line,Point) 2 irrefl., asymm. ��

FAG(Region,Line) 2 irrefl., asymm. � � � ��

FAG(Region,Point) 2 irrefl., asymm. � � � ��

Line parallelism 2 refl., sym., transitive ��

Collinearity 2 irrefl., asymm. ��

Same Side Lines 3 irrefl., arg. 2 & 3 sym. ��

��(Points) 2 irrefl., symm. ��

��(Lines) 2 irrefl., symm. ��

��(Regions) 2 irrefl., symm. ��

Table 3: Properties and cardinalities of the employedrelations. �� , �� and �� denote the number ofpoints, lines, and regions of the building view. FAGmeans the feature adjacency relation and � the dis-tinctness relation.

Training of the conditional probabilities. The con-ditional probability � �� expresses the chancesthat a model relation �� can be observed betweencorresponding features in the image. It is a measureof the stability resp. robustness for each type of rela-tions wrt. to noise and disturbing effects of the imagesegmentation.

The probabilities are estimated from training matchesby building the ratios of the number of relations �"�

� �that hold in the image and the number of given rela-tions in the corresponding building views �"�

��, where�"�

� � � �"��. For instance, � �� .�/�� .�/�� is

estimated by

� �� .�/�� .�/�� "�

� �

�"��

(7)

Since relations are only distinguished into true or false,� �� is given as a 2�2-matrix for every relationtype (cf. table 1).

6.3 Finding the Most Probable Model

In the previous section it was shown how the mostlikely matching between one building view and theextracted image features is determined. The maxi-mization of the mutual information always results in amatching — indepent of the true resemblance of thehypothesis and the image data. Therefore, a statis-tical test is defined on the expected amount of infor-mation needed for a reliable matching, ruling out alldefinitely wrong hypotheses (for details see [Kol99,Vos92]).To conclude the verification step, it has to be deter-mined which of the remaining building views will bethe most probable one. This is problematical, be-cause in general the building views have not the samecomplexity and thus may be represented by a differ-ent number of relations. The maximum achievablemutual information between the building views andthe image data increases with the complexity of a hy-pothesis. Therefore, the matching scores (maximummutual information) of different hypotheses are not di-rectly comparable.To overcome this difficulty we employ the MinimumDescription Length Principle (MDL) which was intro-duced by Rissanen in [Ris87]. The MDL criterion bal-ances between the goodness of fit between hypothe-sis and data on the one hand and the complexity ofthe hypothesis on the other hand. In [Kol99] we showthat the matching scores between building views andimage data can be normalized, if from each score thecomplexity of the corresponding building view is sub-tracted. The complexity of a building view is mea-sured by the size of the bit string that is needed tofully encode the view. If the encoding scheme is opti-mal in the sense that it achieves the minimum length,due to the MDL principle the hypothesis with the high-est score after subtraction of its coding length is themost probable explanation of the data.

The mutual information between building view andimage data is defined by the mutual information ofthe relations. Thus, a compact coding scheme for therelational representation of building views is neededas an approximation for the optimal encoding.

Coding of relational building views. Relationalstructures can be represented by (hyper) graphs. In-stead of using standard data structures for the encod-ing of graphs like adjacency matrices and adjacencylists (cf. [AHU87]), we apply a more efficient encod-ing scheme that is inspired by the work of Cook andHolder from the field of machine learning about au-tomatic substructure discovery [CH94]. Permutationseries are used to encode the corresponding adja-cency matrix of a given graph. Each row of the ad-jacency matrix can be seen as a bit string of fixedlength �. When all rows are appended, we get onestring of length �� consisting of 2 1’s and �� 2 0’s.Since there exist exactly

��

�different possibilities to

place the 2 1’s on the bit string, and these combina-tions may be enumerated by an algorithm, we simplystore the number of 1’s and the number of the permu-tation to represent the graph. Thus, the coding lengthis given by

��!��'� � ��

��

2

�[Bit]. (8)

Graphs can be encoded even more compact if theunderlying relations possess structural properties likesymmetry, reflexivity, and transitivity (cf. table 3 onpage 15). In case of a symmetrical relation the adja-cency matrix of the corresponding graph is mirroredat the main diagonal. Therefore, it is sufficient to en-code only the entries on one side of the diagonal. Fortransitive relations, adjacent nodes in the correspond-ing graph form cliques, that can be sparsed to mini-mal spanning trees, reducing the number of 1’s. Torestore the original structure, one only has to com-pute the transitive closure of the encoded graph.

For the encoding of the relational representation ofbuilding views not only the structural properties ofthe different relations have to be considered, but alsothe interdependencies of the relations. For instance,the feature adjacency relation between regions andpoints can be derived from the feature adjacency re-lations between regions and lines in conjunction withthe feature adjacency relation between lines andpoints. Thus, the former must not be included in thecalculation of the overall coding length.

Furthermore, only structures that lead to codes withvariable length have to be considered. Model infor-mation with a constant coding length such as the num-ber of points, lines, and regions of a building vieware the same for all different building views and canbe neglected in the classification. Table 4 shows thedifferent types of coding lengths that are taken into

account for the computation of the total code lengthof building views. A more detailed explanation of thecoding scheme is given in [Kol99].

No. of points �� 9 14 19No. of lines �� 12 21 27No. of regions �� 4 8 9

No. of relations�FAG(region,line)� 17 34 42�FAG(line,point)� 24 42 54�Line parallelism� 11 30 53�Same Side Lines� 17 34 38

�FAG(region,line) 46.00 123.00 160.14�FAG(line,point) 76.08 159.91 229.39�parallel 35.58 70.99 107.50�SSL 4.09 5.09 22.16��roof regions� 2.00 3.00 3.17Total coding length � 163.75 361.99 522.36

Table 4: Coding lengths for three building views of dif-ferent complexity seen from the same perspective. Allcoding lengths are given in Bit. As expected, the cod-ing length increases with the complexity of the build-ing view.

Example. Figure 15 on the next page shows an ex-ample for the relational matching and classificationapplied to a portion of an aerial image showing aL-shaped hip roof house. Three hypotheses weregenerated (simple, L-shaped, and T-shaped hip roofhouse). The mutual information and the assigned im-age features of the best matchings for all three build-ing views are shown. As expected, the mutual infor-mation increase as the building views become morecomplex. On the bottom row one can see that thesubtraction of the coding length correctly normalisesthe matching scores. The minimal distance betweenthe highest total score and the others is about 40 bits,indicating a clear decision for the (correct) L-shapedhip roof building hypothesis.

6.4 Final Parameter Estimation

A building hypothesis is successfully verified, whennot all generated hypotheses were rejected. In thiscase the most probable building view is selected fromeach image. The determined matchings assign ex-tracted image features to the features of the buildingviews, fixing the possibly unknown geometric extentsof the 2D views. In order to estimate the parametersof the underlying 3D building hypothesis, the obser-vations (assigned extracted image features) for eachimage are inserted into the building view specific pro-jection equations, resulting in one nonlinear equationsystem. Finally, the 3D building parameters are esti-mated simultaneously using the same procedure asdescribed in section 5.3 on page 10.

Image Hypotheses (2D building views)

best

matching '

��Hypot.� Image� 85.2 224.24 237.56

� 107.31 206.09 325.19

�� 22.11 18.15 �87.63

Figure 15: Identification of the most probable modelfor the given image. Each column shows a buildinghypothesis, the extracted image features (highlightedwrt. the best matching), the mutual information of thebest matching, the coding length of the building view,and the final rating. As can be seen from the bottomrow, even from very noisy data the L-shaped hypoth-esis correctly is selected.

7 CONCLUSIONS

In this paper we have shown how complex buildingsare automatically reconstructed from aerial imagesusing a component-based building model. The recon-structed buildings are represented by parameterizedcomponents, which beside the determined geometricshape and location contain also the information aboutthe building type and the corresponding parameters.

The hierarchical structure of the building model re-flects the four levels of semantic abstraction. It en-ables the stepwise interpretation of the initially ex-tracted image features. The coherent modeling in 2Dand 3D allows the verification of hypotheses at anyaggregation level.

Building specific knowledge is used in all stages ofthe analysis to reduce the search space for hypo-thesis generation, to rate the matchings between dataand model, and to increase the accuracy of the geo-metric reconstruction. The most important benefitswrt. the three main steps of the reconstruction pro-cess are:

� The transition from image space to object space isrestricted to building corner types. Only corners thatactually appear as a part of an existing building partprimitive will be reconstructed.

� The construction of aggregates is restricted to sen-sible building structures and sizes by the buildingspecific aggregation relations resp. parameter con-straints.

� Building specific constraints of generated buildinghypotheses are propagated down to the lowest level

of our hierarchy and are used to perform the rela-tional matching of building view features with the im-age features during the final verification. The knowl-edge about the interdependence of different relationtypes is taken into account to accomplish a compactencoding of relational representations of building hy-potheses, thus making the application of the MDLprinciple for classification feasible.

The presented concept has been implemented andshows promising results in suburban areas. It wassuccessfully applied to the international data setwhich was distributed from the ETH Zurich for theAscona Workshop 1995 on Automatic Extraction ofMan-Made Objects from Aerial and Space Imagery(cf. [GKA95, MBS94]). All buildings in the imagescould be reconstructed. In contrast to other approach-es, the type and the specific parameters like height,roof height, width and length even of the complexbuildings were determined. The detailed results aregiven in [FKL�98].

ACKNOWLEDGEMENTS

This work was done within the project ”Semantic Mod-eling and Extraction of Spatial Objects from Imagesand Maps”, especially in the subproject ”Building Ex-traction” funded by the German Research Council(DFG). Thanks go to Wolfgang Forstner, Lutz Plumer,Volker Steinhage, and Armin B. Cremers who helpedwith many discussions and critical comments to de-velop these ideas. We thank the DFG for supportingour work.

REFERENCES

A. Aho, J. E. Hopcroft, and J. D. Ullman. DataStructures and Algorithms. Addison-Wesley, London,1987.

F. Bignone, O. Henricsson, P. Fua, and M. Stricker.Automatic Extraction of Generic House Roofs fromHigh Resolution Aerial Images. In Computer Vi-sion — Proceedings of ECCV ’96, Lecture Notes inComputer Science No. 1064, pages 85–96. Springer-Verlag, Berlin, 1996.

K. L. Boyer and A. C. Kak. Structural Stereopsisfor 3D-Vision. IEEE Transactions on Pattern Analy-sis and Machine Intelligence, 10(2):144–166, March1988.

C. Braun, T. H. Kolbe, F. Lang, W. Schickler, V. Stein-hage, A. B. Cremers, W. Forstner, and L. Plumer.Models for Photogrammetric Building Reconstruc-tion. Computer � Graphics, 19(1):109–118, 1995.

D. J. Cook and L. B. Holder. Substructure Discov-ery Using Minimum Description Length and Back-ground Knowledge. Journal of Artificial IntelligenceResearch, 1:231–255, 1994.

S. J. Dickinson, A. P. Pentland, and A. Rosenfeld. 3-DShape Recovery Using Distributed Aspect Matching.IEEE Transactions on Pattern Analysis and MachineIntelligence, 14(2):174–198, 1992.

R. Englert and E. Gulch. A One-Eye Stereo Sys-tem for the Acquisition of Complex 3D-Building Struc-tures. GEO-INFORMATION-SYSTEMS, Journal forSpatial Information and Decision Making, 9(4):16–21,August 1996.

R. Englert. Systematic Acquisition of Generic 3DBuilding Model Knowledge. In W. Forstner andL. Plumer, editors, Semantic Modeling for the Acqui-sition of Topographic Information from Images andMaps, pages 181–195. Birkhauser Verlag, Basel,Switzerland, 1997.

P. Fua and A.J. Hanson. Resegmentation usinggeneric shape locating general cultural objects. Pat-tern Recognition Letters, 5:243–252, 1987.

A. Fischer, T. H. Kolbe, and F. Lang. Integration of2D and 3D Reasoning for Building ReconstructionUsing a Generic Hierarchical Model. In W. Forstnerand L. Plumer, editors, Semantic Modeling for theAcquisition of Topographic Information from Imagesand Maps, pages 159–180. Birkhauser Verlag, Basel,Switzerland, 1997.

A. Fischer, T. H. Kolbe, F. Lang, A. B. Cremers,W. Forstner, L. Plumer, and V. Steinhage. Extract-ing Buildings from Aerial Images using HierarchicalAggregation in 2D and 3D. Computer Vision & ImageUnderstanding, 72(2), 1998.

C. Fuchs, F. Lang, and W. Forstner. On the Noise andScale Behaviour of Relational Descriptions. In Ebner,Heipke, and Eder, editors, ISPRS, volume 30, pages256–267. SPIE, 1994.

W. Forstner and L. Plumer, editors. Semantic Mod-eling for the Acquisition of Topographic Informationfrom Images and Maps. Birkhauser Verlag, Basel,Switzerland, 1997.

C. Fuchs. Extraktion polymorpher Bildstrukturen undihre topologische und geometrische Gruppierung.PhD thesis, Deutsche Geodatische Kommission,Munchen, Vol. C 502, 1998.

A. Grun, , E. P. Baltsavias, and O. Henricsson, edi-tors. Automatic Extraction of Man-Made Objects fromAerial and Space Images (II). Birkhauser, Basel,1997.

A. Grun, O. Kubler, and P. Agouris, editors. Auto-matic Extraction of Man-Made Objects from Aerialand Space Images. Birkhauser, Basel, 1995.

E. Gulch. A Knowledge-based Approach to Recon-struct Buildings in Digital Aerial Imagery. In L. W.Fritz and J. R. Lucas, editors, International Archivesof Photogrammetry and Remote Sensing, volume 29,B2, pages 410–417, 1992.

O. Henricsson and E. P. Baltsavias. 3-D Building Re-construction with ARUBA: A Qualitative and Quanti-tative Evaluation. In A. Grun, E. P. Baltsavias, andO. Henricsson, editors, Automatic Extraction of Man-Made Objects from Aerial and Space Images (II),pages 139–148. Birkhauser, Basel, 1997.

S. Heuel, F. Lang, and W. Forstner. Topological andgeometrical reasoning in 3d grouping for reconstruct-ing polyhedral surfaces. Internal report, Institute forPhotogrammetry, Bonn University, 1999.

R. M. Haralick and L. G. Shapiro. The Consistent La-beling Problem, Part I. IEEE Transactions on PatternAnalysis and Machine Intelligence, 1:173–184, 1979.

R. M. Haralick and L. G. Shapiro. Computer andRobot Vision, volume II. Addison-Wesley PublishingCompany, New York, 1993.

M. Hendrickx, J. Vandekerckhove, D. Frere, T. Moons,and L. Van Gool. On the 3D Reconstruction of HouseRoofs from Aerial Images of Urban Areas. In In-ternational Archives of Photogrammetry and RemoteSensing, volume 32, Part 3-4W2, pages 87–96. IS-PRS, 1997.

F. M. Ingels. Information and Coding Theory. IntextEducational Publishers, San Francisco, Toronto, Lon-don, 1971.

C. Jaynes, A. Hanson, and E. Riseman. Model-BasedSurface Recovery of Buildings in Optical and RangeImages. In W. Forstner and L. Plumer, editors, Se-mantic Modeling for the Acquisition of TopographicInformation from Images and Maps, pages 211–227.Birkhauser Verlag, Basel, Switzerland, 1997.

T. H. Kolbe. Constraints for Object Recognition inAerial Images — Handling of Unobserved Features.In M. Maher and J.-F. Puget, editors, Principles andPractice of Constraint Programming – Proceedings ofCP’98, Lecture Notes in Computer Science No. 1520.Springer-Verlag, Berlin, 1998.

T. H. Kolbe. Identifikation und Rekonstruk-tion von Gebauden in Luftbildern mittels un-scharfer Constraints. PhD thesis, Institut furUmweltwissenschaften, Hochschule Vechta, 1999.

T. H. Kolbe, L. Plumer, and A. B. Cremers. Identifi-cation of Buildings in Aerial Images Using Constraint

Relaxation and Variable Elimination. IEEE IntelligentSystems, to appear, 1999.

F. Lang. Geometrische und semantische Rekon-struktion von Gebauden durch Ableitung von 3D-Gebaudeecken. PhD thesis, Institut fur Photogram-metrie, Universitat Bonn, 1999.

C. Lin, A. Huertas, and R. Nevatia. Detection of Build-ings from Monocular Images. In A. Grun, O. Kubler,and P. Agouris, editors, Automatic Extraction of Man-Made Objects from Aerial and Space Images, pages125–134. Birkhauser, Basel, 1995.

F. Lang, T. Locherbach, and W. Schickler. A one-eyestereo system for semi-automatic 3d-building extrac-tion. Geomatics Info Magazine, 1995.

S. Mason, E. Baltsavias, and D. Stallman. High preci-sion photgrammetric data set for building reconstruc-tion and terrain modelling. Internal report, Institute ofGeodesy and Photogrammetry, ETH Zurich, 1994.

P. Meseguer. Constraint Satisfaction Problems: AnOverview. AICOM, 2(1):3–17, 1989.

R. Mohan and R. Nevatia. Perceptual Grouping forthe Detection and Description of Structures in AerialImages. In Proceedings of the Image UnderstandingWorkshop 1988, 1988.

R. Mohan. Perceptual Organization for Computer Vi-sion. PhD thesis, Institute for Robotics and IntelligentSystems, University of Southern California, 1989.

J.C. McGlone and J.A. Shufelt. Projective and objectspace geometry for monocular building extraction. InProceedings Computer Vision and Pattern Recogni-tion, pages 54–61, 1994.

R. Nevatia, C. Lin, and A. Huertas. A System forBuilding Detection from Aerial Images. In A. Grun,E. P. Baltsavias, and O. Henricsson, editors, Auto-matic Extraction of Man-Made Objects from Aerialand Space Images (II), pages 77–86. Birkhauser,Basel, 1997.

R. Nevatia and K. E. Price. Locating Structures inAerial Images. IEEE Transactions on Pattern Analy-sis and Machine Intelligence, 4(5):476–484, 1982.

J. Rissanen. Minimum-Description-Length Princi-ple. Encyclopedia of Statistical Sciences, 5:523–527,1987.

K. Schutte. Recognition of buildings from aerial im-ages. In J. A. C. Bernsen, J. J. Gerbrands, A. A.Hoeve, A. W. M. Smeulders, M. A. Viergever, andA. M. Vossepoel, editors, 3rd Quinquennial Review1991-1996 Dutch Society for Pattern Recognition andImage Processing, pages 211–225. NVPHBV, Delft,1996.

L. G. Shapiro and R. M. Haralick. A Metric for Com-paring Relational Descriptions. IEEE Transactions onPattern Analysis and Machine Intelligence, 7(1):90–94, 1985.

L. Spreeuwers, K. Schutte, and Z. Houkes. A modeldriven approach to extract buildings from multi-viewaerial imagery. In A. Gruen, E. P. Baltsavias, andO. Henricsson, editors, Automatic Extraction of Man-Made Objects from Aerial and Space Images (II),pages 109–118. Birkhauser Verlag, Basel, 1997.

V. Steinhage. On the Integration of Object Modelingand Image Modeling in Automated Building Extrac-tion from Aerial Images. In A. Grun, E. P. Baltsavias,and O. Henricsson, editors, Automatic Extraction ofMan-Made Objects from Aerial and Space Images(II), pages 139–148. Birkhauser, Basel, 1997.

V. Steinhage. Zur automatischenGebauderekonstruktion aus Luftbildern. Habili-tationsschrift, Institut fur Informatik, Universitat Bonn,1998.

C.J. Taylor, P.E. Debevec, and J. Malik. Reconstruct-ing polyhedral models of architectural scenes fromphotographs. In B. Buxton B. and R. Cipolla, editors,Computer Vision - ECCV ’96, Vol. II, Proc. of the 4thEurop. Conf. on Computer Vision, pages 659–668.Lecture Notes in Computer Science, 1065, Springer-Verlag, 1996.

G. Vosselman. Relational Matching. Lecture Notes inComputer Science No. 628. Springer Verlag, Berlin,1992.

U. Weidner. Digital Surface Models for Building Ex-traction. In Grun et al. [GBH97], pages 193–202.

U. Weidner and W. Forstner. Towards AutomaticBuilding Extraction from High Resolution Digital El-evation Models. ISPRS Journal of Photogrammetryand Remote Sensing, 50(4):38–49, 1995.

Documents

ON THE USE OF GEOMETRIC AND SEMANTIC MODELS FOR … · Besides the geometric description an increasing number of applications also demand thematic information about acquired buildings