13
ARTICLE IN PRESS JID: YCVIU [m5G;July 7, 2016;21:54] Computer Vision and Image Understanding 000 (2016) 1–13 Contents lists available at ScienceDirect Computer Vision and Image Understanding journal homepage: www.elsevier.com/locate/cviu Efficient architectural structural element decomposition Nikolay Kobyshev a,, Hayko Riemenschneider a , András Bódis-Szomorú a , Luc Van Gool a,b a Computer Vision Laboratory, ETH Zurich, Sternwartstrasse 7, CH-8092 Zurich, Switzerland b Vision for Industry Communications and Services (VISICS), KU Leuven, Kasteelpark Arenberg 10, B-3001 Heverlee, Belgium a r t i c l e i n f o Article history: Received 30 November 2015 Revised 10 May 2016 Accepted 16 June 2016 Available online xxx Keywords: 3D city model Architecture Structure Element Landmark Decomposition Optimization a b s t r a c t Decomposing 3D building models into architectural elements is an essential step in understanding their 3D structure. Although we focus on landmark buildings, our approach generalizes to arbitrary 3D ob- jects. We formulate the decomposition as a multi-label optimization that identifies individual elements of a landmark. This allows our system to cope with noisy, incomplete, outlier-contaminated 3D point clouds. We detect four types of structural cues, namely dominant mirror symmetries, rotational sym- metries, shape primitives, and polylines capturing free-form shapes of the landmark not explained by symmetry. Our novel method combine these cues enables modeling the variability present in complex 3D models, and robustly decomposing them into architectural structural elements. Our proposed archi- tectural decomposition facilitates significant 3D model compression and shape-specific modeling. © 2016 Elsevier Inc. All rights reserved. 1. Introduction Modeling our environment is a common strive in photogram- metry, computer vision and graphics. 3D modeling from imagery has been going through a great evolution over the past decades, maturing methods like incremental Structure-from-Motion (SfM) Locher et al. (2016), Tingdahl and Van Gool (2012), Wu (2013), Internet-scale point cloud reconstruction from imagery Agarwal et al. (2009), high-accuracy detailed surface reconstructions via dense Multi-View Stereo (MVS) Furukawa and Ponce (2010), Hiep et al. (2012), and ach ieved success in procedural modeling of fa- cades Müller et al. (2007), Riemenschneider et al. (2012). LiDAR is an alternative dominant technology to obtain point clouds of urban scenes Toshev et al. (2010). In this work, we tackle the abstraction and understanding of 3D point clouds delivered by such state-of-the-art technologies. Pla- nar priors Bódis-Szomorú et al. (2014, 2015), Sinha et al. (2009), Werner and Zisserman (2002), or a Manhattan-world assumption Vanegas et al. (2010) proved to be enough for many man-made structures. However, for a large mass of buildings, especially land- mark architecture or general objects, a simple shape primitive ab- straction will not suffice. Instead, we propose to decompose a 3D reconstruction by exploiting symmetries within the model. Such a decomposition is a first step towards understanding and compactly modeling the architectural elements of a landmark. Corresponding author. E-mail address: [email protected] (N. Kobyshev). Our method is based on weak architectural priors that natu- rally hold for a majority of buildings, namely mirror symmetries, rotational symmetries and wall verticality. The method starts with a semi-dense 3D point cloud that may be contaminated by noise and gross outliers, and may be highly inhomogeneous. Structure- from-Motion (SfM) point clouds often suffer from such contamina- tion. We show how to robustly detect structural cues, more pre- cisely, axis directions of dominant mirror symmetries, the pivot of rotational symmetries, shape primitives, and free-form parts that are not explained by the symmetries. These cues provide a strong guidance for extracting dominant and semantically mean- ingful components of a model, such as a wall, a tower, an arch, etc., as illustrated in Fig. 1. We refer to these components as architec- tural structural elements (ASE) throughout this work. We formulate the decomposition problem as an energy-driven, multi-label point cloud segmentation. Our contributions: A model that combines symmetries and free-form polylines for decomposing a point cloud into ASEs, Methods for detecting structural cues (dominant mirror sym- metries and bodies of rotation, as well as residual free-form parts) in point clouds, A global energy formulation and optimization approach for par- titioning a point cloud into meaningful structural components based on structural cues. Our proposed abstraction paves the road to 3D model compres- sion Wu and Agarwal (2012) or to shape-specific models Bao et al. (2013), Dame et al. (2013). http://dx.doi.org/10.1016/j.cviu.2016.06.004 1077-3142/© 2016 Elsevier Inc. All rights reserved. Please cite this article as: N. Kobyshev et al., Efficient architectural structural element decomposition, Computer Vision and Image Un- derstanding (2016), http://dx.doi.org/10.1016/j.cviu.2016.06.004

ARTICLE IN PRESSkonijn/publications/2016/...For Haussmannian archi- tecture, strong priors or regular floors may be sufficient to model the buildings Teboul et al. (2010). For more

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • ARTICLE IN PRESS JID: YCVIU [m5G; July 7, 2016;21:54 ]

    Computer Vision and Image Understanding 0 0 0 (2016) 1–13

    Contents lists available at ScienceDirect

    Computer Vision and Image Understanding

    journal homepage: www.elsevier.com/locate/cviu

    Efficient architectural structural element decomposition

    Nikolay Kobyshev a , ∗, Hayko Riemenschneider a , András Bódis-Szomorú a , Luc Van Gool a , b

    a Computer Vision Laboratory, ETH Zurich, Sternwartstrasse 7, CH-8092 Zurich, Switzerland b Vision for Industry Communications and Services (VISICS), KU Leuven, Kasteelpark Arenberg 10, B-3001 Heverlee, Belgium

    a r t i c l e i n f o

    Article history:

    Received 30 November 2015

    Revised 10 May 2016

    Accepted 16 June 2016

    Available online xxx

    Keywords:

    3D city model

    Architecture

    Structure

    Element

    Landmark

    Decomposition

    Optimization

    a b s t r a c t

    Decomposing 3D building models into architectural elements is an essential step in understanding their

    3D structure. Although we focus on landmark buildings, our approach generalizes to arbitrary 3D ob-

    jects. We formulate the decomposition as a multi-label optimization that identifies individual elements

    of a landmark. This allows our system to cope with noisy, incomplete, outlier-contaminated 3D point

    clouds. We detect four types of structural cues, namely dominant mirror symmetries, rotational sym-

    metries, shape primitives, and polylines capturing free-form shapes of the landmark not explained by

    symmetry. Our novel method combine these cues enables modeling the variability present in complex

    3D models, and robustly decomposing them into architectural structural elements. Our proposed archi-

    tectural decomposition facilitates significant 3D model compression and shape-specific modeling.

    © 2016 Elsevier Inc. All rights reserved.

    1

    m

    h

    m

    L

    I

    e

    d

    e

    c

    a

    s

    p

    n

    W

    V

    s

    m

    s

    r

    d

    m

    r

    r

    a

    a

    f

    t

    c

    o

    t

    s

    i

    a

    t

    t

    c

    h

    1

    . Introduction

    Modeling our environment is a common strive in photogram-

    etry, computer vision and graphics. 3D modeling from imagery

    as been going through a great evolution over the past decades,

    aturing methods like incremental Structure-from-Motion (SfM)

    ocher et al. (2016) , Tingdahl and Van Gool (2012) , Wu (2013) ,

    nternet-scale point cloud reconstruction from imagery Agarwal

    t al. (2009) , high-accuracy detailed surface reconstructions via

    ense Multi-View Stereo (MVS) Furukawa and Ponce (2010) , Hiep

    t al. (2012) , and ach ieved success in procedural modeling of fa-

    ades Müller et al. (2007) , Riemenschneider et al. (2012) . LiDAR is

    n alternative dominant technology to obtain point clouds of urban

    cenes Toshev et al. (2010) .

    In this work, we tackle the abstraction and understanding of 3D

    oint clouds delivered by such state-of-the-art technologies. Pla-

    ar priors Bódis-Szomorú et al. (2014, 2015) , Sinha et al. (2009) ,

    erner and Zisserman (2002) , or a Manhattan-world assumption

    anegas et al. (2010) proved to be enough for many man-made

    tructures. However, for a large mass of buildings, especially land-

    ark architecture or general objects, a simple shape primitive ab-

    traction will not suffice. Instead, we propose to decompose a 3D

    econstruction by exploiting symmetries within the model. Such a

    ecomposition is a first step towards understanding and compactly

    odeling the architectural elements of a landmark.

    ∗ Corresponding author. E-mail address: [email protected] (N. Kobyshev).

    s

    (

    ttp://dx.doi.org/10.1016/j.cviu.2016.06.004

    077-3142/© 2016 Elsevier Inc. All rights reserved.

    Please cite this article as: N. Kobyshev et al., Efficient architectural str

    derstanding (2016), http://dx.doi.org/10.1016/j.cviu.2016.06.004

    Our method is based on weak architectural priors that natu-

    ally hold for a majority of buildings, namely mirror symmetries,

    otational symmetries and wall verticality. The method starts with

    semi-dense 3D point cloud that may be contaminated by noise

    nd gross outliers, and may be highly inhomogeneous. Structure-

    rom-Motion (SfM) point clouds often suffer from such contamina-

    ion. We show how to robustly detect structural cues , more pre-

    isely, axis directions of dominant mirror symmetries, the pivot

    f rotational symmetries, shape primitives, and free-form parts

    hat are not explained by the symmetries. These cues provide a

    trong guidance for extracting dominant and semantically mean-

    ngful components of a model, such as a wall, a tower, an arch, etc.,

    s illustrated in Fig. 1 . We refer to these components as architec-

    ural structural elements (ASE) throughout this work. We formulate

    he decomposition problem as an energy-driven, multi-label point

    loud segmentation. Our contributions:

    • A model that combines symmetries and free-form polylines for

    decomposing a point cloud into ASEs, • Methods for detecting structural cues (dominant mirror sym-

    metries and bodies of rotation, as well as residual free-form

    parts) in point clouds, • A global energy formulation and optimization approach for par-

    titioning a point cloud into meaningful structural components

    based on structural cues.

    Our proposed abstraction paves the road to 3D model compres-

    ion Wu and Agarwal (2012) or to shape-specific models Bao et al.

    2013) , Dame et al. (2013) .

    uctural element decomposition, Computer Vision and Image Un-

    http://dx.doi.org/10.1016/j.cviu.2016.06.004http://www.ScienceDirect.comhttp://www.elsevier.com/locate/cviumailto:[email protected]://dx.doi.org/10.1016/j.cviu.2016.06.004http://dx.doi.org/10.1016/j.cviu.2016.06.004

  • 2 N. Kobyshev et al. / Computer Vision and Image Understanding 0 0 0 (2016) 1–13

    ARTICLE IN PRESS JID: YCVIU [m5G; July 7, 2016;21:54 ]

    Fig. 1. Our method segments a point cloud of a landmark building into coherent

    architectural structural elements, such as walls, towers based on structural cues.

    The noisy points that do not belong to any component get a random assignment.

    o

    t

    s

    p

    t

    i

    s

    s

    s

    c

    s

    F

    v

    s

    d

    o

    s

    t

    o

    p

    r

    p

    (

    2

    t

    (

    M

    t

    (

    t

    P

    p

    t

    r

    t

    w

    c

    c

    fi

    e

    t

    l

    s

    w

    s

    m

    a

    b

    m

    t

    v

    b

    2. Related work

    Reconstructing surfaces from point clouds is a dominant prob-

    lem in computer vision as well computer graphics. The range

    of research varies from volumetric segmentation to the detection

    of symmetries and repetitions, enforcing shape priors and shape

    primitives. Depending on the architecture or manufacturing Berger

    et al. (2014) there may be other constraints. For example, for sim-

    ple Manhattan-style skyscrapers the modeling can be as simple as

    a rectangular box Vanegas et al. (2010) . For Haussmannian archi-

    tecture, strong priors or regular floors may be sufficient to model

    the buildings Teboul et al. (2010) . For more general architecture,

    more relaxed structural principles such as symmetries have to be

    used Martinovic et al. (2015, 2012) , Riemenschneider et al. (2012) .

    Further even, in the case of real cities with regular planar buildings

    and complex shapes like statues, a hybrid model can be applied

    Lafarge et al. (2013) .

    For landmark architecture there are few rules that hold across

    multiple landmarks, hence a more per-exemplar approach is

    needed. This is the direction we propose in this work, where we

    tackle the decomposition and understanding of architectural struc-

    tures for landmarks.

    2.1. Primitive detection

    In the line of shape priors for arbitrary surface reconstruction,

    there are two general cases. Either the raw data is replaced by a

    fitted shape primitive (hard prior), or in the other case an attrac-

    tion force to the fitted shape primitive is used (soft prior). Both

    hard and soft priors are used in various forms (e.g. primitive fit-

    ting, shape grammars, etc.) to produce robust and clean results.

    Methods for hard priors use robust fitting of models like planes,

    cylinders Fischler and Bolles (1981) . Schnabel et al. (2007) de-

    tect simple primitives such as planes, spheres, cylinders, etc., and

    further extend shape primitives across the remaining surfaces for

    completion Schnabel et al. (2009) .

    For the soft priors the prior is only included within the opti-

    mization which smooths out the final surface by guiding the shape

    as suggested by the prior. Haene et al. Haene et al. (2012) have

    shown this for piecewise planar priors and the works of Bao et al.

    (2013) , Dame et al. (2013) showed this for arbitrary shape priors

    learned from 3D training data.

    Monszpart and Brostow (2015) have recently shown that in-

    doors and outdoors man-made scenes can be efficiently rep-

    resented as regular arrangements of planes. Moreover, just

    piecewise-planar representation can produce very high quality

    models of indoor environments, as shown in Boulch et al. (2014) .

    Plane fitting can be useful to enhance the representation of urban

    data, as shown in Chauve et al. (2010) . Oesau et al. (2015) showed

    that it is possible to interleave plane detection and regularization

    of the scene with the detections. Lafarge et al. (2009) detected

    multiple types of shape primitives in a recognition-style method,

    as they first cluster the input data into planar, concave, convex

    and non-developable surface types. This in turn defines the type

    Please cite this article as: N. Kobyshev et al., Efficient architectural str

    derstanding (2016), http://dx.doi.org/10.1016/j.cviu.2016.06.004

    f primitive to detect and removes much of the complexity of de-

    ecting all feasible shape primitives.

    Verdie and Lafarge (2014) proposed an efficient Monte Carlo

    ampler for detecting parametric objects in large scenes exploiting

    arallel processing and reversible jump between different primitive

    ypes.

    In another work by Lafarge et al. (2010) it is demonstrated that

    ntegration of primitive models can be useful for multi-view recon-

    truction.

    Lafarge et al. (2013) also propose a hybrid solution between

    hape primitives and arbitrary mesh topology. The authors initially

    how how to estimate the fitting of multiple shape primitives effi-

    iently. The hybrid solution then allows for compact models while

    till preserving the details for arbitrary structures.

    Primitive detection can be also done in 2D, on image sequences.

    or example, Pham et al. (2014) fit orthogonal models to multi-

    iew images.

    Overall, these methods provide a better understanding through

    hape priors and primitives. Yet all methods have implicitly two

    rawbacks. First, they still require models for each specific types

    f primitive. Second, the related works cannot handle non-standard

    hape primitive which are large, complex shapes such as architec-

    ural elements (e.g. an entire tower) in landmark buildings.

    To the best of our knowledge, Wu and Agarwal (2012) is the

    nly related work for abstraction of buildings as it tries to decom-

    ose buildings into 2D sweeping profiles. However, in our expe-

    ience, it iteratively finds the profiles and has problems decom-

    osing into these architectural elements. Hence, Wu and Agarwal

    2012) would benefit from a more holistic decomposition.

    .2. Symmetry detection

    Symmetries have been explored in computer vision for a long

    ime and review reports are available Liu et al. (2009) ; Mitra et al.

    2013) . However, among the first to apply it for 3D buildings were

    itra et al. (2006, 2007) , Pauly et al. (2008) .

    Mitra et al. (2006) introduce a voting for symmetries for reflec-

    ion, rotation and translation. In their follow-up work, Mitra et al.

    2007) , they use the voting space to create a symmetrization effect

    o enhance symmetries while maintaining the shape of the model.

    auly et al. (2008) discover structural regularity by detecting re-

    eated structures in 3D objects, which have been generated by

    he means of computer graphics. They simultaneously evaluate the

    epetition pattern and detect the repeating geometric elements.

    Cohen et al. Cohen et al. (2012) take it one step further and let

    he symmetries influence the Structure-from-Motion optimization,

    hereas Kser et al. (2011) exploit mirror symmetries in dense re-

    onstructions from a single view.

    Zheng et al. (2014) rearrange parts of objects within large yet

    lean shape collections. In their recent work, Liu et al. (2015) de-

    ne replaceable substructures and use the shape graph for remod-

    ling.

    Contrary to these approaches, we start from point clouds and

    ackle the decomposition and understanding of real-world 3D

    andmark reconstructions. In summary, (1) we are the first to

    how a working method on noisy data, as other research has only

    orked with symmetries on clean 3D models, (2) others take pre-

    eparated components for structural analysis and we aim to auto-

    ate this separation, (3) our strengths over ill-suited point-wise

    nalysis or recognition is that we can extract high-level groupings

    ased on symmetries and coherent global similarities rather than

    aking local decisions, and (4) we propose a new notion that ex-

    ends from fixed primitives sets to arbitrary shapes, hence we pro-

    ide a more general and scalable approach.

    In particular, our point clouds are built from image-

    ased reconstructions using Structure-from-Motion (SfM) tools

    uctural element decomposition, Computer Vision and Image Un-

    http://dx.doi.org/10.1016/j.cviu.2016.06.004

  • N. Kobyshev et al. / Computer Vision and Image Understanding 0 0 0 (2016) 1–13 3

    ARTICLE IN PRESS JID: YCVIU [m5G; July 7, 2016;21:54 ]

    Fig. 2. An overview of our method: (a) semi-dense point cloud as input, (b) identification of mirror symmetries and centers of bodies of revolution (axes in red), (c)

    extraction of free-form polylines, (d) our segmentation result. (Best viewed in color.)

    T

    n

    t

    s

    m

    f

    o

    3

    c

    t

    o

    i

    t

    t

    r

    i

    n

    N

    b

    s

    g

    p

    a

    r

    t

    v

    i

    l

    fi

    r

    p

    w

    p

    w

    e

    b

    3

    v

    o

    o

    3

    c

    a

    d

    t

    g

    c

    m

    t

    t

    s

    a

    v

    s

    m

    a

    p

    p

    3

    w

    p

    i

    o

    | w

    a

    t

    W

    p

    J

    K

    3

    H

    p

    p

    s

    φ d

    g

    s

    m

    m

    n

    r

    s

    l

    c

    H

    ingdahl and Van Gool (2012) , Wu (2013) and contains many

    oisy points and inaccuracies over clean CAD-style models. Fur-

    her, we propose a method for segmenting complex landmark

    tructures into parts using a range of structural cues such as

    irror symmetries, bodies of rotation, plane primitives, and free-

    orm polylines. More cues can easily be included in our unified

    ptimization scheme.

    . ASE decomposition

    The overall goal of our method is to decompose a 3D point

    loud representing a landmark into semantically meaningful archi-

    ectural structural elements (ASE) of the building. Fig. 2 gives an

    verview of our approach.

    The input to our algorithm is a 3D point cloud of a build-

    ng. In a preprocessing step, we align our input point cloud with

    he gravity vector (we extract it from a raw point cloud using

    he method suggested in Wu and Agarwal (2012) ) and scale it to

    eal-world (metric) scale, which can be easily automatized know-

    ng the (coarse) GPS positions of the cameras. Next, we extract

    ormals by Principal Component Analysis (PCA) over the 3D k -

    N neighborhood of each point. We denote an oriented 3D point

    y p i = (X i , N i ) . Exploiting the natural vertical prior for walls, forome of the cues we project the points and their normals to the

    round. We denote the coordinates and normals of the projected

    oints as p i = (x i , n i ) . The corresponding point height is denoteds h i .

    We only use the 2D projections for detecting the mirror and

    otational symmetries and extracting free-form shapes. By doing

    hat, we assume that the rotational and mirror symmetries have

    ertical lines and planes of rotation. This is true for the vast major-

    ty of buildings (with an exception of, for example, falling towers).

    Next, we perform detection of structural cues. First, we ana-

    yze the point cloud to detect its dominant mirror symmetries (to

    nd opposing walls) and rotational symmetries (to detect bodies of

    evolution, e.g. towers) ( Fig. 2 b). Second, we extract a rough floor

    lan of the point cloud to capture free-form structures that are not

    ell explained by the symmetries ( Fig. 2 c). Additionally, we extract

    lanes and use them as low-level structural cues as well. Finally,

    e formulate a global energy minimization to robustly assign ev-

    ry 3D point to the structural elements that are generated either

    y a symmetry or a free-form shape ( Fig. 2 d).

    It is important to note that the symmetry detection in Section

    .1 and the free-form polyline extraction in Section 3.2 only pro-

    ide symmetries and polylines that enter as hypotheses into a final

    ptimization, discussion of which is detailed in Section 3.4 . This

    ptimization can suppress unlikely structural cues.

    .1. Symmetry analysis

    Symmetries are prominent properties of many landmarks. It is

    ommon for buildings to have self-reflection (e.g. opposite walls

    Please cite this article as: N. Kobyshev et al., Efficient architectural str

    derstanding (2016), http://dx.doi.org/10.1016/j.cviu.2016.06.004

    re often symmetric) or rotational symmetry (e.g. for towers or

    omes).

    In this work, we demonstrate the detection of mirror and ro-

    ational symmetries. We extract them in the aforementioned 2D

    round projection. Our symmetry extraction scheme is general and

    an be extended to other types of symmetries and to 3D sym-

    etries. The main goal is the optimization over various struc-

    ural cues, however additionally we demonstrate our methods in

    wo variants for collecting symmetry evidence, namely Hough-

    pace voting Ballard (1981) , Hough (1959) and RANSAC Fischler

    nd Bolles (1981) . Inspired by Mitra et al. (2006) , we generate

    otes for each pair of points for a symmetry in Hough-space, and

    how the extension of the approach with RANSAC that is less de-

    anding to computational resources and memory. The RANSAC

    pproach is greatly inspired by the Hough-space voting, and we

    resent them both for clearer comparison and analysis of each ap-

    roach.

    .1.1. Point matching

    For Hough voting, in order to prevent filling in the voting space

    ith votes for unlikely 2D symmetries, only a selected subset of all

    ossible point pairs is allowed to vote for symmetries. For simplic-

    ty, our criterion for a point pair { p i , p j } (a matching pair from now

    n) to generate a vote is

    h i − h j | < t h , (1)here h i is the height of p i over the ground as introduced earlier,

    nd t h is a height difference threshold defined as

    h = 0 . 1 · ( max i h i − min i h i ) . (2)e note that this simple criterion could be replaced by a more so-

    histicated matching of local 3D shape descriptors, e.g. spin images

    ohnson and Hebert (1999) , FPFH Rusu et al. (2009) or 3D SURF

    nopp et al. (2010) .

    .1.2. Detecting mirror symmetries

    ough voting approach. For mirror symmetries, every matching

    air ( p i , p j ) votes for a hypothesized symmetry line, which is the

    erpendicular bisector of the 2D segment connecting p i and p j , as

    hown in Fig. 3 a. This symmetry line is parametrized by a pair ( D ij ,

    ij ) (i.e. a point) in the Hough-space H mir (D, φ) , where D ij is theistance of the line from the origin and φij is the characteristic an-le of the line shown in the figure. Fig. 4 a visualizes these Hough-

    pace votes for the example of Fig. 2 a. In this parametrization, we

    ove the origin to be below and to the left from the lowest left-

    ost point of the cloud. This is done to reduce the voting space:

    o symmetry line will have the characteristic angle φij in (π ; 3 2 π) .Next, we extract dominant peaks in Hough-space, which cor-

    espond to likely axes of mirror symmetry. Here, we restrict our-

    elves to the two dominant perpendicular symmetries, which al-

    ows for a robust and parameter-free peak detection. More pre-

    isely, we seek the global maximum of

    mir (D 1 , φ) + H mir (D 2 , { φ + π/ 2 } mod 2 π) (3)

    uctural element decomposition, Computer Vision and Image Un-

    http://dx.doi.org/10.1016/j.cviu.2016.06.004

  • 4 N. Kobyshev et al. / Computer Vision and Image Understanding 0 0 0 (2016) 1–13

    ARTICLE IN PRESS JID: YCVIU [m5G; July 7, 2016;21:54 ]

    Fig. 3. Our voting schemes for mirror (a) and rotational symmetries (b). (a) the

    points p i and p j vote for the direction of the red perpendicular bisector of the seg-

    ment connecting the points, parametrized by its distance D ij to the origin and its

    depicted angle φ ij . (b) The points p i and p j vote for a pivot point that lies at the

    intersection of the green lines passing through the points parallel to the normals.

    (Best viewed in color.)

    Fig. 4. Hough-spaces for voting for (a) symmetry lines of mirror symmetries, (b)

    pivot points of rotational symmetries for the point cloud shown in Fig. 2 . Red dots

    show the extracted peaks (see the text for details). (Best viewed in color.)

    Fig. 5. Mirror symmetry detection with RANSAC. Left: two randomly selected

    points (red inside) form a hypothesis (red line). Points from one side (with a blue

    outline) are mirrored, the result is shown on the right. Inliers (i.e. the points that

    have a point from another symmetry side nearby) are colored green. In the first

    guess (top row) the guess is not lucky and there are not many inliers. In the sec-

    ond guess (bottom row) every point is an inlier. (Best viewed in color.)

    Fig. 6. The RANSAC solution stability test of parameter t m [inlier distance in me-

    ters] on the Fraumuenster dataset. The columns show different solutions of the

    same mirror symmetry problem by RANSAC. For different rows, different values of

    t m are chosen. It can be seen that the smaller t m goes, the more stable the solution

    becomes.

    t

    i

    (

    s

    t

    b

    u

    p

    t

    i

    a

    p

    w

    as a function of ( D 1 , D 2 , φ) to obtain the two peaks, i.e., two sym-metry axes (D ∗1 , φ

    ∗) and (D ∗2 , { φ∗ + π/ 2 } mod 2 π) . This is solved exhaustively, where for each discrete value of φ,

    the maximal H mir (D 1 ) and H mir (D 2 ) are found (1D searches) andsummed. However, this comes at virtually no cost as a coarse dis-

    cretization (in the order of 360 × 200) of our 2D Hough-spaceproved to work well with our datasets.

    RANSAC approach. To form a mirror symmetry hypothesis, only

    two points are needed. If we have randomly selected a pair of

    points ( p i , p j ), they form a hypothesis for a symmetry line that is

    a perpendicular bisector of the 2D line between x i and x j . Then, a

    part of the point cloud lying on one side of the symmetry line (the

    side can be chosen arbitrarily, we use the part with less points to

    save computational resources) is flipped over the symmetry line.

    We perform mirror reflection of the point cloud, i.e., we add the

    vector from the point to its closest point on the symmetry line

    to its coordinates twice. As a result of this operation, we have

    two groups of points lying on the same side of the symmetry

    line. We count a point as an inlier if it has a point from another

    group within the proximity threshold t m = 0 . 05 for a metric-scaledpoint cloud. An example of the described procedure is shown in

    Fig. 5 . We repeat the random hypothesis generation iterations un-

    til RANSAC confidence is higher than 99%. We define confidence ξas

    ξ = 1 − (1 − w 2 ) i , (4)where i is the number of iterations and w is the ratio of number

    of inliers to the overall number of points. In the equation, w 2 is

    a best guess of the probability that both randomly selected points

    are inliers, and the value of ξ therefore indicates the probabilitythat at least one hypothesis out of i ones is made with two inliers.

    To define a good value for the parameter t m , we run a solution

    stability test shown in Fig. 6 . In the picture one can see results of

    the RANSAC procedure run multiple times on the same dataset. It

    can be seen that the more small the threshold of point proximity

    t m goes, the more similar are the solutions over multiple iterations.

    Please cite this article as: N. Kobyshev et al., Efficient architectural str

    derstanding (2016), http://dx.doi.org/10.1016/j.cviu.2016.06.004

    As before, we aim to find the dominant perpendicular symme-

    ries of the dataset. Therefore, we count the points as inliers only

    f they have a normal that is perpendicular to the symmetry line

    the absolute value of the dot product between the normal and the

    ymmetry line direction is lower than a threshold t d = 0 . 1 ) and runhe procedure twice, restricting the result of the second run from

    eing too close to the first detected symmetry. As a result, we end

    p with the two symmetry lines that correspond to the dominant

    erpendicular symmetries of the point cloud. It should be noted

    hat orthogonality of the detected symmetries is not enforced (as

    n the case of Hough transform). However, the search terminates

    fter two symmetries are found.

    Although, as will be shown further, our optimization is inde-

    endent on the orthogonality of the detected mirror symmetries,

    e restrict ourselves to the two dominant directions. There are

    uctural element decomposition, Computer Vision and Image Un-

    http://dx.doi.org/10.1016/j.cviu.2016.06.004

  • N. Kobyshev et al. / Computer Vision and Image Understanding 0 0 0 (2016) 1–13 5

    ARTICLE IN PRESS JID: YCVIU [m5G; July 7, 2016;21:54 ]

    Fig. 7. Mirror symmetry detection performance on different cases. Top: initial

    datasets (a house with orthogonal walls and a hexagonal building), middle: Hough-

    space voting, bottom: RANSAC detections (from a top view). See text for the de-

    tailed discussion.

    t

    r

    D

    l

    t

    i

    H

    t

    l

    m

    h

    v

    d

    l

    m

    t

    s

    u

    i

    M

    r

    e

    t

    o

    (

    b

    o

    c

    3

    H

    m

    p

    T

    e

    m

    r

    r

    t

    p

    r

    Fig. 8. Splitting the circle hypothesis into angular bins. The red lines define a circle

    hypothesis. The circle is split into eight segments, and points in each are counted

    to form a density distribution illustrated on the histogram to the right.

    c

    t

    i

    s

    R

    2

    s

    A

    c

    fi

    t

    e

    c

    l

    c

    c

    t

    i

    a

    i

    s

    i

    r

    i

    c

    i

    b

    a

    t

    c

    H

    t

    d

    b

    i

    d

    o

    e

    w

    e

    wo reasons for choice of such a model (in the discussion we are

    eferring to Fig. 7 ):

    etection stability. Detecting maxima in Hough-space is a chal-

    enging task. Usually, non-maximum suppression (NMS) is used for

    hat. In Fig. 7 , the house model in the left column has two dom-

    nant symmetries (that are highlighted with green circles on the

    ough-space plot). However, there are two diagonal flip symme-

    ries that also contribute a high score to the Hough-space (high-

    ighted with red). If our detections are based only on NMS, the

    ethod becomes extremely sensitive to the NMS parameters that

    ave to be tuned for every model. For Hough-spaces, another ad-

    antage the restrictive search brings is the solution to the space

    iscontinuity problem. In case of general Hough voting symmetry

    ines that pass close to origin can have very different angles. This

    ight be a problem for classical Hough space maxima detection

    echniques, such as NMS (as points that are far away in the Hough

    pace might be very close in the original space). However, when

    sing the restrictive search the problem disappears as the search

    s performed row-wise.

    odel simplicity. In our experiment, RANSAC was always able to

    eturn two dominant mirror symmetries after two iterations. How-

    ver, if the number of mirror symmetries is unknown, we need

    o determine the number of iterations after which to stop. In case

    f the house (left side of the picture), four iterations is too much

    as it only has two mirror symmetries), whereas for a hexagonal

    uilding six iterations is needed to detect all symmetries. The task

    f analyzing multiple detections and determining the number of

    orrect ones we leave for future work.

    .1.3. Detecting rotational symmetries

    ough voting approach. In the case of rotational symmetries, each

    atching point pair ( p i , p j ) votes for a hypothesized rotational

    ivot point ( x ij , y ij ) in a 2D Hough-space H rot (x, y ) of pivot points.he hypothesized pivot resides at the intersection of the two lines,

    ach passing through the corresponding point parallel to its nor-

    al direction, as shown in Fig. 3 b.

    Since for rotational symmetries we do not have such a natu-

    al simplifying constraint for peak detection as in the case of mir-

    or symmetries, the standard scheme is employed for peak extrac-

    ion. We use non-maximum suppression with window size w and

    erform repeated peak extraction until a confidence threshold c is

    eached as H rot (x, y ) > c. We note however, that parameters w and

    Please cite this article as: N. Kobyshev et al., Efficient architectural str

    derstanding (2016), http://dx.doi.org/10.1016/j.cviu.2016.06.004

    are kept fixed throughout the datasets in the experimental sec-

    ion, and recall that the detected symmetries enter as hypotheses

    nto the final optimization in Section 3.4 . An example of the corre-

    ponding Hough-space is shown in Fig. 4 b.

    ANSAC approach. We define body of revolution (BOR) as a set of

    D circles parallel to the ground plane in 3D space that share the

    ame rotational pivot point and are located on different heights.

    natural extension to this is a requirement for the radii of the

    ircles to change smoothly with the height change, but we let the

    nal global optimization take care of this.

    To detect bodies of revolution, we make hypotheses about

    heir rotational pivot points on the 2D ground plane by simply

    xtracting 2D circles. Then (in the final optimization), we use

    ontext-aware point-wise potentials to decide whether a point is

    ikely to belong to BOR with a center given by a particular 2D

    ircle.

    Given a pair of points ( p i , p j ), we generate a hypothesis for a cir-

    le with a center c at the point of intersection of the lines that go

    hrough ( x i , x j ) and are parallel to ( n i , n j ). The radius r of the circle

    s the mean value of distances || x i − c || and || x j − c || of points p i nd p y to the circle centre c . If the difference in these distances

    s too high (larger than half a meter), we discard the hypothe-

    is. We count the point p k as an inlier to the current hypothesis

    f

    − r t ≤ || x k − c || ≤ r + r t , (5).e. it lies on the same circle within a radius threshold t r .

    To prevent the algorithm to find incomplete circles (i.e., the cir-

    les that do not have support from different sides), we bin the

    nliers into eight angular bins over the circle’s circumference. The

    inning process is illustrated in Fig. 8 .

    For every bin, we compute density value P b that is a ratio of

    number of points in the bin b to the total number of points in

    he circle. To evaluate how equally are the points distributed, we

    ompute the entropy

    ¯ (P ) = −

    8 ∑ b=1

    P b log 8 P b (6)

    hat is normalized between zero and one. In the RANSAC proce-

    ure, we compare the hypotheses not by the number of inliers,

    ut by the entropy value. The hypothesis with the highest entropy

    s considered to be the best.

    As in the case of mirror symmetries, we run the RANSAC proce-

    ure until the confidence in the best hypothesis reaches the value

    f 99% (again, we use Eq. (4) to determine the confidence). How-

    ver, to avoid fitting circles to the point clouds that don’t have any,

    e discard the procedure if it reaches the maximal number of it-

    rations.

    uctural element decomposition, Computer Vision and Image Un-

    http://dx.doi.org/10.1016/j.cviu.2016.06.004

  • 6 N. Kobyshev et al. / Computer Vision and Image Understanding 0 0 0 (2016) 1–13

    ARTICLE IN PRESS JID: YCVIU [m5G; July 7, 2016;21:54 ]

    E

    w

    t

    n

    t

    t

    t

    w

    t

    r

    s

    T

    f

    k

    p

    I

    r

    b

    o

    e

    e

    (

    s

    p

    t

    e

    { b

    3

    s

    o

    h

    t

    i

    o

    t

    3.2. Detecting free-form shapes

    To describe parts of the input model that are not explained by

    symmetries (free-form parts), we additionally summarize the 2D

    projected point cloud (including symmetric parts) as a low number

    of polylines. In other words, our method is designed to extract a

    set of 2D polylines that approximate the entire floor plan of the

    object.

    As a preprocessing, points lying on near-vertical surfaces (walls,

    typically) are extracted, as these are more representative for cre-

    ating a floor plan. We remove points lying on slanted surfaces

    (e.g. roofs) and horizontal surfaces based on the angle between

    their 3D normals and the vertical direction, and we aim to sum-

    marize the remaining subset of ground-projected points with a low

    number of 2D polylines.

    First, we decompose the 2D point cloud into disjoint local parti-

    tions and robustly extract a single dominant 2D line segment from

    each such partition on-the-fly. The algorithm performs a single

    run through all points for efficiency, selects the next unpartitioned

    point as seed, and grows a partition in the 2D k -nearest neighbor-

    hood up to a distance threshold. The latter controls the size of the

    aperture, i.e. the extent of the partitions. The dominant line seg-

    ment is extracted from each partition using RANSAC Fischler and

    Bolles (1981) (with soft-scoring) and least-squares segment fitting

    to the inliers. This simple approach tends to preserve locally linear

    structures while summarizing a 2D point cloud into three orders

    of magnitude less segments.

    Second, the line segments are to be snapped and linked into

    polylines. To do so, we mark two end-points of segments for

    snapping if they are nearest neighbors and are within a dis-

    tance threshold. From these pairwise adjacencies, we identify con-

    nected groups of vertices and collapse them into a single ver-

    tex at their centroid. Next, the full set of segments is linked into

    polylines between end-points and junctions in a tracing procedure

    based on vertex valence, and the resulting polylines are subject

    to a Douglas-Peucker polygon simplification Douglas and Peucker

    (1973) . In a final cleaning step, short polylines are eliminated. The

    output is a set of polylines roughly approximating the shape of

    dense clusters in the 2D point cloud. An example of detected poly-

    lines is shown in Figs. 2 c and 11 .

    3.3. Detecting planes

    Planes are a very low-level cue, but are ubiquitous in buildings.

    Hence, we detect the planes in 3D with RANSAC. We always de-

    tect as many planes as needed to explain 90% of points and let

    the global optimization remove the redundant ones w.r.t. the other

    structural cues.

    3.4. Structural element assignment

    The final part of our method is the assignment of an archi-

    tectural structural element to every oriented point p i = (x i , n i ) inthe point cloud, where x i , n i ∈ R 2 , i = 1 . . . , N denote the 2D loca-tion and normal of point p i on the ground, respectively. As will

    be discussed in detail, each structural element is generated by

    one of the structural cues : mirror symmetry, center of body of

    revolution, free-form polyline or plane described in the sections

    above.

    We formulate the assignment as a multi-label energy minimiza-

    tion, where each point obtains a label l i ∈ L of a structural ele-ment, and the optimal labeling (l ∗1 , l

    ∗2 , . . . , l

    ∗N ) ∈ L N is sought for the

    following energy:

    Please cite this article as: N. Kobyshev et al., Efficient architectural str

    derstanding (2016), http://dx.doi.org/10.1016/j.cviu.2016.06.004

    (l 1 , l 2 , . . . , l N ) = ∑

    i =1 ... N �(p i , l i )

    + γ1 ·∑

    i

    ∑ j

    �(p i , p j , l i , l j ) + γ2 ·∑ ∀ l∈L

    �(l) , (7)

    here �( p i , l i ) is the unary cost of assigning a point p i to a struc-

    ural element l i , �( p i , p j , l i , l j ) is a pairwise term enforcing smooth-

    ess of the labeling solution, and �( l ) encodes our prior for a par-

    icular structural element l . In the following discussion, we define

    hese terms in detail and explain how each cue generates struc-

    ural elements.

    Our unary cost for each of the structural cues is

    (p i , l i ) = 1 − W t (l i ) · K t (x i , l i ) · C t (n i , l i ) · H t (p i , l) . (8)here index t ∈ {mir, rot, poly, plane} refers to the type of struc-

    ural cue, i.e., mirror symmetry, rotational symmetry and polyline,

    espectively, and the terms are as follows:

    • C t ( n , l ) ∈ [0, 1] is a measure of how consistent the normal n ofa point p is with a structural element l ,

    • K t ( x , l ) ∈ [0, 1] is a kernel, namely a function of the position xof a point p w.r.t. a structural element l ,

    • W t ( l ) ∈ [0, 1] is a weight encoding how important each struc-tural element l ∈ L is for the understanding of the building,

    • H t ( p, l ) ∈ [0, 1] is a weight encoding how well do other pointssupport the choice of a given point for a structural element l .

    We postpone the exact definition of these terms to the next

    ubsections, as they differ per structural cue (index t ).

    Our pairwise term in Eq. (7) is the weighted Potts penalty

    (p i , p j , l i , l j ) = {

    0 , l i = l j e −λE d E (p i ,p j ) , l i � = l j . (9)

    his term penalizes any pair of adjacent points ( p i , p j ) having dif-

    erent labels, where adjacency is defined in 3D via k -NN search,

    = 100 . The penalty vanishes with the 3D Euclidean distance d E ( p i , j ) between adjacent points, and λE controls the speed of decrease.n our experiments, we set λE as the smallest ball neighborhoodadius such that 75% of all input points have at least 100 neigh-

    ors, as also suggested in Wu and Agarwal (2012) .

    Finally, our label cost �( l ) penalizes once for each label l ∈ Lccurring in the solution, i.e., it reduces the number of structural

    lements occurring in the solution:

    (l) = {

    1 , ∃ i : l i = l 0 , otherwise .

    (10)

    We solve the multi-label optimization efficiently via the α-xpansion algorithm Boykov and Kolmogorov (2004) , Boykov et al.

    2001) , Kolmogorov and R.Zabih (2004) . In Delong et al. (2011) it is

    hown that α-expansion is capable of solving energy minimizationroblems with label cost, such as the one in Eq. (7) . The parame-

    ers γ 1 and γ 2 balance the relative importance of the three majornergy terms, their choice is discussed in the experiments section.

    Next, we explain how each type of structural cue t = mir , rot , poly , plane } is incorporated into the unary term ( Eq. (8) )y defining each of its sub-terms C t ( n , l ), K t ( x , l ), W t ( l ) and H t ( p, l ).

    .4.1. Energy terms for mirror symmetries

    Every detected mirror symmetry generates two architectural

    tructural elements (represented by two labels), one for each side

    f the symmetry axis, designed to separate points into symmetric

    alves, e.g., opposing walls. As discussed in Section 3.1.2 , we ex-

    ract the two dominant orthogonal mirror symmetries for simplic-

    ty and robustness (see Fig. 2 b). It should be noted, however, that

    ur optimization scheme allows for more than two mirror symme-

    ries.

    uctural element decomposition, Computer Vision and Image Un-

    http://dx.doi.org/10.1016/j.cviu.2016.06.004

  • N. Kobyshev et al. / Computer Vision and Image Understanding 0 0 0 (2016) 1–13 7

    ARTICLE IN PRESS JID: YCVIU [m5G; July 7, 2016;21:54 ]

    Fig. 9. Definition of angle αt il

    between a point and a structural cue: (a) mirror symmetry ( t = mir ), (b) rotational symmetry ( t = rot ), (c) free-form polyline ( t = poly ). (Best viewed in color.)

    (

    a

    i

    C

    w

    t

    d

    r

    s

    K

    w

    s

    m

    c

    n

    3

    d

    H

    t

    t

    fi

    C

    w

    l

    s

    g

    T

    t

    K

    w

    t

    m

    p

    G

    i

    c

    a

    t

    b

    r

    P

    c

    t

    3

    b

    a

    t

    s

    p

    C

    w

    l

    s

    e

    K

    w

    i

    2

    a

    d

    w

    e

    s

    r

    m

    u

    U

    w

    s

    t

    h

    i

    a

    W

    w

    t

    We now define the sub-terms of Eq. (8) for mirror symmetries

    t = mir ). The consistency between the normal n i ∈ R 2 of a point p i nd a structural element of label l generated by a mirror symmetry

    s defined by

    mir (n i , l) = 1 − sin (αmir il ) , (11) here αmir

    il ∈ [0 , π2 ] is the angle between the point’s normal n i and

    he normal of the symmetry axis (see Fig. 9 a). Kernel K mir (x i , l)

    efines how well do the coordinates of the point support the mir-

    or symmetry. For the structural element l generated by one of the

    ides of the axis,

    mir (p i , l) = e −λE d E (p i ,p ∗i ) , (12)here p ∗

    i is the closes point to p i after it is flipped over the mirror

    ymmetry line.

    The weight W mir (l) in Eq. (8) is set to 1 for all structural ele-

    ents generated by mirror symmetries (high importance). For this

    ue we also set H t ( p, l ) to 1 because mirror symmetries is a global

    otion that is not dependent on a local point support.

    .4.2. Energy terms for rotational symmetries

    Each rotational symmetry generates a single structural element

    efined by a body of revolution and identified by a label l ∈ L .erein, we define the sub-terms of Eq. (8) for rotational symme-

    ries ( t = rot ). The consistency between the normal n i ∈ R 2 of a point p i and

    he structural element l generated by a rotational symmetry is de-

    ned by

    rot (n i , l) = 1 − sin (αrot il ) , (13)here αrot

    il is the angle between the point’s normal n i and the

    ine passing through the point and the pivot point of the rotational

    ymmetry (red dot in Fig. 9 b).

    Unlike dominant mirror symmetries, which tend to be more

    lobal for a dataset, we aim to extract local rotational symmetries.

    herefore, we limit the maximal radii of detected bodies of revolu-

    ion using a simple kernel:

    rot (x i , l) = {

    1 , d E (x i , c ) < t br 0 , otherwise ,

    (14)

    here c is the center of considered body of revolution and t br is

    he threshold on the maximal circle size (we limit it to half of the

    aximal point cloud width).

    As we would like the detected bodies of revolution to be sup-

    orted from all sides, we also use the point support weight H t ( p, l ).

    iven a point p i , and a body of revolution hypothesis l we assume

    t is assigned to a body of revolution with a center in c . Then, we

    ompute how equally will the points be distributed in a circle with

    center in c and a radius of d E ( x i , c ). To do so, we use same defini-

    ion of entropy as in Eq. (6) . We split the circle into eight angular

    ins and count the number of inliers within the same threshold

    Please cite this article as: N. Kobyshev et al., Efficient architectural str

    derstanding (2016), http://dx.doi.org/10.1016/j.cviu.2016.06.004

    t . We assign the value of entropy H̄ (P ∗(p i , l)) to H rot (p i , l) , where

    ∗( p i , l ) is a set of inliers to the current hypothesis of a circle withenter in c going through a point p i .

    As for mirror symmetries, the importance weight W rot (l) is set

    o one for all rotational symmetries (high importance).

    .4.3. Energy terms for free-form polylines

    Each of the 2D polylines representing the floor plan of the

    uilding ( Section 3.2 ) generates a single structural element (hence,

    label l ∈ L ). They serve to capture free-form structural elementshat are not well explained by symmetry cues. Here, we define the

    ub-terms of Eq. (8) for t = poly . The consistency between the normal n i ∈ R 2 of a point p i and a

    olyline identified by label l is

    poly (n i , l) = 1 − sin (αpoly il ) , (15)

    here αpoly il

    is the angle between the point’s normal n i and the

    ine passing through p i and the closest point ˆ p il of the polyline, as

    hown in Fig. 9 c.

    Similarly to rotational symmetries, polylines have a local influ-

    nce, encoded in the sigmoid kernel

    poly (x i , l) = 1 −1

    1 + exp { λp (τp − || x i − ˆ x il || ) } , (16)

    here x i , ̂ x il ∈ R 2 are the 2D positions of p i and of the closest pointˆ p il on the polyline, respectively. The inflection point of the sigmoid

    s fixed at t p = 1 meter, and the steepness of the descent to λp =0 in all our experiments.

    A strong symmetry cue should dominate over polylines in its

    rea of influence. Therefore, each polyline needs to be weighted

    epending on how well it can fit to any symmetry cue. In this vein,

    e extract the oriented mid-point m lk from each segment e lk of

    very polyline l , and evaluate �( m lk , l ′ ) defined in Eq. (8) for each

    tructural element l ′ generated by detected mirror ( t = mir ) andotational ( t = rot ) symmetries. �( m lk , l ′ ) measures how well the

    id-point m lk fits to a symmetry cue. We introduce the polyline

    sefulness measure

    l = ∑

    k

    f lk · min ( ∑

    l ′ =1 ... L �(m lk , l

    ′ ) , 1) , (17)

    here f lk is the normalized length of segment e lk of polyline l ,

    uch that ∑

    k f lk = 1 . A high U l indicates that the polygon con-ains a large number of segments unexplained by symmetry cues,

    ence, it is worth considering the polyline to explain the neighbor-

    ng points as free-form parts. The weight of the polyline is defined

    s the sigmoid

    poly (l) = 1 1 + exp { λw (t w − U l ) } , (18)

    here we set an abrupt change λw = 20 similarly to Eq. (16) , and w = 0 . 6 in all experiments.

    uctural element decomposition, Computer Vision and Image Un-

    http://dx.doi.org/10.1016/j.cviu.2016.06.004

  • 8 N. Kobyshev et al. / Computer Vision and Image Understanding 0 0 0 (2016) 1–13

    ARTICLE IN PRESS JID: YCVIU [m5G; July 7, 2016;21:54 ]

    Table 1

    Overview of the datasets and comparison of our method with the baselines. Results are reported as segmentation accuracy (Jaccard index multiplied by 100). Colors

    correspond to the components identified by our method: red for mirror symmetries, green for bodies of revolution, yellow for polylines, blue for planes. (Best viewed

    in color.)

    C

    h

    p

    o

    (

    c

    s

    o

    m

    e

    b

    l

    o

    l

    t

    t

    m

    4

    a

    c

    d

    t

    (

    g

    t

    b

    c

    o

    l

    b

    t

    3.4.4. Energy terms for planes

    For each plane, we define the weight terms as follows. The con-

    sistency between the point p i with a normal n i ∈ R 2 and the struc-tural element l that is defined by plane is

    rot (n i , l) = 1 − sin (αplane il ) , (19)where αrot

    il is the angle between the point’s normal n i and normal

    of the plane.

    The distance between the point and the plane is captured by

    the kernel K plane (x i , l) :

    K plane (x i , l) = e −λE d E (p i ,p ∗i ) , (20)where p ∗

    i is the closest point on the plane to p i .

    We also have a preference to planes that are equally supported

    by points. To evaluate that, we study the 2D point distribution of

    the plane inliers. First, we compute a 2D convex hull for the point

    inliers on the plane. Then, we split the space inside the hull into n s 1 by 1 meter squares, and compute the fraction of points for every

    square P n . Finally, we assign the weight

    H̄ (p i , l) = −P n ∑

    b=1 P n log P n P n (21)

    that equals to 1 if the points are completely uniformly distributed

    and fades to 0 as the distribution becomes less equal. We assign

    the same weight H ( p i , l ) to every point p i . As the planes are robust

    to detect, we also set their importance to W plane (l) = 1 .

    4. Experiments

    In this section we evaluate the properties of our architectural

    structural elements (ASE) decomposition. All used 3D models are

    reconstructed from unordered image collections. Despite tremen-

    dous progress in the research, these models are still incomplete

    and very noisy compared to clean computer generated collections

    used in prior work. This method is tested on multiple datasets that

    Please cite this article as: N. Kobyshev et al., Efficient architectural str

    derstanding (2016), http://dx.doi.org/10.1016/j.cviu.2016.06.004

    ave different architectural elements like towers, arches, ellipsoids,

    lanar and free-form walls, as summarized in Table 1 . The majority

    f the datasets are a result of running SfM pipeline, however, some

    namely, Fraumuenster and Grossmuenster) are sampled from a

    ity-scale aerial mesh (and are very coarse). The results are also

    hown in Figs. 15 and 16 .

    We evaluate our results quantitatively by comparing them with

    ur ground truth segmentations. The segmentation accuracy is

    easured by the intersection over union (also Jaccard index) for

    very ground truth label and the corresponding label. Due to possi-

    le over-segmentation, a mapping is needed between the predicted

    abeling (where the number of labels can vary) to the fixed labels

    f the ground truth segmentation. We select the most frequent

    abel per ground truth label, mark it used and as corresponding

    o the ground truth label. This is a similar procedure as used in

    he PASCAL VOC challenge Everingham et al. (2012) . We report the

    ean accuracy over all labels.

    .1. Method parameters

    In this section we investigate the stability of our method

    gainst the internal parameters. Since all the models are metri-

    ally scaled, the geometric parameters are fixed to meter scale as

    escribed in the text above. The main remaining parameters are

    he multi-label optimization weights γ 1 (smoothing cost) and γ 2 label cost). We empirically evaluate the mean accuracy over the

    round truth labels as well the number of labels that appear in

    he final decomposition. Using a grid search over a range of feasi-

    le parameters we fix these for all other experiments.

    The final values are γ1 = 1 (smoothing cost) and γ2 = 100 (labelost). Increasing γ1 = 1 smooths the labels. It is robust in the rangef 1 . . . 5 . Changing the value of γ 2 affects the number of detected

    abels and is very robust. Only very low values result in extra la-

    els and hence in over-segmentation. Results of the grid search for

    hese parameters is shown in Fig. 12 .

    uctural element decomposition, Computer Vision and Image Un-

    http://dx.doi.org/10.1016/j.cviu.2016.06.004

  • N. Kobyshev et al. / Computer Vision and Image Understanding 0 0 0 (2016) 1–13 9

    ARTICLE IN PRESS JID: YCVIU [m5G; July 7, 2016;21:54 ]

    Fig. 10. Close-up pictures of the seams between adjacent ASEs demonstrate the accuracy of our method. Left to right: Orebro, junction between the tower and the wall;

    Arch: two walls; Fraumuenster: rooftop and the tower.

    Fig. 11. Examples of detected free-form shapes on the datasets. Left to right: Orebro, Arch, Colosseum.

    a

    i

    f

    H

    c

    i

    c

    f

    r

    a

    t

    t

    l

    4

    p

    w

    a

    t

    c

    r

    c

    t

    t

    o

    (

    p

    T

    p

    n

    t

    t

    F

    m

    s

    s

    d

    m

    s

    e

    t

    d

    t

    t

    r

    w

    a

    o

    l

    t

    A

    d

    s

    s

    i

    S

    o

    Hough voting and RANSAC give qualitatively similar results,

    nd the main difference is the time needed for computation. This

    s shown in Fig. 13 . In this plot, we compare the time needed

    or finding for a mirror symmetry by RANSAC and Hough voting.

    ough voting is implemented efficiently on C++, while RANSAC is

    omputed with Matlab. However, due to the quadratic complex-

    ty of Hough voting, it is an order of magnitude slower on a point

    loud with 350K points. In the future experiments, we use RANSAC

    or detecting ASEs as the point clouds we are operating with are

    elatively large (up to 1.7M points).

    Regarding the time of computation, the main bottleneck of the

    lgorithm is the final 3D optimization. In Fig. 14 , we show the time

    aken to do a full optimization on Orebro dataset as a function of

    he number of points in the dataset. The computational time grows

    inearly with the number of points.

    .2. Comparison to baselines

    In this section we evaluate our method against baseline ap-

    roaches. Because of a lack of available, directly comparable work,

    e compare to alternative ways of unsupervised decomposition of

    point cloud. Namely, we investigate curvature-based segmenta-

    ion and hard-prior approaches for grouping parts of the 3D point

    loud into coherent groups.

    The first baseline is inspired by Lafarge et al. (2009) . We detect

    egularized regions of different curvatures, and run connected of

    omponents search to identify the components.

    As a second baseline, we iteratively fit planes with an inlier

    hreshold of 0.1 meters until 90% of the point cloud is assigned

    o a plane. For every ground truth structural element, we choose

    ne plane that is best at covering it.

    As a third baseline, we use a method of Schnabel et al.

    2007) that decomposes a point cloud into a set of primitives like

    lanes, tori, spheres, etc.

    Please cite this article as: N. Kobyshev et al., Efficient architectural str

    derstanding (2016), http://dx.doi.org/10.1016/j.cviu.2016.06.004

    The results are provided in Figs. 15 and 16 as well as in

    able 1 . As noise in the ground truth is not segmented, to com-

    are the decomposition to the ground truth, we search the con-

    ected components of the resulting ASE decomposition and match

    he closest connected component to the corresponding ground

    ruth element. This we also show on top view visualizations in

    igs. 15 and 16 .

    The curvature split baseline achieves reasonable results on the

    eshes that are not too smooth. On Fraumuenster and Grossmuen-

    ter datasets (that are extracted from an aerial mesh), the tran-

    ition between different components is so smooth the curvature

    oes not capture it. On other datasets, for example, Orebro, the

    ethod is capable of delivering a reasonable decomposition. It

    hould be noted that sharp edges on parts of the components, for

    xample, a transition from roof to wall in towers, can lead to split-

    ing of the component into multiple ones.

    Plane fitting produces reasonable results when the objects to

    ecompose are flat. For suitable landmarks, such as Arch, it is able

    o capture the ASEs very well (although ignoring the internal struc-

    ure of the object). Orebro’s walls are identified reasonably well,

    esulting in high scores for their segmentation. The method fails

    ith towers and arbitrarily shaped constructions, resulting in a rel-

    tively low mean accuracy for this dataset. Finally, in the absence

    f planar objects, this method does not provide any reasonable so-

    ution (see Colosseum).

    The method from Schnabel et al. (2007) is very efficient at fit-

    ing primitives into point clouds, as it can be seen on, for example,

    rch and Fraumuenster datasets. However, it suffers from several

    rawbacks. First, in the absence of clear primitive it fails to fit a

    ensible model (for example, in case of Colosseum, every side is

    hared between several models). Our method is capable of deal-

    ng with these elements because of the free-form polylines cue.

    econd, the primitives are too simple for buildings. This is seen

    r towers of Orebro and Sant’Angelo. Different parts of towers that

    uctural element decomposition, Computer Vision and Image Un-

    http://dx.doi.org/10.1016/j.cviu.2016.06.004

  • 10 N. Kobyshev et al. / Computer Vision and Image Understanding 0 0 0 (2016) 1–13

    ARTICLE IN PRESS JID: YCVIU [m5G; July 7, 2016;21:54 ]

    Fig. 12. Results of the grid search over the weights of the energy optimization for Orebro dataset.

    Fig. 13. Running time for detecting a mirror symmetry for Hough space voting and

    RANSAC. RANSAC significantly outperforms Hough space voting as the number of

    points grows. Please note that the time axis is in logarithmic scale.

    Fig. 14. Running time of the 3D optimization (measured on a single core). The sub-

    set of points (from Orebro dataset) are generated by parts of the same dataset,

    keeping the point density the same).

    have different width get different primitives (namely, cylinders) as-

    signed to them.

    Our method successfully and consistently outperforms the base-

    lines as it is optimized for multiple specialized structural features

    rather than hard constraints or split based on geometrical pecular-

    ities. It can be also seen in the quantitative evaluation shown in

    Table 1 .

    It prioritizes the structural cues based on the dataset. For Colos-

    seum, that does not have a strong support for rotational or mir-

    ror symmetries, it explains the whole model based on the poly-

    lines. Conversely, Arch is completely explained by the mirror sym-

    metries, resulting for good scores for all components but the

    back wall, where the point support is very weak. For Orebro, the

    method is able to capture all four wall and side towers, as well

    as the two half moon-shaped structures at the entrance. On ev-

    ery wall the mirror symmetries and plane cues have to compete

    as both have high unaries. As a result, two walls are assigned to

    planes, and two – to the mirror symmetries.

    In case of Grossmuenster, all ASEs are successfully determined.

    However, because of too high influence of the pairwise term in Eq.

    (7) , the area assigned to a body of revolution of a small tower is

    extended to parts of the building itself.

    Finally, the Trinity Chapel is a very noisy point cloud with sev-

    eral SfM errors and very irregular point structure. All considered

    methods fail to decompose it. For our method, the main challenges

    related to this datasets were irregular point normals and very ir-

    regular point density that made smoothing extremely challenging.

    The quality of the segmentation is additionally illustrated in

    Fig. 10 , where we show close-up pictures of the seams between

    different com ponents.

    In Fig. 11 we demonstrate our method for detecting free-form

    shapes on several datasets. While detecting floor plans in 2D point

    cloud projections can be simplified by a number of assumptions

    (for example, walls can be assumed to be planar and be there-

    fore detected with a line detector), our method is targeted towards

    general shapes like curved objects (as straight walls can be easily

    detected with other cues, namely mirror symmetries and planes).

    It can be seen that the more precise the data is (for example, on

    Colosseum dataset), the more likely is the method to return one

    polyline per component. In case of noisy thick walls, like in the

    Arch dataset, a single wall can be represented with multiple poly-

    lines. Adding data awareness to our current method can result in

    better performance of very noisy sets.

    Overall our method successfully decomposes all models into

    their ASEs. Failure cases occur when the symmetries or free-form

    lines are not correctly detected. Usually this is the case for missing

    data, as this affects any of the methods as no underlying structural

    cues exist.

    Please cite this article as: N. Kobyshev et al., Efficient architectural structural element decomposition, Computer Vision and Image Un-

    derstanding (2016), http://dx.doi.org/10.1016/j.cviu.2016.06.004

    http://dx.doi.org/10.1016/j.cviu.2016.06.004

  • N. Kobyshev et al. / Computer Vision and Image Understanding 0 0 0 (2016) 1–13 11

    ARTICLE IN PRESS JID: YCVIU [m5G; July 7, 2016;21:54 ]

    Fig. 15. Examples of our method in comparison to standard baseline methods. The baselines fail to capture all structural elements of the landmarks as they either contain

    no higher level features (curvature) or enforce hard priors (plane fitting). Our method successfully decomposes the complex scenes due to the global optimization of our

    structural features.

    Please cite this article as: N. Kobyshev et al., Efficient architectural structural element decomposition, Computer Vision and Image Un-

    derstanding (2016), http://dx.doi.org/10.1016/j.cviu.2016.06.004

    http://dx.doi.org/10.1016/j.cviu.2016.06.004

  • 12 N. Kobyshev et al. / Computer Vision and Image Understanding 0 0 0 (2016) 1–13

    ARTICLE IN PRESS JID: YCVIU [m5G; July 7, 2016;21:54 ]

    Fig. 16. Examples of our method in comparison to standard baseline methods, continued.

    t

    t

    t

    f

    5. Conclusions and future work

    This work takes a step towards understanding the architectural

    and structural elements of landmarks. Although for simple build-

    ings a simple planar abstraction should suffice, we look at architec-

    Please cite this article as: N. Kobyshev et al., Efficient architectural str

    derstanding (2016), http://dx.doi.org/10.1016/j.cviu.2016.06.004

    ural landmarks which contain more complex shapes and structure

    han usual buildings.

    Our method for decomposing 3D reconstructions exploits mul-

    iple structural cues like mirror and rotational symmetries, free-

    orm polylines and planes. As we formulate it as a multi-label

    uctural element decomposition, Computer Vision and Image Un-

    http://dx.doi.org/10.1016/j.cviu.2016.06.004

  • N. Kobyshev et al. / Computer Vision and Image Understanding 0 0 0 (2016) 1–13 13

    ARTICLE IN PRESS JID: YCVIU [m5G; July 7, 2016;21:54 ]

    o

    i

    t

    l

    s

    n

    c

    l

    e

    A

    d

    d

    d

    a

    t

    c

    t

    A

    (

    m

    a

    R

    A

    B

    B

    B

    B

    B

    B

    B

    B

    C

    C

    D

    D

    D

    D

    E

    F

    F

    F

    H

    H

    H

    H

    J

    J

    K

    K

    K

    K

    L

    L

    L

    L

    L

    L

    M

    M

    M

    M

    M

    M

    M

    M

    O

    O

    P

    P

    R

    R

    S

    S

    S

    T

    T

    T

    V

    V

    W

    W

    WW

    Z

    ptimization, our method works on noisy 3D point clouds from

    mage-based reconstruction. Experimental evaluation confirms that

    he results are state-of-the-art for the decomposition of complex

    andmark buildings.

    In future work we plan to iterate symmetry extraction and

    tructural element assignment and infer long-distance graph con-

    ections to complete empty areas of the 3D reconstruction to over-

    ome some aspects of the missing data, as well as to enlarge the

    ist of detected ASEs as well as researching the extraction of gen-

    ral 3D shapes Kalogerakis et al. (2010) , Mansfield et al. (2014) .

    nother contribution we leave for future work is integrating noise

    etection into optimization. That will facilitate simultaneous ASE

    ecomposition and generally image-based structure-from-motion

    ataset denoising.

    The understanding of 3D architectural models such as buildings

    nd landmarks, will pave the way for more automation of creating

    alk maps Weissenberg et al. (2014) and creating more structurally

    orrect textured 3D models Dai et al. (2013) for better visualiza-

    ion.

    cknowledgments

    This work was supported by the European Research Council

    ERC) project VarCity (#273940) and the Institute of Photogram-

    etry and Remote Sensing, ETH Zurich by providing the aerial im-

    gery.

    eferences

    garwal, S. , Snavely, N. , Simon, I. , Seitz, S. , Szeliski, R. , 2009. Building rome in a day.

    ICCV . allard, D. , 1981. Generalizing the hough transform to detect arbitrary shapes. PR

    33 (2), 111–122 . ao, Y. , Chandraker, M. , Lin, Y. , Savarese, S. , 2013. Dense object reconstruction with

    semantic priors. CVPR . erger, M. , Tagliasacchi, A. , Seversky, L. , Alliez, P. , Levine, J. , Sharf, A. , Silva, C. , 2014.

    State of the art in surface reconstruction from point clouds. EUROGRAPHICS .

    ódis-Szomorú, A. , Riemenschneider, H. , Van Gool, L. , 2014. Fast, approximate piece-wise-planar modeling based on sparse structure-from-motion and superpixels.

    CVPR . ódis-Szomorú, A. , Riemenschneider, H. , Van Gool, L. , 2015. Superpixel meshes for

    fast edge-preserving surface reconstruction. CVPR . oulch, A. , De La Gorce, M. , Marlet, R. , 2014. Piecewise-planar 3D reconstruction

    with edge and corner regularization. Comput. Graph. Forum 33 (5), 55–64 . oykov, Y. , Kolmogorov, V. , 2004. An experimental comparison of min-cut/max-flow

    algorithms for energy minimization in vision. PAMI 26 (9), 124–1137 .

    oykov, Y. , Veksler, O. , Zabih, R. , 2001. Fast approximate energy minimization viagraph cuts. PAMI 23 (11), 1222–1239 .

    hauve, A. , Labatut, P. , Pons, J. , 2010. Robust piecewise-planar 3d reconstruction andcompletion from large-scale unstructured point data. CVPR .

    ohen, A. , Zach, C. , Sinha, S.N. , Pollefeys, M. , 2012. Discovering and exploiting 3Dsymmetries in structure from motion. CVPR .

    ai, D. , Riemenschneider, H. , Schmitt, G. , Van Gool, L. , 2013. Example-based facade

    texture synthesis. ICCV . ame, A. , Prisacariu, V. , Ren, C. , Reid, I. , 2013. Dense Reconstruction Using 3D Object

    Shape Priors. CVPR . elong, A. , Osokin, A. , Isack, H.N. , Boykov, Y. , 2011. Fast approximate energy mini-

    mization with label costs. Int. J. Comput. Vis. 96 (1), 1–27 . ouglas, D. , Peucker, T. , 1973. Algorithms for the reduction of the number of points

    required to represent a digitized line or its caricature. Can. Cartograph. 10 (2),

    112–122 . veringham, M., Van Gool, L., Williams, C. K. I., Winn, J., Zisserman, A., 2012. The

    PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. ischler, M. , Bolles, R. , 1981. Random sample consensus: a paradigm for model fit-

    ting with applications to image analysis and automated cartography. Commun.ACM 24 (6), 381–395 .

    urukawa, Y. , Curless, B. , Seitz, S. , Szeliski, R. , 2010. Towards internet-scale multi-

    -view stereos. CVPR . urukawa, Y. , Ponce, J. , 2010. Accurate, dense, and robust multi-view stereopsis.

    PAMI 32 (8), 1362–1376 . aene, C. , Zach, C. , Zeisl, B. , Pollefeys, M. , 2012. A patch prior for dense 3D recon-

    struction in man-made environments. 3DPVT . ao, Q. , Cai, R. , Li, Z. , Zhang, L. , Pang, Y. , Wu, F. , 2012. 3D visual phrases for land-

    mark recognition. CVPR .

    Please cite this article as: N. Kobyshev et al., Efficient architectural str

    derstanding (2016), http://dx.doi.org/10.1016/j.cviu.2016.06.004

    iep, V. , Labatut, P. , Pons, J. , Keriven, R. , 2012. High accuracy and visibility-consis-tent dense multi-view stereo. PAMI 34 (5), 889–901 .

    ough, P. , 1959. Machine analysis of bubble chamber pictures. In: International Con-ference on High Energy Accelerators and Instrumentation .

    ancosek, M. , Pajdla, T. , 2011. Multi-view reconstruction preserving weakly-sup-ported surfaces. CVPR .

    ohnson, A.E. , Hebert, M. , 1999. Using spin images for efficient object recognition incluttered 3d scenes. PAMI 21 (5), 433–449 .

    alogerakis, E. , Hertzmann, A. , Singh, K. , 2010. Learning 3d mesh segmentation and

    labeling. ACM Trans. Graph. 29 (4), 102 . nopp, J. , Prasad, M. , Willems, G. , Timofte, R. , Van Gool, L. , 2010. Hough transform

    and 3d surf for robust three dimensional classification. ECCV . olmogorov, V. , R.Zabih , 2004. What energy functions can be minimized via graph

    cuts? PAMI 26 (2), 147–159 . ser, K. , Zach, C. , Pollefeys, M. , 2011. Dense 3d reconstruction of symmetric scenes

    from a single image. DAGM .

    afarge, F. , Keriven, R. , Bredif, M. , 2009. Combining meshes and geometric primitivesfor accurate and semantic modeling. BMVC .

    afarge, F. , Keriven, R. , Bredif, M. , Vu, H. , 2010. Hybrid multi-view reconstruction byjump-diffusion. CVPR .

    afarge, F. , Keriven, R. , Bredif, M. , Vu, H. , 2013. A hybrid multi-view stereo algorithmfor modeling urban scenes. PAMI 35 (1), 5–17 .

    iu, H. , Vimont, U. , Wand, M. , Cani, M.-P. , Hahmann, S. , Rohmer, D. , Mitra, N.J. , 2015.

    Replaceable substructures for efficient part-based modeling. EUROGRAPHICS . iu, Y. , Hel-Or, H. , Kaplan, C. , Van Gool, L. , 2009. Computational symmetry in com-

    puter vision and computer graphics. Foundations Trends Comput. Graph. Vis. 5(1), 1–195 .

    ocher, A. , Perdoch, M. , Riemenschneider, H. , Van Gool, L. , 2016. Mobile phone andcloud – a dream team for 3d reconstruction. WACV .

    ansfield, A. , Kobyshev, N. , Riemenschneider, H. , Chang, W. , Van Gool, L. , 2014.

    Frankenhorse: Automatic completion of articulating objects from image-basedreconstruction. BMVC .

    artinovic, A. , Knopp, J. , Riemenschneider, H. , Van Gool, L. , 2015. 3d all the way:semantic segmentation of urban scenes from start to end in 3d. CVPR .

    artinovic, A. , Mathias, M. , Weissenberg, J. , Van Gool, L. , 2012. A three-layered ap-proach to facade parsing. ECCV .

    itra, N.J. , Guibas, L.J. , Pauly, M. , 2006. Partial and approximate symmetry detection

    for 3D geometry. SIGGRAPH . itra, N.J. , Guibas, L.J. , Pauly, M. , 2007. Symmetrization. SIGGRAPH .

    itra, N.J. , Pauly, M. , Wand, M. , Ceylan, D. , 2013. Symmetry in 3D geometry: extrac-tion and applications. Comput. Graph. Forum 32 (6), 1–23 .

    onszpart, A. , Brostow, G.J. , 2015. RAPTER : Rebuilding Man-made Scenes with Reg-ular Arrangements of Planes. SIGGRAPH .

    üller, P. , Zeng, G. , Wonka, P. , Van Gool, L. , 2007. Image-based procedural modeling

    of facades. SIGGRAPH . esau, S. , Lafarge, F. , Alliez, P. , 2015. Planar shape detection and regularization in

    tandem. Comput. Graph. Forum 1–14 . lsson, C. , Enqvist, O. , 2011. Stable structure from motion for unordered image col-

    lections. SCIA . auly, M. , Mitra, N.J. , Wallner, J. , Pottmann, H. , Guibas, L.J. , 2008. Discovering struc-

    tural regularity in 3D geometry. SIGGRAPH . ham, T.T. , Chin, T.J. , Schindler, K. , Suter, D. , 2014. Interacting geometric priors for

    robust multimodel fitting. IEEE Trans. Image Process. 23 (10), 1 .

    iemenschneider, H. , Krispel, U. , Thaller, W. , Donoser, M. , Havemann, S. , Fellner, D. ,Bischof, H. , 2012. Irregular lattices for complex shape grammar facade parsing.

    CVPR . usu, R.B. , Blodow, N. , Beetz, M. , 2009. Fast Point Feature Histograms (FPFH) for 3D

    Registration. ICRA . chnabel, R. , Degener, P. , Klein, R. , 2009. Completion and reconstruction with prim-

    itive shapes. Comput. Graph. Forum 28 (2), 503–512 .

    chnabel, R. , Wahl, R. , Klein, R. , 2007. Efficient ransac for point-cloud shape detec-tion. Comput. Graph. Forum 26 (2), 214–226 .

    inha, S. , Steedly., D. , Szeliski, R. , 2009. Piecewise planar stereo for image-based ren-dering. ICCV .

    eboul, O. , Simon, L. , Koutsourakis, P. , Paragios, N. , 2010. Segmentation of buildingfacades using procedural shape prior. CVPR .

    ingdahl, D. , Van Gool, L.J. , 2012. An enhanced on-line service for 3D model con-

    struction from photographs. Int. J. Heritage Digital Era 1 (2), 277–294 . oshev, A. , Mordohai, P. , Taskar, B. , 2010. Detecting and parsing architecture at city

    scale from range data. CVPR . anegas, C. , Aliaga, D. , Benes, B. , 2010. Building reconstruction using manhattan–

    world grammars. CVPR . erdie, Y. , Lafarge, F. , 2014. Detecting parametric objects in large scenes by monte

    carlo sampling. IJCV 106 (1), 55–75 .

    eissenberg, J. , Gygli, M. , Riemenschneider, H. , Van Gool, L. , 2014. Navigation usingspecial buildings as signposts. MapInteract .

    erner, T. , Zisserman, A. , 2002. New techniques for automated architecture recon-struction from photographs. ECCV .

    u, C. , 2013. Towards linear-time incremental structure from motion. 3DPVT . u, C. , Agarwal, S. , 2012. Schematic surface reconstruction. CVPR .

    heng, Y. , Cohen-Or, D. , Averkiou, M. , Mitra, N.J. , 2014. Recurring part arrangements

    in shape collections. Comput. Graph. Forum 33 (2), 115–124 .

    uctural element decomposition, Computer Vision and Image Un-

    http://dx.doi.org/10.13039/501100000781http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0001http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0001http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0001http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0001http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0001http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0001http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0002http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0002http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0003http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0003http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0003http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0003http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0003http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0004http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0004http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0004http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0004http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0004http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0004http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0004http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0004http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0005http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0005http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0005http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0005http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0006http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0006http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0006http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0006http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0007http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0007http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0007http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0007http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0008http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0008http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0008http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0009http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0009http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0009http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0009http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0010http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0010http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0010http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0010http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0011http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0011http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0011http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0011http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0011http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0012http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0012http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0012http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0012http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0012http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0013http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0013http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0013http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0013http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0013http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0014http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0014http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0014http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0014http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0014http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0015http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0015http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0015http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0016http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0016http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0016http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0017http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0017http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0017http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0017http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0017http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0018http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0018http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0018http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0019http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0019http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0019http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0019http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0019http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0020http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0020http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0020http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0020http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0020http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0020http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0020http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0021http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0021http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0021http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0021http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0021http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0022http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0022http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0023http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0023http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0023http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0024http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0024http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0024http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0025http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0025http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0025http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0025http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0026http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0026http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0026http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0026http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0026http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0026http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0027http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0027http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0027http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0028http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0028http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0028http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0028http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0029http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0029http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0029http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0029http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0030http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0030http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0030http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0030http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0030http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0031http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0031http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0031http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0031http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0031http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0032http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0032http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0032http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0032http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0032http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0032http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0032http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0032http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0033http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0033http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0033http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0033http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0033http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0034http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0034http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0034http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0034http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0034http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0035http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0035http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0035http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0035http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0035http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0035http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0036http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0036http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0036http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0036http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0036http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0037http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0037http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0037http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0037http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0037http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0038http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0038http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0038http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0038http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0039http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0039http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0039http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0039http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0040http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0040http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0040http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0040http://refhub.elsevier.com/S1077-3142(16)30080-7/sbref0040