17
Validation and Automatic Repair of Planar Partitions Using a Constrained Triangulation Ken Arroyo Ohori Hugo Ledoux Martijn Meijers This is an author’s version of the paper. The authoritative version is: Validation and automatic repair of planar partitions using a constrained triangulation. Ken Arroyo Ohori, Hugo Ledoux and Martijn Meijers. Pho- togrammetrie, Fernerkundung, Geoinformation 5, October 2012, pp. 613–630. ISSN: 1432-8364. doi: 10.1127/1432-8364/2012/0143 Related source code is available at: https://github.com/tudelft3d/pprepair Planar partitions are frequently used to model, among others, land cover, cadas- tral parcels and administrative boundaries. In practice, they are oſten stored as a set of individual polygons to which aributes are aached (e.g. with the Sim- ple Features paradigm), causing different errors and inconsistencies (e.g. gaps, overlaps and disconnected polygons), which are introduced during their creation, manipulation and exchange. These errors severely hamper the use of planar par- titions in other soſtware (e.g. due to false assumptions causing erroneous calcu- lations). Existing approaches to validate planar partitions involve first building a planar graph of the polygons and enforcing constraints, then repair is done by snapping vertices and edges of this graph. We argue that these approaches have many shortcomings in terms of complexity, numerical robustness and difficulty of implementation, and do not guarantee valid results. Furthermore, they are semi-automatic, requiring manual user intervention. We propose in this paper a novel method to validate and automatically repair planar partitions. It uses a constrained triangulation of the polygons as a base—which by definition is a pla- nar partition—and only simple operations are needed (i.e. labelling of triangles) to both validate and repair. Perhaps the biggest advantage of our method is that we can guarantee that a planar partition is valid aſter repair. In the paper we de- scribe the details of our method, our implementation, and the experiments we have done with real-world datasets. We show that our implementation scales to big datasets and that it offers beer capabilities and overall performance than ex- isting solutions. 1

ValidationandAutomaticRepairofPlanar ... · mightbesnappedtogether,creatingre-peatedvertices,disjointregions,orvar-ioustopologicalproblems. 4.None of the vertices of a polygon shouldbecloserthanthisthreshold

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ValidationandAutomaticRepairofPlanar ... · mightbesnappedtogether,creatingre-peatedvertices,disjointregions,orvar-ioustopologicalproblems. 4.None of the vertices of a polygon shouldbecloserthanthisthreshold

Validation andAutomatic Repair of PlanarPartitions Using a Constrained

TriangulationKen Arroyo Ohori Hugo Ledoux Martijn Meijers

This is an authorrsquos version of the paper The authoritative version is

Validation and automatic repair of planar partitions using a constrainedtriangulation Ken Arroyo Ohori Hugo Ledoux and Martijn Meijers Pho-togrammetrie Fernerkundung Geoinformation 5 October 2012 pp 613ndash630ISSN 1432-8364doi 1011271432-836420120143

Related source code is available athttpsgithubcomtudelft3dpprepair

Planar partitions are frequently used tomodel among others land cover cadas-tral parcels and administrative boundaries In practice they are often stored asa set of individual polygons to which attributes are attached (eg with the Sim-ple Features paradigm) causing different errors and inconsistencies (eg gapsoverlaps anddisconnectedpolygons) which are introducedduring their creationmanipulation and exchange These errors severely hamper the use of planar par-titions in other software (eg due to false assumptions causing erroneous calcu-lations) Existing approaches to validate planar partitions involve first buildinga planar graph of the polygons and enforcing constraints then repair is done bysnapping vertices and edges of this graph We argue that these approaches havemany shortcomings in terms of complexity numerical robustness and difficultyof implementation and do not guarantee valid results Furthermore they aresemi-automatic requiring manual user intervention We propose in this papera novel method to validate and automatically repair planar partitions It uses aconstrained triangulation of the polygons as a basemdashwhich by definition is a pla-nar partitionmdashand only simple operations are needed (ie labelling of triangles)to both validate and repair Perhaps the biggest advantage of our method is thatwe can guarantee that a planar partition is valid after repair In the paper we de-scribe the details of our method our implementation and the experiments wehave done with real-world datasets We show that our implementation scales tobig datasets and that it offers better capabilities and overall performance than ex-isting solutions

1

1 Introduction

Planar partitions are frequently used inGIS to model concepts such as land coverthe cadastre or the administrative bound-aries of a given country As shown in Fig-ure 1 a planar partition is a subdivision ofa polygonal subset of the plane into non-overlapping polygons In practice planarpartitions are often represented and storedin a computer as a set of individual poly-gons to which one or more attributes areattached and the topological relationshipsbetween polygons are not explicitly stored(shared boundaries are thus representedand stored twice) The preferred methodof practitioners is representing polygonsaccording to the Simple Features specifica-tion [OGC 2006] for instance as an ESRIShapefile [ESRI 1998] or in a database suchas PostGIS1

Figure 1 Part of the Corine Land Coverdataset for the region aroundDelft The Netherlands

If a planar partition is stored as a set ofindividual polygons then in practice er-rors mistakes and inconsistencies will of-ten be introduced when the planar parti-tion is built updated or exchanged Ex-amples of common errors are overlappingpolygons gaps between polygons and poly-gons not connected to the others This canbe among others due to human error theuse of floating-point arithmetic or limitedprecision [Schirra 1997] These errors canhave catastrophic consequences for practi-tioners since most software and algorithmsusing planar partitions as input assume thatthis input is valid At best erroneous results1httppostgisrefractionsnet

are returned at worst it causes a softwarefailure often without any warning to theuser Moreover such problems are oftennot visible at the scale that the data is usu-ally viewed exacerbating the problem [Lau-rini and Milleret-Raffort 1994]

Solving that issue entails working on tworelated problems (1) how to identify er-rors in a planar partition and (2) how torepair these errors As described in Sec-tion 2 both problems have been tackled inthe past with the creation of a planar graphof the input The validation the simpler ofthe two problems is usually implementedas a set of topological and geometrical con-straints that the planar graphmust have Asfor the repair it is usually performed bysnapping together the vertices and edges ofthe graph or by using topological informa-tion As we discuss in Section 2 both ap-proaches have drawbacks for users the for-mer method is error-prone (topological in-consistencies can be created) and the lat-ter is only semi-automatic (and in practicereal-world datasets can easily contain sev-eral hundred errors)

We present in this paper a novel method toboth validate and automatically repair pla-nar partitions stored according to the Sim-ple Features specification Our methodwhich is an extension of our preliminaryresults [Ledoux and Meijers 2010] uses aconstrained triangulation (CT) of the poly-gons as a supportmdashwhich is by definitiona planar partitionmdashand both the valida-tion and the repair functions are performedwith relatively simple operations Theseare the labelling of triangles and standardgraph traversal algorithms (such as depth-first search) Since errors are repairedby re-labelling triangles (vertices are nevermoved) we can guarantee that a given re-pair operation will preserve the topologicalconsistency of the whole planar partitionWe describe in Section 3 how the CT is usedhow the polygons are labelled how the val-idation is performed and how we can auto-matically repair a planar partition More-over we describe six different repair oper-ations that can be used to obtain differentoutput

We have implemented the method in C++

2

and its most relevant details are discussedin Section 4 Our software takes as in-put polygons stored according to the Sim-ple Features specification validates themrepairs them if they contain errors and re-turns a new set of polygons that is guar-anteed to be a valid planar partition Wealso report in that section our experimentswith several real-world datasets (some ofthem rather large) and we compare ourmethod and its implementation to alterna-tives both for validation and for repair Fi-nally we discuss the advantages and disad-vantages of using our method and the con-clusions drawn from this in Section 5

2 RelatedWork

Since a planar partition is formed by a setof individual polygons wefirst discusswhata valid polygon is in our context and thenwe review existing methods to validate andrepair planar partitions

21 Simple Features and validity ofsimple polygons

While there are several definitions ofwhat constitutes a valid polygon (seevan Oosterom et al [2004] for an exten-sive discussion) we use in the followingthe international standard Simple Fea-tures [OGC 2006] with the addition of theISO 19107 [ISO 2003] polygon orientationrules Simple Features defines a polygonas follows ldquoA Polygon is a planar Surfacedefined by 1 exterior boundary and 0 ormore interior boundaries Each interiorboundary defines a hole in the PolygonrdquoIn the specification six assertions aregiven that together define a valid polygonEssential for a valid polygon is that theboundaries of the polygon must define oneconnected area Additionally a polygoncan contain holes We say that the exteriorboundary of the polygon is the outer ringand a hole is an inner ring These holes canbe filled by one or more polygons whichcan recursively contain holes which arefilled by other polygons Observe also that

holes are allowed to interact with each otherand the outer boundary under certain con-ditions eg they are allowed to touch at onepoint as long as the interior of the polygonstays one connected area Each polygon isstored independently from other polygonsand it is not possible to store topologicalrelationships between the polygons TheISO 19107 specification [ISO 2003] is moreambiguously defined but it does establishorientation rules (counterclockwise for theouter ring clockwise for the inner ones)which we use in our output

The validation of a single polygon is possi-ble with different libraries GEOS2 and JTS3being two widely used open-source exam-ples

The repair of single polygons is a lessdocumented topic than their validationDifferent software vendors offer toolsto help identify and semi-automaticallyrepair broken polygons Examples areST_MakeValid() from PostGIS and the con-straints in 1Spatial Radius Topology Themethod we present in this paper has beenadapted to automatically repair commonerrors in individual polygons eg wrongring orientation or holes that split the inte-rior of a polygon (see Ledoux et al [2012])However we focus in this paper on thevalidation and repair of planar partitionsonly andwe assume that the input polygonsare individually valid

22 Validation of a Planar PartitionUsing a Planar Graph

Assuming that individual polygons havebeen deemed to be valid it is possible to testthe validity of a planar partition by identify-ing the two types of invalid configurationsoverlaps and gaps

If individual polygons are checked with-out building a planar graph (or an index-ing structure) finding overlaps involves

2Geometry Engine Open Source httptracosgeoorggeos

3Java Topology Suite httpwwwvividsolutionscomjtsjtshomehtm

3

checking whether any possible pair of poly-gons overlap This is a computationally ex-pensive operation to make (quadratic be-haviour) even when heuristics to speedup the process are used [Badawy and Aref1999 Kirkpatrick et al 2002] Additionallyrobustness issues are significant in polygonintersection tests [Hoffmann et al 1988]Finding the potential gaps in a planar par-tition is even more problematic For thiscomputing the union of the entire set ofpolygons is required which is also compu-tationally expensive [Margalit and Knott1989 Rivero and Feito 2000]

The validation process can be sped up byfirst building a planar graph of the in-put polygons which is afterwards checkedfor consistency It should first be noticedthat while different approaches are avail-able to construct a planar graph [ShamosandHoey 1976 van Roessel 1991] it is stillsometimes difficult especially if the poly-gon contains holes The graphof the bound-ary can then be unconnected and extra ma-chinery is necessary to represent the knowl-edge of holes in the graph structure Thefact that holes are also allowed to touch com-plicates the task of validation even furthersince holes cannot be assumed to form anunconnected planar graph

Based on this graph Pluumlmer and Groumlger[1997] specify a list of minimal mathemat-ical axioms that can be used to check thevalidity of a planar partition no danglingedges no zero-length edges planarity noholes no self-intersections no overlapsand having a connected graph It is impor-tant to note that Pluumlmer and Groumlger basetheir axioms on concepts from graph the-ory but they also highlight the fact that agraph-based approach alone is not enoughthe graph has to be augmented with geo-metrical knowledge (each vertex has geom-etry attached ie the coordinates of pointshave to be stored) Validation is thus under-pinned by both geometrical and topologicalconcepts and systems thus have to deal withthose two concepts at the same time Themethod we propose in this paper mdashusinga constrained triangulationmdash permits us todo exactly this to embedbothgeometry andtopology in the same structure

23 Repair Using Point and EdgeSnapping and Splitting

Themost commonmethod for planar parti-tion repair is based on the assumption thatpolygons approximatelymatch each other attheir common boundaries This impliesthat they should be within a certain dis-tance of each other along those edges Ifadditionally all parts further apart thanthis value are known not to be commonboundaries it is possible to ldquosnaprdquo togetherpolygons that are closer to each other thanthis threshold while keeping the rest un-touched This method of planar partitionrepair is available inmanyGIS packages in-cludingArcGIS4 FME5 GRASS6 and RadiusTopology7

Since thresholds are central to this methodit is of utmost importance to select a goodthreshold value something that is com-pletely different in each dataset For planarpartition repair to be successful using thismethod such a threshold should be cho-sen in a careful manner and always com-ply with a few conditions These have beensummarised as follows

1 Adjacent polygons should not be fur-ther apart than this threshold alongany part of their common boundaries(shown as the maximum threshold inFigure 2(a)) Otherwise gaps cannot befixed

2 Adjacent polygons should not overlapeach other in areas which are fur-ther inwards than this threshold fromtheir common boundaries (shown asthemaximumthreshold inFigure 2(b))Otherwise overlaps cannot be fixed

3 None of the vertices of a polygonshould be closer to each other thanthis threshold including non consec-utive vertices (shown as the minimumthresholds in Figure 2) Otherwise they

4httpwwwesricomsoftwarearcgisindexhtml

5httpwwwsafecomfme6httpgrassosgeoorg7httpwww1spatialcomsoftwareradius_topology

4

might be snapped together creating re-peated vertices disjoint regions or var-ious topological problems

4 None of the vertices of a polygonshould be closer than this thresholdto any non incident edge Otherwisethey might be snapped together creat-ing disjoint regions or various topolog-ical problems

This threshold value is usually manuallydetermined either by trial and error orby analysing certain properties of thedataset(s) involved (eg point spacing pre-cision or map scale) However it is oftenhard to find an optimal threshold for acertain dataset since ensuring that it workswell for every part of a dataset is unrealisticMoreover sometimes such a threshold doesnot even exist (eg because point spacingin some places might be smaller than thewidth of the gaps and overlaps present)

Since the aforementioned conditions arefrequently not met or are not checked be-forehand and it is still necessary to performrepair of a dataset snapping is often per-formed nevertheless possibly creating in-valid polygons andor planar partitions orsignificantly changing the topology of theexisting features Two examples of this phe-nomenon are shown in Figures 3 and 4

(a) Before snapping (b) After snapping

Figure 3 Spikes and punctures can be cre-ated by snapping since the basesof these elongated forms (encir-cled) might be narrower than thethreshold but its length not

While these examples show that snap-ping is not problem-free it is importantto note that commercial GIS packages of-ten implement more complex snappingoptions (such as point-to-edge edge-to-edge or using a reference dataset) These

(a) Before snapping (b) After snapping

Figure 4 Polygons can be split by snappingsince some parts of themmight benarrower than the threshold (en-circled) While this result does notcreate an invalid planar partitionit can change the number of poly-gons present and their topologi-cal relations and can therefore beundesirable

options can help to solve a problematiccase but can also have undesired conse-quences such as changing the topology ofthe polygons Another problem is thatpost-processing operations to clean result-ing polygonsmight be required (eg dispos-ing of polygons with small areas removingredundant lines thresholds for minimumangles etc) which could again create in-valid configurations requiring iterative val-idation or repair processes

24 Repair Using TopologicalInformation

Adifferent approach for planar partition re-pair based on topological information isavailable in some software

GRASS also creates a graph using edges asa base structure instead of triangles andcould be used to (manually) detect overlapsand gaps based on the number of labels us-ing the functions vwhat and dwhatvectHowever there is no simple automated pro-cedure to get the number of labels at a cer-tain point which makes it very cumber-some and time consuming to use GRASS forthis purpose

Meanwhile ArcGIS provides a more com-plete solution using a method similar insome ways to the one developed and de-scribed in Section 3 It involves using the

5

minimumdistancethreshold

maximumdistancethreshold

(a) Gaps between polygons

minimumdistancethreshold

maximumdistancethreshold

(b) Overlapping polygons

Figure 2 Defining a threshold for vertex edge and face snapping The threshold touse should be larger than the largest minimum distance between the matchingboundaries and smaller than the minimum distance between vertices

Geodatabase feature of the software withsome combined validation rules (eg mustnot overlap and must not have gaps) How-ever fixing every problematic area in an ap-propriate manner requires extensive userintervention (see Figure 5) since the bestchoice for each case depends on the specificconfiguration of the error

Since both the aforementioned programsdo not offer an automated process to cor-rectly solve this problem (and GRASS lacksthe ability to visualise problem areas) theyare not really comparable to our solutionA planar partition can easily contain tensof thousands of polygons possibly generat-ing thousands of errors which need to bechecked and repaired semi-automatically

3 Validation andAutomaticRepair Using a ConstrainedTriangulation

Our approach to validation and automaticrepair of planar partitions uses a con-strained triangulation (CT) as a supportingstructure because it has many good proper-ties including the following

bull It is by definition a planar partitionTherefore as long as we keep the in-formation about which polygon eachtriangle belongs to the reconstructedpolygons will be either a valid planarpartition or multiple ones

bull It can be built quickly in 119874(119899 log 119899)with a variety of approaches [Guibasand Stolfi 1985 Muumlcke et al 1999Clarkson et al 1992]8

bull Changes to the triangulation (egadding a new constrained edge) arelocal and therefore fast

bull Constrained edges canusually be addedin constant time being only signif-icantly slower (and more complex)when there is an intersection with anexisting constrained edge [Shewchuk1997b]

bull Implementation-wise several stableand fast triangulation libraries existincluding CGAL [CGAL 2011] Trian-gle [Shewchuk 1997a] and GTS [GTS2006]

The general workflow of our approach toboth validate and repair a planar partitionis as follows

1 the CT of the input segments formingthe polygons is constructed

2 each triangle in the CT is labelled withthe label of the polygon inside which itis located

3 problems are detected by identifyingtriangles having no or multiple labels

8The actual computational complexity can be119874(119899 111361111136141113606 119899+119896) with 119896 being the number of edge-edgeintersections which could conceivably even be 1198991113579However 119896 ≪ 119899 for most GIS datasets

6

(a) Viewing a topology error in ArcGIS (b) Assigning an overlapping region to one of thepolygons involved

Figure 5 Planar partition repair in ArcGIS The user is expected to zoom in to a particularerror analyse the situation (eg by looking at the properties from the surround-ing polygons) and make a decision to assign the problematic region to a certainpolygon More than 11 000 errors were detected in this tile of the Corine dataset

and by verifying the connectivity be-tween triangles

4 repairing of the problems ismade by re-labelling triangles to ensure that eachtriangle has exactly one label

5 extracting the polygons from the trian-gulation (polygons modelled with theSimple Features specification)

As mentioned previously for this workflowwe assume that each input polygon is indi-vidually valid We describe in the followingsection the concepts needed and we give adetailed description of the different steps

31 Triangulation of a polygon andconstrained triangulation

A triangulation subdivides an area intonon-overlapping triangles Using a con-strained triangulation every line segmentthat defines the boundary of a polygon isensured to appear as an edge in the tri-angulation It is known that any polygon(also with holes) can be triangulated with-out adding extra vertices [de Berg et al2008 Shewchuk 1997a] Figure 6 shows anexample

In our approach the triangulation is per-formed by constructing incrementally a CT

Figure 6 (a) A polygon with 4 holes (b)The constrained triangulation ofthe segments of this polygon

of all the segments representing the bound-aries (outer + inner) of each polygon Ifthe set of input polygons forms a planarpartition then each segment will be in-serted twice (except those forming the outerboundary of the set of input polygons) Thisis usually not a problem for triangulation li-braries because they ignore points and seg-ments at the same location (as is the casewith the solution we use see Section 4)When segments are found to intersect theyare split with a new point created at theintersection This is the only situation inwhich the generation of new points is re-quired

Notice that our approach requires only aconstrained triangulation and not one thatfulfils the Delaunay criterion [Shewchuk

7

1997a] However having well-shaped tri-angles is useful for repair purposes anddoes not significantly increase the process-ing time Therefore our implementation ac-tually constructs and uses a constrainedDe-launay triangulation

32 Labelling the triangles of aPlanar Partition

Labelling a triangle means assigning it la-bel(s) for the polygon(s) that it belongs to (iftwo input polygons overlap each triangle inthe overlapping region should have the la-bels of the two polygons)

In our previous work on validation [Ledouxand Meijers 2010] we used the centroid ofa polygon to start the labelling process butthis method is prone to errors (if for in-stance the calculated centroid is outside oron the boundary of the polygon) and doesnot permit us to differentiate gaps fromoverlaps

To solve these problems we store informa-tion about the constrained edges of the CTSince it is known that the input rings areclosed and have a known orientation (ac-cording to the ISO 19107 orientation rules)it is also known on which side of a cer-tain line segment the interior of the poly-gon lies This property is used for robustlabelling of each polygon Triangles adja-cent to the outer ring of the polygons arelabelled first and this is later expandedto triangles further in the interior of thepolygon recursively labelling adjacent un-labelled faces as long as no constrainededges are crossed After this operation alltriangles that are part of any polygon arelabelled with overlapping regions havingmultiply labelled triangles However holesare then indistinguishable from trianglesoutside theplanar partition since bothhavezero labels Therefore a special label is cre-ated for all the triangles outside the planarpartition referred to as the ldquouniverserdquo labelwhich are labelled by recursively labellingadjacent triangles from any triangle knownto lie in the exterior of the planar parti-tion Weexploit the concept of the ldquofar-awaypointrdquo to achieve this [Liu and Snoeyink

2006] which is used by several implemen-tations and is also known as the ldquobig trian-glerdquo [Facello 1995]

33 Validation

If the set of input polygons forms a planarpartition then all the triangles will be la-belled with one and only one label Theproblems are easily detected

bull Gaps are detected by finding triangleswithout any labels

bull Overlaps are detected by finding trian-gles with two or more labels

bull Disjoint regions are detected by iden-tifying regions separated by the ldquouni-verserdquo label This is done by starting ata given triangle and doing a breadth-first search on the dual of the triangu-lation without visiting the triangles la-belled as ldquouniverserdquo If all the trianglescan be reached then no polygons of theplanar partition are disjoint

34 Repair Operations

The greatest benefits of using a labelled tri-angulation for planar partition repair stemfrom the fact that while repair operationsare performed the validity of the planarpartition is always kept together with theintegrity of the data Unlike with snappingvertices are not required to be moved dur-ing the process and unlike snapping repairis performed using local criteria instead ofglobal ones (in snapping the threshold isusually fixed for the whole dataset) Thiscomes as a contrast to othermethods wherecare needs to be taken to ensure that the (ge-ometric or topological) validity is kept

Figure 7 shows the standard steps requiredin a repair operation In order to avoid or-der dependency when repairing the repairoperation itself is always performed after allthe choices for which label to assign havebeen made

In particular we propose six different re-pair operations that can be used to fix gaps

8

Table 1 The repair operations currently implemented in our software The types of oper-ations are defined in the map algebra classification [Tomlin 1994]

Repair operation Type Criteria

Triangle by priority list Varies Label that has the highest priority according toa predefined priority list

Triangle by number of Focal The label present in the largest number ofneighbours adjacent faces overlaps includedTriangle by absolute majority Focal Label present in two or more valid

adjacent facesTriangle by longest boundary Focal Label present along the longest portion of

the boundary of the adjacent facesRegions by longest boundary Focal of Label present along the longest portion of

zonal the boundary of the adjacent facesRegions by random neighbour Focal of Random label from the adjacent faces

zonal

and overlaps These are shown in Table 1and all imply re-labelling triangles Fourof them use triangles as a base (ie the la-bel assigned is based only on that of itsthree neighbouring triangles) which is fastand modifies the area of each input poly-gon the least Despite their simplicity theyoffer substantial control over the resultsFor instance the first two operations onlydiffer from each other in their handlingof overlapping faces but triangle by num-ber of neighbours is better for large over-lapping regions while triangle by absolutemajority is better in fixing small problemsHaving well-shaped (Delaunay) triangles ismost useful for triangle-based repair func-tions Two of them use regions of adja-cent triangles with equivalent sets of la-bels which is slower than a triangle-basedmethod but yields results that can be car-tographically more pleasing An interest-ing repair operation for practicioners is theone in which a priority of labels is used iein case of gapsoverlaps the labels in thetriangle (overlap) or in the adjacent poly-gons (gap) are ordered according to a user-defined priority and the highest priority isassigned to the problematic triangles No-tice that these repair operations can be usedone after the other (in a hierarchical man-ner) for instance if first the repair accord-ing to the longest boundary is used but onezone has two or more boundaries with ex-actly the same length then the deadlock can

be solved by choosing one randomly Asample of the results obtained with differ-ent repair operators is also shown in Fig-ure 8

More repair operations basedonextensionsto the idea of labelling trianglesregions canbe further developed For instance tri-angles could be split to subdivide an areawith problems (as in Bader and Weibel[1997]) or sliver trianglesregions could bediscarded (and then filled during repair)

35 Extraction of Polygons From aTriangulation

Starting from a labelled triangulation itis possible to reconstruct the individualpolygons of a planar partition to conformto valid polygons according to the ISO19107 and the Simple Features specifica-tions which allows users to incorporate au-tomatic validation and repair in their work-flow

We do this operation polygon by polygonWe start at anunprocessed triangle and visitall the connected triangles having the samelabel (to reconstruct the polygon) mark-ing the triangles as processed as we visitthem Note that since all these trianglesare connected the outer and inner bound-aries of a polygon are all simple (non self-

9

over

lapp

ing

region

red

blue

(a) A set ofoverlappingpolygons thatneeds to berepaired

red

blue

(b) The poly-gons aretriangulatedand eachtriangle islabelled withthe polygons itbelongs to

A

C

D

B

red

blue

(c) An invalidregion isfound definedas a set ofcontiguouspolygons withan equivalentset of 0 or ge 1113569labels

Triangle Repair

ABCD

redredredred

(d) The neigh-bours of theregion areanalysed to seehow it shouldbe re-labelledand the resultis saved onto alist

red

blue

(e) The tri-angles arerelabelledaccording tothe list

red

blue

(f) The outputpolygonswhich nowhave no gapsor overlaps

Figure 7 The steps in a generic repairoperation

intersecting) We repeat this operation un-til all the triangles have been processed

For each polygon we have to recover notonly its outer boundary but also its in-ner boundaries which are not connectedObserve that we cannot simply follow theoriginal constrained edges as these do nothave any meaning after a planar partitionswas repaired the boundaries of the repairedpolygons are instead formed by edges in-cident to two triangles having different la-bels

For each polygon we walk at a triangle andmove on to triangles having the same label

(a) The original poly-gons

(b) Repaired eachtriangle using thelabel adjacent alongthe longest boundaryfrom the neighbouringtriangles

(c) Repaired each regionusing a random labelfrom the neighbouringtriangles

(d) Repaired each regionusing the label adjacentalong the longest bound-ary from the neighbour-ing triangles

Figure 8 Different repair operations usedin the twopolygons for theArribesdel Duero Natural Park in Spain(reddarker grey) and the Inter-national Douro Natural Park inPortugal (greenlighter grey) Allof them can be considered bestby a certain criterion like pre-serving the area ratio between thetwo polygons (b) smoothness ofthe boundary (d) or a balance be-tween the two (c)

As the process goes a single polyline thatruns along all the boundaries of the poly-gon is generated This involves a depth-firstsearch (clockwise) that recursively reachesuntil the boundary of a polygon return-ing a long chain of edges in a proceduresimilar to following the boundary edge byedge The procedure is shown step by stepin Figure 9 The polyline created with thismethod has ldquobridgesrdquo which allow us tokeep all inner boundaries (holes) connectedwith the outer one in a manner that keepsthe interior connected as well These helpto preserve connectivity and the relationsbetween different (outer and inner) bound-aries but are removed later in the process(to conform to the Simple Features specifi-cation) Also its orientation conveys the in-

10

AB

CD

FE

123

45

67 8 9

1011 12

1314 15 16 17

1819

20

21

22

23

24

25

2627 28

29

3130G

H IJ

K

L

MN

O

Figure 9 The traversal order when navigating through triangles in a clockwise mannerStarting at 119860 and the edge between 119860 and 119861 the operations occur accordingto the encircled numbers Black arrows denote when an unprocessed triangle isfound red arrowswhen it is not Notice how the traversal is performed clockwise(shown in dark blue arrows) for both cases despite starting from different sides

formation of whether a section of it is partof an inner or an outer boundary

This polyline is processed with a stack-based algorithm that generates separateclosed rings for the outer boundary andeach of the inner boundaries collapsing theldquobridgesrdquo that were generated In order todo this the polyline is cut at the placeswhere more than two edges join and theseare joined in the correct order by keepingtrack of (yet) unclosed rings When a newsegment is being processed it can be oneof three options one that completes a ringone that is part of a bridge or one thatis not yet closed Closed rings are storedand bridges are removed while unclosedrings are saved in the stack until they can bepopped to forma closed ring togetherwith anew segment This is shown in Figure 10

4 Implementation andExperiments

41 Implementation

An implementation of the algorithms de-scribed in Section 3 was written in the C++

programming language using external li-braries for some functionality The de-veloped software is called pprepair and isopen source and freely available at httptudelft3dgithubcompprepair C++was selected in order to have plenty of con-trol with regards to low level details andto achieve good performance which makesit possible to compare it with existing so-lutions The libraries used are the OGRSimple Features Library9 (which allows in-put and output from a large variety ofdata formats common in GIS) and CGAL10(which has support for many robust spatialdata structures and the operations based onthem we use its constrained triangulationmodule)

42 Experiments with real-worldplanar partition datasets

We havemade experiments with four freelyavailable real-world datasets ie we havevalidated and automatically repaired themwith our implementation the overview ofthese datasets is shown in Figure 11 and9httpwwwgdalorgogr10Computational Geometry Algorithms Library

httpwwwcgalorg

11

AB

a

b

CD

EFG

HIJK c

de

f

Figure 10 Processing the polyline shown starting from vertex 119886 and moving to-wards polyline 119860 the following operations occur (1) 119860 unclosedrarrpush(2) 119861 unclosedrarrpush (3) 119862 unclosedrarrpush (4) 119863 unclosedrarrpush (5)119864 unclosedrarrpush (6) 119865 closedrarrstore (7) poprarr 119864119866 closedrarrstore (8)poprarr 119863119867 degeneraterarrerase (9) poprarr 119862119868 closedrarrstore (10) poprarr 119861119869degeneraterarrerase (11) poprarr 119860119870 closedrarrstore

(a) E41N27 (b) 4tiles

(c) 16tiles (d) Mexico

Figure 11 Overview of the four datasets

their properties in Table 2 The datasets arethe following

E41N27 Corine 200011 tile E41N27 whichcontains a shifted polygon (by about10 cm) creating many small gaps andoverlaps in the dataset The snapping

11Corine is a land cover dataset for 32 Eu-ropean countries It is freely available athttpwwweeaeuropaeudata-and-mapsdatacorine-land-cover-2000-clc2000-seamless-vector-database

threshold has been set at 1 m

4tiles Corine 2000 tiles E39N32 E39N33E40N32 and E40N33 which are knownto have long and thin overlapping re-gions (lt 1 mm) with each other Thesnapping thresholdhas been set at 1 cm

16tiles 16 adjacent Corine 2000 tilesE39N30 E39N31 E39N32 E39N33E40N30 E40N31 E40N32 E40N33E41N30 E41N31 E41N32 E41N33E42N30 E42N31 E42N32 E42E33Some have gaps between one anothersome overlap but match within a fewcentimetres The snapping thresholdhas been set at 10 cm

Mexico 11 000 000 scale land cover datasetfrom INEGI consisting of over 26 000polygons Interestingly it is mostly al-ready valid (according to the Shapefilespecification) but contains some verylarge polygons with tens of thousandsof vertices

As a comparison we have also tried to per-form the same operations with other avail-able software While the capabilities forplanar partition repair among the softwaretested vary considerably with full topolog-ical repair only available in ArcGIS (usingmanual operations only) it is also impor-tant to consider how different repair im-plementations scale to large datasets For

12

Table 2 Properties of the datasets used for the experiments

pts avg pts polygons pts largest polygon per polygon

E41N27 14 969 496 303 26 740 3374tiles 4 984 365 702 16 961 74716tiles 63 868 6 622 133 95 112 1037Mexico 26 866 4 181 354 117 736 1556

this a few performance tests were made inour implementation and three planar par-tition repair tools that perform this processusing snapping and splitting The testingmethodology for each tool is as follows

ArcGIS In ArcCatalog a multiple featuredataset is created in a Geodatabase setwith XY resolution values equal to thesnapping threshold The features areimported into it and themerge and dis-solve operations are used tomerge adja-cent polygons with the same ID Topol-ogy is then generated to check that theplanar partition is valid Everything isfinally exported to a single ShapefileThe individual parts of the process aretimed and the total is recorded Mem-ory usage is calculated as the differ-ence between the just loaded ArcCata-log application and itsmaximummem-ory usage throughout the process

FME A reader is created for each input filewhich serve as input to a Snapper trans-former features with the same IDs arethen dissolved and finally they are out-put into a new Shapefile writer Thetopology generator is used to be ableto tell whether the result is a valid pla-nar partition Results are timed and themaximum memory allocation of thefmeexe process is recorded

GRASS Input files are imported withvinogr with all polygon cleaning op-erations performed and snapping set tothe correct values Boundaries betweenfeatures with the same IDs are thendissolved using vdissolve Files arethen exported with voutogr Timesreported by GRASS are added togetherto give the total while memory usage

by the vinogrexe vdissolveexeand voutogrexe are monitored andtheir maximum is recorded

pprepair Files are read and put into thetriangulation the triangulation is la-belled the repair is performed withthe longest boundary first (see Table 1)with ambiguous cases resolved with arandom choice Polygons are then ex-tracted from the triangulation and out-put to a single Shapefile The entireprocess is timed and the maximumamount of memory used is recorded

Notice that we are somewhat comparingapples with oranges here since our imple-mentation is able to repair more cases thanother methods keeps topological consis-tency and does not require finding out theappropriate threshold value (if it exists)More importantly it is able to directly statewhether the result is a planar partition un-like the three other solutions (with thesesnapping is performed but that does notguarantee that the output is a valid planarpartition) The results of the experimentsare shown in Table 3

We have made the experiments in orderto have an idea of the processing timeand the memory usage involved Onlycases that are acceptably solved by snappingand splitting have been considered sincethere is no other comparable automatedtopological repair tool among those stud-ied that would also work for more complexcases such as those that require a snappingthreshold too high to be practical (eg hor-izontal conflation of independently gener-ated datasets [Yuan and Tao 1999 Davis2003])

13

Table 3 Planar partition repair comparison using large datasets

pprepair ArcGIS FME GRASS

memory time memory time memory time memory time

E41N27 145 MB 19s 145 MB 1m3s 158 MB 31s 59 MB 3m9s4tiles 116 MB 17s 113 MB 37s 105 MB 31s 49 MB 53s16tiles 145 GB 4m47s crashes ndash 636 MB 15m48s crashes ndashMexico 101 GB 3m31s 216 MB gt1d 264 MB 2m45s 408 MB 11m38s

All tests have been run 5 times in a ma-chine just booted and the results averagedto account for the small variations that oc-curred except in the case where the execu-tion was cancelled after one full day due totime limitations or when it causes the pro-gram to crash The hardware is a 266 GHzCore 2DuoMacBookProwith 4GBofRAMArcGIS 93 FME 2010 SP1 and GRASS 64were run in Windows 7 while pprepair wasrun in Mac OS X 1071

As Table 3 shows our approach uses some-what more memory than other solutionsThis is explained by the extra (uncon-strained) edges that are added to the inputover when triangulating them It shouldhowever be noticed that both ArcGIS andGRASS crashed with the biggest dataset16tiles Our implementation is the fastestof the four tested being for instance aroundthree times faster than FME for the biggestdatasets Only for the Mexico dataset isour implementation slower than FME Thisis (probably) explained by the fact that itspolygons already form a valid planar parti-tion and therefore very few snapping op-erations have been performed by FME theplanar graph of the input was basicallybuilt and then the polygons saved back todisk We believe that ArcGIS and GRASSstruggled with the dataset because it con-tains several very large polygons (withmorethan 10 000 vertices) Our implementa-tion took 3m31s but as Table 4 shows ittook CGAL 3m02s to simply triangulate theinput edges This table also demonstratesthe efficiency of our approach for the fourdatasets around 85 of the time of our ap-proach was usedmdashby the CGAL librarymdashtoread the input fromdisk and triangulate the

Table 4 Timed steps of the planar partitionrepair procedure rounded down tothe nearest second The percentageis the triangulation time over thetotal time

E41N27 4tiles 16tiles Mexico

Triangulate 017 015 400 302Label 001 001 027 016Repair 000 000 000 000Reconstruct 001 001 011 007Output 000 000 010 006

Total 019 017 447 331triangulate 89 88 83 86

input edges

During the implementation we took sev-eral engineering decisions to optimise ourprogram One of them was to favour diskspace over computation time as we wantedto be able to process large datasets This iswhy we always reconstruct polygons even ifthey not modified by the repair process Al-though possible it would require us to keepin memory the original polygons (whichwould significantly increase the memoryconsumption) and to keep track of whichlabels have been modified And as Table 4shows the reconstruction is very efficient asit only takes around 3ndash4 of the total timefor the 16tiles and Mexico datasets

14

5 Discussion andConclusions

We have presented a new method for re-pairing polygonal area partitions ensur-ing that the output partition is valid ieall individual polygons are conforming tothe ISO 19107 [ISO 2003] and Simple Fea-tures [OGC 2006] specifications and nogaps nor overlaps are present between anypair of polygons Thenovelty of ourmethodlies in the fact that repair is performed ac-cording to user defined criteria but thentakes place without any human interven-tion Automatic repair is becoming increas-ingly an important topic due to data in-tegration eg data collected for the differ-ent themes of the European INSPIRE Ini-tiative will finally have to fit together anddata could eventually bematched fully auto-matically by dedicated web service compo-nents [INSPIRE 2009] Our approach couldbe at the heart of such a web service

The proposed approach excels in auto-mated repair at the cost of increased mem-ory usage compared to a pure graph-basedapproachmdashthis difference is mainly causedby the unconstrained edges introduced bythe triangulation While it would be possi-ble to use our repair rules together with a(primaldual) graph-based approach theseadditional edges in the triangulation givefine-grained control over the repair oper-ations and ensures that the graph is con-nected which facilitates the reconstructionof polygons

We have implemented our algorithm overa numerically robust triangulator (CGAL)and since repair operations are expressedsolely in terms of re-labelling of triangles(no geometric computation is involved) theapproach is also fully robust Since duringour experiments most of the time was usedto compute the constrained triangulationanother library could also be tested to im-prove the implementation

For the future we plan to

bull Improve the scalability of the approachand process and repair datasets withmore than 10 million polygons It is

known that using divide-and-conquertechniques triangulation algorithmscan handle big datasets [Amenta et al2003 Blandford et al 2005] We willinvestigate whether it is possible toautomatically repair each divided partindividually and lsquogluersquo the repairedparts together

bull Investigate snap rounding [Hobby1999 de Berg et al 2007] as a pre-processing stepmdashor embeddeddirectlyin the triangulation mdash to guaranteethat repaired planar partitions have novertices that are closer than a certain120576 threshold However snap roundingmay change the topology of the inputbut the output will nevertheless be avalid partition (as the topology willbe repaired) Apart from topologicalchanges snap rounding can also leadto removed polygon parts that are toosmall to be preserved based on thechosen 120576

bull Add more advanced repair operationsto our repair toolkit eg repair couldtake place based on splitting a collec-tion of triangles

bull Extend our work to include the thirddimension to validate and repair 3Dcity models using a constrained tetra-hedralisation [Si 2008] Notice thatthe tetrahedralisation of a given poly-hedron does not always exist and thusextra (Steiner) points might need tobe added The main concepts of ourapproach (re)labelling and reconstruc-tion extend naturally to 3D How-ever appropriate repair operations for3D city models would need to be de-fined and some application-specificconstraints (eg right angles at corners)are not trivial to implement in our cur-rent approach

ReferencesNina Amenta Sunghee Choi and Guumln-ter Rote Incremental constructions conBRIO In Proceedings of the 19th An-nual Symposium on Computational Geome-try pages 211ndash219 ACM Press 2003

15

WaelM Badawy andWalidG Aref On localheuristics to speed up polygon-polygonintersection tests In Proceedings of the 7thACM International Symposium on Advancesin Geographic Information Systems pages97ndash102 ACM 1999

M Bader and R Weibel Detecting andresolving size and proximity conflicts inthe generalization of polygonal maps InProceedings of the 18th International Carto-graphic Conference Stockholm pages 1525ndash1532 1997

Daniel K Blandford Guy E BlellochDavid E Cardoze and Clemens KadowCompact representations of simplicialmeshes in two and three dimensionsInternational Journal of ComputationalGeometry and Applications 15(1)3ndash242005

CGAL CGAL 38 User and Reference ManualCGAL Editorial Board 2011

Kenneth L Clarkson Kurt Mehlhorn andRaimund Seidel Four results on random-ized incremental constructions In AlainFinkel and Matthias Jantzen editors Pro-ceedings of the 9th Annual Symposium onTheoretical Aspects of Computer Science vol-ume 577 of Lecture Notes in Computer Sci-ence pages 461ndash474 SpringerBerlin Hei-delberg 1992

Martin Davis Java conflation suite Tech-nical report Vivid Solutions 2003 Avail-able at httpwwwvividsolutionscomjcs

Mark de Berg Dan Halperin and MarkOvermars An intersection-sensitive al-gorithm for snap rounding Computa-tional Geometry 36(3)159 ndash 165 2007 ISSN0925-7721

Mark de Berg Marc van Kreveld MarkOvermars and Otfried SchwarzkopfComputational Geometry Algorithms andApplications Springer-Verlag 3 edition2008

ESRI Shapefile technical descriptionWhite paper ESRI July 1998

Michael A Facello Implementation of arandomized algorithm for Delaunay and

regular triangulations in three dimen-sions Computer Aided Geometric Design 12(4)349ndash370 1995

GTS GTS Library Reference Manual 2006URL httpgtssourceforgenetreferencebook1html

LeonidasGuibas and Jorge Stolfi Primitivesfor the manipulation of general subdivi-sions and the computation of VoronoiACM Transactions on Graphics 4(2)74ndash1231985

John D Hobby Practical segment intersec-tion with finite precision output Com-putational Geometry 13(4)199 ndash 214 1999ISSN 0925-7721

Christoph M Hoffmann John E Hopcroftand Michael S Karasick Towards imple-menting robust geometric computationsIn Proceedings of the 4th Annual Symposiumon Computational Geometry pages 106ndash117ACM 1988

INSPIRE D25 Generic conceptual modelversion 32 Technical report DraftingTeam ldquoData Specificationsrdquo 2009 URLhttpinspirejrceceuropaeudocumentsData_SpecificationsD25_v32pdf

ISO 19107 Geographic information ndash spa-tial schema March 2003

David Kirkpatrick Jack Snoeyink and Bet-tina Speckmann Kinetic collision detec-tion for simple polygons InternationalJournal of Computational Geometry and Ap-plications 12(1ndash2)3ndash27 2002

Robert Laurini and Franccediloise Milleret-Raffort Topological reorganization ofinconsistent geographical databases Astep towards their certification Comput-ers amp Graphics 18(6)803ndash813 December1994

Hugo Ledoux and Martijn Meijers Val-idation of planar partitions using con-strained triangulations In ProceedingsJoint International Conference on TheoryData Handling and Modelling in GeoSpa-tial Information Science pages 51ndash55 HongKong May 2010

16

Hugo Ledoux Ken Arroyo Ohori and Mar-tijn Meijers Automatically repairing in-valid polygons with a constrained trian-gulation In Proceedings of the 15th AGILEInternational Conference on Geographic In-formation Science 2012

Yuanxin Liu and Jack Snoeyink Farawaypoint A sentinel point forDelaunay com-putation International Journal of Compu-tational Geometry and Applications 18(4)343ndash355 November 2006

Avraham Margalit and Gary D Knott Analgorithm for computing the union in-tersection or difference of two poly-gons Computers amp Graphics 13(2)167ndash183 1989

Ernst P Muumlcke Isaac Saias and Bin-hai Zhu Fast randomized point loca-tion without preprocessing in two- andthree-dimensional Delaunay triangula-tions Computational GeometrymdashTheoryand Applications 1263ndash83 1999

OGC OpenGIS Implementation Specificationfor Geographic Information - Simple FeatureAccess - Part 1 Common Architecture 120edition October 2006

Lutz Pluumlmer and Gerhard Groumlger Achiev-ing integrity in geographic informationsystemsmdashmaps and nested maps GeoIn-formatica 1(4)345ndash367 1997

M Rivero and FR Feito Boolean opera-tions on general planar polygons Com-puters amp Graphics 24(6)881ndash896 Decem-ber 2000

Stefan Schirra Precision and Robustness inGeometric Computations volume Algorith-mic Foundations of Geographic Informa-

tion Systems of Lecture Notes in ComputerScience chapter 9 pages 255ndash287 SpringerBerlin Heidelberg 1997

Michael Ian Shamos and Dan Hoey Geo-metric intersection problems In FOCSpages 208ndash215 IEEE 1976

Jonathan Richard Shewchuk Delaunay Re-finement Mesh Generation PhD thesisSchool of Computer Science CarnegieMellon University Pittsburg USA 1997a

Jonathan Richard Shewchuk Adaptiveprecision floating-point arithmetic andfast robust geometric predicates Dis-crete amp Computational Geometry 18305ndash363 1997b

Hang Si Three Dimensional Boundary Con-forming Delaunay Mesh Generation PhDthesis Technische Universitaumlt Berlin2008

C Dana Tomlin Map algebra One perspec-tive Landscape and Urban Planning 30(1ndash2)3ndash12 October 1994

Peter van OosteromWilko Quak and TheoTijssen About invalid valid and cleanpolygons In Peter F Fisher editor Devel-opments in Spatial Data Handlingmdash11th In-ternational Symposium on Spatial DataHan-dling pages 1ndash16 Springer 2004

Jan W van Roessel A new approach toplane-sweep overlay Topological struc-turing and line-segment classificationCartography and Geographic InformationScience 18(1)49ndash67 January 1991

Shuxin Yuan and Chuang Tao Develop-ment of conflation components Geoin-formatics and Socioinformatics pages 1ndash13June 1999

17

  • Introduction
  • Related Work
    • Simple Features and validity of simple polygons
    • Validation of a Planar Partition Using a Planar Graph
    • Repair Using Point and Edge Snapping and Splitting
    • Repair Using Topological Information
      • Validation and Automatic Repair Using a Constrained Triangulation
        • Triangulation of a polygon and constrained triangulation
        • Labelling the triangles of a Planar Partition
        • Validation
        • Repair Operations
        • Extraction of Polygons From a Triangulation
          • Implementation and Experiments
            • Implementation
            • Experiments with real-world planar partition datasets
              • Discussion and Conclusions
Page 2: ValidationandAutomaticRepairofPlanar ... · mightbesnappedtogether,creatingre-peatedvertices,disjointregions,orvar-ioustopologicalproblems. 4.None of the vertices of a polygon shouldbecloserthanthisthreshold

1 Introduction

Planar partitions are frequently used inGIS to model concepts such as land coverthe cadastre or the administrative bound-aries of a given country As shown in Fig-ure 1 a planar partition is a subdivision ofa polygonal subset of the plane into non-overlapping polygons In practice planarpartitions are often represented and storedin a computer as a set of individual poly-gons to which one or more attributes areattached and the topological relationshipsbetween polygons are not explicitly stored(shared boundaries are thus representedand stored twice) The preferred methodof practitioners is representing polygonsaccording to the Simple Features specifica-tion [OGC 2006] for instance as an ESRIShapefile [ESRI 1998] or in a database suchas PostGIS1

Figure 1 Part of the Corine Land Coverdataset for the region aroundDelft The Netherlands

If a planar partition is stored as a set ofindividual polygons then in practice er-rors mistakes and inconsistencies will of-ten be introduced when the planar parti-tion is built updated or exchanged Ex-amples of common errors are overlappingpolygons gaps between polygons and poly-gons not connected to the others This canbe among others due to human error theuse of floating-point arithmetic or limitedprecision [Schirra 1997] These errors canhave catastrophic consequences for practi-tioners since most software and algorithmsusing planar partitions as input assume thatthis input is valid At best erroneous results1httppostgisrefractionsnet

are returned at worst it causes a softwarefailure often without any warning to theuser Moreover such problems are oftennot visible at the scale that the data is usu-ally viewed exacerbating the problem [Lau-rini and Milleret-Raffort 1994]

Solving that issue entails working on tworelated problems (1) how to identify er-rors in a planar partition and (2) how torepair these errors As described in Sec-tion 2 both problems have been tackled inthe past with the creation of a planar graphof the input The validation the simpler ofthe two problems is usually implementedas a set of topological and geometrical con-straints that the planar graphmust have Asfor the repair it is usually performed bysnapping together the vertices and edges ofthe graph or by using topological informa-tion As we discuss in Section 2 both ap-proaches have drawbacks for users the for-mer method is error-prone (topological in-consistencies can be created) and the lat-ter is only semi-automatic (and in practicereal-world datasets can easily contain sev-eral hundred errors)

We present in this paper a novel method toboth validate and automatically repair pla-nar partitions stored according to the Sim-ple Features specification Our methodwhich is an extension of our preliminaryresults [Ledoux and Meijers 2010] uses aconstrained triangulation (CT) of the poly-gons as a supportmdashwhich is by definitiona planar partitionmdashand both the valida-tion and the repair functions are performedwith relatively simple operations Theseare the labelling of triangles and standardgraph traversal algorithms (such as depth-first search) Since errors are repairedby re-labelling triangles (vertices are nevermoved) we can guarantee that a given re-pair operation will preserve the topologicalconsistency of the whole planar partitionWe describe in Section 3 how the CT is usedhow the polygons are labelled how the val-idation is performed and how we can auto-matically repair a planar partition More-over we describe six different repair oper-ations that can be used to obtain differentoutput

We have implemented the method in C++

2

and its most relevant details are discussedin Section 4 Our software takes as in-put polygons stored according to the Sim-ple Features specification validates themrepairs them if they contain errors and re-turns a new set of polygons that is guar-anteed to be a valid planar partition Wealso report in that section our experimentswith several real-world datasets (some ofthem rather large) and we compare ourmethod and its implementation to alterna-tives both for validation and for repair Fi-nally we discuss the advantages and disad-vantages of using our method and the con-clusions drawn from this in Section 5

2 RelatedWork

Since a planar partition is formed by a setof individual polygons wefirst discusswhata valid polygon is in our context and thenwe review existing methods to validate andrepair planar partitions

21 Simple Features and validity ofsimple polygons

While there are several definitions ofwhat constitutes a valid polygon (seevan Oosterom et al [2004] for an exten-sive discussion) we use in the followingthe international standard Simple Fea-tures [OGC 2006] with the addition of theISO 19107 [ISO 2003] polygon orientationrules Simple Features defines a polygonas follows ldquoA Polygon is a planar Surfacedefined by 1 exterior boundary and 0 ormore interior boundaries Each interiorboundary defines a hole in the PolygonrdquoIn the specification six assertions aregiven that together define a valid polygonEssential for a valid polygon is that theboundaries of the polygon must define oneconnected area Additionally a polygoncan contain holes We say that the exteriorboundary of the polygon is the outer ringand a hole is an inner ring These holes canbe filled by one or more polygons whichcan recursively contain holes which arefilled by other polygons Observe also that

holes are allowed to interact with each otherand the outer boundary under certain con-ditions eg they are allowed to touch at onepoint as long as the interior of the polygonstays one connected area Each polygon isstored independently from other polygonsand it is not possible to store topologicalrelationships between the polygons TheISO 19107 specification [ISO 2003] is moreambiguously defined but it does establishorientation rules (counterclockwise for theouter ring clockwise for the inner ones)which we use in our output

The validation of a single polygon is possi-ble with different libraries GEOS2 and JTS3being two widely used open-source exam-ples

The repair of single polygons is a lessdocumented topic than their validationDifferent software vendors offer toolsto help identify and semi-automaticallyrepair broken polygons Examples areST_MakeValid() from PostGIS and the con-straints in 1Spatial Radius Topology Themethod we present in this paper has beenadapted to automatically repair commonerrors in individual polygons eg wrongring orientation or holes that split the inte-rior of a polygon (see Ledoux et al [2012])However we focus in this paper on thevalidation and repair of planar partitionsonly andwe assume that the input polygonsare individually valid

22 Validation of a Planar PartitionUsing a Planar Graph

Assuming that individual polygons havebeen deemed to be valid it is possible to testthe validity of a planar partition by identify-ing the two types of invalid configurationsoverlaps and gaps

If individual polygons are checked with-out building a planar graph (or an index-ing structure) finding overlaps involves

2Geometry Engine Open Source httptracosgeoorggeos

3Java Topology Suite httpwwwvividsolutionscomjtsjtshomehtm

3

checking whether any possible pair of poly-gons overlap This is a computationally ex-pensive operation to make (quadratic be-haviour) even when heuristics to speedup the process are used [Badawy and Aref1999 Kirkpatrick et al 2002] Additionallyrobustness issues are significant in polygonintersection tests [Hoffmann et al 1988]Finding the potential gaps in a planar par-tition is even more problematic For thiscomputing the union of the entire set ofpolygons is required which is also compu-tationally expensive [Margalit and Knott1989 Rivero and Feito 2000]

The validation process can be sped up byfirst building a planar graph of the in-put polygons which is afterwards checkedfor consistency It should first be noticedthat while different approaches are avail-able to construct a planar graph [ShamosandHoey 1976 van Roessel 1991] it is stillsometimes difficult especially if the poly-gon contains holes The graphof the bound-ary can then be unconnected and extra ma-chinery is necessary to represent the knowl-edge of holes in the graph structure Thefact that holes are also allowed to touch com-plicates the task of validation even furthersince holes cannot be assumed to form anunconnected planar graph

Based on this graph Pluumlmer and Groumlger[1997] specify a list of minimal mathemat-ical axioms that can be used to check thevalidity of a planar partition no danglingedges no zero-length edges planarity noholes no self-intersections no overlapsand having a connected graph It is impor-tant to note that Pluumlmer and Groumlger basetheir axioms on concepts from graph the-ory but they also highlight the fact that agraph-based approach alone is not enoughthe graph has to be augmented with geo-metrical knowledge (each vertex has geom-etry attached ie the coordinates of pointshave to be stored) Validation is thus under-pinned by both geometrical and topologicalconcepts and systems thus have to deal withthose two concepts at the same time Themethod we propose in this paper mdashusinga constrained triangulationmdash permits us todo exactly this to embedbothgeometry andtopology in the same structure

23 Repair Using Point and EdgeSnapping and Splitting

Themost commonmethod for planar parti-tion repair is based on the assumption thatpolygons approximatelymatch each other attheir common boundaries This impliesthat they should be within a certain dis-tance of each other along those edges Ifadditionally all parts further apart thanthis value are known not to be commonboundaries it is possible to ldquosnaprdquo togetherpolygons that are closer to each other thanthis threshold while keeping the rest un-touched This method of planar partitionrepair is available inmanyGIS packages in-cludingArcGIS4 FME5 GRASS6 and RadiusTopology7

Since thresholds are central to this methodit is of utmost importance to select a goodthreshold value something that is com-pletely different in each dataset For planarpartition repair to be successful using thismethod such a threshold should be cho-sen in a careful manner and always com-ply with a few conditions These have beensummarised as follows

1 Adjacent polygons should not be fur-ther apart than this threshold alongany part of their common boundaries(shown as the maximum threshold inFigure 2(a)) Otherwise gaps cannot befixed

2 Adjacent polygons should not overlapeach other in areas which are fur-ther inwards than this threshold fromtheir common boundaries (shown asthemaximumthreshold inFigure 2(b))Otherwise overlaps cannot be fixed

3 None of the vertices of a polygonshould be closer to each other thanthis threshold including non consec-utive vertices (shown as the minimumthresholds in Figure 2) Otherwise they

4httpwwwesricomsoftwarearcgisindexhtml

5httpwwwsafecomfme6httpgrassosgeoorg7httpwww1spatialcomsoftwareradius_topology

4

might be snapped together creating re-peated vertices disjoint regions or var-ious topological problems

4 None of the vertices of a polygonshould be closer than this thresholdto any non incident edge Otherwisethey might be snapped together creat-ing disjoint regions or various topolog-ical problems

This threshold value is usually manuallydetermined either by trial and error orby analysing certain properties of thedataset(s) involved (eg point spacing pre-cision or map scale) However it is oftenhard to find an optimal threshold for acertain dataset since ensuring that it workswell for every part of a dataset is unrealisticMoreover sometimes such a threshold doesnot even exist (eg because point spacingin some places might be smaller than thewidth of the gaps and overlaps present)

Since the aforementioned conditions arefrequently not met or are not checked be-forehand and it is still necessary to performrepair of a dataset snapping is often per-formed nevertheless possibly creating in-valid polygons andor planar partitions orsignificantly changing the topology of theexisting features Two examples of this phe-nomenon are shown in Figures 3 and 4

(a) Before snapping (b) After snapping

Figure 3 Spikes and punctures can be cre-ated by snapping since the basesof these elongated forms (encir-cled) might be narrower than thethreshold but its length not

While these examples show that snap-ping is not problem-free it is importantto note that commercial GIS packages of-ten implement more complex snappingoptions (such as point-to-edge edge-to-edge or using a reference dataset) These

(a) Before snapping (b) After snapping

Figure 4 Polygons can be split by snappingsince some parts of themmight benarrower than the threshold (en-circled) While this result does notcreate an invalid planar partitionit can change the number of poly-gons present and their topologi-cal relations and can therefore beundesirable

options can help to solve a problematiccase but can also have undesired conse-quences such as changing the topology ofthe polygons Another problem is thatpost-processing operations to clean result-ing polygonsmight be required (eg dispos-ing of polygons with small areas removingredundant lines thresholds for minimumangles etc) which could again create in-valid configurations requiring iterative val-idation or repair processes

24 Repair Using TopologicalInformation

Adifferent approach for planar partition re-pair based on topological information isavailable in some software

GRASS also creates a graph using edges asa base structure instead of triangles andcould be used to (manually) detect overlapsand gaps based on the number of labels us-ing the functions vwhat and dwhatvectHowever there is no simple automated pro-cedure to get the number of labels at a cer-tain point which makes it very cumber-some and time consuming to use GRASS forthis purpose

Meanwhile ArcGIS provides a more com-plete solution using a method similar insome ways to the one developed and de-scribed in Section 3 It involves using the

5

minimumdistancethreshold

maximumdistancethreshold

(a) Gaps between polygons

minimumdistancethreshold

maximumdistancethreshold

(b) Overlapping polygons

Figure 2 Defining a threshold for vertex edge and face snapping The threshold touse should be larger than the largest minimum distance between the matchingboundaries and smaller than the minimum distance between vertices

Geodatabase feature of the software withsome combined validation rules (eg mustnot overlap and must not have gaps) How-ever fixing every problematic area in an ap-propriate manner requires extensive userintervention (see Figure 5) since the bestchoice for each case depends on the specificconfiguration of the error

Since both the aforementioned programsdo not offer an automated process to cor-rectly solve this problem (and GRASS lacksthe ability to visualise problem areas) theyare not really comparable to our solutionA planar partition can easily contain tensof thousands of polygons possibly generat-ing thousands of errors which need to bechecked and repaired semi-automatically

3 Validation andAutomaticRepair Using a ConstrainedTriangulation

Our approach to validation and automaticrepair of planar partitions uses a con-strained triangulation (CT) as a supportingstructure because it has many good proper-ties including the following

bull It is by definition a planar partitionTherefore as long as we keep the in-formation about which polygon eachtriangle belongs to the reconstructedpolygons will be either a valid planarpartition or multiple ones

bull It can be built quickly in 119874(119899 log 119899)with a variety of approaches [Guibasand Stolfi 1985 Muumlcke et al 1999Clarkson et al 1992]8

bull Changes to the triangulation (egadding a new constrained edge) arelocal and therefore fast

bull Constrained edges canusually be addedin constant time being only signif-icantly slower (and more complex)when there is an intersection with anexisting constrained edge [Shewchuk1997b]

bull Implementation-wise several stableand fast triangulation libraries existincluding CGAL [CGAL 2011] Trian-gle [Shewchuk 1997a] and GTS [GTS2006]

The general workflow of our approach toboth validate and repair a planar partitionis as follows

1 the CT of the input segments formingthe polygons is constructed

2 each triangle in the CT is labelled withthe label of the polygon inside which itis located

3 problems are detected by identifyingtriangles having no or multiple labels

8The actual computational complexity can be119874(119899 111361111136141113606 119899+119896) with 119896 being the number of edge-edgeintersections which could conceivably even be 1198991113579However 119896 ≪ 119899 for most GIS datasets

6

(a) Viewing a topology error in ArcGIS (b) Assigning an overlapping region to one of thepolygons involved

Figure 5 Planar partition repair in ArcGIS The user is expected to zoom in to a particularerror analyse the situation (eg by looking at the properties from the surround-ing polygons) and make a decision to assign the problematic region to a certainpolygon More than 11 000 errors were detected in this tile of the Corine dataset

and by verifying the connectivity be-tween triangles

4 repairing of the problems ismade by re-labelling triangles to ensure that eachtriangle has exactly one label

5 extracting the polygons from the trian-gulation (polygons modelled with theSimple Features specification)

As mentioned previously for this workflowwe assume that each input polygon is indi-vidually valid We describe in the followingsection the concepts needed and we give adetailed description of the different steps

31 Triangulation of a polygon andconstrained triangulation

A triangulation subdivides an area intonon-overlapping triangles Using a con-strained triangulation every line segmentthat defines the boundary of a polygon isensured to appear as an edge in the tri-angulation It is known that any polygon(also with holes) can be triangulated with-out adding extra vertices [de Berg et al2008 Shewchuk 1997a] Figure 6 shows anexample

In our approach the triangulation is per-formed by constructing incrementally a CT

Figure 6 (a) A polygon with 4 holes (b)The constrained triangulation ofthe segments of this polygon

of all the segments representing the bound-aries (outer + inner) of each polygon Ifthe set of input polygons forms a planarpartition then each segment will be in-serted twice (except those forming the outerboundary of the set of input polygons) Thisis usually not a problem for triangulation li-braries because they ignore points and seg-ments at the same location (as is the casewith the solution we use see Section 4)When segments are found to intersect theyare split with a new point created at theintersection This is the only situation inwhich the generation of new points is re-quired

Notice that our approach requires only aconstrained triangulation and not one thatfulfils the Delaunay criterion [Shewchuk

7

1997a] However having well-shaped tri-angles is useful for repair purposes anddoes not significantly increase the process-ing time Therefore our implementation ac-tually constructs and uses a constrainedDe-launay triangulation

32 Labelling the triangles of aPlanar Partition

Labelling a triangle means assigning it la-bel(s) for the polygon(s) that it belongs to (iftwo input polygons overlap each triangle inthe overlapping region should have the la-bels of the two polygons)

In our previous work on validation [Ledouxand Meijers 2010] we used the centroid ofa polygon to start the labelling process butthis method is prone to errors (if for in-stance the calculated centroid is outside oron the boundary of the polygon) and doesnot permit us to differentiate gaps fromoverlaps

To solve these problems we store informa-tion about the constrained edges of the CTSince it is known that the input rings areclosed and have a known orientation (ac-cording to the ISO 19107 orientation rules)it is also known on which side of a cer-tain line segment the interior of the poly-gon lies This property is used for robustlabelling of each polygon Triangles adja-cent to the outer ring of the polygons arelabelled first and this is later expandedto triangles further in the interior of thepolygon recursively labelling adjacent un-labelled faces as long as no constrainededges are crossed After this operation alltriangles that are part of any polygon arelabelled with overlapping regions havingmultiply labelled triangles However holesare then indistinguishable from trianglesoutside theplanar partition since bothhavezero labels Therefore a special label is cre-ated for all the triangles outside the planarpartition referred to as the ldquouniverserdquo labelwhich are labelled by recursively labellingadjacent triangles from any triangle knownto lie in the exterior of the planar parti-tion Weexploit the concept of the ldquofar-awaypointrdquo to achieve this [Liu and Snoeyink

2006] which is used by several implemen-tations and is also known as the ldquobig trian-glerdquo [Facello 1995]

33 Validation

If the set of input polygons forms a planarpartition then all the triangles will be la-belled with one and only one label Theproblems are easily detected

bull Gaps are detected by finding triangleswithout any labels

bull Overlaps are detected by finding trian-gles with two or more labels

bull Disjoint regions are detected by iden-tifying regions separated by the ldquouni-verserdquo label This is done by starting ata given triangle and doing a breadth-first search on the dual of the triangu-lation without visiting the triangles la-belled as ldquouniverserdquo If all the trianglescan be reached then no polygons of theplanar partition are disjoint

34 Repair Operations

The greatest benefits of using a labelled tri-angulation for planar partition repair stemfrom the fact that while repair operationsare performed the validity of the planarpartition is always kept together with theintegrity of the data Unlike with snappingvertices are not required to be moved dur-ing the process and unlike snapping repairis performed using local criteria instead ofglobal ones (in snapping the threshold isusually fixed for the whole dataset) Thiscomes as a contrast to othermethods wherecare needs to be taken to ensure that the (ge-ometric or topological) validity is kept

Figure 7 shows the standard steps requiredin a repair operation In order to avoid or-der dependency when repairing the repairoperation itself is always performed after allthe choices for which label to assign havebeen made

In particular we propose six different re-pair operations that can be used to fix gaps

8

Table 1 The repair operations currently implemented in our software The types of oper-ations are defined in the map algebra classification [Tomlin 1994]

Repair operation Type Criteria

Triangle by priority list Varies Label that has the highest priority according toa predefined priority list

Triangle by number of Focal The label present in the largest number ofneighbours adjacent faces overlaps includedTriangle by absolute majority Focal Label present in two or more valid

adjacent facesTriangle by longest boundary Focal Label present along the longest portion of

the boundary of the adjacent facesRegions by longest boundary Focal of Label present along the longest portion of

zonal the boundary of the adjacent facesRegions by random neighbour Focal of Random label from the adjacent faces

zonal

and overlaps These are shown in Table 1and all imply re-labelling triangles Fourof them use triangles as a base (ie the la-bel assigned is based only on that of itsthree neighbouring triangles) which is fastand modifies the area of each input poly-gon the least Despite their simplicity theyoffer substantial control over the resultsFor instance the first two operations onlydiffer from each other in their handlingof overlapping faces but triangle by num-ber of neighbours is better for large over-lapping regions while triangle by absolutemajority is better in fixing small problemsHaving well-shaped (Delaunay) triangles ismost useful for triangle-based repair func-tions Two of them use regions of adja-cent triangles with equivalent sets of la-bels which is slower than a triangle-basedmethod but yields results that can be car-tographically more pleasing An interest-ing repair operation for practicioners is theone in which a priority of labels is used iein case of gapsoverlaps the labels in thetriangle (overlap) or in the adjacent poly-gons (gap) are ordered according to a user-defined priority and the highest priority isassigned to the problematic triangles No-tice that these repair operations can be usedone after the other (in a hierarchical man-ner) for instance if first the repair accord-ing to the longest boundary is used but onezone has two or more boundaries with ex-actly the same length then the deadlock can

be solved by choosing one randomly Asample of the results obtained with differ-ent repair operators is also shown in Fig-ure 8

More repair operations basedonextensionsto the idea of labelling trianglesregions canbe further developed For instance tri-angles could be split to subdivide an areawith problems (as in Bader and Weibel[1997]) or sliver trianglesregions could bediscarded (and then filled during repair)

35 Extraction of Polygons From aTriangulation

Starting from a labelled triangulation itis possible to reconstruct the individualpolygons of a planar partition to conformto valid polygons according to the ISO19107 and the Simple Features specifica-tions which allows users to incorporate au-tomatic validation and repair in their work-flow

We do this operation polygon by polygonWe start at anunprocessed triangle and visitall the connected triangles having the samelabel (to reconstruct the polygon) mark-ing the triangles as processed as we visitthem Note that since all these trianglesare connected the outer and inner bound-aries of a polygon are all simple (non self-

9

over

lapp

ing

region

red

blue

(a) A set ofoverlappingpolygons thatneeds to berepaired

red

blue

(b) The poly-gons aretriangulatedand eachtriangle islabelled withthe polygons itbelongs to

A

C

D

B

red

blue

(c) An invalidregion isfound definedas a set ofcontiguouspolygons withan equivalentset of 0 or ge 1113569labels

Triangle Repair

ABCD

redredredred

(d) The neigh-bours of theregion areanalysed to seehow it shouldbe re-labelledand the resultis saved onto alist

red

blue

(e) The tri-angles arerelabelledaccording tothe list

red

blue

(f) The outputpolygonswhich nowhave no gapsor overlaps

Figure 7 The steps in a generic repairoperation

intersecting) We repeat this operation un-til all the triangles have been processed

For each polygon we have to recover notonly its outer boundary but also its in-ner boundaries which are not connectedObserve that we cannot simply follow theoriginal constrained edges as these do nothave any meaning after a planar partitionswas repaired the boundaries of the repairedpolygons are instead formed by edges in-cident to two triangles having different la-bels

For each polygon we walk at a triangle andmove on to triangles having the same label

(a) The original poly-gons

(b) Repaired eachtriangle using thelabel adjacent alongthe longest boundaryfrom the neighbouringtriangles

(c) Repaired each regionusing a random labelfrom the neighbouringtriangles

(d) Repaired each regionusing the label adjacentalong the longest bound-ary from the neighbour-ing triangles

Figure 8 Different repair operations usedin the twopolygons for theArribesdel Duero Natural Park in Spain(reddarker grey) and the Inter-national Douro Natural Park inPortugal (greenlighter grey) Allof them can be considered bestby a certain criterion like pre-serving the area ratio between thetwo polygons (b) smoothness ofthe boundary (d) or a balance be-tween the two (c)

As the process goes a single polyline thatruns along all the boundaries of the poly-gon is generated This involves a depth-firstsearch (clockwise) that recursively reachesuntil the boundary of a polygon return-ing a long chain of edges in a proceduresimilar to following the boundary edge byedge The procedure is shown step by stepin Figure 9 The polyline created with thismethod has ldquobridgesrdquo which allow us tokeep all inner boundaries (holes) connectedwith the outer one in a manner that keepsthe interior connected as well These helpto preserve connectivity and the relationsbetween different (outer and inner) bound-aries but are removed later in the process(to conform to the Simple Features specifi-cation) Also its orientation conveys the in-

10

AB

CD

FE

123

45

67 8 9

1011 12

1314 15 16 17

1819

20

21

22

23

24

25

2627 28

29

3130G

H IJ

K

L

MN

O

Figure 9 The traversal order when navigating through triangles in a clockwise mannerStarting at 119860 and the edge between 119860 and 119861 the operations occur accordingto the encircled numbers Black arrows denote when an unprocessed triangle isfound red arrowswhen it is not Notice how the traversal is performed clockwise(shown in dark blue arrows) for both cases despite starting from different sides

formation of whether a section of it is partof an inner or an outer boundary

This polyline is processed with a stack-based algorithm that generates separateclosed rings for the outer boundary andeach of the inner boundaries collapsing theldquobridgesrdquo that were generated In order todo this the polyline is cut at the placeswhere more than two edges join and theseare joined in the correct order by keepingtrack of (yet) unclosed rings When a newsegment is being processed it can be oneof three options one that completes a ringone that is part of a bridge or one thatis not yet closed Closed rings are storedand bridges are removed while unclosedrings are saved in the stack until they can bepopped to forma closed ring togetherwith anew segment This is shown in Figure 10

4 Implementation andExperiments

41 Implementation

An implementation of the algorithms de-scribed in Section 3 was written in the C++

programming language using external li-braries for some functionality The de-veloped software is called pprepair and isopen source and freely available at httptudelft3dgithubcompprepair C++was selected in order to have plenty of con-trol with regards to low level details andto achieve good performance which makesit possible to compare it with existing so-lutions The libraries used are the OGRSimple Features Library9 (which allows in-put and output from a large variety ofdata formats common in GIS) and CGAL10(which has support for many robust spatialdata structures and the operations based onthem we use its constrained triangulationmodule)

42 Experiments with real-worldplanar partition datasets

We havemade experiments with four freelyavailable real-world datasets ie we havevalidated and automatically repaired themwith our implementation the overview ofthese datasets is shown in Figure 11 and9httpwwwgdalorgogr10Computational Geometry Algorithms Library

httpwwwcgalorg

11

AB

a

b

CD

EFG

HIJK c

de

f

Figure 10 Processing the polyline shown starting from vertex 119886 and moving to-wards polyline 119860 the following operations occur (1) 119860 unclosedrarrpush(2) 119861 unclosedrarrpush (3) 119862 unclosedrarrpush (4) 119863 unclosedrarrpush (5)119864 unclosedrarrpush (6) 119865 closedrarrstore (7) poprarr 119864119866 closedrarrstore (8)poprarr 119863119867 degeneraterarrerase (9) poprarr 119862119868 closedrarrstore (10) poprarr 119861119869degeneraterarrerase (11) poprarr 119860119870 closedrarrstore

(a) E41N27 (b) 4tiles

(c) 16tiles (d) Mexico

Figure 11 Overview of the four datasets

their properties in Table 2 The datasets arethe following

E41N27 Corine 200011 tile E41N27 whichcontains a shifted polygon (by about10 cm) creating many small gaps andoverlaps in the dataset The snapping

11Corine is a land cover dataset for 32 Eu-ropean countries It is freely available athttpwwweeaeuropaeudata-and-mapsdatacorine-land-cover-2000-clc2000-seamless-vector-database

threshold has been set at 1 m

4tiles Corine 2000 tiles E39N32 E39N33E40N32 and E40N33 which are knownto have long and thin overlapping re-gions (lt 1 mm) with each other Thesnapping thresholdhas been set at 1 cm

16tiles 16 adjacent Corine 2000 tilesE39N30 E39N31 E39N32 E39N33E40N30 E40N31 E40N32 E40N33E41N30 E41N31 E41N32 E41N33E42N30 E42N31 E42N32 E42E33Some have gaps between one anothersome overlap but match within a fewcentimetres The snapping thresholdhas been set at 10 cm

Mexico 11 000 000 scale land cover datasetfrom INEGI consisting of over 26 000polygons Interestingly it is mostly al-ready valid (according to the Shapefilespecification) but contains some verylarge polygons with tens of thousandsof vertices

As a comparison we have also tried to per-form the same operations with other avail-able software While the capabilities forplanar partition repair among the softwaretested vary considerably with full topolog-ical repair only available in ArcGIS (usingmanual operations only) it is also impor-tant to consider how different repair im-plementations scale to large datasets For

12

Table 2 Properties of the datasets used for the experiments

pts avg pts polygons pts largest polygon per polygon

E41N27 14 969 496 303 26 740 3374tiles 4 984 365 702 16 961 74716tiles 63 868 6 622 133 95 112 1037Mexico 26 866 4 181 354 117 736 1556

this a few performance tests were made inour implementation and three planar par-tition repair tools that perform this processusing snapping and splitting The testingmethodology for each tool is as follows

ArcGIS In ArcCatalog a multiple featuredataset is created in a Geodatabase setwith XY resolution values equal to thesnapping threshold The features areimported into it and themerge and dis-solve operations are used tomerge adja-cent polygons with the same ID Topol-ogy is then generated to check that theplanar partition is valid Everything isfinally exported to a single ShapefileThe individual parts of the process aretimed and the total is recorded Mem-ory usage is calculated as the differ-ence between the just loaded ArcCata-log application and itsmaximummem-ory usage throughout the process

FME A reader is created for each input filewhich serve as input to a Snapper trans-former features with the same IDs arethen dissolved and finally they are out-put into a new Shapefile writer Thetopology generator is used to be ableto tell whether the result is a valid pla-nar partition Results are timed and themaximum memory allocation of thefmeexe process is recorded

GRASS Input files are imported withvinogr with all polygon cleaning op-erations performed and snapping set tothe correct values Boundaries betweenfeatures with the same IDs are thendissolved using vdissolve Files arethen exported with voutogr Timesreported by GRASS are added togetherto give the total while memory usage

by the vinogrexe vdissolveexeand voutogrexe are monitored andtheir maximum is recorded

pprepair Files are read and put into thetriangulation the triangulation is la-belled the repair is performed withthe longest boundary first (see Table 1)with ambiguous cases resolved with arandom choice Polygons are then ex-tracted from the triangulation and out-put to a single Shapefile The entireprocess is timed and the maximumamount of memory used is recorded

Notice that we are somewhat comparingapples with oranges here since our imple-mentation is able to repair more cases thanother methods keeps topological consis-tency and does not require finding out theappropriate threshold value (if it exists)More importantly it is able to directly statewhether the result is a planar partition un-like the three other solutions (with thesesnapping is performed but that does notguarantee that the output is a valid planarpartition) The results of the experimentsare shown in Table 3

We have made the experiments in orderto have an idea of the processing timeand the memory usage involved Onlycases that are acceptably solved by snappingand splitting have been considered sincethere is no other comparable automatedtopological repair tool among those stud-ied that would also work for more complexcases such as those that require a snappingthreshold too high to be practical (eg hor-izontal conflation of independently gener-ated datasets [Yuan and Tao 1999 Davis2003])

13

Table 3 Planar partition repair comparison using large datasets

pprepair ArcGIS FME GRASS

memory time memory time memory time memory time

E41N27 145 MB 19s 145 MB 1m3s 158 MB 31s 59 MB 3m9s4tiles 116 MB 17s 113 MB 37s 105 MB 31s 49 MB 53s16tiles 145 GB 4m47s crashes ndash 636 MB 15m48s crashes ndashMexico 101 GB 3m31s 216 MB gt1d 264 MB 2m45s 408 MB 11m38s

All tests have been run 5 times in a ma-chine just booted and the results averagedto account for the small variations that oc-curred except in the case where the execu-tion was cancelled after one full day due totime limitations or when it causes the pro-gram to crash The hardware is a 266 GHzCore 2DuoMacBookProwith 4GBofRAMArcGIS 93 FME 2010 SP1 and GRASS 64were run in Windows 7 while pprepair wasrun in Mac OS X 1071

As Table 3 shows our approach uses some-what more memory than other solutionsThis is explained by the extra (uncon-strained) edges that are added to the inputover when triangulating them It shouldhowever be noticed that both ArcGIS andGRASS crashed with the biggest dataset16tiles Our implementation is the fastestof the four tested being for instance aroundthree times faster than FME for the biggestdatasets Only for the Mexico dataset isour implementation slower than FME Thisis (probably) explained by the fact that itspolygons already form a valid planar parti-tion and therefore very few snapping op-erations have been performed by FME theplanar graph of the input was basicallybuilt and then the polygons saved back todisk We believe that ArcGIS and GRASSstruggled with the dataset because it con-tains several very large polygons (withmorethan 10 000 vertices) Our implementa-tion took 3m31s but as Table 4 shows ittook CGAL 3m02s to simply triangulate theinput edges This table also demonstratesthe efficiency of our approach for the fourdatasets around 85 of the time of our ap-proach was usedmdashby the CGAL librarymdashtoread the input fromdisk and triangulate the

Table 4 Timed steps of the planar partitionrepair procedure rounded down tothe nearest second The percentageis the triangulation time over thetotal time

E41N27 4tiles 16tiles Mexico

Triangulate 017 015 400 302Label 001 001 027 016Repair 000 000 000 000Reconstruct 001 001 011 007Output 000 000 010 006

Total 019 017 447 331triangulate 89 88 83 86

input edges

During the implementation we took sev-eral engineering decisions to optimise ourprogram One of them was to favour diskspace over computation time as we wantedto be able to process large datasets This iswhy we always reconstruct polygons even ifthey not modified by the repair process Al-though possible it would require us to keepin memory the original polygons (whichwould significantly increase the memoryconsumption) and to keep track of whichlabels have been modified And as Table 4shows the reconstruction is very efficient asit only takes around 3ndash4 of the total timefor the 16tiles and Mexico datasets

14

5 Discussion andConclusions

We have presented a new method for re-pairing polygonal area partitions ensur-ing that the output partition is valid ieall individual polygons are conforming tothe ISO 19107 [ISO 2003] and Simple Fea-tures [OGC 2006] specifications and nogaps nor overlaps are present between anypair of polygons Thenovelty of ourmethodlies in the fact that repair is performed ac-cording to user defined criteria but thentakes place without any human interven-tion Automatic repair is becoming increas-ingly an important topic due to data in-tegration eg data collected for the differ-ent themes of the European INSPIRE Ini-tiative will finally have to fit together anddata could eventually bematched fully auto-matically by dedicated web service compo-nents [INSPIRE 2009] Our approach couldbe at the heart of such a web service

The proposed approach excels in auto-mated repair at the cost of increased mem-ory usage compared to a pure graph-basedapproachmdashthis difference is mainly causedby the unconstrained edges introduced bythe triangulation While it would be possi-ble to use our repair rules together with a(primaldual) graph-based approach theseadditional edges in the triangulation givefine-grained control over the repair oper-ations and ensures that the graph is con-nected which facilitates the reconstructionof polygons

We have implemented our algorithm overa numerically robust triangulator (CGAL)and since repair operations are expressedsolely in terms of re-labelling of triangles(no geometric computation is involved) theapproach is also fully robust Since duringour experiments most of the time was usedto compute the constrained triangulationanother library could also be tested to im-prove the implementation

For the future we plan to

bull Improve the scalability of the approachand process and repair datasets withmore than 10 million polygons It is

known that using divide-and-conquertechniques triangulation algorithmscan handle big datasets [Amenta et al2003 Blandford et al 2005] We willinvestigate whether it is possible toautomatically repair each divided partindividually and lsquogluersquo the repairedparts together

bull Investigate snap rounding [Hobby1999 de Berg et al 2007] as a pre-processing stepmdashor embeddeddirectlyin the triangulation mdash to guaranteethat repaired planar partitions have novertices that are closer than a certain120576 threshold However snap roundingmay change the topology of the inputbut the output will nevertheless be avalid partition (as the topology willbe repaired) Apart from topologicalchanges snap rounding can also leadto removed polygon parts that are toosmall to be preserved based on thechosen 120576

bull Add more advanced repair operationsto our repair toolkit eg repair couldtake place based on splitting a collec-tion of triangles

bull Extend our work to include the thirddimension to validate and repair 3Dcity models using a constrained tetra-hedralisation [Si 2008] Notice thatthe tetrahedralisation of a given poly-hedron does not always exist and thusextra (Steiner) points might need tobe added The main concepts of ourapproach (re)labelling and reconstruc-tion extend naturally to 3D How-ever appropriate repair operations for3D city models would need to be de-fined and some application-specificconstraints (eg right angles at corners)are not trivial to implement in our cur-rent approach

ReferencesNina Amenta Sunghee Choi and Guumln-ter Rote Incremental constructions conBRIO In Proceedings of the 19th An-nual Symposium on Computational Geome-try pages 211ndash219 ACM Press 2003

15

WaelM Badawy andWalidG Aref On localheuristics to speed up polygon-polygonintersection tests In Proceedings of the 7thACM International Symposium on Advancesin Geographic Information Systems pages97ndash102 ACM 1999

M Bader and R Weibel Detecting andresolving size and proximity conflicts inthe generalization of polygonal maps InProceedings of the 18th International Carto-graphic Conference Stockholm pages 1525ndash1532 1997

Daniel K Blandford Guy E BlellochDavid E Cardoze and Clemens KadowCompact representations of simplicialmeshes in two and three dimensionsInternational Journal of ComputationalGeometry and Applications 15(1)3ndash242005

CGAL CGAL 38 User and Reference ManualCGAL Editorial Board 2011

Kenneth L Clarkson Kurt Mehlhorn andRaimund Seidel Four results on random-ized incremental constructions In AlainFinkel and Matthias Jantzen editors Pro-ceedings of the 9th Annual Symposium onTheoretical Aspects of Computer Science vol-ume 577 of Lecture Notes in Computer Sci-ence pages 461ndash474 SpringerBerlin Hei-delberg 1992

Martin Davis Java conflation suite Tech-nical report Vivid Solutions 2003 Avail-able at httpwwwvividsolutionscomjcs

Mark de Berg Dan Halperin and MarkOvermars An intersection-sensitive al-gorithm for snap rounding Computa-tional Geometry 36(3)159 ndash 165 2007 ISSN0925-7721

Mark de Berg Marc van Kreveld MarkOvermars and Otfried SchwarzkopfComputational Geometry Algorithms andApplications Springer-Verlag 3 edition2008

ESRI Shapefile technical descriptionWhite paper ESRI July 1998

Michael A Facello Implementation of arandomized algorithm for Delaunay and

regular triangulations in three dimen-sions Computer Aided Geometric Design 12(4)349ndash370 1995

GTS GTS Library Reference Manual 2006URL httpgtssourceforgenetreferencebook1html

LeonidasGuibas and Jorge Stolfi Primitivesfor the manipulation of general subdivi-sions and the computation of VoronoiACM Transactions on Graphics 4(2)74ndash1231985

John D Hobby Practical segment intersec-tion with finite precision output Com-putational Geometry 13(4)199 ndash 214 1999ISSN 0925-7721

Christoph M Hoffmann John E Hopcroftand Michael S Karasick Towards imple-menting robust geometric computationsIn Proceedings of the 4th Annual Symposiumon Computational Geometry pages 106ndash117ACM 1988

INSPIRE D25 Generic conceptual modelversion 32 Technical report DraftingTeam ldquoData Specificationsrdquo 2009 URLhttpinspirejrceceuropaeudocumentsData_SpecificationsD25_v32pdf

ISO 19107 Geographic information ndash spa-tial schema March 2003

David Kirkpatrick Jack Snoeyink and Bet-tina Speckmann Kinetic collision detec-tion for simple polygons InternationalJournal of Computational Geometry and Ap-plications 12(1ndash2)3ndash27 2002

Robert Laurini and Franccediloise Milleret-Raffort Topological reorganization ofinconsistent geographical databases Astep towards their certification Comput-ers amp Graphics 18(6)803ndash813 December1994

Hugo Ledoux and Martijn Meijers Val-idation of planar partitions using con-strained triangulations In ProceedingsJoint International Conference on TheoryData Handling and Modelling in GeoSpa-tial Information Science pages 51ndash55 HongKong May 2010

16

Hugo Ledoux Ken Arroyo Ohori and Mar-tijn Meijers Automatically repairing in-valid polygons with a constrained trian-gulation In Proceedings of the 15th AGILEInternational Conference on Geographic In-formation Science 2012

Yuanxin Liu and Jack Snoeyink Farawaypoint A sentinel point forDelaunay com-putation International Journal of Compu-tational Geometry and Applications 18(4)343ndash355 November 2006

Avraham Margalit and Gary D Knott Analgorithm for computing the union in-tersection or difference of two poly-gons Computers amp Graphics 13(2)167ndash183 1989

Ernst P Muumlcke Isaac Saias and Bin-hai Zhu Fast randomized point loca-tion without preprocessing in two- andthree-dimensional Delaunay triangula-tions Computational GeometrymdashTheoryand Applications 1263ndash83 1999

OGC OpenGIS Implementation Specificationfor Geographic Information - Simple FeatureAccess - Part 1 Common Architecture 120edition October 2006

Lutz Pluumlmer and Gerhard Groumlger Achiev-ing integrity in geographic informationsystemsmdashmaps and nested maps GeoIn-formatica 1(4)345ndash367 1997

M Rivero and FR Feito Boolean opera-tions on general planar polygons Com-puters amp Graphics 24(6)881ndash896 Decem-ber 2000

Stefan Schirra Precision and Robustness inGeometric Computations volume Algorith-mic Foundations of Geographic Informa-

tion Systems of Lecture Notes in ComputerScience chapter 9 pages 255ndash287 SpringerBerlin Heidelberg 1997

Michael Ian Shamos and Dan Hoey Geo-metric intersection problems In FOCSpages 208ndash215 IEEE 1976

Jonathan Richard Shewchuk Delaunay Re-finement Mesh Generation PhD thesisSchool of Computer Science CarnegieMellon University Pittsburg USA 1997a

Jonathan Richard Shewchuk Adaptiveprecision floating-point arithmetic andfast robust geometric predicates Dis-crete amp Computational Geometry 18305ndash363 1997b

Hang Si Three Dimensional Boundary Con-forming Delaunay Mesh Generation PhDthesis Technische Universitaumlt Berlin2008

C Dana Tomlin Map algebra One perspec-tive Landscape and Urban Planning 30(1ndash2)3ndash12 October 1994

Peter van OosteromWilko Quak and TheoTijssen About invalid valid and cleanpolygons In Peter F Fisher editor Devel-opments in Spatial Data Handlingmdash11th In-ternational Symposium on Spatial DataHan-dling pages 1ndash16 Springer 2004

Jan W van Roessel A new approach toplane-sweep overlay Topological struc-turing and line-segment classificationCartography and Geographic InformationScience 18(1)49ndash67 January 1991

Shuxin Yuan and Chuang Tao Develop-ment of conflation components Geoin-formatics and Socioinformatics pages 1ndash13June 1999

17

  • Introduction
  • Related Work
    • Simple Features and validity of simple polygons
    • Validation of a Planar Partition Using a Planar Graph
    • Repair Using Point and Edge Snapping and Splitting
    • Repair Using Topological Information
      • Validation and Automatic Repair Using a Constrained Triangulation
        • Triangulation of a polygon and constrained triangulation
        • Labelling the triangles of a Planar Partition
        • Validation
        • Repair Operations
        • Extraction of Polygons From a Triangulation
          • Implementation and Experiments
            • Implementation
            • Experiments with real-world planar partition datasets
              • Discussion and Conclusions
Page 3: ValidationandAutomaticRepairofPlanar ... · mightbesnappedtogether,creatingre-peatedvertices,disjointregions,orvar-ioustopologicalproblems. 4.None of the vertices of a polygon shouldbecloserthanthisthreshold

and its most relevant details are discussedin Section 4 Our software takes as in-put polygons stored according to the Sim-ple Features specification validates themrepairs them if they contain errors and re-turns a new set of polygons that is guar-anteed to be a valid planar partition Wealso report in that section our experimentswith several real-world datasets (some ofthem rather large) and we compare ourmethod and its implementation to alterna-tives both for validation and for repair Fi-nally we discuss the advantages and disad-vantages of using our method and the con-clusions drawn from this in Section 5

2 RelatedWork

Since a planar partition is formed by a setof individual polygons wefirst discusswhata valid polygon is in our context and thenwe review existing methods to validate andrepair planar partitions

21 Simple Features and validity ofsimple polygons

While there are several definitions ofwhat constitutes a valid polygon (seevan Oosterom et al [2004] for an exten-sive discussion) we use in the followingthe international standard Simple Fea-tures [OGC 2006] with the addition of theISO 19107 [ISO 2003] polygon orientationrules Simple Features defines a polygonas follows ldquoA Polygon is a planar Surfacedefined by 1 exterior boundary and 0 ormore interior boundaries Each interiorboundary defines a hole in the PolygonrdquoIn the specification six assertions aregiven that together define a valid polygonEssential for a valid polygon is that theboundaries of the polygon must define oneconnected area Additionally a polygoncan contain holes We say that the exteriorboundary of the polygon is the outer ringand a hole is an inner ring These holes canbe filled by one or more polygons whichcan recursively contain holes which arefilled by other polygons Observe also that

holes are allowed to interact with each otherand the outer boundary under certain con-ditions eg they are allowed to touch at onepoint as long as the interior of the polygonstays one connected area Each polygon isstored independently from other polygonsand it is not possible to store topologicalrelationships between the polygons TheISO 19107 specification [ISO 2003] is moreambiguously defined but it does establishorientation rules (counterclockwise for theouter ring clockwise for the inner ones)which we use in our output

The validation of a single polygon is possi-ble with different libraries GEOS2 and JTS3being two widely used open-source exam-ples

The repair of single polygons is a lessdocumented topic than their validationDifferent software vendors offer toolsto help identify and semi-automaticallyrepair broken polygons Examples areST_MakeValid() from PostGIS and the con-straints in 1Spatial Radius Topology Themethod we present in this paper has beenadapted to automatically repair commonerrors in individual polygons eg wrongring orientation or holes that split the inte-rior of a polygon (see Ledoux et al [2012])However we focus in this paper on thevalidation and repair of planar partitionsonly andwe assume that the input polygonsare individually valid

22 Validation of a Planar PartitionUsing a Planar Graph

Assuming that individual polygons havebeen deemed to be valid it is possible to testthe validity of a planar partition by identify-ing the two types of invalid configurationsoverlaps and gaps

If individual polygons are checked with-out building a planar graph (or an index-ing structure) finding overlaps involves

2Geometry Engine Open Source httptracosgeoorggeos

3Java Topology Suite httpwwwvividsolutionscomjtsjtshomehtm

3

checking whether any possible pair of poly-gons overlap This is a computationally ex-pensive operation to make (quadratic be-haviour) even when heuristics to speedup the process are used [Badawy and Aref1999 Kirkpatrick et al 2002] Additionallyrobustness issues are significant in polygonintersection tests [Hoffmann et al 1988]Finding the potential gaps in a planar par-tition is even more problematic For thiscomputing the union of the entire set ofpolygons is required which is also compu-tationally expensive [Margalit and Knott1989 Rivero and Feito 2000]

The validation process can be sped up byfirst building a planar graph of the in-put polygons which is afterwards checkedfor consistency It should first be noticedthat while different approaches are avail-able to construct a planar graph [ShamosandHoey 1976 van Roessel 1991] it is stillsometimes difficult especially if the poly-gon contains holes The graphof the bound-ary can then be unconnected and extra ma-chinery is necessary to represent the knowl-edge of holes in the graph structure Thefact that holes are also allowed to touch com-plicates the task of validation even furthersince holes cannot be assumed to form anunconnected planar graph

Based on this graph Pluumlmer and Groumlger[1997] specify a list of minimal mathemat-ical axioms that can be used to check thevalidity of a planar partition no danglingedges no zero-length edges planarity noholes no self-intersections no overlapsand having a connected graph It is impor-tant to note that Pluumlmer and Groumlger basetheir axioms on concepts from graph the-ory but they also highlight the fact that agraph-based approach alone is not enoughthe graph has to be augmented with geo-metrical knowledge (each vertex has geom-etry attached ie the coordinates of pointshave to be stored) Validation is thus under-pinned by both geometrical and topologicalconcepts and systems thus have to deal withthose two concepts at the same time Themethod we propose in this paper mdashusinga constrained triangulationmdash permits us todo exactly this to embedbothgeometry andtopology in the same structure

23 Repair Using Point and EdgeSnapping and Splitting

Themost commonmethod for planar parti-tion repair is based on the assumption thatpolygons approximatelymatch each other attheir common boundaries This impliesthat they should be within a certain dis-tance of each other along those edges Ifadditionally all parts further apart thanthis value are known not to be commonboundaries it is possible to ldquosnaprdquo togetherpolygons that are closer to each other thanthis threshold while keeping the rest un-touched This method of planar partitionrepair is available inmanyGIS packages in-cludingArcGIS4 FME5 GRASS6 and RadiusTopology7

Since thresholds are central to this methodit is of utmost importance to select a goodthreshold value something that is com-pletely different in each dataset For planarpartition repair to be successful using thismethod such a threshold should be cho-sen in a careful manner and always com-ply with a few conditions These have beensummarised as follows

1 Adjacent polygons should not be fur-ther apart than this threshold alongany part of their common boundaries(shown as the maximum threshold inFigure 2(a)) Otherwise gaps cannot befixed

2 Adjacent polygons should not overlapeach other in areas which are fur-ther inwards than this threshold fromtheir common boundaries (shown asthemaximumthreshold inFigure 2(b))Otherwise overlaps cannot be fixed

3 None of the vertices of a polygonshould be closer to each other thanthis threshold including non consec-utive vertices (shown as the minimumthresholds in Figure 2) Otherwise they

4httpwwwesricomsoftwarearcgisindexhtml

5httpwwwsafecomfme6httpgrassosgeoorg7httpwww1spatialcomsoftwareradius_topology

4

might be snapped together creating re-peated vertices disjoint regions or var-ious topological problems

4 None of the vertices of a polygonshould be closer than this thresholdto any non incident edge Otherwisethey might be snapped together creat-ing disjoint regions or various topolog-ical problems

This threshold value is usually manuallydetermined either by trial and error orby analysing certain properties of thedataset(s) involved (eg point spacing pre-cision or map scale) However it is oftenhard to find an optimal threshold for acertain dataset since ensuring that it workswell for every part of a dataset is unrealisticMoreover sometimes such a threshold doesnot even exist (eg because point spacingin some places might be smaller than thewidth of the gaps and overlaps present)

Since the aforementioned conditions arefrequently not met or are not checked be-forehand and it is still necessary to performrepair of a dataset snapping is often per-formed nevertheless possibly creating in-valid polygons andor planar partitions orsignificantly changing the topology of theexisting features Two examples of this phe-nomenon are shown in Figures 3 and 4

(a) Before snapping (b) After snapping

Figure 3 Spikes and punctures can be cre-ated by snapping since the basesof these elongated forms (encir-cled) might be narrower than thethreshold but its length not

While these examples show that snap-ping is not problem-free it is importantto note that commercial GIS packages of-ten implement more complex snappingoptions (such as point-to-edge edge-to-edge or using a reference dataset) These

(a) Before snapping (b) After snapping

Figure 4 Polygons can be split by snappingsince some parts of themmight benarrower than the threshold (en-circled) While this result does notcreate an invalid planar partitionit can change the number of poly-gons present and their topologi-cal relations and can therefore beundesirable

options can help to solve a problematiccase but can also have undesired conse-quences such as changing the topology ofthe polygons Another problem is thatpost-processing operations to clean result-ing polygonsmight be required (eg dispos-ing of polygons with small areas removingredundant lines thresholds for minimumangles etc) which could again create in-valid configurations requiring iterative val-idation or repair processes

24 Repair Using TopologicalInformation

Adifferent approach for planar partition re-pair based on topological information isavailable in some software

GRASS also creates a graph using edges asa base structure instead of triangles andcould be used to (manually) detect overlapsand gaps based on the number of labels us-ing the functions vwhat and dwhatvectHowever there is no simple automated pro-cedure to get the number of labels at a cer-tain point which makes it very cumber-some and time consuming to use GRASS forthis purpose

Meanwhile ArcGIS provides a more com-plete solution using a method similar insome ways to the one developed and de-scribed in Section 3 It involves using the

5

minimumdistancethreshold

maximumdistancethreshold

(a) Gaps between polygons

minimumdistancethreshold

maximumdistancethreshold

(b) Overlapping polygons

Figure 2 Defining a threshold for vertex edge and face snapping The threshold touse should be larger than the largest minimum distance between the matchingboundaries and smaller than the minimum distance between vertices

Geodatabase feature of the software withsome combined validation rules (eg mustnot overlap and must not have gaps) How-ever fixing every problematic area in an ap-propriate manner requires extensive userintervention (see Figure 5) since the bestchoice for each case depends on the specificconfiguration of the error

Since both the aforementioned programsdo not offer an automated process to cor-rectly solve this problem (and GRASS lacksthe ability to visualise problem areas) theyare not really comparable to our solutionA planar partition can easily contain tensof thousands of polygons possibly generat-ing thousands of errors which need to bechecked and repaired semi-automatically

3 Validation andAutomaticRepair Using a ConstrainedTriangulation

Our approach to validation and automaticrepair of planar partitions uses a con-strained triangulation (CT) as a supportingstructure because it has many good proper-ties including the following

bull It is by definition a planar partitionTherefore as long as we keep the in-formation about which polygon eachtriangle belongs to the reconstructedpolygons will be either a valid planarpartition or multiple ones

bull It can be built quickly in 119874(119899 log 119899)with a variety of approaches [Guibasand Stolfi 1985 Muumlcke et al 1999Clarkson et al 1992]8

bull Changes to the triangulation (egadding a new constrained edge) arelocal and therefore fast

bull Constrained edges canusually be addedin constant time being only signif-icantly slower (and more complex)when there is an intersection with anexisting constrained edge [Shewchuk1997b]

bull Implementation-wise several stableand fast triangulation libraries existincluding CGAL [CGAL 2011] Trian-gle [Shewchuk 1997a] and GTS [GTS2006]

The general workflow of our approach toboth validate and repair a planar partitionis as follows

1 the CT of the input segments formingthe polygons is constructed

2 each triangle in the CT is labelled withthe label of the polygon inside which itis located

3 problems are detected by identifyingtriangles having no or multiple labels

8The actual computational complexity can be119874(119899 111361111136141113606 119899+119896) with 119896 being the number of edge-edgeintersections which could conceivably even be 1198991113579However 119896 ≪ 119899 for most GIS datasets

6

(a) Viewing a topology error in ArcGIS (b) Assigning an overlapping region to one of thepolygons involved

Figure 5 Planar partition repair in ArcGIS The user is expected to zoom in to a particularerror analyse the situation (eg by looking at the properties from the surround-ing polygons) and make a decision to assign the problematic region to a certainpolygon More than 11 000 errors were detected in this tile of the Corine dataset

and by verifying the connectivity be-tween triangles

4 repairing of the problems ismade by re-labelling triangles to ensure that eachtriangle has exactly one label

5 extracting the polygons from the trian-gulation (polygons modelled with theSimple Features specification)

As mentioned previously for this workflowwe assume that each input polygon is indi-vidually valid We describe in the followingsection the concepts needed and we give adetailed description of the different steps

31 Triangulation of a polygon andconstrained triangulation

A triangulation subdivides an area intonon-overlapping triangles Using a con-strained triangulation every line segmentthat defines the boundary of a polygon isensured to appear as an edge in the tri-angulation It is known that any polygon(also with holes) can be triangulated with-out adding extra vertices [de Berg et al2008 Shewchuk 1997a] Figure 6 shows anexample

In our approach the triangulation is per-formed by constructing incrementally a CT

Figure 6 (a) A polygon with 4 holes (b)The constrained triangulation ofthe segments of this polygon

of all the segments representing the bound-aries (outer + inner) of each polygon Ifthe set of input polygons forms a planarpartition then each segment will be in-serted twice (except those forming the outerboundary of the set of input polygons) Thisis usually not a problem for triangulation li-braries because they ignore points and seg-ments at the same location (as is the casewith the solution we use see Section 4)When segments are found to intersect theyare split with a new point created at theintersection This is the only situation inwhich the generation of new points is re-quired

Notice that our approach requires only aconstrained triangulation and not one thatfulfils the Delaunay criterion [Shewchuk

7

1997a] However having well-shaped tri-angles is useful for repair purposes anddoes not significantly increase the process-ing time Therefore our implementation ac-tually constructs and uses a constrainedDe-launay triangulation

32 Labelling the triangles of aPlanar Partition

Labelling a triangle means assigning it la-bel(s) for the polygon(s) that it belongs to (iftwo input polygons overlap each triangle inthe overlapping region should have the la-bels of the two polygons)

In our previous work on validation [Ledouxand Meijers 2010] we used the centroid ofa polygon to start the labelling process butthis method is prone to errors (if for in-stance the calculated centroid is outside oron the boundary of the polygon) and doesnot permit us to differentiate gaps fromoverlaps

To solve these problems we store informa-tion about the constrained edges of the CTSince it is known that the input rings areclosed and have a known orientation (ac-cording to the ISO 19107 orientation rules)it is also known on which side of a cer-tain line segment the interior of the poly-gon lies This property is used for robustlabelling of each polygon Triangles adja-cent to the outer ring of the polygons arelabelled first and this is later expandedto triangles further in the interior of thepolygon recursively labelling adjacent un-labelled faces as long as no constrainededges are crossed After this operation alltriangles that are part of any polygon arelabelled with overlapping regions havingmultiply labelled triangles However holesare then indistinguishable from trianglesoutside theplanar partition since bothhavezero labels Therefore a special label is cre-ated for all the triangles outside the planarpartition referred to as the ldquouniverserdquo labelwhich are labelled by recursively labellingadjacent triangles from any triangle knownto lie in the exterior of the planar parti-tion Weexploit the concept of the ldquofar-awaypointrdquo to achieve this [Liu and Snoeyink

2006] which is used by several implemen-tations and is also known as the ldquobig trian-glerdquo [Facello 1995]

33 Validation

If the set of input polygons forms a planarpartition then all the triangles will be la-belled with one and only one label Theproblems are easily detected

bull Gaps are detected by finding triangleswithout any labels

bull Overlaps are detected by finding trian-gles with two or more labels

bull Disjoint regions are detected by iden-tifying regions separated by the ldquouni-verserdquo label This is done by starting ata given triangle and doing a breadth-first search on the dual of the triangu-lation without visiting the triangles la-belled as ldquouniverserdquo If all the trianglescan be reached then no polygons of theplanar partition are disjoint

34 Repair Operations

The greatest benefits of using a labelled tri-angulation for planar partition repair stemfrom the fact that while repair operationsare performed the validity of the planarpartition is always kept together with theintegrity of the data Unlike with snappingvertices are not required to be moved dur-ing the process and unlike snapping repairis performed using local criteria instead ofglobal ones (in snapping the threshold isusually fixed for the whole dataset) Thiscomes as a contrast to othermethods wherecare needs to be taken to ensure that the (ge-ometric or topological) validity is kept

Figure 7 shows the standard steps requiredin a repair operation In order to avoid or-der dependency when repairing the repairoperation itself is always performed after allthe choices for which label to assign havebeen made

In particular we propose six different re-pair operations that can be used to fix gaps

8

Table 1 The repair operations currently implemented in our software The types of oper-ations are defined in the map algebra classification [Tomlin 1994]

Repair operation Type Criteria

Triangle by priority list Varies Label that has the highest priority according toa predefined priority list

Triangle by number of Focal The label present in the largest number ofneighbours adjacent faces overlaps includedTriangle by absolute majority Focal Label present in two or more valid

adjacent facesTriangle by longest boundary Focal Label present along the longest portion of

the boundary of the adjacent facesRegions by longest boundary Focal of Label present along the longest portion of

zonal the boundary of the adjacent facesRegions by random neighbour Focal of Random label from the adjacent faces

zonal

and overlaps These are shown in Table 1and all imply re-labelling triangles Fourof them use triangles as a base (ie the la-bel assigned is based only on that of itsthree neighbouring triangles) which is fastand modifies the area of each input poly-gon the least Despite their simplicity theyoffer substantial control over the resultsFor instance the first two operations onlydiffer from each other in their handlingof overlapping faces but triangle by num-ber of neighbours is better for large over-lapping regions while triangle by absolutemajority is better in fixing small problemsHaving well-shaped (Delaunay) triangles ismost useful for triangle-based repair func-tions Two of them use regions of adja-cent triangles with equivalent sets of la-bels which is slower than a triangle-basedmethod but yields results that can be car-tographically more pleasing An interest-ing repair operation for practicioners is theone in which a priority of labels is used iein case of gapsoverlaps the labels in thetriangle (overlap) or in the adjacent poly-gons (gap) are ordered according to a user-defined priority and the highest priority isassigned to the problematic triangles No-tice that these repair operations can be usedone after the other (in a hierarchical man-ner) for instance if first the repair accord-ing to the longest boundary is used but onezone has two or more boundaries with ex-actly the same length then the deadlock can

be solved by choosing one randomly Asample of the results obtained with differ-ent repair operators is also shown in Fig-ure 8

More repair operations basedonextensionsto the idea of labelling trianglesregions canbe further developed For instance tri-angles could be split to subdivide an areawith problems (as in Bader and Weibel[1997]) or sliver trianglesregions could bediscarded (and then filled during repair)

35 Extraction of Polygons From aTriangulation

Starting from a labelled triangulation itis possible to reconstruct the individualpolygons of a planar partition to conformto valid polygons according to the ISO19107 and the Simple Features specifica-tions which allows users to incorporate au-tomatic validation and repair in their work-flow

We do this operation polygon by polygonWe start at anunprocessed triangle and visitall the connected triangles having the samelabel (to reconstruct the polygon) mark-ing the triangles as processed as we visitthem Note that since all these trianglesare connected the outer and inner bound-aries of a polygon are all simple (non self-

9

over

lapp

ing

region

red

blue

(a) A set ofoverlappingpolygons thatneeds to berepaired

red

blue

(b) The poly-gons aretriangulatedand eachtriangle islabelled withthe polygons itbelongs to

A

C

D

B

red

blue

(c) An invalidregion isfound definedas a set ofcontiguouspolygons withan equivalentset of 0 or ge 1113569labels

Triangle Repair

ABCD

redredredred

(d) The neigh-bours of theregion areanalysed to seehow it shouldbe re-labelledand the resultis saved onto alist

red

blue

(e) The tri-angles arerelabelledaccording tothe list

red

blue

(f) The outputpolygonswhich nowhave no gapsor overlaps

Figure 7 The steps in a generic repairoperation

intersecting) We repeat this operation un-til all the triangles have been processed

For each polygon we have to recover notonly its outer boundary but also its in-ner boundaries which are not connectedObserve that we cannot simply follow theoriginal constrained edges as these do nothave any meaning after a planar partitionswas repaired the boundaries of the repairedpolygons are instead formed by edges in-cident to two triangles having different la-bels

For each polygon we walk at a triangle andmove on to triangles having the same label

(a) The original poly-gons

(b) Repaired eachtriangle using thelabel adjacent alongthe longest boundaryfrom the neighbouringtriangles

(c) Repaired each regionusing a random labelfrom the neighbouringtriangles

(d) Repaired each regionusing the label adjacentalong the longest bound-ary from the neighbour-ing triangles

Figure 8 Different repair operations usedin the twopolygons for theArribesdel Duero Natural Park in Spain(reddarker grey) and the Inter-national Douro Natural Park inPortugal (greenlighter grey) Allof them can be considered bestby a certain criterion like pre-serving the area ratio between thetwo polygons (b) smoothness ofthe boundary (d) or a balance be-tween the two (c)

As the process goes a single polyline thatruns along all the boundaries of the poly-gon is generated This involves a depth-firstsearch (clockwise) that recursively reachesuntil the boundary of a polygon return-ing a long chain of edges in a proceduresimilar to following the boundary edge byedge The procedure is shown step by stepin Figure 9 The polyline created with thismethod has ldquobridgesrdquo which allow us tokeep all inner boundaries (holes) connectedwith the outer one in a manner that keepsthe interior connected as well These helpto preserve connectivity and the relationsbetween different (outer and inner) bound-aries but are removed later in the process(to conform to the Simple Features specifi-cation) Also its orientation conveys the in-

10

AB

CD

FE

123

45

67 8 9

1011 12

1314 15 16 17

1819

20

21

22

23

24

25

2627 28

29

3130G

H IJ

K

L

MN

O

Figure 9 The traversal order when navigating through triangles in a clockwise mannerStarting at 119860 and the edge between 119860 and 119861 the operations occur accordingto the encircled numbers Black arrows denote when an unprocessed triangle isfound red arrowswhen it is not Notice how the traversal is performed clockwise(shown in dark blue arrows) for both cases despite starting from different sides

formation of whether a section of it is partof an inner or an outer boundary

This polyline is processed with a stack-based algorithm that generates separateclosed rings for the outer boundary andeach of the inner boundaries collapsing theldquobridgesrdquo that were generated In order todo this the polyline is cut at the placeswhere more than two edges join and theseare joined in the correct order by keepingtrack of (yet) unclosed rings When a newsegment is being processed it can be oneof three options one that completes a ringone that is part of a bridge or one thatis not yet closed Closed rings are storedand bridges are removed while unclosedrings are saved in the stack until they can bepopped to forma closed ring togetherwith anew segment This is shown in Figure 10

4 Implementation andExperiments

41 Implementation

An implementation of the algorithms de-scribed in Section 3 was written in the C++

programming language using external li-braries for some functionality The de-veloped software is called pprepair and isopen source and freely available at httptudelft3dgithubcompprepair C++was selected in order to have plenty of con-trol with regards to low level details andto achieve good performance which makesit possible to compare it with existing so-lutions The libraries used are the OGRSimple Features Library9 (which allows in-put and output from a large variety ofdata formats common in GIS) and CGAL10(which has support for many robust spatialdata structures and the operations based onthem we use its constrained triangulationmodule)

42 Experiments with real-worldplanar partition datasets

We havemade experiments with four freelyavailable real-world datasets ie we havevalidated and automatically repaired themwith our implementation the overview ofthese datasets is shown in Figure 11 and9httpwwwgdalorgogr10Computational Geometry Algorithms Library

httpwwwcgalorg

11

AB

a

b

CD

EFG

HIJK c

de

f

Figure 10 Processing the polyline shown starting from vertex 119886 and moving to-wards polyline 119860 the following operations occur (1) 119860 unclosedrarrpush(2) 119861 unclosedrarrpush (3) 119862 unclosedrarrpush (4) 119863 unclosedrarrpush (5)119864 unclosedrarrpush (6) 119865 closedrarrstore (7) poprarr 119864119866 closedrarrstore (8)poprarr 119863119867 degeneraterarrerase (9) poprarr 119862119868 closedrarrstore (10) poprarr 119861119869degeneraterarrerase (11) poprarr 119860119870 closedrarrstore

(a) E41N27 (b) 4tiles

(c) 16tiles (d) Mexico

Figure 11 Overview of the four datasets

their properties in Table 2 The datasets arethe following

E41N27 Corine 200011 tile E41N27 whichcontains a shifted polygon (by about10 cm) creating many small gaps andoverlaps in the dataset The snapping

11Corine is a land cover dataset for 32 Eu-ropean countries It is freely available athttpwwweeaeuropaeudata-and-mapsdatacorine-land-cover-2000-clc2000-seamless-vector-database

threshold has been set at 1 m

4tiles Corine 2000 tiles E39N32 E39N33E40N32 and E40N33 which are knownto have long and thin overlapping re-gions (lt 1 mm) with each other Thesnapping thresholdhas been set at 1 cm

16tiles 16 adjacent Corine 2000 tilesE39N30 E39N31 E39N32 E39N33E40N30 E40N31 E40N32 E40N33E41N30 E41N31 E41N32 E41N33E42N30 E42N31 E42N32 E42E33Some have gaps between one anothersome overlap but match within a fewcentimetres The snapping thresholdhas been set at 10 cm

Mexico 11 000 000 scale land cover datasetfrom INEGI consisting of over 26 000polygons Interestingly it is mostly al-ready valid (according to the Shapefilespecification) but contains some verylarge polygons with tens of thousandsof vertices

As a comparison we have also tried to per-form the same operations with other avail-able software While the capabilities forplanar partition repair among the softwaretested vary considerably with full topolog-ical repair only available in ArcGIS (usingmanual operations only) it is also impor-tant to consider how different repair im-plementations scale to large datasets For

12

Table 2 Properties of the datasets used for the experiments

pts avg pts polygons pts largest polygon per polygon

E41N27 14 969 496 303 26 740 3374tiles 4 984 365 702 16 961 74716tiles 63 868 6 622 133 95 112 1037Mexico 26 866 4 181 354 117 736 1556

this a few performance tests were made inour implementation and three planar par-tition repair tools that perform this processusing snapping and splitting The testingmethodology for each tool is as follows

ArcGIS In ArcCatalog a multiple featuredataset is created in a Geodatabase setwith XY resolution values equal to thesnapping threshold The features areimported into it and themerge and dis-solve operations are used tomerge adja-cent polygons with the same ID Topol-ogy is then generated to check that theplanar partition is valid Everything isfinally exported to a single ShapefileThe individual parts of the process aretimed and the total is recorded Mem-ory usage is calculated as the differ-ence between the just loaded ArcCata-log application and itsmaximummem-ory usage throughout the process

FME A reader is created for each input filewhich serve as input to a Snapper trans-former features with the same IDs arethen dissolved and finally they are out-put into a new Shapefile writer Thetopology generator is used to be ableto tell whether the result is a valid pla-nar partition Results are timed and themaximum memory allocation of thefmeexe process is recorded

GRASS Input files are imported withvinogr with all polygon cleaning op-erations performed and snapping set tothe correct values Boundaries betweenfeatures with the same IDs are thendissolved using vdissolve Files arethen exported with voutogr Timesreported by GRASS are added togetherto give the total while memory usage

by the vinogrexe vdissolveexeand voutogrexe are monitored andtheir maximum is recorded

pprepair Files are read and put into thetriangulation the triangulation is la-belled the repair is performed withthe longest boundary first (see Table 1)with ambiguous cases resolved with arandom choice Polygons are then ex-tracted from the triangulation and out-put to a single Shapefile The entireprocess is timed and the maximumamount of memory used is recorded

Notice that we are somewhat comparingapples with oranges here since our imple-mentation is able to repair more cases thanother methods keeps topological consis-tency and does not require finding out theappropriate threshold value (if it exists)More importantly it is able to directly statewhether the result is a planar partition un-like the three other solutions (with thesesnapping is performed but that does notguarantee that the output is a valid planarpartition) The results of the experimentsare shown in Table 3

We have made the experiments in orderto have an idea of the processing timeand the memory usage involved Onlycases that are acceptably solved by snappingand splitting have been considered sincethere is no other comparable automatedtopological repair tool among those stud-ied that would also work for more complexcases such as those that require a snappingthreshold too high to be practical (eg hor-izontal conflation of independently gener-ated datasets [Yuan and Tao 1999 Davis2003])

13

Table 3 Planar partition repair comparison using large datasets

pprepair ArcGIS FME GRASS

memory time memory time memory time memory time

E41N27 145 MB 19s 145 MB 1m3s 158 MB 31s 59 MB 3m9s4tiles 116 MB 17s 113 MB 37s 105 MB 31s 49 MB 53s16tiles 145 GB 4m47s crashes ndash 636 MB 15m48s crashes ndashMexico 101 GB 3m31s 216 MB gt1d 264 MB 2m45s 408 MB 11m38s

All tests have been run 5 times in a ma-chine just booted and the results averagedto account for the small variations that oc-curred except in the case where the execu-tion was cancelled after one full day due totime limitations or when it causes the pro-gram to crash The hardware is a 266 GHzCore 2DuoMacBookProwith 4GBofRAMArcGIS 93 FME 2010 SP1 and GRASS 64were run in Windows 7 while pprepair wasrun in Mac OS X 1071

As Table 3 shows our approach uses some-what more memory than other solutionsThis is explained by the extra (uncon-strained) edges that are added to the inputover when triangulating them It shouldhowever be noticed that both ArcGIS andGRASS crashed with the biggest dataset16tiles Our implementation is the fastestof the four tested being for instance aroundthree times faster than FME for the biggestdatasets Only for the Mexico dataset isour implementation slower than FME Thisis (probably) explained by the fact that itspolygons already form a valid planar parti-tion and therefore very few snapping op-erations have been performed by FME theplanar graph of the input was basicallybuilt and then the polygons saved back todisk We believe that ArcGIS and GRASSstruggled with the dataset because it con-tains several very large polygons (withmorethan 10 000 vertices) Our implementa-tion took 3m31s but as Table 4 shows ittook CGAL 3m02s to simply triangulate theinput edges This table also demonstratesthe efficiency of our approach for the fourdatasets around 85 of the time of our ap-proach was usedmdashby the CGAL librarymdashtoread the input fromdisk and triangulate the

Table 4 Timed steps of the planar partitionrepair procedure rounded down tothe nearest second The percentageis the triangulation time over thetotal time

E41N27 4tiles 16tiles Mexico

Triangulate 017 015 400 302Label 001 001 027 016Repair 000 000 000 000Reconstruct 001 001 011 007Output 000 000 010 006

Total 019 017 447 331triangulate 89 88 83 86

input edges

During the implementation we took sev-eral engineering decisions to optimise ourprogram One of them was to favour diskspace over computation time as we wantedto be able to process large datasets This iswhy we always reconstruct polygons even ifthey not modified by the repair process Al-though possible it would require us to keepin memory the original polygons (whichwould significantly increase the memoryconsumption) and to keep track of whichlabels have been modified And as Table 4shows the reconstruction is very efficient asit only takes around 3ndash4 of the total timefor the 16tiles and Mexico datasets

14

5 Discussion andConclusions

We have presented a new method for re-pairing polygonal area partitions ensur-ing that the output partition is valid ieall individual polygons are conforming tothe ISO 19107 [ISO 2003] and Simple Fea-tures [OGC 2006] specifications and nogaps nor overlaps are present between anypair of polygons Thenovelty of ourmethodlies in the fact that repair is performed ac-cording to user defined criteria but thentakes place without any human interven-tion Automatic repair is becoming increas-ingly an important topic due to data in-tegration eg data collected for the differ-ent themes of the European INSPIRE Ini-tiative will finally have to fit together anddata could eventually bematched fully auto-matically by dedicated web service compo-nents [INSPIRE 2009] Our approach couldbe at the heart of such a web service

The proposed approach excels in auto-mated repair at the cost of increased mem-ory usage compared to a pure graph-basedapproachmdashthis difference is mainly causedby the unconstrained edges introduced bythe triangulation While it would be possi-ble to use our repair rules together with a(primaldual) graph-based approach theseadditional edges in the triangulation givefine-grained control over the repair oper-ations and ensures that the graph is con-nected which facilitates the reconstructionof polygons

We have implemented our algorithm overa numerically robust triangulator (CGAL)and since repair operations are expressedsolely in terms of re-labelling of triangles(no geometric computation is involved) theapproach is also fully robust Since duringour experiments most of the time was usedto compute the constrained triangulationanother library could also be tested to im-prove the implementation

For the future we plan to

bull Improve the scalability of the approachand process and repair datasets withmore than 10 million polygons It is

known that using divide-and-conquertechniques triangulation algorithmscan handle big datasets [Amenta et al2003 Blandford et al 2005] We willinvestigate whether it is possible toautomatically repair each divided partindividually and lsquogluersquo the repairedparts together

bull Investigate snap rounding [Hobby1999 de Berg et al 2007] as a pre-processing stepmdashor embeddeddirectlyin the triangulation mdash to guaranteethat repaired planar partitions have novertices that are closer than a certain120576 threshold However snap roundingmay change the topology of the inputbut the output will nevertheless be avalid partition (as the topology willbe repaired) Apart from topologicalchanges snap rounding can also leadto removed polygon parts that are toosmall to be preserved based on thechosen 120576

bull Add more advanced repair operationsto our repair toolkit eg repair couldtake place based on splitting a collec-tion of triangles

bull Extend our work to include the thirddimension to validate and repair 3Dcity models using a constrained tetra-hedralisation [Si 2008] Notice thatthe tetrahedralisation of a given poly-hedron does not always exist and thusextra (Steiner) points might need tobe added The main concepts of ourapproach (re)labelling and reconstruc-tion extend naturally to 3D How-ever appropriate repair operations for3D city models would need to be de-fined and some application-specificconstraints (eg right angles at corners)are not trivial to implement in our cur-rent approach

ReferencesNina Amenta Sunghee Choi and Guumln-ter Rote Incremental constructions conBRIO In Proceedings of the 19th An-nual Symposium on Computational Geome-try pages 211ndash219 ACM Press 2003

15

WaelM Badawy andWalidG Aref On localheuristics to speed up polygon-polygonintersection tests In Proceedings of the 7thACM International Symposium on Advancesin Geographic Information Systems pages97ndash102 ACM 1999

M Bader and R Weibel Detecting andresolving size and proximity conflicts inthe generalization of polygonal maps InProceedings of the 18th International Carto-graphic Conference Stockholm pages 1525ndash1532 1997

Daniel K Blandford Guy E BlellochDavid E Cardoze and Clemens KadowCompact representations of simplicialmeshes in two and three dimensionsInternational Journal of ComputationalGeometry and Applications 15(1)3ndash242005

CGAL CGAL 38 User and Reference ManualCGAL Editorial Board 2011

Kenneth L Clarkson Kurt Mehlhorn andRaimund Seidel Four results on random-ized incremental constructions In AlainFinkel and Matthias Jantzen editors Pro-ceedings of the 9th Annual Symposium onTheoretical Aspects of Computer Science vol-ume 577 of Lecture Notes in Computer Sci-ence pages 461ndash474 SpringerBerlin Hei-delberg 1992

Martin Davis Java conflation suite Tech-nical report Vivid Solutions 2003 Avail-able at httpwwwvividsolutionscomjcs

Mark de Berg Dan Halperin and MarkOvermars An intersection-sensitive al-gorithm for snap rounding Computa-tional Geometry 36(3)159 ndash 165 2007 ISSN0925-7721

Mark de Berg Marc van Kreveld MarkOvermars and Otfried SchwarzkopfComputational Geometry Algorithms andApplications Springer-Verlag 3 edition2008

ESRI Shapefile technical descriptionWhite paper ESRI July 1998

Michael A Facello Implementation of arandomized algorithm for Delaunay and

regular triangulations in three dimen-sions Computer Aided Geometric Design 12(4)349ndash370 1995

GTS GTS Library Reference Manual 2006URL httpgtssourceforgenetreferencebook1html

LeonidasGuibas and Jorge Stolfi Primitivesfor the manipulation of general subdivi-sions and the computation of VoronoiACM Transactions on Graphics 4(2)74ndash1231985

John D Hobby Practical segment intersec-tion with finite precision output Com-putational Geometry 13(4)199 ndash 214 1999ISSN 0925-7721

Christoph M Hoffmann John E Hopcroftand Michael S Karasick Towards imple-menting robust geometric computationsIn Proceedings of the 4th Annual Symposiumon Computational Geometry pages 106ndash117ACM 1988

INSPIRE D25 Generic conceptual modelversion 32 Technical report DraftingTeam ldquoData Specificationsrdquo 2009 URLhttpinspirejrceceuropaeudocumentsData_SpecificationsD25_v32pdf

ISO 19107 Geographic information ndash spa-tial schema March 2003

David Kirkpatrick Jack Snoeyink and Bet-tina Speckmann Kinetic collision detec-tion for simple polygons InternationalJournal of Computational Geometry and Ap-plications 12(1ndash2)3ndash27 2002

Robert Laurini and Franccediloise Milleret-Raffort Topological reorganization ofinconsistent geographical databases Astep towards their certification Comput-ers amp Graphics 18(6)803ndash813 December1994

Hugo Ledoux and Martijn Meijers Val-idation of planar partitions using con-strained triangulations In ProceedingsJoint International Conference on TheoryData Handling and Modelling in GeoSpa-tial Information Science pages 51ndash55 HongKong May 2010

16

Hugo Ledoux Ken Arroyo Ohori and Mar-tijn Meijers Automatically repairing in-valid polygons with a constrained trian-gulation In Proceedings of the 15th AGILEInternational Conference on Geographic In-formation Science 2012

Yuanxin Liu and Jack Snoeyink Farawaypoint A sentinel point forDelaunay com-putation International Journal of Compu-tational Geometry and Applications 18(4)343ndash355 November 2006

Avraham Margalit and Gary D Knott Analgorithm for computing the union in-tersection or difference of two poly-gons Computers amp Graphics 13(2)167ndash183 1989

Ernst P Muumlcke Isaac Saias and Bin-hai Zhu Fast randomized point loca-tion without preprocessing in two- andthree-dimensional Delaunay triangula-tions Computational GeometrymdashTheoryand Applications 1263ndash83 1999

OGC OpenGIS Implementation Specificationfor Geographic Information - Simple FeatureAccess - Part 1 Common Architecture 120edition October 2006

Lutz Pluumlmer and Gerhard Groumlger Achiev-ing integrity in geographic informationsystemsmdashmaps and nested maps GeoIn-formatica 1(4)345ndash367 1997

M Rivero and FR Feito Boolean opera-tions on general planar polygons Com-puters amp Graphics 24(6)881ndash896 Decem-ber 2000

Stefan Schirra Precision and Robustness inGeometric Computations volume Algorith-mic Foundations of Geographic Informa-

tion Systems of Lecture Notes in ComputerScience chapter 9 pages 255ndash287 SpringerBerlin Heidelberg 1997

Michael Ian Shamos and Dan Hoey Geo-metric intersection problems In FOCSpages 208ndash215 IEEE 1976

Jonathan Richard Shewchuk Delaunay Re-finement Mesh Generation PhD thesisSchool of Computer Science CarnegieMellon University Pittsburg USA 1997a

Jonathan Richard Shewchuk Adaptiveprecision floating-point arithmetic andfast robust geometric predicates Dis-crete amp Computational Geometry 18305ndash363 1997b

Hang Si Three Dimensional Boundary Con-forming Delaunay Mesh Generation PhDthesis Technische Universitaumlt Berlin2008

C Dana Tomlin Map algebra One perspec-tive Landscape and Urban Planning 30(1ndash2)3ndash12 October 1994

Peter van OosteromWilko Quak and TheoTijssen About invalid valid and cleanpolygons In Peter F Fisher editor Devel-opments in Spatial Data Handlingmdash11th In-ternational Symposium on Spatial DataHan-dling pages 1ndash16 Springer 2004

Jan W van Roessel A new approach toplane-sweep overlay Topological struc-turing and line-segment classificationCartography and Geographic InformationScience 18(1)49ndash67 January 1991

Shuxin Yuan and Chuang Tao Develop-ment of conflation components Geoin-formatics and Socioinformatics pages 1ndash13June 1999

17

  • Introduction
  • Related Work
    • Simple Features and validity of simple polygons
    • Validation of a Planar Partition Using a Planar Graph
    • Repair Using Point and Edge Snapping and Splitting
    • Repair Using Topological Information
      • Validation and Automatic Repair Using a Constrained Triangulation
        • Triangulation of a polygon and constrained triangulation
        • Labelling the triangles of a Planar Partition
        • Validation
        • Repair Operations
        • Extraction of Polygons From a Triangulation
          • Implementation and Experiments
            • Implementation
            • Experiments with real-world planar partition datasets
              • Discussion and Conclusions
Page 4: ValidationandAutomaticRepairofPlanar ... · mightbesnappedtogether,creatingre-peatedvertices,disjointregions,orvar-ioustopologicalproblems. 4.None of the vertices of a polygon shouldbecloserthanthisthreshold

checking whether any possible pair of poly-gons overlap This is a computationally ex-pensive operation to make (quadratic be-haviour) even when heuristics to speedup the process are used [Badawy and Aref1999 Kirkpatrick et al 2002] Additionallyrobustness issues are significant in polygonintersection tests [Hoffmann et al 1988]Finding the potential gaps in a planar par-tition is even more problematic For thiscomputing the union of the entire set ofpolygons is required which is also compu-tationally expensive [Margalit and Knott1989 Rivero and Feito 2000]

The validation process can be sped up byfirst building a planar graph of the in-put polygons which is afterwards checkedfor consistency It should first be noticedthat while different approaches are avail-able to construct a planar graph [ShamosandHoey 1976 van Roessel 1991] it is stillsometimes difficult especially if the poly-gon contains holes The graphof the bound-ary can then be unconnected and extra ma-chinery is necessary to represent the knowl-edge of holes in the graph structure Thefact that holes are also allowed to touch com-plicates the task of validation even furthersince holes cannot be assumed to form anunconnected planar graph

Based on this graph Pluumlmer and Groumlger[1997] specify a list of minimal mathemat-ical axioms that can be used to check thevalidity of a planar partition no danglingedges no zero-length edges planarity noholes no self-intersections no overlapsand having a connected graph It is impor-tant to note that Pluumlmer and Groumlger basetheir axioms on concepts from graph the-ory but they also highlight the fact that agraph-based approach alone is not enoughthe graph has to be augmented with geo-metrical knowledge (each vertex has geom-etry attached ie the coordinates of pointshave to be stored) Validation is thus under-pinned by both geometrical and topologicalconcepts and systems thus have to deal withthose two concepts at the same time Themethod we propose in this paper mdashusinga constrained triangulationmdash permits us todo exactly this to embedbothgeometry andtopology in the same structure

23 Repair Using Point and EdgeSnapping and Splitting

Themost commonmethod for planar parti-tion repair is based on the assumption thatpolygons approximatelymatch each other attheir common boundaries This impliesthat they should be within a certain dis-tance of each other along those edges Ifadditionally all parts further apart thanthis value are known not to be commonboundaries it is possible to ldquosnaprdquo togetherpolygons that are closer to each other thanthis threshold while keeping the rest un-touched This method of planar partitionrepair is available inmanyGIS packages in-cludingArcGIS4 FME5 GRASS6 and RadiusTopology7

Since thresholds are central to this methodit is of utmost importance to select a goodthreshold value something that is com-pletely different in each dataset For planarpartition repair to be successful using thismethod such a threshold should be cho-sen in a careful manner and always com-ply with a few conditions These have beensummarised as follows

1 Adjacent polygons should not be fur-ther apart than this threshold alongany part of their common boundaries(shown as the maximum threshold inFigure 2(a)) Otherwise gaps cannot befixed

2 Adjacent polygons should not overlapeach other in areas which are fur-ther inwards than this threshold fromtheir common boundaries (shown asthemaximumthreshold inFigure 2(b))Otherwise overlaps cannot be fixed

3 None of the vertices of a polygonshould be closer to each other thanthis threshold including non consec-utive vertices (shown as the minimumthresholds in Figure 2) Otherwise they

4httpwwwesricomsoftwarearcgisindexhtml

5httpwwwsafecomfme6httpgrassosgeoorg7httpwww1spatialcomsoftwareradius_topology

4

might be snapped together creating re-peated vertices disjoint regions or var-ious topological problems

4 None of the vertices of a polygonshould be closer than this thresholdto any non incident edge Otherwisethey might be snapped together creat-ing disjoint regions or various topolog-ical problems

This threshold value is usually manuallydetermined either by trial and error orby analysing certain properties of thedataset(s) involved (eg point spacing pre-cision or map scale) However it is oftenhard to find an optimal threshold for acertain dataset since ensuring that it workswell for every part of a dataset is unrealisticMoreover sometimes such a threshold doesnot even exist (eg because point spacingin some places might be smaller than thewidth of the gaps and overlaps present)

Since the aforementioned conditions arefrequently not met or are not checked be-forehand and it is still necessary to performrepair of a dataset snapping is often per-formed nevertheless possibly creating in-valid polygons andor planar partitions orsignificantly changing the topology of theexisting features Two examples of this phe-nomenon are shown in Figures 3 and 4

(a) Before snapping (b) After snapping

Figure 3 Spikes and punctures can be cre-ated by snapping since the basesof these elongated forms (encir-cled) might be narrower than thethreshold but its length not

While these examples show that snap-ping is not problem-free it is importantto note that commercial GIS packages of-ten implement more complex snappingoptions (such as point-to-edge edge-to-edge or using a reference dataset) These

(a) Before snapping (b) After snapping

Figure 4 Polygons can be split by snappingsince some parts of themmight benarrower than the threshold (en-circled) While this result does notcreate an invalid planar partitionit can change the number of poly-gons present and their topologi-cal relations and can therefore beundesirable

options can help to solve a problematiccase but can also have undesired conse-quences such as changing the topology ofthe polygons Another problem is thatpost-processing operations to clean result-ing polygonsmight be required (eg dispos-ing of polygons with small areas removingredundant lines thresholds for minimumangles etc) which could again create in-valid configurations requiring iterative val-idation or repair processes

24 Repair Using TopologicalInformation

Adifferent approach for planar partition re-pair based on topological information isavailable in some software

GRASS also creates a graph using edges asa base structure instead of triangles andcould be used to (manually) detect overlapsand gaps based on the number of labels us-ing the functions vwhat and dwhatvectHowever there is no simple automated pro-cedure to get the number of labels at a cer-tain point which makes it very cumber-some and time consuming to use GRASS forthis purpose

Meanwhile ArcGIS provides a more com-plete solution using a method similar insome ways to the one developed and de-scribed in Section 3 It involves using the

5

minimumdistancethreshold

maximumdistancethreshold

(a) Gaps between polygons

minimumdistancethreshold

maximumdistancethreshold

(b) Overlapping polygons

Figure 2 Defining a threshold for vertex edge and face snapping The threshold touse should be larger than the largest minimum distance between the matchingboundaries and smaller than the minimum distance between vertices

Geodatabase feature of the software withsome combined validation rules (eg mustnot overlap and must not have gaps) How-ever fixing every problematic area in an ap-propriate manner requires extensive userintervention (see Figure 5) since the bestchoice for each case depends on the specificconfiguration of the error

Since both the aforementioned programsdo not offer an automated process to cor-rectly solve this problem (and GRASS lacksthe ability to visualise problem areas) theyare not really comparable to our solutionA planar partition can easily contain tensof thousands of polygons possibly generat-ing thousands of errors which need to bechecked and repaired semi-automatically

3 Validation andAutomaticRepair Using a ConstrainedTriangulation

Our approach to validation and automaticrepair of planar partitions uses a con-strained triangulation (CT) as a supportingstructure because it has many good proper-ties including the following

bull It is by definition a planar partitionTherefore as long as we keep the in-formation about which polygon eachtriangle belongs to the reconstructedpolygons will be either a valid planarpartition or multiple ones

bull It can be built quickly in 119874(119899 log 119899)with a variety of approaches [Guibasand Stolfi 1985 Muumlcke et al 1999Clarkson et al 1992]8

bull Changes to the triangulation (egadding a new constrained edge) arelocal and therefore fast

bull Constrained edges canusually be addedin constant time being only signif-icantly slower (and more complex)when there is an intersection with anexisting constrained edge [Shewchuk1997b]

bull Implementation-wise several stableand fast triangulation libraries existincluding CGAL [CGAL 2011] Trian-gle [Shewchuk 1997a] and GTS [GTS2006]

The general workflow of our approach toboth validate and repair a planar partitionis as follows

1 the CT of the input segments formingthe polygons is constructed

2 each triangle in the CT is labelled withthe label of the polygon inside which itis located

3 problems are detected by identifyingtriangles having no or multiple labels

8The actual computational complexity can be119874(119899 111361111136141113606 119899+119896) with 119896 being the number of edge-edgeintersections which could conceivably even be 1198991113579However 119896 ≪ 119899 for most GIS datasets

6

(a) Viewing a topology error in ArcGIS (b) Assigning an overlapping region to one of thepolygons involved

Figure 5 Planar partition repair in ArcGIS The user is expected to zoom in to a particularerror analyse the situation (eg by looking at the properties from the surround-ing polygons) and make a decision to assign the problematic region to a certainpolygon More than 11 000 errors were detected in this tile of the Corine dataset

and by verifying the connectivity be-tween triangles

4 repairing of the problems ismade by re-labelling triangles to ensure that eachtriangle has exactly one label

5 extracting the polygons from the trian-gulation (polygons modelled with theSimple Features specification)

As mentioned previously for this workflowwe assume that each input polygon is indi-vidually valid We describe in the followingsection the concepts needed and we give adetailed description of the different steps

31 Triangulation of a polygon andconstrained triangulation

A triangulation subdivides an area intonon-overlapping triangles Using a con-strained triangulation every line segmentthat defines the boundary of a polygon isensured to appear as an edge in the tri-angulation It is known that any polygon(also with holes) can be triangulated with-out adding extra vertices [de Berg et al2008 Shewchuk 1997a] Figure 6 shows anexample

In our approach the triangulation is per-formed by constructing incrementally a CT

Figure 6 (a) A polygon with 4 holes (b)The constrained triangulation ofthe segments of this polygon

of all the segments representing the bound-aries (outer + inner) of each polygon Ifthe set of input polygons forms a planarpartition then each segment will be in-serted twice (except those forming the outerboundary of the set of input polygons) Thisis usually not a problem for triangulation li-braries because they ignore points and seg-ments at the same location (as is the casewith the solution we use see Section 4)When segments are found to intersect theyare split with a new point created at theintersection This is the only situation inwhich the generation of new points is re-quired

Notice that our approach requires only aconstrained triangulation and not one thatfulfils the Delaunay criterion [Shewchuk

7

1997a] However having well-shaped tri-angles is useful for repair purposes anddoes not significantly increase the process-ing time Therefore our implementation ac-tually constructs and uses a constrainedDe-launay triangulation

32 Labelling the triangles of aPlanar Partition

Labelling a triangle means assigning it la-bel(s) for the polygon(s) that it belongs to (iftwo input polygons overlap each triangle inthe overlapping region should have the la-bels of the two polygons)

In our previous work on validation [Ledouxand Meijers 2010] we used the centroid ofa polygon to start the labelling process butthis method is prone to errors (if for in-stance the calculated centroid is outside oron the boundary of the polygon) and doesnot permit us to differentiate gaps fromoverlaps

To solve these problems we store informa-tion about the constrained edges of the CTSince it is known that the input rings areclosed and have a known orientation (ac-cording to the ISO 19107 orientation rules)it is also known on which side of a cer-tain line segment the interior of the poly-gon lies This property is used for robustlabelling of each polygon Triangles adja-cent to the outer ring of the polygons arelabelled first and this is later expandedto triangles further in the interior of thepolygon recursively labelling adjacent un-labelled faces as long as no constrainededges are crossed After this operation alltriangles that are part of any polygon arelabelled with overlapping regions havingmultiply labelled triangles However holesare then indistinguishable from trianglesoutside theplanar partition since bothhavezero labels Therefore a special label is cre-ated for all the triangles outside the planarpartition referred to as the ldquouniverserdquo labelwhich are labelled by recursively labellingadjacent triangles from any triangle knownto lie in the exterior of the planar parti-tion Weexploit the concept of the ldquofar-awaypointrdquo to achieve this [Liu and Snoeyink

2006] which is used by several implemen-tations and is also known as the ldquobig trian-glerdquo [Facello 1995]

33 Validation

If the set of input polygons forms a planarpartition then all the triangles will be la-belled with one and only one label Theproblems are easily detected

bull Gaps are detected by finding triangleswithout any labels

bull Overlaps are detected by finding trian-gles with two or more labels

bull Disjoint regions are detected by iden-tifying regions separated by the ldquouni-verserdquo label This is done by starting ata given triangle and doing a breadth-first search on the dual of the triangu-lation without visiting the triangles la-belled as ldquouniverserdquo If all the trianglescan be reached then no polygons of theplanar partition are disjoint

34 Repair Operations

The greatest benefits of using a labelled tri-angulation for planar partition repair stemfrom the fact that while repair operationsare performed the validity of the planarpartition is always kept together with theintegrity of the data Unlike with snappingvertices are not required to be moved dur-ing the process and unlike snapping repairis performed using local criteria instead ofglobal ones (in snapping the threshold isusually fixed for the whole dataset) Thiscomes as a contrast to othermethods wherecare needs to be taken to ensure that the (ge-ometric or topological) validity is kept

Figure 7 shows the standard steps requiredin a repair operation In order to avoid or-der dependency when repairing the repairoperation itself is always performed after allthe choices for which label to assign havebeen made

In particular we propose six different re-pair operations that can be used to fix gaps

8

Table 1 The repair operations currently implemented in our software The types of oper-ations are defined in the map algebra classification [Tomlin 1994]

Repair operation Type Criteria

Triangle by priority list Varies Label that has the highest priority according toa predefined priority list

Triangle by number of Focal The label present in the largest number ofneighbours adjacent faces overlaps includedTriangle by absolute majority Focal Label present in two or more valid

adjacent facesTriangle by longest boundary Focal Label present along the longest portion of

the boundary of the adjacent facesRegions by longest boundary Focal of Label present along the longest portion of

zonal the boundary of the adjacent facesRegions by random neighbour Focal of Random label from the adjacent faces

zonal

and overlaps These are shown in Table 1and all imply re-labelling triangles Fourof them use triangles as a base (ie the la-bel assigned is based only on that of itsthree neighbouring triangles) which is fastand modifies the area of each input poly-gon the least Despite their simplicity theyoffer substantial control over the resultsFor instance the first two operations onlydiffer from each other in their handlingof overlapping faces but triangle by num-ber of neighbours is better for large over-lapping regions while triangle by absolutemajority is better in fixing small problemsHaving well-shaped (Delaunay) triangles ismost useful for triangle-based repair func-tions Two of them use regions of adja-cent triangles with equivalent sets of la-bels which is slower than a triangle-basedmethod but yields results that can be car-tographically more pleasing An interest-ing repair operation for practicioners is theone in which a priority of labels is used iein case of gapsoverlaps the labels in thetriangle (overlap) or in the adjacent poly-gons (gap) are ordered according to a user-defined priority and the highest priority isassigned to the problematic triangles No-tice that these repair operations can be usedone after the other (in a hierarchical man-ner) for instance if first the repair accord-ing to the longest boundary is used but onezone has two or more boundaries with ex-actly the same length then the deadlock can

be solved by choosing one randomly Asample of the results obtained with differ-ent repair operators is also shown in Fig-ure 8

More repair operations basedonextensionsto the idea of labelling trianglesregions canbe further developed For instance tri-angles could be split to subdivide an areawith problems (as in Bader and Weibel[1997]) or sliver trianglesregions could bediscarded (and then filled during repair)

35 Extraction of Polygons From aTriangulation

Starting from a labelled triangulation itis possible to reconstruct the individualpolygons of a planar partition to conformto valid polygons according to the ISO19107 and the Simple Features specifica-tions which allows users to incorporate au-tomatic validation and repair in their work-flow

We do this operation polygon by polygonWe start at anunprocessed triangle and visitall the connected triangles having the samelabel (to reconstruct the polygon) mark-ing the triangles as processed as we visitthem Note that since all these trianglesare connected the outer and inner bound-aries of a polygon are all simple (non self-

9

over

lapp

ing

region

red

blue

(a) A set ofoverlappingpolygons thatneeds to berepaired

red

blue

(b) The poly-gons aretriangulatedand eachtriangle islabelled withthe polygons itbelongs to

A

C

D

B

red

blue

(c) An invalidregion isfound definedas a set ofcontiguouspolygons withan equivalentset of 0 or ge 1113569labels

Triangle Repair

ABCD

redredredred

(d) The neigh-bours of theregion areanalysed to seehow it shouldbe re-labelledand the resultis saved onto alist

red

blue

(e) The tri-angles arerelabelledaccording tothe list

red

blue

(f) The outputpolygonswhich nowhave no gapsor overlaps

Figure 7 The steps in a generic repairoperation

intersecting) We repeat this operation un-til all the triangles have been processed

For each polygon we have to recover notonly its outer boundary but also its in-ner boundaries which are not connectedObserve that we cannot simply follow theoriginal constrained edges as these do nothave any meaning after a planar partitionswas repaired the boundaries of the repairedpolygons are instead formed by edges in-cident to two triangles having different la-bels

For each polygon we walk at a triangle andmove on to triangles having the same label

(a) The original poly-gons

(b) Repaired eachtriangle using thelabel adjacent alongthe longest boundaryfrom the neighbouringtriangles

(c) Repaired each regionusing a random labelfrom the neighbouringtriangles

(d) Repaired each regionusing the label adjacentalong the longest bound-ary from the neighbour-ing triangles

Figure 8 Different repair operations usedin the twopolygons for theArribesdel Duero Natural Park in Spain(reddarker grey) and the Inter-national Douro Natural Park inPortugal (greenlighter grey) Allof them can be considered bestby a certain criterion like pre-serving the area ratio between thetwo polygons (b) smoothness ofthe boundary (d) or a balance be-tween the two (c)

As the process goes a single polyline thatruns along all the boundaries of the poly-gon is generated This involves a depth-firstsearch (clockwise) that recursively reachesuntil the boundary of a polygon return-ing a long chain of edges in a proceduresimilar to following the boundary edge byedge The procedure is shown step by stepin Figure 9 The polyline created with thismethod has ldquobridgesrdquo which allow us tokeep all inner boundaries (holes) connectedwith the outer one in a manner that keepsthe interior connected as well These helpto preserve connectivity and the relationsbetween different (outer and inner) bound-aries but are removed later in the process(to conform to the Simple Features specifi-cation) Also its orientation conveys the in-

10

AB

CD

FE

123

45

67 8 9

1011 12

1314 15 16 17

1819

20

21

22

23

24

25

2627 28

29

3130G

H IJ

K

L

MN

O

Figure 9 The traversal order when navigating through triangles in a clockwise mannerStarting at 119860 and the edge between 119860 and 119861 the operations occur accordingto the encircled numbers Black arrows denote when an unprocessed triangle isfound red arrowswhen it is not Notice how the traversal is performed clockwise(shown in dark blue arrows) for both cases despite starting from different sides

formation of whether a section of it is partof an inner or an outer boundary

This polyline is processed with a stack-based algorithm that generates separateclosed rings for the outer boundary andeach of the inner boundaries collapsing theldquobridgesrdquo that were generated In order todo this the polyline is cut at the placeswhere more than two edges join and theseare joined in the correct order by keepingtrack of (yet) unclosed rings When a newsegment is being processed it can be oneof three options one that completes a ringone that is part of a bridge or one thatis not yet closed Closed rings are storedand bridges are removed while unclosedrings are saved in the stack until they can bepopped to forma closed ring togetherwith anew segment This is shown in Figure 10

4 Implementation andExperiments

41 Implementation

An implementation of the algorithms de-scribed in Section 3 was written in the C++

programming language using external li-braries for some functionality The de-veloped software is called pprepair and isopen source and freely available at httptudelft3dgithubcompprepair C++was selected in order to have plenty of con-trol with regards to low level details andto achieve good performance which makesit possible to compare it with existing so-lutions The libraries used are the OGRSimple Features Library9 (which allows in-put and output from a large variety ofdata formats common in GIS) and CGAL10(which has support for many robust spatialdata structures and the operations based onthem we use its constrained triangulationmodule)

42 Experiments with real-worldplanar partition datasets

We havemade experiments with four freelyavailable real-world datasets ie we havevalidated and automatically repaired themwith our implementation the overview ofthese datasets is shown in Figure 11 and9httpwwwgdalorgogr10Computational Geometry Algorithms Library

httpwwwcgalorg

11

AB

a

b

CD

EFG

HIJK c

de

f

Figure 10 Processing the polyline shown starting from vertex 119886 and moving to-wards polyline 119860 the following operations occur (1) 119860 unclosedrarrpush(2) 119861 unclosedrarrpush (3) 119862 unclosedrarrpush (4) 119863 unclosedrarrpush (5)119864 unclosedrarrpush (6) 119865 closedrarrstore (7) poprarr 119864119866 closedrarrstore (8)poprarr 119863119867 degeneraterarrerase (9) poprarr 119862119868 closedrarrstore (10) poprarr 119861119869degeneraterarrerase (11) poprarr 119860119870 closedrarrstore

(a) E41N27 (b) 4tiles

(c) 16tiles (d) Mexico

Figure 11 Overview of the four datasets

their properties in Table 2 The datasets arethe following

E41N27 Corine 200011 tile E41N27 whichcontains a shifted polygon (by about10 cm) creating many small gaps andoverlaps in the dataset The snapping

11Corine is a land cover dataset for 32 Eu-ropean countries It is freely available athttpwwweeaeuropaeudata-and-mapsdatacorine-land-cover-2000-clc2000-seamless-vector-database

threshold has been set at 1 m

4tiles Corine 2000 tiles E39N32 E39N33E40N32 and E40N33 which are knownto have long and thin overlapping re-gions (lt 1 mm) with each other Thesnapping thresholdhas been set at 1 cm

16tiles 16 adjacent Corine 2000 tilesE39N30 E39N31 E39N32 E39N33E40N30 E40N31 E40N32 E40N33E41N30 E41N31 E41N32 E41N33E42N30 E42N31 E42N32 E42E33Some have gaps between one anothersome overlap but match within a fewcentimetres The snapping thresholdhas been set at 10 cm

Mexico 11 000 000 scale land cover datasetfrom INEGI consisting of over 26 000polygons Interestingly it is mostly al-ready valid (according to the Shapefilespecification) but contains some verylarge polygons with tens of thousandsof vertices

As a comparison we have also tried to per-form the same operations with other avail-able software While the capabilities forplanar partition repair among the softwaretested vary considerably with full topolog-ical repair only available in ArcGIS (usingmanual operations only) it is also impor-tant to consider how different repair im-plementations scale to large datasets For

12

Table 2 Properties of the datasets used for the experiments

pts avg pts polygons pts largest polygon per polygon

E41N27 14 969 496 303 26 740 3374tiles 4 984 365 702 16 961 74716tiles 63 868 6 622 133 95 112 1037Mexico 26 866 4 181 354 117 736 1556

this a few performance tests were made inour implementation and three planar par-tition repair tools that perform this processusing snapping and splitting The testingmethodology for each tool is as follows

ArcGIS In ArcCatalog a multiple featuredataset is created in a Geodatabase setwith XY resolution values equal to thesnapping threshold The features areimported into it and themerge and dis-solve operations are used tomerge adja-cent polygons with the same ID Topol-ogy is then generated to check that theplanar partition is valid Everything isfinally exported to a single ShapefileThe individual parts of the process aretimed and the total is recorded Mem-ory usage is calculated as the differ-ence between the just loaded ArcCata-log application and itsmaximummem-ory usage throughout the process

FME A reader is created for each input filewhich serve as input to a Snapper trans-former features with the same IDs arethen dissolved and finally they are out-put into a new Shapefile writer Thetopology generator is used to be ableto tell whether the result is a valid pla-nar partition Results are timed and themaximum memory allocation of thefmeexe process is recorded

GRASS Input files are imported withvinogr with all polygon cleaning op-erations performed and snapping set tothe correct values Boundaries betweenfeatures with the same IDs are thendissolved using vdissolve Files arethen exported with voutogr Timesreported by GRASS are added togetherto give the total while memory usage

by the vinogrexe vdissolveexeand voutogrexe are monitored andtheir maximum is recorded

pprepair Files are read and put into thetriangulation the triangulation is la-belled the repair is performed withthe longest boundary first (see Table 1)with ambiguous cases resolved with arandom choice Polygons are then ex-tracted from the triangulation and out-put to a single Shapefile The entireprocess is timed and the maximumamount of memory used is recorded

Notice that we are somewhat comparingapples with oranges here since our imple-mentation is able to repair more cases thanother methods keeps topological consis-tency and does not require finding out theappropriate threshold value (if it exists)More importantly it is able to directly statewhether the result is a planar partition un-like the three other solutions (with thesesnapping is performed but that does notguarantee that the output is a valid planarpartition) The results of the experimentsare shown in Table 3

We have made the experiments in orderto have an idea of the processing timeand the memory usage involved Onlycases that are acceptably solved by snappingand splitting have been considered sincethere is no other comparable automatedtopological repair tool among those stud-ied that would also work for more complexcases such as those that require a snappingthreshold too high to be practical (eg hor-izontal conflation of independently gener-ated datasets [Yuan and Tao 1999 Davis2003])

13

Table 3 Planar partition repair comparison using large datasets

pprepair ArcGIS FME GRASS

memory time memory time memory time memory time

E41N27 145 MB 19s 145 MB 1m3s 158 MB 31s 59 MB 3m9s4tiles 116 MB 17s 113 MB 37s 105 MB 31s 49 MB 53s16tiles 145 GB 4m47s crashes ndash 636 MB 15m48s crashes ndashMexico 101 GB 3m31s 216 MB gt1d 264 MB 2m45s 408 MB 11m38s

All tests have been run 5 times in a ma-chine just booted and the results averagedto account for the small variations that oc-curred except in the case where the execu-tion was cancelled after one full day due totime limitations or when it causes the pro-gram to crash The hardware is a 266 GHzCore 2DuoMacBookProwith 4GBofRAMArcGIS 93 FME 2010 SP1 and GRASS 64were run in Windows 7 while pprepair wasrun in Mac OS X 1071

As Table 3 shows our approach uses some-what more memory than other solutionsThis is explained by the extra (uncon-strained) edges that are added to the inputover when triangulating them It shouldhowever be noticed that both ArcGIS andGRASS crashed with the biggest dataset16tiles Our implementation is the fastestof the four tested being for instance aroundthree times faster than FME for the biggestdatasets Only for the Mexico dataset isour implementation slower than FME Thisis (probably) explained by the fact that itspolygons already form a valid planar parti-tion and therefore very few snapping op-erations have been performed by FME theplanar graph of the input was basicallybuilt and then the polygons saved back todisk We believe that ArcGIS and GRASSstruggled with the dataset because it con-tains several very large polygons (withmorethan 10 000 vertices) Our implementa-tion took 3m31s but as Table 4 shows ittook CGAL 3m02s to simply triangulate theinput edges This table also demonstratesthe efficiency of our approach for the fourdatasets around 85 of the time of our ap-proach was usedmdashby the CGAL librarymdashtoread the input fromdisk and triangulate the

Table 4 Timed steps of the planar partitionrepair procedure rounded down tothe nearest second The percentageis the triangulation time over thetotal time

E41N27 4tiles 16tiles Mexico

Triangulate 017 015 400 302Label 001 001 027 016Repair 000 000 000 000Reconstruct 001 001 011 007Output 000 000 010 006

Total 019 017 447 331triangulate 89 88 83 86

input edges

During the implementation we took sev-eral engineering decisions to optimise ourprogram One of them was to favour diskspace over computation time as we wantedto be able to process large datasets This iswhy we always reconstruct polygons even ifthey not modified by the repair process Al-though possible it would require us to keepin memory the original polygons (whichwould significantly increase the memoryconsumption) and to keep track of whichlabels have been modified And as Table 4shows the reconstruction is very efficient asit only takes around 3ndash4 of the total timefor the 16tiles and Mexico datasets

14

5 Discussion andConclusions

We have presented a new method for re-pairing polygonal area partitions ensur-ing that the output partition is valid ieall individual polygons are conforming tothe ISO 19107 [ISO 2003] and Simple Fea-tures [OGC 2006] specifications and nogaps nor overlaps are present between anypair of polygons Thenovelty of ourmethodlies in the fact that repair is performed ac-cording to user defined criteria but thentakes place without any human interven-tion Automatic repair is becoming increas-ingly an important topic due to data in-tegration eg data collected for the differ-ent themes of the European INSPIRE Ini-tiative will finally have to fit together anddata could eventually bematched fully auto-matically by dedicated web service compo-nents [INSPIRE 2009] Our approach couldbe at the heart of such a web service

The proposed approach excels in auto-mated repair at the cost of increased mem-ory usage compared to a pure graph-basedapproachmdashthis difference is mainly causedby the unconstrained edges introduced bythe triangulation While it would be possi-ble to use our repair rules together with a(primaldual) graph-based approach theseadditional edges in the triangulation givefine-grained control over the repair oper-ations and ensures that the graph is con-nected which facilitates the reconstructionof polygons

We have implemented our algorithm overa numerically robust triangulator (CGAL)and since repair operations are expressedsolely in terms of re-labelling of triangles(no geometric computation is involved) theapproach is also fully robust Since duringour experiments most of the time was usedto compute the constrained triangulationanother library could also be tested to im-prove the implementation

For the future we plan to

bull Improve the scalability of the approachand process and repair datasets withmore than 10 million polygons It is

known that using divide-and-conquertechniques triangulation algorithmscan handle big datasets [Amenta et al2003 Blandford et al 2005] We willinvestigate whether it is possible toautomatically repair each divided partindividually and lsquogluersquo the repairedparts together

bull Investigate snap rounding [Hobby1999 de Berg et al 2007] as a pre-processing stepmdashor embeddeddirectlyin the triangulation mdash to guaranteethat repaired planar partitions have novertices that are closer than a certain120576 threshold However snap roundingmay change the topology of the inputbut the output will nevertheless be avalid partition (as the topology willbe repaired) Apart from topologicalchanges snap rounding can also leadto removed polygon parts that are toosmall to be preserved based on thechosen 120576

bull Add more advanced repair operationsto our repair toolkit eg repair couldtake place based on splitting a collec-tion of triangles

bull Extend our work to include the thirddimension to validate and repair 3Dcity models using a constrained tetra-hedralisation [Si 2008] Notice thatthe tetrahedralisation of a given poly-hedron does not always exist and thusextra (Steiner) points might need tobe added The main concepts of ourapproach (re)labelling and reconstruc-tion extend naturally to 3D How-ever appropriate repair operations for3D city models would need to be de-fined and some application-specificconstraints (eg right angles at corners)are not trivial to implement in our cur-rent approach

ReferencesNina Amenta Sunghee Choi and Guumln-ter Rote Incremental constructions conBRIO In Proceedings of the 19th An-nual Symposium on Computational Geome-try pages 211ndash219 ACM Press 2003

15

WaelM Badawy andWalidG Aref On localheuristics to speed up polygon-polygonintersection tests In Proceedings of the 7thACM International Symposium on Advancesin Geographic Information Systems pages97ndash102 ACM 1999

M Bader and R Weibel Detecting andresolving size and proximity conflicts inthe generalization of polygonal maps InProceedings of the 18th International Carto-graphic Conference Stockholm pages 1525ndash1532 1997

Daniel K Blandford Guy E BlellochDavid E Cardoze and Clemens KadowCompact representations of simplicialmeshes in two and three dimensionsInternational Journal of ComputationalGeometry and Applications 15(1)3ndash242005

CGAL CGAL 38 User and Reference ManualCGAL Editorial Board 2011

Kenneth L Clarkson Kurt Mehlhorn andRaimund Seidel Four results on random-ized incremental constructions In AlainFinkel and Matthias Jantzen editors Pro-ceedings of the 9th Annual Symposium onTheoretical Aspects of Computer Science vol-ume 577 of Lecture Notes in Computer Sci-ence pages 461ndash474 SpringerBerlin Hei-delberg 1992

Martin Davis Java conflation suite Tech-nical report Vivid Solutions 2003 Avail-able at httpwwwvividsolutionscomjcs

Mark de Berg Dan Halperin and MarkOvermars An intersection-sensitive al-gorithm for snap rounding Computa-tional Geometry 36(3)159 ndash 165 2007 ISSN0925-7721

Mark de Berg Marc van Kreveld MarkOvermars and Otfried SchwarzkopfComputational Geometry Algorithms andApplications Springer-Verlag 3 edition2008

ESRI Shapefile technical descriptionWhite paper ESRI July 1998

Michael A Facello Implementation of arandomized algorithm for Delaunay and

regular triangulations in three dimen-sions Computer Aided Geometric Design 12(4)349ndash370 1995

GTS GTS Library Reference Manual 2006URL httpgtssourceforgenetreferencebook1html

LeonidasGuibas and Jorge Stolfi Primitivesfor the manipulation of general subdivi-sions and the computation of VoronoiACM Transactions on Graphics 4(2)74ndash1231985

John D Hobby Practical segment intersec-tion with finite precision output Com-putational Geometry 13(4)199 ndash 214 1999ISSN 0925-7721

Christoph M Hoffmann John E Hopcroftand Michael S Karasick Towards imple-menting robust geometric computationsIn Proceedings of the 4th Annual Symposiumon Computational Geometry pages 106ndash117ACM 1988

INSPIRE D25 Generic conceptual modelversion 32 Technical report DraftingTeam ldquoData Specificationsrdquo 2009 URLhttpinspirejrceceuropaeudocumentsData_SpecificationsD25_v32pdf

ISO 19107 Geographic information ndash spa-tial schema March 2003

David Kirkpatrick Jack Snoeyink and Bet-tina Speckmann Kinetic collision detec-tion for simple polygons InternationalJournal of Computational Geometry and Ap-plications 12(1ndash2)3ndash27 2002

Robert Laurini and Franccediloise Milleret-Raffort Topological reorganization ofinconsistent geographical databases Astep towards their certification Comput-ers amp Graphics 18(6)803ndash813 December1994

Hugo Ledoux and Martijn Meijers Val-idation of planar partitions using con-strained triangulations In ProceedingsJoint International Conference on TheoryData Handling and Modelling in GeoSpa-tial Information Science pages 51ndash55 HongKong May 2010

16

Hugo Ledoux Ken Arroyo Ohori and Mar-tijn Meijers Automatically repairing in-valid polygons with a constrained trian-gulation In Proceedings of the 15th AGILEInternational Conference on Geographic In-formation Science 2012

Yuanxin Liu and Jack Snoeyink Farawaypoint A sentinel point forDelaunay com-putation International Journal of Compu-tational Geometry and Applications 18(4)343ndash355 November 2006

Avraham Margalit and Gary D Knott Analgorithm for computing the union in-tersection or difference of two poly-gons Computers amp Graphics 13(2)167ndash183 1989

Ernst P Muumlcke Isaac Saias and Bin-hai Zhu Fast randomized point loca-tion without preprocessing in two- andthree-dimensional Delaunay triangula-tions Computational GeometrymdashTheoryand Applications 1263ndash83 1999

OGC OpenGIS Implementation Specificationfor Geographic Information - Simple FeatureAccess - Part 1 Common Architecture 120edition October 2006

Lutz Pluumlmer and Gerhard Groumlger Achiev-ing integrity in geographic informationsystemsmdashmaps and nested maps GeoIn-formatica 1(4)345ndash367 1997

M Rivero and FR Feito Boolean opera-tions on general planar polygons Com-puters amp Graphics 24(6)881ndash896 Decem-ber 2000

Stefan Schirra Precision and Robustness inGeometric Computations volume Algorith-mic Foundations of Geographic Informa-

tion Systems of Lecture Notes in ComputerScience chapter 9 pages 255ndash287 SpringerBerlin Heidelberg 1997

Michael Ian Shamos and Dan Hoey Geo-metric intersection problems In FOCSpages 208ndash215 IEEE 1976

Jonathan Richard Shewchuk Delaunay Re-finement Mesh Generation PhD thesisSchool of Computer Science CarnegieMellon University Pittsburg USA 1997a

Jonathan Richard Shewchuk Adaptiveprecision floating-point arithmetic andfast robust geometric predicates Dis-crete amp Computational Geometry 18305ndash363 1997b

Hang Si Three Dimensional Boundary Con-forming Delaunay Mesh Generation PhDthesis Technische Universitaumlt Berlin2008

C Dana Tomlin Map algebra One perspec-tive Landscape and Urban Planning 30(1ndash2)3ndash12 October 1994

Peter van OosteromWilko Quak and TheoTijssen About invalid valid and cleanpolygons In Peter F Fisher editor Devel-opments in Spatial Data Handlingmdash11th In-ternational Symposium on Spatial DataHan-dling pages 1ndash16 Springer 2004

Jan W van Roessel A new approach toplane-sweep overlay Topological struc-turing and line-segment classificationCartography and Geographic InformationScience 18(1)49ndash67 January 1991

Shuxin Yuan and Chuang Tao Develop-ment of conflation components Geoin-formatics and Socioinformatics pages 1ndash13June 1999

17

  • Introduction
  • Related Work
    • Simple Features and validity of simple polygons
    • Validation of a Planar Partition Using a Planar Graph
    • Repair Using Point and Edge Snapping and Splitting
    • Repair Using Topological Information
      • Validation and Automatic Repair Using a Constrained Triangulation
        • Triangulation of a polygon and constrained triangulation
        • Labelling the triangles of a Planar Partition
        • Validation
        • Repair Operations
        • Extraction of Polygons From a Triangulation
          • Implementation and Experiments
            • Implementation
            • Experiments with real-world planar partition datasets
              • Discussion and Conclusions
Page 5: ValidationandAutomaticRepairofPlanar ... · mightbesnappedtogether,creatingre-peatedvertices,disjointregions,orvar-ioustopologicalproblems. 4.None of the vertices of a polygon shouldbecloserthanthisthreshold

might be snapped together creating re-peated vertices disjoint regions or var-ious topological problems

4 None of the vertices of a polygonshould be closer than this thresholdto any non incident edge Otherwisethey might be snapped together creat-ing disjoint regions or various topolog-ical problems

This threshold value is usually manuallydetermined either by trial and error orby analysing certain properties of thedataset(s) involved (eg point spacing pre-cision or map scale) However it is oftenhard to find an optimal threshold for acertain dataset since ensuring that it workswell for every part of a dataset is unrealisticMoreover sometimes such a threshold doesnot even exist (eg because point spacingin some places might be smaller than thewidth of the gaps and overlaps present)

Since the aforementioned conditions arefrequently not met or are not checked be-forehand and it is still necessary to performrepair of a dataset snapping is often per-formed nevertheless possibly creating in-valid polygons andor planar partitions orsignificantly changing the topology of theexisting features Two examples of this phe-nomenon are shown in Figures 3 and 4

(a) Before snapping (b) After snapping

Figure 3 Spikes and punctures can be cre-ated by snapping since the basesof these elongated forms (encir-cled) might be narrower than thethreshold but its length not

While these examples show that snap-ping is not problem-free it is importantto note that commercial GIS packages of-ten implement more complex snappingoptions (such as point-to-edge edge-to-edge or using a reference dataset) These

(a) Before snapping (b) After snapping

Figure 4 Polygons can be split by snappingsince some parts of themmight benarrower than the threshold (en-circled) While this result does notcreate an invalid planar partitionit can change the number of poly-gons present and their topologi-cal relations and can therefore beundesirable

options can help to solve a problematiccase but can also have undesired conse-quences such as changing the topology ofthe polygons Another problem is thatpost-processing operations to clean result-ing polygonsmight be required (eg dispos-ing of polygons with small areas removingredundant lines thresholds for minimumangles etc) which could again create in-valid configurations requiring iterative val-idation or repair processes

24 Repair Using TopologicalInformation

Adifferent approach for planar partition re-pair based on topological information isavailable in some software

GRASS also creates a graph using edges asa base structure instead of triangles andcould be used to (manually) detect overlapsand gaps based on the number of labels us-ing the functions vwhat and dwhatvectHowever there is no simple automated pro-cedure to get the number of labels at a cer-tain point which makes it very cumber-some and time consuming to use GRASS forthis purpose

Meanwhile ArcGIS provides a more com-plete solution using a method similar insome ways to the one developed and de-scribed in Section 3 It involves using the

5

minimumdistancethreshold

maximumdistancethreshold

(a) Gaps between polygons

minimumdistancethreshold

maximumdistancethreshold

(b) Overlapping polygons

Figure 2 Defining a threshold for vertex edge and face snapping The threshold touse should be larger than the largest minimum distance between the matchingboundaries and smaller than the minimum distance between vertices

Geodatabase feature of the software withsome combined validation rules (eg mustnot overlap and must not have gaps) How-ever fixing every problematic area in an ap-propriate manner requires extensive userintervention (see Figure 5) since the bestchoice for each case depends on the specificconfiguration of the error

Since both the aforementioned programsdo not offer an automated process to cor-rectly solve this problem (and GRASS lacksthe ability to visualise problem areas) theyare not really comparable to our solutionA planar partition can easily contain tensof thousands of polygons possibly generat-ing thousands of errors which need to bechecked and repaired semi-automatically

3 Validation andAutomaticRepair Using a ConstrainedTriangulation

Our approach to validation and automaticrepair of planar partitions uses a con-strained triangulation (CT) as a supportingstructure because it has many good proper-ties including the following

bull It is by definition a planar partitionTherefore as long as we keep the in-formation about which polygon eachtriangle belongs to the reconstructedpolygons will be either a valid planarpartition or multiple ones

bull It can be built quickly in 119874(119899 log 119899)with a variety of approaches [Guibasand Stolfi 1985 Muumlcke et al 1999Clarkson et al 1992]8

bull Changes to the triangulation (egadding a new constrained edge) arelocal and therefore fast

bull Constrained edges canusually be addedin constant time being only signif-icantly slower (and more complex)when there is an intersection with anexisting constrained edge [Shewchuk1997b]

bull Implementation-wise several stableand fast triangulation libraries existincluding CGAL [CGAL 2011] Trian-gle [Shewchuk 1997a] and GTS [GTS2006]

The general workflow of our approach toboth validate and repair a planar partitionis as follows

1 the CT of the input segments formingthe polygons is constructed

2 each triangle in the CT is labelled withthe label of the polygon inside which itis located

3 problems are detected by identifyingtriangles having no or multiple labels

8The actual computational complexity can be119874(119899 111361111136141113606 119899+119896) with 119896 being the number of edge-edgeintersections which could conceivably even be 1198991113579However 119896 ≪ 119899 for most GIS datasets

6

(a) Viewing a topology error in ArcGIS (b) Assigning an overlapping region to one of thepolygons involved

Figure 5 Planar partition repair in ArcGIS The user is expected to zoom in to a particularerror analyse the situation (eg by looking at the properties from the surround-ing polygons) and make a decision to assign the problematic region to a certainpolygon More than 11 000 errors were detected in this tile of the Corine dataset

and by verifying the connectivity be-tween triangles

4 repairing of the problems ismade by re-labelling triangles to ensure that eachtriangle has exactly one label

5 extracting the polygons from the trian-gulation (polygons modelled with theSimple Features specification)

As mentioned previously for this workflowwe assume that each input polygon is indi-vidually valid We describe in the followingsection the concepts needed and we give adetailed description of the different steps

31 Triangulation of a polygon andconstrained triangulation

A triangulation subdivides an area intonon-overlapping triangles Using a con-strained triangulation every line segmentthat defines the boundary of a polygon isensured to appear as an edge in the tri-angulation It is known that any polygon(also with holes) can be triangulated with-out adding extra vertices [de Berg et al2008 Shewchuk 1997a] Figure 6 shows anexample

In our approach the triangulation is per-formed by constructing incrementally a CT

Figure 6 (a) A polygon with 4 holes (b)The constrained triangulation ofthe segments of this polygon

of all the segments representing the bound-aries (outer + inner) of each polygon Ifthe set of input polygons forms a planarpartition then each segment will be in-serted twice (except those forming the outerboundary of the set of input polygons) Thisis usually not a problem for triangulation li-braries because they ignore points and seg-ments at the same location (as is the casewith the solution we use see Section 4)When segments are found to intersect theyare split with a new point created at theintersection This is the only situation inwhich the generation of new points is re-quired

Notice that our approach requires only aconstrained triangulation and not one thatfulfils the Delaunay criterion [Shewchuk

7

1997a] However having well-shaped tri-angles is useful for repair purposes anddoes not significantly increase the process-ing time Therefore our implementation ac-tually constructs and uses a constrainedDe-launay triangulation

32 Labelling the triangles of aPlanar Partition

Labelling a triangle means assigning it la-bel(s) for the polygon(s) that it belongs to (iftwo input polygons overlap each triangle inthe overlapping region should have the la-bels of the two polygons)

In our previous work on validation [Ledouxand Meijers 2010] we used the centroid ofa polygon to start the labelling process butthis method is prone to errors (if for in-stance the calculated centroid is outside oron the boundary of the polygon) and doesnot permit us to differentiate gaps fromoverlaps

To solve these problems we store informa-tion about the constrained edges of the CTSince it is known that the input rings areclosed and have a known orientation (ac-cording to the ISO 19107 orientation rules)it is also known on which side of a cer-tain line segment the interior of the poly-gon lies This property is used for robustlabelling of each polygon Triangles adja-cent to the outer ring of the polygons arelabelled first and this is later expandedto triangles further in the interior of thepolygon recursively labelling adjacent un-labelled faces as long as no constrainededges are crossed After this operation alltriangles that are part of any polygon arelabelled with overlapping regions havingmultiply labelled triangles However holesare then indistinguishable from trianglesoutside theplanar partition since bothhavezero labels Therefore a special label is cre-ated for all the triangles outside the planarpartition referred to as the ldquouniverserdquo labelwhich are labelled by recursively labellingadjacent triangles from any triangle knownto lie in the exterior of the planar parti-tion Weexploit the concept of the ldquofar-awaypointrdquo to achieve this [Liu and Snoeyink

2006] which is used by several implemen-tations and is also known as the ldquobig trian-glerdquo [Facello 1995]

33 Validation

If the set of input polygons forms a planarpartition then all the triangles will be la-belled with one and only one label Theproblems are easily detected

bull Gaps are detected by finding triangleswithout any labels

bull Overlaps are detected by finding trian-gles with two or more labels

bull Disjoint regions are detected by iden-tifying regions separated by the ldquouni-verserdquo label This is done by starting ata given triangle and doing a breadth-first search on the dual of the triangu-lation without visiting the triangles la-belled as ldquouniverserdquo If all the trianglescan be reached then no polygons of theplanar partition are disjoint

34 Repair Operations

The greatest benefits of using a labelled tri-angulation for planar partition repair stemfrom the fact that while repair operationsare performed the validity of the planarpartition is always kept together with theintegrity of the data Unlike with snappingvertices are not required to be moved dur-ing the process and unlike snapping repairis performed using local criteria instead ofglobal ones (in snapping the threshold isusually fixed for the whole dataset) Thiscomes as a contrast to othermethods wherecare needs to be taken to ensure that the (ge-ometric or topological) validity is kept

Figure 7 shows the standard steps requiredin a repair operation In order to avoid or-der dependency when repairing the repairoperation itself is always performed after allthe choices for which label to assign havebeen made

In particular we propose six different re-pair operations that can be used to fix gaps

8

Table 1 The repair operations currently implemented in our software The types of oper-ations are defined in the map algebra classification [Tomlin 1994]

Repair operation Type Criteria

Triangle by priority list Varies Label that has the highest priority according toa predefined priority list

Triangle by number of Focal The label present in the largest number ofneighbours adjacent faces overlaps includedTriangle by absolute majority Focal Label present in two or more valid

adjacent facesTriangle by longest boundary Focal Label present along the longest portion of

the boundary of the adjacent facesRegions by longest boundary Focal of Label present along the longest portion of

zonal the boundary of the adjacent facesRegions by random neighbour Focal of Random label from the adjacent faces

zonal

and overlaps These are shown in Table 1and all imply re-labelling triangles Fourof them use triangles as a base (ie the la-bel assigned is based only on that of itsthree neighbouring triangles) which is fastand modifies the area of each input poly-gon the least Despite their simplicity theyoffer substantial control over the resultsFor instance the first two operations onlydiffer from each other in their handlingof overlapping faces but triangle by num-ber of neighbours is better for large over-lapping regions while triangle by absolutemajority is better in fixing small problemsHaving well-shaped (Delaunay) triangles ismost useful for triangle-based repair func-tions Two of them use regions of adja-cent triangles with equivalent sets of la-bels which is slower than a triangle-basedmethod but yields results that can be car-tographically more pleasing An interest-ing repair operation for practicioners is theone in which a priority of labels is used iein case of gapsoverlaps the labels in thetriangle (overlap) or in the adjacent poly-gons (gap) are ordered according to a user-defined priority and the highest priority isassigned to the problematic triangles No-tice that these repair operations can be usedone after the other (in a hierarchical man-ner) for instance if first the repair accord-ing to the longest boundary is used but onezone has two or more boundaries with ex-actly the same length then the deadlock can

be solved by choosing one randomly Asample of the results obtained with differ-ent repair operators is also shown in Fig-ure 8

More repair operations basedonextensionsto the idea of labelling trianglesregions canbe further developed For instance tri-angles could be split to subdivide an areawith problems (as in Bader and Weibel[1997]) or sliver trianglesregions could bediscarded (and then filled during repair)

35 Extraction of Polygons From aTriangulation

Starting from a labelled triangulation itis possible to reconstruct the individualpolygons of a planar partition to conformto valid polygons according to the ISO19107 and the Simple Features specifica-tions which allows users to incorporate au-tomatic validation and repair in their work-flow

We do this operation polygon by polygonWe start at anunprocessed triangle and visitall the connected triangles having the samelabel (to reconstruct the polygon) mark-ing the triangles as processed as we visitthem Note that since all these trianglesare connected the outer and inner bound-aries of a polygon are all simple (non self-

9

over

lapp

ing

region

red

blue

(a) A set ofoverlappingpolygons thatneeds to berepaired

red

blue

(b) The poly-gons aretriangulatedand eachtriangle islabelled withthe polygons itbelongs to

A

C

D

B

red

blue

(c) An invalidregion isfound definedas a set ofcontiguouspolygons withan equivalentset of 0 or ge 1113569labels

Triangle Repair

ABCD

redredredred

(d) The neigh-bours of theregion areanalysed to seehow it shouldbe re-labelledand the resultis saved onto alist

red

blue

(e) The tri-angles arerelabelledaccording tothe list

red

blue

(f) The outputpolygonswhich nowhave no gapsor overlaps

Figure 7 The steps in a generic repairoperation

intersecting) We repeat this operation un-til all the triangles have been processed

For each polygon we have to recover notonly its outer boundary but also its in-ner boundaries which are not connectedObserve that we cannot simply follow theoriginal constrained edges as these do nothave any meaning after a planar partitionswas repaired the boundaries of the repairedpolygons are instead formed by edges in-cident to two triangles having different la-bels

For each polygon we walk at a triangle andmove on to triangles having the same label

(a) The original poly-gons

(b) Repaired eachtriangle using thelabel adjacent alongthe longest boundaryfrom the neighbouringtriangles

(c) Repaired each regionusing a random labelfrom the neighbouringtriangles

(d) Repaired each regionusing the label adjacentalong the longest bound-ary from the neighbour-ing triangles

Figure 8 Different repair operations usedin the twopolygons for theArribesdel Duero Natural Park in Spain(reddarker grey) and the Inter-national Douro Natural Park inPortugal (greenlighter grey) Allof them can be considered bestby a certain criterion like pre-serving the area ratio between thetwo polygons (b) smoothness ofthe boundary (d) or a balance be-tween the two (c)

As the process goes a single polyline thatruns along all the boundaries of the poly-gon is generated This involves a depth-firstsearch (clockwise) that recursively reachesuntil the boundary of a polygon return-ing a long chain of edges in a proceduresimilar to following the boundary edge byedge The procedure is shown step by stepin Figure 9 The polyline created with thismethod has ldquobridgesrdquo which allow us tokeep all inner boundaries (holes) connectedwith the outer one in a manner that keepsthe interior connected as well These helpto preserve connectivity and the relationsbetween different (outer and inner) bound-aries but are removed later in the process(to conform to the Simple Features specifi-cation) Also its orientation conveys the in-

10

AB

CD

FE

123

45

67 8 9

1011 12

1314 15 16 17

1819

20

21

22

23

24

25

2627 28

29

3130G

H IJ

K

L

MN

O

Figure 9 The traversal order when navigating through triangles in a clockwise mannerStarting at 119860 and the edge between 119860 and 119861 the operations occur accordingto the encircled numbers Black arrows denote when an unprocessed triangle isfound red arrowswhen it is not Notice how the traversal is performed clockwise(shown in dark blue arrows) for both cases despite starting from different sides

formation of whether a section of it is partof an inner or an outer boundary

This polyline is processed with a stack-based algorithm that generates separateclosed rings for the outer boundary andeach of the inner boundaries collapsing theldquobridgesrdquo that were generated In order todo this the polyline is cut at the placeswhere more than two edges join and theseare joined in the correct order by keepingtrack of (yet) unclosed rings When a newsegment is being processed it can be oneof three options one that completes a ringone that is part of a bridge or one thatis not yet closed Closed rings are storedand bridges are removed while unclosedrings are saved in the stack until they can bepopped to forma closed ring togetherwith anew segment This is shown in Figure 10

4 Implementation andExperiments

41 Implementation

An implementation of the algorithms de-scribed in Section 3 was written in the C++

programming language using external li-braries for some functionality The de-veloped software is called pprepair and isopen source and freely available at httptudelft3dgithubcompprepair C++was selected in order to have plenty of con-trol with regards to low level details andto achieve good performance which makesit possible to compare it with existing so-lutions The libraries used are the OGRSimple Features Library9 (which allows in-put and output from a large variety ofdata formats common in GIS) and CGAL10(which has support for many robust spatialdata structures and the operations based onthem we use its constrained triangulationmodule)

42 Experiments with real-worldplanar partition datasets

We havemade experiments with four freelyavailable real-world datasets ie we havevalidated and automatically repaired themwith our implementation the overview ofthese datasets is shown in Figure 11 and9httpwwwgdalorgogr10Computational Geometry Algorithms Library

httpwwwcgalorg

11

AB

a

b

CD

EFG

HIJK c

de

f

Figure 10 Processing the polyline shown starting from vertex 119886 and moving to-wards polyline 119860 the following operations occur (1) 119860 unclosedrarrpush(2) 119861 unclosedrarrpush (3) 119862 unclosedrarrpush (4) 119863 unclosedrarrpush (5)119864 unclosedrarrpush (6) 119865 closedrarrstore (7) poprarr 119864119866 closedrarrstore (8)poprarr 119863119867 degeneraterarrerase (9) poprarr 119862119868 closedrarrstore (10) poprarr 119861119869degeneraterarrerase (11) poprarr 119860119870 closedrarrstore

(a) E41N27 (b) 4tiles

(c) 16tiles (d) Mexico

Figure 11 Overview of the four datasets

their properties in Table 2 The datasets arethe following

E41N27 Corine 200011 tile E41N27 whichcontains a shifted polygon (by about10 cm) creating many small gaps andoverlaps in the dataset The snapping

11Corine is a land cover dataset for 32 Eu-ropean countries It is freely available athttpwwweeaeuropaeudata-and-mapsdatacorine-land-cover-2000-clc2000-seamless-vector-database

threshold has been set at 1 m

4tiles Corine 2000 tiles E39N32 E39N33E40N32 and E40N33 which are knownto have long and thin overlapping re-gions (lt 1 mm) with each other Thesnapping thresholdhas been set at 1 cm

16tiles 16 adjacent Corine 2000 tilesE39N30 E39N31 E39N32 E39N33E40N30 E40N31 E40N32 E40N33E41N30 E41N31 E41N32 E41N33E42N30 E42N31 E42N32 E42E33Some have gaps between one anothersome overlap but match within a fewcentimetres The snapping thresholdhas been set at 10 cm

Mexico 11 000 000 scale land cover datasetfrom INEGI consisting of over 26 000polygons Interestingly it is mostly al-ready valid (according to the Shapefilespecification) but contains some verylarge polygons with tens of thousandsof vertices

As a comparison we have also tried to per-form the same operations with other avail-able software While the capabilities forplanar partition repair among the softwaretested vary considerably with full topolog-ical repair only available in ArcGIS (usingmanual operations only) it is also impor-tant to consider how different repair im-plementations scale to large datasets For

12

Table 2 Properties of the datasets used for the experiments

pts avg pts polygons pts largest polygon per polygon

E41N27 14 969 496 303 26 740 3374tiles 4 984 365 702 16 961 74716tiles 63 868 6 622 133 95 112 1037Mexico 26 866 4 181 354 117 736 1556

this a few performance tests were made inour implementation and three planar par-tition repair tools that perform this processusing snapping and splitting The testingmethodology for each tool is as follows

ArcGIS In ArcCatalog a multiple featuredataset is created in a Geodatabase setwith XY resolution values equal to thesnapping threshold The features areimported into it and themerge and dis-solve operations are used tomerge adja-cent polygons with the same ID Topol-ogy is then generated to check that theplanar partition is valid Everything isfinally exported to a single ShapefileThe individual parts of the process aretimed and the total is recorded Mem-ory usage is calculated as the differ-ence between the just loaded ArcCata-log application and itsmaximummem-ory usage throughout the process

FME A reader is created for each input filewhich serve as input to a Snapper trans-former features with the same IDs arethen dissolved and finally they are out-put into a new Shapefile writer Thetopology generator is used to be ableto tell whether the result is a valid pla-nar partition Results are timed and themaximum memory allocation of thefmeexe process is recorded

GRASS Input files are imported withvinogr with all polygon cleaning op-erations performed and snapping set tothe correct values Boundaries betweenfeatures with the same IDs are thendissolved using vdissolve Files arethen exported with voutogr Timesreported by GRASS are added togetherto give the total while memory usage

by the vinogrexe vdissolveexeand voutogrexe are monitored andtheir maximum is recorded

pprepair Files are read and put into thetriangulation the triangulation is la-belled the repair is performed withthe longest boundary first (see Table 1)with ambiguous cases resolved with arandom choice Polygons are then ex-tracted from the triangulation and out-put to a single Shapefile The entireprocess is timed and the maximumamount of memory used is recorded

Notice that we are somewhat comparingapples with oranges here since our imple-mentation is able to repair more cases thanother methods keeps topological consis-tency and does not require finding out theappropriate threshold value (if it exists)More importantly it is able to directly statewhether the result is a planar partition un-like the three other solutions (with thesesnapping is performed but that does notguarantee that the output is a valid planarpartition) The results of the experimentsare shown in Table 3

We have made the experiments in orderto have an idea of the processing timeand the memory usage involved Onlycases that are acceptably solved by snappingand splitting have been considered sincethere is no other comparable automatedtopological repair tool among those stud-ied that would also work for more complexcases such as those that require a snappingthreshold too high to be practical (eg hor-izontal conflation of independently gener-ated datasets [Yuan and Tao 1999 Davis2003])

13

Table 3 Planar partition repair comparison using large datasets

pprepair ArcGIS FME GRASS

memory time memory time memory time memory time

E41N27 145 MB 19s 145 MB 1m3s 158 MB 31s 59 MB 3m9s4tiles 116 MB 17s 113 MB 37s 105 MB 31s 49 MB 53s16tiles 145 GB 4m47s crashes ndash 636 MB 15m48s crashes ndashMexico 101 GB 3m31s 216 MB gt1d 264 MB 2m45s 408 MB 11m38s

All tests have been run 5 times in a ma-chine just booted and the results averagedto account for the small variations that oc-curred except in the case where the execu-tion was cancelled after one full day due totime limitations or when it causes the pro-gram to crash The hardware is a 266 GHzCore 2DuoMacBookProwith 4GBofRAMArcGIS 93 FME 2010 SP1 and GRASS 64were run in Windows 7 while pprepair wasrun in Mac OS X 1071

As Table 3 shows our approach uses some-what more memory than other solutionsThis is explained by the extra (uncon-strained) edges that are added to the inputover when triangulating them It shouldhowever be noticed that both ArcGIS andGRASS crashed with the biggest dataset16tiles Our implementation is the fastestof the four tested being for instance aroundthree times faster than FME for the biggestdatasets Only for the Mexico dataset isour implementation slower than FME Thisis (probably) explained by the fact that itspolygons already form a valid planar parti-tion and therefore very few snapping op-erations have been performed by FME theplanar graph of the input was basicallybuilt and then the polygons saved back todisk We believe that ArcGIS and GRASSstruggled with the dataset because it con-tains several very large polygons (withmorethan 10 000 vertices) Our implementa-tion took 3m31s but as Table 4 shows ittook CGAL 3m02s to simply triangulate theinput edges This table also demonstratesthe efficiency of our approach for the fourdatasets around 85 of the time of our ap-proach was usedmdashby the CGAL librarymdashtoread the input fromdisk and triangulate the

Table 4 Timed steps of the planar partitionrepair procedure rounded down tothe nearest second The percentageis the triangulation time over thetotal time

E41N27 4tiles 16tiles Mexico

Triangulate 017 015 400 302Label 001 001 027 016Repair 000 000 000 000Reconstruct 001 001 011 007Output 000 000 010 006

Total 019 017 447 331triangulate 89 88 83 86

input edges

During the implementation we took sev-eral engineering decisions to optimise ourprogram One of them was to favour diskspace over computation time as we wantedto be able to process large datasets This iswhy we always reconstruct polygons even ifthey not modified by the repair process Al-though possible it would require us to keepin memory the original polygons (whichwould significantly increase the memoryconsumption) and to keep track of whichlabels have been modified And as Table 4shows the reconstruction is very efficient asit only takes around 3ndash4 of the total timefor the 16tiles and Mexico datasets

14

5 Discussion andConclusions

We have presented a new method for re-pairing polygonal area partitions ensur-ing that the output partition is valid ieall individual polygons are conforming tothe ISO 19107 [ISO 2003] and Simple Fea-tures [OGC 2006] specifications and nogaps nor overlaps are present between anypair of polygons Thenovelty of ourmethodlies in the fact that repair is performed ac-cording to user defined criteria but thentakes place without any human interven-tion Automatic repair is becoming increas-ingly an important topic due to data in-tegration eg data collected for the differ-ent themes of the European INSPIRE Ini-tiative will finally have to fit together anddata could eventually bematched fully auto-matically by dedicated web service compo-nents [INSPIRE 2009] Our approach couldbe at the heart of such a web service

The proposed approach excels in auto-mated repair at the cost of increased mem-ory usage compared to a pure graph-basedapproachmdashthis difference is mainly causedby the unconstrained edges introduced bythe triangulation While it would be possi-ble to use our repair rules together with a(primaldual) graph-based approach theseadditional edges in the triangulation givefine-grained control over the repair oper-ations and ensures that the graph is con-nected which facilitates the reconstructionof polygons

We have implemented our algorithm overa numerically robust triangulator (CGAL)and since repair operations are expressedsolely in terms of re-labelling of triangles(no geometric computation is involved) theapproach is also fully robust Since duringour experiments most of the time was usedto compute the constrained triangulationanother library could also be tested to im-prove the implementation

For the future we plan to

bull Improve the scalability of the approachand process and repair datasets withmore than 10 million polygons It is

known that using divide-and-conquertechniques triangulation algorithmscan handle big datasets [Amenta et al2003 Blandford et al 2005] We willinvestigate whether it is possible toautomatically repair each divided partindividually and lsquogluersquo the repairedparts together

bull Investigate snap rounding [Hobby1999 de Berg et al 2007] as a pre-processing stepmdashor embeddeddirectlyin the triangulation mdash to guaranteethat repaired planar partitions have novertices that are closer than a certain120576 threshold However snap roundingmay change the topology of the inputbut the output will nevertheless be avalid partition (as the topology willbe repaired) Apart from topologicalchanges snap rounding can also leadto removed polygon parts that are toosmall to be preserved based on thechosen 120576

bull Add more advanced repair operationsto our repair toolkit eg repair couldtake place based on splitting a collec-tion of triangles

bull Extend our work to include the thirddimension to validate and repair 3Dcity models using a constrained tetra-hedralisation [Si 2008] Notice thatthe tetrahedralisation of a given poly-hedron does not always exist and thusextra (Steiner) points might need tobe added The main concepts of ourapproach (re)labelling and reconstruc-tion extend naturally to 3D How-ever appropriate repair operations for3D city models would need to be de-fined and some application-specificconstraints (eg right angles at corners)are not trivial to implement in our cur-rent approach

ReferencesNina Amenta Sunghee Choi and Guumln-ter Rote Incremental constructions conBRIO In Proceedings of the 19th An-nual Symposium on Computational Geome-try pages 211ndash219 ACM Press 2003

15

WaelM Badawy andWalidG Aref On localheuristics to speed up polygon-polygonintersection tests In Proceedings of the 7thACM International Symposium on Advancesin Geographic Information Systems pages97ndash102 ACM 1999

M Bader and R Weibel Detecting andresolving size and proximity conflicts inthe generalization of polygonal maps InProceedings of the 18th International Carto-graphic Conference Stockholm pages 1525ndash1532 1997

Daniel K Blandford Guy E BlellochDavid E Cardoze and Clemens KadowCompact representations of simplicialmeshes in two and three dimensionsInternational Journal of ComputationalGeometry and Applications 15(1)3ndash242005

CGAL CGAL 38 User and Reference ManualCGAL Editorial Board 2011

Kenneth L Clarkson Kurt Mehlhorn andRaimund Seidel Four results on random-ized incremental constructions In AlainFinkel and Matthias Jantzen editors Pro-ceedings of the 9th Annual Symposium onTheoretical Aspects of Computer Science vol-ume 577 of Lecture Notes in Computer Sci-ence pages 461ndash474 SpringerBerlin Hei-delberg 1992

Martin Davis Java conflation suite Tech-nical report Vivid Solutions 2003 Avail-able at httpwwwvividsolutionscomjcs

Mark de Berg Dan Halperin and MarkOvermars An intersection-sensitive al-gorithm for snap rounding Computa-tional Geometry 36(3)159 ndash 165 2007 ISSN0925-7721

Mark de Berg Marc van Kreveld MarkOvermars and Otfried SchwarzkopfComputational Geometry Algorithms andApplications Springer-Verlag 3 edition2008

ESRI Shapefile technical descriptionWhite paper ESRI July 1998

Michael A Facello Implementation of arandomized algorithm for Delaunay and

regular triangulations in three dimen-sions Computer Aided Geometric Design 12(4)349ndash370 1995

GTS GTS Library Reference Manual 2006URL httpgtssourceforgenetreferencebook1html

LeonidasGuibas and Jorge Stolfi Primitivesfor the manipulation of general subdivi-sions and the computation of VoronoiACM Transactions on Graphics 4(2)74ndash1231985

John D Hobby Practical segment intersec-tion with finite precision output Com-putational Geometry 13(4)199 ndash 214 1999ISSN 0925-7721

Christoph M Hoffmann John E Hopcroftand Michael S Karasick Towards imple-menting robust geometric computationsIn Proceedings of the 4th Annual Symposiumon Computational Geometry pages 106ndash117ACM 1988

INSPIRE D25 Generic conceptual modelversion 32 Technical report DraftingTeam ldquoData Specificationsrdquo 2009 URLhttpinspirejrceceuropaeudocumentsData_SpecificationsD25_v32pdf

ISO 19107 Geographic information ndash spa-tial schema March 2003

David Kirkpatrick Jack Snoeyink and Bet-tina Speckmann Kinetic collision detec-tion for simple polygons InternationalJournal of Computational Geometry and Ap-plications 12(1ndash2)3ndash27 2002

Robert Laurini and Franccediloise Milleret-Raffort Topological reorganization ofinconsistent geographical databases Astep towards their certification Comput-ers amp Graphics 18(6)803ndash813 December1994

Hugo Ledoux and Martijn Meijers Val-idation of planar partitions using con-strained triangulations In ProceedingsJoint International Conference on TheoryData Handling and Modelling in GeoSpa-tial Information Science pages 51ndash55 HongKong May 2010

16

Hugo Ledoux Ken Arroyo Ohori and Mar-tijn Meijers Automatically repairing in-valid polygons with a constrained trian-gulation In Proceedings of the 15th AGILEInternational Conference on Geographic In-formation Science 2012

Yuanxin Liu and Jack Snoeyink Farawaypoint A sentinel point forDelaunay com-putation International Journal of Compu-tational Geometry and Applications 18(4)343ndash355 November 2006

Avraham Margalit and Gary D Knott Analgorithm for computing the union in-tersection or difference of two poly-gons Computers amp Graphics 13(2)167ndash183 1989

Ernst P Muumlcke Isaac Saias and Bin-hai Zhu Fast randomized point loca-tion without preprocessing in two- andthree-dimensional Delaunay triangula-tions Computational GeometrymdashTheoryand Applications 1263ndash83 1999

OGC OpenGIS Implementation Specificationfor Geographic Information - Simple FeatureAccess - Part 1 Common Architecture 120edition October 2006

Lutz Pluumlmer and Gerhard Groumlger Achiev-ing integrity in geographic informationsystemsmdashmaps and nested maps GeoIn-formatica 1(4)345ndash367 1997

M Rivero and FR Feito Boolean opera-tions on general planar polygons Com-puters amp Graphics 24(6)881ndash896 Decem-ber 2000

Stefan Schirra Precision and Robustness inGeometric Computations volume Algorith-mic Foundations of Geographic Informa-

tion Systems of Lecture Notes in ComputerScience chapter 9 pages 255ndash287 SpringerBerlin Heidelberg 1997

Michael Ian Shamos and Dan Hoey Geo-metric intersection problems In FOCSpages 208ndash215 IEEE 1976

Jonathan Richard Shewchuk Delaunay Re-finement Mesh Generation PhD thesisSchool of Computer Science CarnegieMellon University Pittsburg USA 1997a

Jonathan Richard Shewchuk Adaptiveprecision floating-point arithmetic andfast robust geometric predicates Dis-crete amp Computational Geometry 18305ndash363 1997b

Hang Si Three Dimensional Boundary Con-forming Delaunay Mesh Generation PhDthesis Technische Universitaumlt Berlin2008

C Dana Tomlin Map algebra One perspec-tive Landscape and Urban Planning 30(1ndash2)3ndash12 October 1994

Peter van OosteromWilko Quak and TheoTijssen About invalid valid and cleanpolygons In Peter F Fisher editor Devel-opments in Spatial Data Handlingmdash11th In-ternational Symposium on Spatial DataHan-dling pages 1ndash16 Springer 2004

Jan W van Roessel A new approach toplane-sweep overlay Topological struc-turing and line-segment classificationCartography and Geographic InformationScience 18(1)49ndash67 January 1991

Shuxin Yuan and Chuang Tao Develop-ment of conflation components Geoin-formatics and Socioinformatics pages 1ndash13June 1999

17

  • Introduction
  • Related Work
    • Simple Features and validity of simple polygons
    • Validation of a Planar Partition Using a Planar Graph
    • Repair Using Point and Edge Snapping and Splitting
    • Repair Using Topological Information
      • Validation and Automatic Repair Using a Constrained Triangulation
        • Triangulation of a polygon and constrained triangulation
        • Labelling the triangles of a Planar Partition
        • Validation
        • Repair Operations
        • Extraction of Polygons From a Triangulation
          • Implementation and Experiments
            • Implementation
            • Experiments with real-world planar partition datasets
              • Discussion and Conclusions
Page 6: ValidationandAutomaticRepairofPlanar ... · mightbesnappedtogether,creatingre-peatedvertices,disjointregions,orvar-ioustopologicalproblems. 4.None of the vertices of a polygon shouldbecloserthanthisthreshold

minimumdistancethreshold

maximumdistancethreshold

(a) Gaps between polygons

minimumdistancethreshold

maximumdistancethreshold

(b) Overlapping polygons

Figure 2 Defining a threshold for vertex edge and face snapping The threshold touse should be larger than the largest minimum distance between the matchingboundaries and smaller than the minimum distance between vertices

Geodatabase feature of the software withsome combined validation rules (eg mustnot overlap and must not have gaps) How-ever fixing every problematic area in an ap-propriate manner requires extensive userintervention (see Figure 5) since the bestchoice for each case depends on the specificconfiguration of the error

Since both the aforementioned programsdo not offer an automated process to cor-rectly solve this problem (and GRASS lacksthe ability to visualise problem areas) theyare not really comparable to our solutionA planar partition can easily contain tensof thousands of polygons possibly generat-ing thousands of errors which need to bechecked and repaired semi-automatically

3 Validation andAutomaticRepair Using a ConstrainedTriangulation

Our approach to validation and automaticrepair of planar partitions uses a con-strained triangulation (CT) as a supportingstructure because it has many good proper-ties including the following

bull It is by definition a planar partitionTherefore as long as we keep the in-formation about which polygon eachtriangle belongs to the reconstructedpolygons will be either a valid planarpartition or multiple ones

bull It can be built quickly in 119874(119899 log 119899)with a variety of approaches [Guibasand Stolfi 1985 Muumlcke et al 1999Clarkson et al 1992]8

bull Changes to the triangulation (egadding a new constrained edge) arelocal and therefore fast

bull Constrained edges canusually be addedin constant time being only signif-icantly slower (and more complex)when there is an intersection with anexisting constrained edge [Shewchuk1997b]

bull Implementation-wise several stableand fast triangulation libraries existincluding CGAL [CGAL 2011] Trian-gle [Shewchuk 1997a] and GTS [GTS2006]

The general workflow of our approach toboth validate and repair a planar partitionis as follows

1 the CT of the input segments formingthe polygons is constructed

2 each triangle in the CT is labelled withthe label of the polygon inside which itis located

3 problems are detected by identifyingtriangles having no or multiple labels

8The actual computational complexity can be119874(119899 111361111136141113606 119899+119896) with 119896 being the number of edge-edgeintersections which could conceivably even be 1198991113579However 119896 ≪ 119899 for most GIS datasets

6

(a) Viewing a topology error in ArcGIS (b) Assigning an overlapping region to one of thepolygons involved

Figure 5 Planar partition repair in ArcGIS The user is expected to zoom in to a particularerror analyse the situation (eg by looking at the properties from the surround-ing polygons) and make a decision to assign the problematic region to a certainpolygon More than 11 000 errors were detected in this tile of the Corine dataset

and by verifying the connectivity be-tween triangles

4 repairing of the problems ismade by re-labelling triangles to ensure that eachtriangle has exactly one label

5 extracting the polygons from the trian-gulation (polygons modelled with theSimple Features specification)

As mentioned previously for this workflowwe assume that each input polygon is indi-vidually valid We describe in the followingsection the concepts needed and we give adetailed description of the different steps

31 Triangulation of a polygon andconstrained triangulation

A triangulation subdivides an area intonon-overlapping triangles Using a con-strained triangulation every line segmentthat defines the boundary of a polygon isensured to appear as an edge in the tri-angulation It is known that any polygon(also with holes) can be triangulated with-out adding extra vertices [de Berg et al2008 Shewchuk 1997a] Figure 6 shows anexample

In our approach the triangulation is per-formed by constructing incrementally a CT

Figure 6 (a) A polygon with 4 holes (b)The constrained triangulation ofthe segments of this polygon

of all the segments representing the bound-aries (outer + inner) of each polygon Ifthe set of input polygons forms a planarpartition then each segment will be in-serted twice (except those forming the outerboundary of the set of input polygons) Thisis usually not a problem for triangulation li-braries because they ignore points and seg-ments at the same location (as is the casewith the solution we use see Section 4)When segments are found to intersect theyare split with a new point created at theintersection This is the only situation inwhich the generation of new points is re-quired

Notice that our approach requires only aconstrained triangulation and not one thatfulfils the Delaunay criterion [Shewchuk

7

1997a] However having well-shaped tri-angles is useful for repair purposes anddoes not significantly increase the process-ing time Therefore our implementation ac-tually constructs and uses a constrainedDe-launay triangulation

32 Labelling the triangles of aPlanar Partition

Labelling a triangle means assigning it la-bel(s) for the polygon(s) that it belongs to (iftwo input polygons overlap each triangle inthe overlapping region should have the la-bels of the two polygons)

In our previous work on validation [Ledouxand Meijers 2010] we used the centroid ofa polygon to start the labelling process butthis method is prone to errors (if for in-stance the calculated centroid is outside oron the boundary of the polygon) and doesnot permit us to differentiate gaps fromoverlaps

To solve these problems we store informa-tion about the constrained edges of the CTSince it is known that the input rings areclosed and have a known orientation (ac-cording to the ISO 19107 orientation rules)it is also known on which side of a cer-tain line segment the interior of the poly-gon lies This property is used for robustlabelling of each polygon Triangles adja-cent to the outer ring of the polygons arelabelled first and this is later expandedto triangles further in the interior of thepolygon recursively labelling adjacent un-labelled faces as long as no constrainededges are crossed After this operation alltriangles that are part of any polygon arelabelled with overlapping regions havingmultiply labelled triangles However holesare then indistinguishable from trianglesoutside theplanar partition since bothhavezero labels Therefore a special label is cre-ated for all the triangles outside the planarpartition referred to as the ldquouniverserdquo labelwhich are labelled by recursively labellingadjacent triangles from any triangle knownto lie in the exterior of the planar parti-tion Weexploit the concept of the ldquofar-awaypointrdquo to achieve this [Liu and Snoeyink

2006] which is used by several implemen-tations and is also known as the ldquobig trian-glerdquo [Facello 1995]

33 Validation

If the set of input polygons forms a planarpartition then all the triangles will be la-belled with one and only one label Theproblems are easily detected

bull Gaps are detected by finding triangleswithout any labels

bull Overlaps are detected by finding trian-gles with two or more labels

bull Disjoint regions are detected by iden-tifying regions separated by the ldquouni-verserdquo label This is done by starting ata given triangle and doing a breadth-first search on the dual of the triangu-lation without visiting the triangles la-belled as ldquouniverserdquo If all the trianglescan be reached then no polygons of theplanar partition are disjoint

34 Repair Operations

The greatest benefits of using a labelled tri-angulation for planar partition repair stemfrom the fact that while repair operationsare performed the validity of the planarpartition is always kept together with theintegrity of the data Unlike with snappingvertices are not required to be moved dur-ing the process and unlike snapping repairis performed using local criteria instead ofglobal ones (in snapping the threshold isusually fixed for the whole dataset) Thiscomes as a contrast to othermethods wherecare needs to be taken to ensure that the (ge-ometric or topological) validity is kept

Figure 7 shows the standard steps requiredin a repair operation In order to avoid or-der dependency when repairing the repairoperation itself is always performed after allthe choices for which label to assign havebeen made

In particular we propose six different re-pair operations that can be used to fix gaps

8

Table 1 The repair operations currently implemented in our software The types of oper-ations are defined in the map algebra classification [Tomlin 1994]

Repair operation Type Criteria

Triangle by priority list Varies Label that has the highest priority according toa predefined priority list

Triangle by number of Focal The label present in the largest number ofneighbours adjacent faces overlaps includedTriangle by absolute majority Focal Label present in two or more valid

adjacent facesTriangle by longest boundary Focal Label present along the longest portion of

the boundary of the adjacent facesRegions by longest boundary Focal of Label present along the longest portion of

zonal the boundary of the adjacent facesRegions by random neighbour Focal of Random label from the adjacent faces

zonal

and overlaps These are shown in Table 1and all imply re-labelling triangles Fourof them use triangles as a base (ie the la-bel assigned is based only on that of itsthree neighbouring triangles) which is fastand modifies the area of each input poly-gon the least Despite their simplicity theyoffer substantial control over the resultsFor instance the first two operations onlydiffer from each other in their handlingof overlapping faces but triangle by num-ber of neighbours is better for large over-lapping regions while triangle by absolutemajority is better in fixing small problemsHaving well-shaped (Delaunay) triangles ismost useful for triangle-based repair func-tions Two of them use regions of adja-cent triangles with equivalent sets of la-bels which is slower than a triangle-basedmethod but yields results that can be car-tographically more pleasing An interest-ing repair operation for practicioners is theone in which a priority of labels is used iein case of gapsoverlaps the labels in thetriangle (overlap) or in the adjacent poly-gons (gap) are ordered according to a user-defined priority and the highest priority isassigned to the problematic triangles No-tice that these repair operations can be usedone after the other (in a hierarchical man-ner) for instance if first the repair accord-ing to the longest boundary is used but onezone has two or more boundaries with ex-actly the same length then the deadlock can

be solved by choosing one randomly Asample of the results obtained with differ-ent repair operators is also shown in Fig-ure 8

More repair operations basedonextensionsto the idea of labelling trianglesregions canbe further developed For instance tri-angles could be split to subdivide an areawith problems (as in Bader and Weibel[1997]) or sliver trianglesregions could bediscarded (and then filled during repair)

35 Extraction of Polygons From aTriangulation

Starting from a labelled triangulation itis possible to reconstruct the individualpolygons of a planar partition to conformto valid polygons according to the ISO19107 and the Simple Features specifica-tions which allows users to incorporate au-tomatic validation and repair in their work-flow

We do this operation polygon by polygonWe start at anunprocessed triangle and visitall the connected triangles having the samelabel (to reconstruct the polygon) mark-ing the triangles as processed as we visitthem Note that since all these trianglesare connected the outer and inner bound-aries of a polygon are all simple (non self-

9

over

lapp

ing

region

red

blue

(a) A set ofoverlappingpolygons thatneeds to berepaired

red

blue

(b) The poly-gons aretriangulatedand eachtriangle islabelled withthe polygons itbelongs to

A

C

D

B

red

blue

(c) An invalidregion isfound definedas a set ofcontiguouspolygons withan equivalentset of 0 or ge 1113569labels

Triangle Repair

ABCD

redredredred

(d) The neigh-bours of theregion areanalysed to seehow it shouldbe re-labelledand the resultis saved onto alist

red

blue

(e) The tri-angles arerelabelledaccording tothe list

red

blue

(f) The outputpolygonswhich nowhave no gapsor overlaps

Figure 7 The steps in a generic repairoperation

intersecting) We repeat this operation un-til all the triangles have been processed

For each polygon we have to recover notonly its outer boundary but also its in-ner boundaries which are not connectedObserve that we cannot simply follow theoriginal constrained edges as these do nothave any meaning after a planar partitionswas repaired the boundaries of the repairedpolygons are instead formed by edges in-cident to two triangles having different la-bels

For each polygon we walk at a triangle andmove on to triangles having the same label

(a) The original poly-gons

(b) Repaired eachtriangle using thelabel adjacent alongthe longest boundaryfrom the neighbouringtriangles

(c) Repaired each regionusing a random labelfrom the neighbouringtriangles

(d) Repaired each regionusing the label adjacentalong the longest bound-ary from the neighbour-ing triangles

Figure 8 Different repair operations usedin the twopolygons for theArribesdel Duero Natural Park in Spain(reddarker grey) and the Inter-national Douro Natural Park inPortugal (greenlighter grey) Allof them can be considered bestby a certain criterion like pre-serving the area ratio between thetwo polygons (b) smoothness ofthe boundary (d) or a balance be-tween the two (c)

As the process goes a single polyline thatruns along all the boundaries of the poly-gon is generated This involves a depth-firstsearch (clockwise) that recursively reachesuntil the boundary of a polygon return-ing a long chain of edges in a proceduresimilar to following the boundary edge byedge The procedure is shown step by stepin Figure 9 The polyline created with thismethod has ldquobridgesrdquo which allow us tokeep all inner boundaries (holes) connectedwith the outer one in a manner that keepsthe interior connected as well These helpto preserve connectivity and the relationsbetween different (outer and inner) bound-aries but are removed later in the process(to conform to the Simple Features specifi-cation) Also its orientation conveys the in-

10

AB

CD

FE

123

45

67 8 9

1011 12

1314 15 16 17

1819

20

21

22

23

24

25

2627 28

29

3130G

H IJ

K

L

MN

O

Figure 9 The traversal order when navigating through triangles in a clockwise mannerStarting at 119860 and the edge between 119860 and 119861 the operations occur accordingto the encircled numbers Black arrows denote when an unprocessed triangle isfound red arrowswhen it is not Notice how the traversal is performed clockwise(shown in dark blue arrows) for both cases despite starting from different sides

formation of whether a section of it is partof an inner or an outer boundary

This polyline is processed with a stack-based algorithm that generates separateclosed rings for the outer boundary andeach of the inner boundaries collapsing theldquobridgesrdquo that were generated In order todo this the polyline is cut at the placeswhere more than two edges join and theseare joined in the correct order by keepingtrack of (yet) unclosed rings When a newsegment is being processed it can be oneof three options one that completes a ringone that is part of a bridge or one thatis not yet closed Closed rings are storedand bridges are removed while unclosedrings are saved in the stack until they can bepopped to forma closed ring togetherwith anew segment This is shown in Figure 10

4 Implementation andExperiments

41 Implementation

An implementation of the algorithms de-scribed in Section 3 was written in the C++

programming language using external li-braries for some functionality The de-veloped software is called pprepair and isopen source and freely available at httptudelft3dgithubcompprepair C++was selected in order to have plenty of con-trol with regards to low level details andto achieve good performance which makesit possible to compare it with existing so-lutions The libraries used are the OGRSimple Features Library9 (which allows in-put and output from a large variety ofdata formats common in GIS) and CGAL10(which has support for many robust spatialdata structures and the operations based onthem we use its constrained triangulationmodule)

42 Experiments with real-worldplanar partition datasets

We havemade experiments with four freelyavailable real-world datasets ie we havevalidated and automatically repaired themwith our implementation the overview ofthese datasets is shown in Figure 11 and9httpwwwgdalorgogr10Computational Geometry Algorithms Library

httpwwwcgalorg

11

AB

a

b

CD

EFG

HIJK c

de

f

Figure 10 Processing the polyline shown starting from vertex 119886 and moving to-wards polyline 119860 the following operations occur (1) 119860 unclosedrarrpush(2) 119861 unclosedrarrpush (3) 119862 unclosedrarrpush (4) 119863 unclosedrarrpush (5)119864 unclosedrarrpush (6) 119865 closedrarrstore (7) poprarr 119864119866 closedrarrstore (8)poprarr 119863119867 degeneraterarrerase (9) poprarr 119862119868 closedrarrstore (10) poprarr 119861119869degeneraterarrerase (11) poprarr 119860119870 closedrarrstore

(a) E41N27 (b) 4tiles

(c) 16tiles (d) Mexico

Figure 11 Overview of the four datasets

their properties in Table 2 The datasets arethe following

E41N27 Corine 200011 tile E41N27 whichcontains a shifted polygon (by about10 cm) creating many small gaps andoverlaps in the dataset The snapping

11Corine is a land cover dataset for 32 Eu-ropean countries It is freely available athttpwwweeaeuropaeudata-and-mapsdatacorine-land-cover-2000-clc2000-seamless-vector-database

threshold has been set at 1 m

4tiles Corine 2000 tiles E39N32 E39N33E40N32 and E40N33 which are knownto have long and thin overlapping re-gions (lt 1 mm) with each other Thesnapping thresholdhas been set at 1 cm

16tiles 16 adjacent Corine 2000 tilesE39N30 E39N31 E39N32 E39N33E40N30 E40N31 E40N32 E40N33E41N30 E41N31 E41N32 E41N33E42N30 E42N31 E42N32 E42E33Some have gaps between one anothersome overlap but match within a fewcentimetres The snapping thresholdhas been set at 10 cm

Mexico 11 000 000 scale land cover datasetfrom INEGI consisting of over 26 000polygons Interestingly it is mostly al-ready valid (according to the Shapefilespecification) but contains some verylarge polygons with tens of thousandsof vertices

As a comparison we have also tried to per-form the same operations with other avail-able software While the capabilities forplanar partition repair among the softwaretested vary considerably with full topolog-ical repair only available in ArcGIS (usingmanual operations only) it is also impor-tant to consider how different repair im-plementations scale to large datasets For

12

Table 2 Properties of the datasets used for the experiments

pts avg pts polygons pts largest polygon per polygon

E41N27 14 969 496 303 26 740 3374tiles 4 984 365 702 16 961 74716tiles 63 868 6 622 133 95 112 1037Mexico 26 866 4 181 354 117 736 1556

this a few performance tests were made inour implementation and three planar par-tition repair tools that perform this processusing snapping and splitting The testingmethodology for each tool is as follows

ArcGIS In ArcCatalog a multiple featuredataset is created in a Geodatabase setwith XY resolution values equal to thesnapping threshold The features areimported into it and themerge and dis-solve operations are used tomerge adja-cent polygons with the same ID Topol-ogy is then generated to check that theplanar partition is valid Everything isfinally exported to a single ShapefileThe individual parts of the process aretimed and the total is recorded Mem-ory usage is calculated as the differ-ence between the just loaded ArcCata-log application and itsmaximummem-ory usage throughout the process

FME A reader is created for each input filewhich serve as input to a Snapper trans-former features with the same IDs arethen dissolved and finally they are out-put into a new Shapefile writer Thetopology generator is used to be ableto tell whether the result is a valid pla-nar partition Results are timed and themaximum memory allocation of thefmeexe process is recorded

GRASS Input files are imported withvinogr with all polygon cleaning op-erations performed and snapping set tothe correct values Boundaries betweenfeatures with the same IDs are thendissolved using vdissolve Files arethen exported with voutogr Timesreported by GRASS are added togetherto give the total while memory usage

by the vinogrexe vdissolveexeand voutogrexe are monitored andtheir maximum is recorded

pprepair Files are read and put into thetriangulation the triangulation is la-belled the repair is performed withthe longest boundary first (see Table 1)with ambiguous cases resolved with arandom choice Polygons are then ex-tracted from the triangulation and out-put to a single Shapefile The entireprocess is timed and the maximumamount of memory used is recorded

Notice that we are somewhat comparingapples with oranges here since our imple-mentation is able to repair more cases thanother methods keeps topological consis-tency and does not require finding out theappropriate threshold value (if it exists)More importantly it is able to directly statewhether the result is a planar partition un-like the three other solutions (with thesesnapping is performed but that does notguarantee that the output is a valid planarpartition) The results of the experimentsare shown in Table 3

We have made the experiments in orderto have an idea of the processing timeand the memory usage involved Onlycases that are acceptably solved by snappingand splitting have been considered sincethere is no other comparable automatedtopological repair tool among those stud-ied that would also work for more complexcases such as those that require a snappingthreshold too high to be practical (eg hor-izontal conflation of independently gener-ated datasets [Yuan and Tao 1999 Davis2003])

13

Table 3 Planar partition repair comparison using large datasets

pprepair ArcGIS FME GRASS

memory time memory time memory time memory time

E41N27 145 MB 19s 145 MB 1m3s 158 MB 31s 59 MB 3m9s4tiles 116 MB 17s 113 MB 37s 105 MB 31s 49 MB 53s16tiles 145 GB 4m47s crashes ndash 636 MB 15m48s crashes ndashMexico 101 GB 3m31s 216 MB gt1d 264 MB 2m45s 408 MB 11m38s

All tests have been run 5 times in a ma-chine just booted and the results averagedto account for the small variations that oc-curred except in the case where the execu-tion was cancelled after one full day due totime limitations or when it causes the pro-gram to crash The hardware is a 266 GHzCore 2DuoMacBookProwith 4GBofRAMArcGIS 93 FME 2010 SP1 and GRASS 64were run in Windows 7 while pprepair wasrun in Mac OS X 1071

As Table 3 shows our approach uses some-what more memory than other solutionsThis is explained by the extra (uncon-strained) edges that are added to the inputover when triangulating them It shouldhowever be noticed that both ArcGIS andGRASS crashed with the biggest dataset16tiles Our implementation is the fastestof the four tested being for instance aroundthree times faster than FME for the biggestdatasets Only for the Mexico dataset isour implementation slower than FME Thisis (probably) explained by the fact that itspolygons already form a valid planar parti-tion and therefore very few snapping op-erations have been performed by FME theplanar graph of the input was basicallybuilt and then the polygons saved back todisk We believe that ArcGIS and GRASSstruggled with the dataset because it con-tains several very large polygons (withmorethan 10 000 vertices) Our implementa-tion took 3m31s but as Table 4 shows ittook CGAL 3m02s to simply triangulate theinput edges This table also demonstratesthe efficiency of our approach for the fourdatasets around 85 of the time of our ap-proach was usedmdashby the CGAL librarymdashtoread the input fromdisk and triangulate the

Table 4 Timed steps of the planar partitionrepair procedure rounded down tothe nearest second The percentageis the triangulation time over thetotal time

E41N27 4tiles 16tiles Mexico

Triangulate 017 015 400 302Label 001 001 027 016Repair 000 000 000 000Reconstruct 001 001 011 007Output 000 000 010 006

Total 019 017 447 331triangulate 89 88 83 86

input edges

During the implementation we took sev-eral engineering decisions to optimise ourprogram One of them was to favour diskspace over computation time as we wantedto be able to process large datasets This iswhy we always reconstruct polygons even ifthey not modified by the repair process Al-though possible it would require us to keepin memory the original polygons (whichwould significantly increase the memoryconsumption) and to keep track of whichlabels have been modified And as Table 4shows the reconstruction is very efficient asit only takes around 3ndash4 of the total timefor the 16tiles and Mexico datasets

14

5 Discussion andConclusions

We have presented a new method for re-pairing polygonal area partitions ensur-ing that the output partition is valid ieall individual polygons are conforming tothe ISO 19107 [ISO 2003] and Simple Fea-tures [OGC 2006] specifications and nogaps nor overlaps are present between anypair of polygons Thenovelty of ourmethodlies in the fact that repair is performed ac-cording to user defined criteria but thentakes place without any human interven-tion Automatic repair is becoming increas-ingly an important topic due to data in-tegration eg data collected for the differ-ent themes of the European INSPIRE Ini-tiative will finally have to fit together anddata could eventually bematched fully auto-matically by dedicated web service compo-nents [INSPIRE 2009] Our approach couldbe at the heart of such a web service

The proposed approach excels in auto-mated repair at the cost of increased mem-ory usage compared to a pure graph-basedapproachmdashthis difference is mainly causedby the unconstrained edges introduced bythe triangulation While it would be possi-ble to use our repair rules together with a(primaldual) graph-based approach theseadditional edges in the triangulation givefine-grained control over the repair oper-ations and ensures that the graph is con-nected which facilitates the reconstructionof polygons

We have implemented our algorithm overa numerically robust triangulator (CGAL)and since repair operations are expressedsolely in terms of re-labelling of triangles(no geometric computation is involved) theapproach is also fully robust Since duringour experiments most of the time was usedto compute the constrained triangulationanother library could also be tested to im-prove the implementation

For the future we plan to

bull Improve the scalability of the approachand process and repair datasets withmore than 10 million polygons It is

known that using divide-and-conquertechniques triangulation algorithmscan handle big datasets [Amenta et al2003 Blandford et al 2005] We willinvestigate whether it is possible toautomatically repair each divided partindividually and lsquogluersquo the repairedparts together

bull Investigate snap rounding [Hobby1999 de Berg et al 2007] as a pre-processing stepmdashor embeddeddirectlyin the triangulation mdash to guaranteethat repaired planar partitions have novertices that are closer than a certain120576 threshold However snap roundingmay change the topology of the inputbut the output will nevertheless be avalid partition (as the topology willbe repaired) Apart from topologicalchanges snap rounding can also leadto removed polygon parts that are toosmall to be preserved based on thechosen 120576

bull Add more advanced repair operationsto our repair toolkit eg repair couldtake place based on splitting a collec-tion of triangles

bull Extend our work to include the thirddimension to validate and repair 3Dcity models using a constrained tetra-hedralisation [Si 2008] Notice thatthe tetrahedralisation of a given poly-hedron does not always exist and thusextra (Steiner) points might need tobe added The main concepts of ourapproach (re)labelling and reconstruc-tion extend naturally to 3D How-ever appropriate repair operations for3D city models would need to be de-fined and some application-specificconstraints (eg right angles at corners)are not trivial to implement in our cur-rent approach

ReferencesNina Amenta Sunghee Choi and Guumln-ter Rote Incremental constructions conBRIO In Proceedings of the 19th An-nual Symposium on Computational Geome-try pages 211ndash219 ACM Press 2003

15

WaelM Badawy andWalidG Aref On localheuristics to speed up polygon-polygonintersection tests In Proceedings of the 7thACM International Symposium on Advancesin Geographic Information Systems pages97ndash102 ACM 1999

M Bader and R Weibel Detecting andresolving size and proximity conflicts inthe generalization of polygonal maps InProceedings of the 18th International Carto-graphic Conference Stockholm pages 1525ndash1532 1997

Daniel K Blandford Guy E BlellochDavid E Cardoze and Clemens KadowCompact representations of simplicialmeshes in two and three dimensionsInternational Journal of ComputationalGeometry and Applications 15(1)3ndash242005

CGAL CGAL 38 User and Reference ManualCGAL Editorial Board 2011

Kenneth L Clarkson Kurt Mehlhorn andRaimund Seidel Four results on random-ized incremental constructions In AlainFinkel and Matthias Jantzen editors Pro-ceedings of the 9th Annual Symposium onTheoretical Aspects of Computer Science vol-ume 577 of Lecture Notes in Computer Sci-ence pages 461ndash474 SpringerBerlin Hei-delberg 1992

Martin Davis Java conflation suite Tech-nical report Vivid Solutions 2003 Avail-able at httpwwwvividsolutionscomjcs

Mark de Berg Dan Halperin and MarkOvermars An intersection-sensitive al-gorithm for snap rounding Computa-tional Geometry 36(3)159 ndash 165 2007 ISSN0925-7721

Mark de Berg Marc van Kreveld MarkOvermars and Otfried SchwarzkopfComputational Geometry Algorithms andApplications Springer-Verlag 3 edition2008

ESRI Shapefile technical descriptionWhite paper ESRI July 1998

Michael A Facello Implementation of arandomized algorithm for Delaunay and

regular triangulations in three dimen-sions Computer Aided Geometric Design 12(4)349ndash370 1995

GTS GTS Library Reference Manual 2006URL httpgtssourceforgenetreferencebook1html

LeonidasGuibas and Jorge Stolfi Primitivesfor the manipulation of general subdivi-sions and the computation of VoronoiACM Transactions on Graphics 4(2)74ndash1231985

John D Hobby Practical segment intersec-tion with finite precision output Com-putational Geometry 13(4)199 ndash 214 1999ISSN 0925-7721

Christoph M Hoffmann John E Hopcroftand Michael S Karasick Towards imple-menting robust geometric computationsIn Proceedings of the 4th Annual Symposiumon Computational Geometry pages 106ndash117ACM 1988

INSPIRE D25 Generic conceptual modelversion 32 Technical report DraftingTeam ldquoData Specificationsrdquo 2009 URLhttpinspirejrceceuropaeudocumentsData_SpecificationsD25_v32pdf

ISO 19107 Geographic information ndash spa-tial schema March 2003

David Kirkpatrick Jack Snoeyink and Bet-tina Speckmann Kinetic collision detec-tion for simple polygons InternationalJournal of Computational Geometry and Ap-plications 12(1ndash2)3ndash27 2002

Robert Laurini and Franccediloise Milleret-Raffort Topological reorganization ofinconsistent geographical databases Astep towards their certification Comput-ers amp Graphics 18(6)803ndash813 December1994

Hugo Ledoux and Martijn Meijers Val-idation of planar partitions using con-strained triangulations In ProceedingsJoint International Conference on TheoryData Handling and Modelling in GeoSpa-tial Information Science pages 51ndash55 HongKong May 2010

16

Hugo Ledoux Ken Arroyo Ohori and Mar-tijn Meijers Automatically repairing in-valid polygons with a constrained trian-gulation In Proceedings of the 15th AGILEInternational Conference on Geographic In-formation Science 2012

Yuanxin Liu and Jack Snoeyink Farawaypoint A sentinel point forDelaunay com-putation International Journal of Compu-tational Geometry and Applications 18(4)343ndash355 November 2006

Avraham Margalit and Gary D Knott Analgorithm for computing the union in-tersection or difference of two poly-gons Computers amp Graphics 13(2)167ndash183 1989

Ernst P Muumlcke Isaac Saias and Bin-hai Zhu Fast randomized point loca-tion without preprocessing in two- andthree-dimensional Delaunay triangula-tions Computational GeometrymdashTheoryand Applications 1263ndash83 1999

OGC OpenGIS Implementation Specificationfor Geographic Information - Simple FeatureAccess - Part 1 Common Architecture 120edition October 2006

Lutz Pluumlmer and Gerhard Groumlger Achiev-ing integrity in geographic informationsystemsmdashmaps and nested maps GeoIn-formatica 1(4)345ndash367 1997

M Rivero and FR Feito Boolean opera-tions on general planar polygons Com-puters amp Graphics 24(6)881ndash896 Decem-ber 2000

Stefan Schirra Precision and Robustness inGeometric Computations volume Algorith-mic Foundations of Geographic Informa-

tion Systems of Lecture Notes in ComputerScience chapter 9 pages 255ndash287 SpringerBerlin Heidelberg 1997

Michael Ian Shamos and Dan Hoey Geo-metric intersection problems In FOCSpages 208ndash215 IEEE 1976

Jonathan Richard Shewchuk Delaunay Re-finement Mesh Generation PhD thesisSchool of Computer Science CarnegieMellon University Pittsburg USA 1997a

Jonathan Richard Shewchuk Adaptiveprecision floating-point arithmetic andfast robust geometric predicates Dis-crete amp Computational Geometry 18305ndash363 1997b

Hang Si Three Dimensional Boundary Con-forming Delaunay Mesh Generation PhDthesis Technische Universitaumlt Berlin2008

C Dana Tomlin Map algebra One perspec-tive Landscape and Urban Planning 30(1ndash2)3ndash12 October 1994

Peter van OosteromWilko Quak and TheoTijssen About invalid valid and cleanpolygons In Peter F Fisher editor Devel-opments in Spatial Data Handlingmdash11th In-ternational Symposium on Spatial DataHan-dling pages 1ndash16 Springer 2004

Jan W van Roessel A new approach toplane-sweep overlay Topological struc-turing and line-segment classificationCartography and Geographic InformationScience 18(1)49ndash67 January 1991

Shuxin Yuan and Chuang Tao Develop-ment of conflation components Geoin-formatics and Socioinformatics pages 1ndash13June 1999

17

  • Introduction
  • Related Work
    • Simple Features and validity of simple polygons
    • Validation of a Planar Partition Using a Planar Graph
    • Repair Using Point and Edge Snapping and Splitting
    • Repair Using Topological Information
      • Validation and Automatic Repair Using a Constrained Triangulation
        • Triangulation of a polygon and constrained triangulation
        • Labelling the triangles of a Planar Partition
        • Validation
        • Repair Operations
        • Extraction of Polygons From a Triangulation
          • Implementation and Experiments
            • Implementation
            • Experiments with real-world planar partition datasets
              • Discussion and Conclusions
Page 7: ValidationandAutomaticRepairofPlanar ... · mightbesnappedtogether,creatingre-peatedvertices,disjointregions,orvar-ioustopologicalproblems. 4.None of the vertices of a polygon shouldbecloserthanthisthreshold

(a) Viewing a topology error in ArcGIS (b) Assigning an overlapping region to one of thepolygons involved

Figure 5 Planar partition repair in ArcGIS The user is expected to zoom in to a particularerror analyse the situation (eg by looking at the properties from the surround-ing polygons) and make a decision to assign the problematic region to a certainpolygon More than 11 000 errors were detected in this tile of the Corine dataset

and by verifying the connectivity be-tween triangles

4 repairing of the problems ismade by re-labelling triangles to ensure that eachtriangle has exactly one label

5 extracting the polygons from the trian-gulation (polygons modelled with theSimple Features specification)

As mentioned previously for this workflowwe assume that each input polygon is indi-vidually valid We describe in the followingsection the concepts needed and we give adetailed description of the different steps

31 Triangulation of a polygon andconstrained triangulation

A triangulation subdivides an area intonon-overlapping triangles Using a con-strained triangulation every line segmentthat defines the boundary of a polygon isensured to appear as an edge in the tri-angulation It is known that any polygon(also with holes) can be triangulated with-out adding extra vertices [de Berg et al2008 Shewchuk 1997a] Figure 6 shows anexample

In our approach the triangulation is per-formed by constructing incrementally a CT

Figure 6 (a) A polygon with 4 holes (b)The constrained triangulation ofthe segments of this polygon

of all the segments representing the bound-aries (outer + inner) of each polygon Ifthe set of input polygons forms a planarpartition then each segment will be in-serted twice (except those forming the outerboundary of the set of input polygons) Thisis usually not a problem for triangulation li-braries because they ignore points and seg-ments at the same location (as is the casewith the solution we use see Section 4)When segments are found to intersect theyare split with a new point created at theintersection This is the only situation inwhich the generation of new points is re-quired

Notice that our approach requires only aconstrained triangulation and not one thatfulfils the Delaunay criterion [Shewchuk

7

1997a] However having well-shaped tri-angles is useful for repair purposes anddoes not significantly increase the process-ing time Therefore our implementation ac-tually constructs and uses a constrainedDe-launay triangulation

32 Labelling the triangles of aPlanar Partition

Labelling a triangle means assigning it la-bel(s) for the polygon(s) that it belongs to (iftwo input polygons overlap each triangle inthe overlapping region should have the la-bels of the two polygons)

In our previous work on validation [Ledouxand Meijers 2010] we used the centroid ofa polygon to start the labelling process butthis method is prone to errors (if for in-stance the calculated centroid is outside oron the boundary of the polygon) and doesnot permit us to differentiate gaps fromoverlaps

To solve these problems we store informa-tion about the constrained edges of the CTSince it is known that the input rings areclosed and have a known orientation (ac-cording to the ISO 19107 orientation rules)it is also known on which side of a cer-tain line segment the interior of the poly-gon lies This property is used for robustlabelling of each polygon Triangles adja-cent to the outer ring of the polygons arelabelled first and this is later expandedto triangles further in the interior of thepolygon recursively labelling adjacent un-labelled faces as long as no constrainededges are crossed After this operation alltriangles that are part of any polygon arelabelled with overlapping regions havingmultiply labelled triangles However holesare then indistinguishable from trianglesoutside theplanar partition since bothhavezero labels Therefore a special label is cre-ated for all the triangles outside the planarpartition referred to as the ldquouniverserdquo labelwhich are labelled by recursively labellingadjacent triangles from any triangle knownto lie in the exterior of the planar parti-tion Weexploit the concept of the ldquofar-awaypointrdquo to achieve this [Liu and Snoeyink

2006] which is used by several implemen-tations and is also known as the ldquobig trian-glerdquo [Facello 1995]

33 Validation

If the set of input polygons forms a planarpartition then all the triangles will be la-belled with one and only one label Theproblems are easily detected

bull Gaps are detected by finding triangleswithout any labels

bull Overlaps are detected by finding trian-gles with two or more labels

bull Disjoint regions are detected by iden-tifying regions separated by the ldquouni-verserdquo label This is done by starting ata given triangle and doing a breadth-first search on the dual of the triangu-lation without visiting the triangles la-belled as ldquouniverserdquo If all the trianglescan be reached then no polygons of theplanar partition are disjoint

34 Repair Operations

The greatest benefits of using a labelled tri-angulation for planar partition repair stemfrom the fact that while repair operationsare performed the validity of the planarpartition is always kept together with theintegrity of the data Unlike with snappingvertices are not required to be moved dur-ing the process and unlike snapping repairis performed using local criteria instead ofglobal ones (in snapping the threshold isusually fixed for the whole dataset) Thiscomes as a contrast to othermethods wherecare needs to be taken to ensure that the (ge-ometric or topological) validity is kept

Figure 7 shows the standard steps requiredin a repair operation In order to avoid or-der dependency when repairing the repairoperation itself is always performed after allthe choices for which label to assign havebeen made

In particular we propose six different re-pair operations that can be used to fix gaps

8

Table 1 The repair operations currently implemented in our software The types of oper-ations are defined in the map algebra classification [Tomlin 1994]

Repair operation Type Criteria

Triangle by priority list Varies Label that has the highest priority according toa predefined priority list

Triangle by number of Focal The label present in the largest number ofneighbours adjacent faces overlaps includedTriangle by absolute majority Focal Label present in two or more valid

adjacent facesTriangle by longest boundary Focal Label present along the longest portion of

the boundary of the adjacent facesRegions by longest boundary Focal of Label present along the longest portion of

zonal the boundary of the adjacent facesRegions by random neighbour Focal of Random label from the adjacent faces

zonal

and overlaps These are shown in Table 1and all imply re-labelling triangles Fourof them use triangles as a base (ie the la-bel assigned is based only on that of itsthree neighbouring triangles) which is fastand modifies the area of each input poly-gon the least Despite their simplicity theyoffer substantial control over the resultsFor instance the first two operations onlydiffer from each other in their handlingof overlapping faces but triangle by num-ber of neighbours is better for large over-lapping regions while triangle by absolutemajority is better in fixing small problemsHaving well-shaped (Delaunay) triangles ismost useful for triangle-based repair func-tions Two of them use regions of adja-cent triangles with equivalent sets of la-bels which is slower than a triangle-basedmethod but yields results that can be car-tographically more pleasing An interest-ing repair operation for practicioners is theone in which a priority of labels is used iein case of gapsoverlaps the labels in thetriangle (overlap) or in the adjacent poly-gons (gap) are ordered according to a user-defined priority and the highest priority isassigned to the problematic triangles No-tice that these repair operations can be usedone after the other (in a hierarchical man-ner) for instance if first the repair accord-ing to the longest boundary is used but onezone has two or more boundaries with ex-actly the same length then the deadlock can

be solved by choosing one randomly Asample of the results obtained with differ-ent repair operators is also shown in Fig-ure 8

More repair operations basedonextensionsto the idea of labelling trianglesregions canbe further developed For instance tri-angles could be split to subdivide an areawith problems (as in Bader and Weibel[1997]) or sliver trianglesregions could bediscarded (and then filled during repair)

35 Extraction of Polygons From aTriangulation

Starting from a labelled triangulation itis possible to reconstruct the individualpolygons of a planar partition to conformto valid polygons according to the ISO19107 and the Simple Features specifica-tions which allows users to incorporate au-tomatic validation and repair in their work-flow

We do this operation polygon by polygonWe start at anunprocessed triangle and visitall the connected triangles having the samelabel (to reconstruct the polygon) mark-ing the triangles as processed as we visitthem Note that since all these trianglesare connected the outer and inner bound-aries of a polygon are all simple (non self-

9

over

lapp

ing

region

red

blue

(a) A set ofoverlappingpolygons thatneeds to berepaired

red

blue

(b) The poly-gons aretriangulatedand eachtriangle islabelled withthe polygons itbelongs to

A

C

D

B

red

blue

(c) An invalidregion isfound definedas a set ofcontiguouspolygons withan equivalentset of 0 or ge 1113569labels

Triangle Repair

ABCD

redredredred

(d) The neigh-bours of theregion areanalysed to seehow it shouldbe re-labelledand the resultis saved onto alist

red

blue

(e) The tri-angles arerelabelledaccording tothe list

red

blue

(f) The outputpolygonswhich nowhave no gapsor overlaps

Figure 7 The steps in a generic repairoperation

intersecting) We repeat this operation un-til all the triangles have been processed

For each polygon we have to recover notonly its outer boundary but also its in-ner boundaries which are not connectedObserve that we cannot simply follow theoriginal constrained edges as these do nothave any meaning after a planar partitionswas repaired the boundaries of the repairedpolygons are instead formed by edges in-cident to two triangles having different la-bels

For each polygon we walk at a triangle andmove on to triangles having the same label

(a) The original poly-gons

(b) Repaired eachtriangle using thelabel adjacent alongthe longest boundaryfrom the neighbouringtriangles

(c) Repaired each regionusing a random labelfrom the neighbouringtriangles

(d) Repaired each regionusing the label adjacentalong the longest bound-ary from the neighbour-ing triangles

Figure 8 Different repair operations usedin the twopolygons for theArribesdel Duero Natural Park in Spain(reddarker grey) and the Inter-national Douro Natural Park inPortugal (greenlighter grey) Allof them can be considered bestby a certain criterion like pre-serving the area ratio between thetwo polygons (b) smoothness ofthe boundary (d) or a balance be-tween the two (c)

As the process goes a single polyline thatruns along all the boundaries of the poly-gon is generated This involves a depth-firstsearch (clockwise) that recursively reachesuntil the boundary of a polygon return-ing a long chain of edges in a proceduresimilar to following the boundary edge byedge The procedure is shown step by stepin Figure 9 The polyline created with thismethod has ldquobridgesrdquo which allow us tokeep all inner boundaries (holes) connectedwith the outer one in a manner that keepsthe interior connected as well These helpto preserve connectivity and the relationsbetween different (outer and inner) bound-aries but are removed later in the process(to conform to the Simple Features specifi-cation) Also its orientation conveys the in-

10

AB

CD

FE

123

45

67 8 9

1011 12

1314 15 16 17

1819

20

21

22

23

24

25

2627 28

29

3130G

H IJ

K

L

MN

O

Figure 9 The traversal order when navigating through triangles in a clockwise mannerStarting at 119860 and the edge between 119860 and 119861 the operations occur accordingto the encircled numbers Black arrows denote when an unprocessed triangle isfound red arrowswhen it is not Notice how the traversal is performed clockwise(shown in dark blue arrows) for both cases despite starting from different sides

formation of whether a section of it is partof an inner or an outer boundary

This polyline is processed with a stack-based algorithm that generates separateclosed rings for the outer boundary andeach of the inner boundaries collapsing theldquobridgesrdquo that were generated In order todo this the polyline is cut at the placeswhere more than two edges join and theseare joined in the correct order by keepingtrack of (yet) unclosed rings When a newsegment is being processed it can be oneof three options one that completes a ringone that is part of a bridge or one thatis not yet closed Closed rings are storedand bridges are removed while unclosedrings are saved in the stack until they can bepopped to forma closed ring togetherwith anew segment This is shown in Figure 10

4 Implementation andExperiments

41 Implementation

An implementation of the algorithms de-scribed in Section 3 was written in the C++

programming language using external li-braries for some functionality The de-veloped software is called pprepair and isopen source and freely available at httptudelft3dgithubcompprepair C++was selected in order to have plenty of con-trol with regards to low level details andto achieve good performance which makesit possible to compare it with existing so-lutions The libraries used are the OGRSimple Features Library9 (which allows in-put and output from a large variety ofdata formats common in GIS) and CGAL10(which has support for many robust spatialdata structures and the operations based onthem we use its constrained triangulationmodule)

42 Experiments with real-worldplanar partition datasets

We havemade experiments with four freelyavailable real-world datasets ie we havevalidated and automatically repaired themwith our implementation the overview ofthese datasets is shown in Figure 11 and9httpwwwgdalorgogr10Computational Geometry Algorithms Library

httpwwwcgalorg

11

AB

a

b

CD

EFG

HIJK c

de

f

Figure 10 Processing the polyline shown starting from vertex 119886 and moving to-wards polyline 119860 the following operations occur (1) 119860 unclosedrarrpush(2) 119861 unclosedrarrpush (3) 119862 unclosedrarrpush (4) 119863 unclosedrarrpush (5)119864 unclosedrarrpush (6) 119865 closedrarrstore (7) poprarr 119864119866 closedrarrstore (8)poprarr 119863119867 degeneraterarrerase (9) poprarr 119862119868 closedrarrstore (10) poprarr 119861119869degeneraterarrerase (11) poprarr 119860119870 closedrarrstore

(a) E41N27 (b) 4tiles

(c) 16tiles (d) Mexico

Figure 11 Overview of the four datasets

their properties in Table 2 The datasets arethe following

E41N27 Corine 200011 tile E41N27 whichcontains a shifted polygon (by about10 cm) creating many small gaps andoverlaps in the dataset The snapping

11Corine is a land cover dataset for 32 Eu-ropean countries It is freely available athttpwwweeaeuropaeudata-and-mapsdatacorine-land-cover-2000-clc2000-seamless-vector-database

threshold has been set at 1 m

4tiles Corine 2000 tiles E39N32 E39N33E40N32 and E40N33 which are knownto have long and thin overlapping re-gions (lt 1 mm) with each other Thesnapping thresholdhas been set at 1 cm

16tiles 16 adjacent Corine 2000 tilesE39N30 E39N31 E39N32 E39N33E40N30 E40N31 E40N32 E40N33E41N30 E41N31 E41N32 E41N33E42N30 E42N31 E42N32 E42E33Some have gaps between one anothersome overlap but match within a fewcentimetres The snapping thresholdhas been set at 10 cm

Mexico 11 000 000 scale land cover datasetfrom INEGI consisting of over 26 000polygons Interestingly it is mostly al-ready valid (according to the Shapefilespecification) but contains some verylarge polygons with tens of thousandsof vertices

As a comparison we have also tried to per-form the same operations with other avail-able software While the capabilities forplanar partition repair among the softwaretested vary considerably with full topolog-ical repair only available in ArcGIS (usingmanual operations only) it is also impor-tant to consider how different repair im-plementations scale to large datasets For

12

Table 2 Properties of the datasets used for the experiments

pts avg pts polygons pts largest polygon per polygon

E41N27 14 969 496 303 26 740 3374tiles 4 984 365 702 16 961 74716tiles 63 868 6 622 133 95 112 1037Mexico 26 866 4 181 354 117 736 1556

this a few performance tests were made inour implementation and three planar par-tition repair tools that perform this processusing snapping and splitting The testingmethodology for each tool is as follows

ArcGIS In ArcCatalog a multiple featuredataset is created in a Geodatabase setwith XY resolution values equal to thesnapping threshold The features areimported into it and themerge and dis-solve operations are used tomerge adja-cent polygons with the same ID Topol-ogy is then generated to check that theplanar partition is valid Everything isfinally exported to a single ShapefileThe individual parts of the process aretimed and the total is recorded Mem-ory usage is calculated as the differ-ence between the just loaded ArcCata-log application and itsmaximummem-ory usage throughout the process

FME A reader is created for each input filewhich serve as input to a Snapper trans-former features with the same IDs arethen dissolved and finally they are out-put into a new Shapefile writer Thetopology generator is used to be ableto tell whether the result is a valid pla-nar partition Results are timed and themaximum memory allocation of thefmeexe process is recorded

GRASS Input files are imported withvinogr with all polygon cleaning op-erations performed and snapping set tothe correct values Boundaries betweenfeatures with the same IDs are thendissolved using vdissolve Files arethen exported with voutogr Timesreported by GRASS are added togetherto give the total while memory usage

by the vinogrexe vdissolveexeand voutogrexe are monitored andtheir maximum is recorded

pprepair Files are read and put into thetriangulation the triangulation is la-belled the repair is performed withthe longest boundary first (see Table 1)with ambiguous cases resolved with arandom choice Polygons are then ex-tracted from the triangulation and out-put to a single Shapefile The entireprocess is timed and the maximumamount of memory used is recorded

Notice that we are somewhat comparingapples with oranges here since our imple-mentation is able to repair more cases thanother methods keeps topological consis-tency and does not require finding out theappropriate threshold value (if it exists)More importantly it is able to directly statewhether the result is a planar partition un-like the three other solutions (with thesesnapping is performed but that does notguarantee that the output is a valid planarpartition) The results of the experimentsare shown in Table 3

We have made the experiments in orderto have an idea of the processing timeand the memory usage involved Onlycases that are acceptably solved by snappingand splitting have been considered sincethere is no other comparable automatedtopological repair tool among those stud-ied that would also work for more complexcases such as those that require a snappingthreshold too high to be practical (eg hor-izontal conflation of independently gener-ated datasets [Yuan and Tao 1999 Davis2003])

13

Table 3 Planar partition repair comparison using large datasets

pprepair ArcGIS FME GRASS

memory time memory time memory time memory time

E41N27 145 MB 19s 145 MB 1m3s 158 MB 31s 59 MB 3m9s4tiles 116 MB 17s 113 MB 37s 105 MB 31s 49 MB 53s16tiles 145 GB 4m47s crashes ndash 636 MB 15m48s crashes ndashMexico 101 GB 3m31s 216 MB gt1d 264 MB 2m45s 408 MB 11m38s

All tests have been run 5 times in a ma-chine just booted and the results averagedto account for the small variations that oc-curred except in the case where the execu-tion was cancelled after one full day due totime limitations or when it causes the pro-gram to crash The hardware is a 266 GHzCore 2DuoMacBookProwith 4GBofRAMArcGIS 93 FME 2010 SP1 and GRASS 64were run in Windows 7 while pprepair wasrun in Mac OS X 1071

As Table 3 shows our approach uses some-what more memory than other solutionsThis is explained by the extra (uncon-strained) edges that are added to the inputover when triangulating them It shouldhowever be noticed that both ArcGIS andGRASS crashed with the biggest dataset16tiles Our implementation is the fastestof the four tested being for instance aroundthree times faster than FME for the biggestdatasets Only for the Mexico dataset isour implementation slower than FME Thisis (probably) explained by the fact that itspolygons already form a valid planar parti-tion and therefore very few snapping op-erations have been performed by FME theplanar graph of the input was basicallybuilt and then the polygons saved back todisk We believe that ArcGIS and GRASSstruggled with the dataset because it con-tains several very large polygons (withmorethan 10 000 vertices) Our implementa-tion took 3m31s but as Table 4 shows ittook CGAL 3m02s to simply triangulate theinput edges This table also demonstratesthe efficiency of our approach for the fourdatasets around 85 of the time of our ap-proach was usedmdashby the CGAL librarymdashtoread the input fromdisk and triangulate the

Table 4 Timed steps of the planar partitionrepair procedure rounded down tothe nearest second The percentageis the triangulation time over thetotal time

E41N27 4tiles 16tiles Mexico

Triangulate 017 015 400 302Label 001 001 027 016Repair 000 000 000 000Reconstruct 001 001 011 007Output 000 000 010 006

Total 019 017 447 331triangulate 89 88 83 86

input edges

During the implementation we took sev-eral engineering decisions to optimise ourprogram One of them was to favour diskspace over computation time as we wantedto be able to process large datasets This iswhy we always reconstruct polygons even ifthey not modified by the repair process Al-though possible it would require us to keepin memory the original polygons (whichwould significantly increase the memoryconsumption) and to keep track of whichlabels have been modified And as Table 4shows the reconstruction is very efficient asit only takes around 3ndash4 of the total timefor the 16tiles and Mexico datasets

14

5 Discussion andConclusions

We have presented a new method for re-pairing polygonal area partitions ensur-ing that the output partition is valid ieall individual polygons are conforming tothe ISO 19107 [ISO 2003] and Simple Fea-tures [OGC 2006] specifications and nogaps nor overlaps are present between anypair of polygons Thenovelty of ourmethodlies in the fact that repair is performed ac-cording to user defined criteria but thentakes place without any human interven-tion Automatic repair is becoming increas-ingly an important topic due to data in-tegration eg data collected for the differ-ent themes of the European INSPIRE Ini-tiative will finally have to fit together anddata could eventually bematched fully auto-matically by dedicated web service compo-nents [INSPIRE 2009] Our approach couldbe at the heart of such a web service

The proposed approach excels in auto-mated repair at the cost of increased mem-ory usage compared to a pure graph-basedapproachmdashthis difference is mainly causedby the unconstrained edges introduced bythe triangulation While it would be possi-ble to use our repair rules together with a(primaldual) graph-based approach theseadditional edges in the triangulation givefine-grained control over the repair oper-ations and ensures that the graph is con-nected which facilitates the reconstructionof polygons

We have implemented our algorithm overa numerically robust triangulator (CGAL)and since repair operations are expressedsolely in terms of re-labelling of triangles(no geometric computation is involved) theapproach is also fully robust Since duringour experiments most of the time was usedto compute the constrained triangulationanother library could also be tested to im-prove the implementation

For the future we plan to

bull Improve the scalability of the approachand process and repair datasets withmore than 10 million polygons It is

known that using divide-and-conquertechniques triangulation algorithmscan handle big datasets [Amenta et al2003 Blandford et al 2005] We willinvestigate whether it is possible toautomatically repair each divided partindividually and lsquogluersquo the repairedparts together

bull Investigate snap rounding [Hobby1999 de Berg et al 2007] as a pre-processing stepmdashor embeddeddirectlyin the triangulation mdash to guaranteethat repaired planar partitions have novertices that are closer than a certain120576 threshold However snap roundingmay change the topology of the inputbut the output will nevertheless be avalid partition (as the topology willbe repaired) Apart from topologicalchanges snap rounding can also leadto removed polygon parts that are toosmall to be preserved based on thechosen 120576

bull Add more advanced repair operationsto our repair toolkit eg repair couldtake place based on splitting a collec-tion of triangles

bull Extend our work to include the thirddimension to validate and repair 3Dcity models using a constrained tetra-hedralisation [Si 2008] Notice thatthe tetrahedralisation of a given poly-hedron does not always exist and thusextra (Steiner) points might need tobe added The main concepts of ourapproach (re)labelling and reconstruc-tion extend naturally to 3D How-ever appropriate repair operations for3D city models would need to be de-fined and some application-specificconstraints (eg right angles at corners)are not trivial to implement in our cur-rent approach

ReferencesNina Amenta Sunghee Choi and Guumln-ter Rote Incremental constructions conBRIO In Proceedings of the 19th An-nual Symposium on Computational Geome-try pages 211ndash219 ACM Press 2003

15

WaelM Badawy andWalidG Aref On localheuristics to speed up polygon-polygonintersection tests In Proceedings of the 7thACM International Symposium on Advancesin Geographic Information Systems pages97ndash102 ACM 1999

M Bader and R Weibel Detecting andresolving size and proximity conflicts inthe generalization of polygonal maps InProceedings of the 18th International Carto-graphic Conference Stockholm pages 1525ndash1532 1997

Daniel K Blandford Guy E BlellochDavid E Cardoze and Clemens KadowCompact representations of simplicialmeshes in two and three dimensionsInternational Journal of ComputationalGeometry and Applications 15(1)3ndash242005

CGAL CGAL 38 User and Reference ManualCGAL Editorial Board 2011

Kenneth L Clarkson Kurt Mehlhorn andRaimund Seidel Four results on random-ized incremental constructions In AlainFinkel and Matthias Jantzen editors Pro-ceedings of the 9th Annual Symposium onTheoretical Aspects of Computer Science vol-ume 577 of Lecture Notes in Computer Sci-ence pages 461ndash474 SpringerBerlin Hei-delberg 1992

Martin Davis Java conflation suite Tech-nical report Vivid Solutions 2003 Avail-able at httpwwwvividsolutionscomjcs

Mark de Berg Dan Halperin and MarkOvermars An intersection-sensitive al-gorithm for snap rounding Computa-tional Geometry 36(3)159 ndash 165 2007 ISSN0925-7721

Mark de Berg Marc van Kreveld MarkOvermars and Otfried SchwarzkopfComputational Geometry Algorithms andApplications Springer-Verlag 3 edition2008

ESRI Shapefile technical descriptionWhite paper ESRI July 1998

Michael A Facello Implementation of arandomized algorithm for Delaunay and

regular triangulations in three dimen-sions Computer Aided Geometric Design 12(4)349ndash370 1995

GTS GTS Library Reference Manual 2006URL httpgtssourceforgenetreferencebook1html

LeonidasGuibas and Jorge Stolfi Primitivesfor the manipulation of general subdivi-sions and the computation of VoronoiACM Transactions on Graphics 4(2)74ndash1231985

John D Hobby Practical segment intersec-tion with finite precision output Com-putational Geometry 13(4)199 ndash 214 1999ISSN 0925-7721

Christoph M Hoffmann John E Hopcroftand Michael S Karasick Towards imple-menting robust geometric computationsIn Proceedings of the 4th Annual Symposiumon Computational Geometry pages 106ndash117ACM 1988

INSPIRE D25 Generic conceptual modelversion 32 Technical report DraftingTeam ldquoData Specificationsrdquo 2009 URLhttpinspirejrceceuropaeudocumentsData_SpecificationsD25_v32pdf

ISO 19107 Geographic information ndash spa-tial schema March 2003

David Kirkpatrick Jack Snoeyink and Bet-tina Speckmann Kinetic collision detec-tion for simple polygons InternationalJournal of Computational Geometry and Ap-plications 12(1ndash2)3ndash27 2002

Robert Laurini and Franccediloise Milleret-Raffort Topological reorganization ofinconsistent geographical databases Astep towards their certification Comput-ers amp Graphics 18(6)803ndash813 December1994

Hugo Ledoux and Martijn Meijers Val-idation of planar partitions using con-strained triangulations In ProceedingsJoint International Conference on TheoryData Handling and Modelling in GeoSpa-tial Information Science pages 51ndash55 HongKong May 2010

16

Hugo Ledoux Ken Arroyo Ohori and Mar-tijn Meijers Automatically repairing in-valid polygons with a constrained trian-gulation In Proceedings of the 15th AGILEInternational Conference on Geographic In-formation Science 2012

Yuanxin Liu and Jack Snoeyink Farawaypoint A sentinel point forDelaunay com-putation International Journal of Compu-tational Geometry and Applications 18(4)343ndash355 November 2006

Avraham Margalit and Gary D Knott Analgorithm for computing the union in-tersection or difference of two poly-gons Computers amp Graphics 13(2)167ndash183 1989

Ernst P Muumlcke Isaac Saias and Bin-hai Zhu Fast randomized point loca-tion without preprocessing in two- andthree-dimensional Delaunay triangula-tions Computational GeometrymdashTheoryand Applications 1263ndash83 1999

OGC OpenGIS Implementation Specificationfor Geographic Information - Simple FeatureAccess - Part 1 Common Architecture 120edition October 2006

Lutz Pluumlmer and Gerhard Groumlger Achiev-ing integrity in geographic informationsystemsmdashmaps and nested maps GeoIn-formatica 1(4)345ndash367 1997

M Rivero and FR Feito Boolean opera-tions on general planar polygons Com-puters amp Graphics 24(6)881ndash896 Decem-ber 2000

Stefan Schirra Precision and Robustness inGeometric Computations volume Algorith-mic Foundations of Geographic Informa-

tion Systems of Lecture Notes in ComputerScience chapter 9 pages 255ndash287 SpringerBerlin Heidelberg 1997

Michael Ian Shamos and Dan Hoey Geo-metric intersection problems In FOCSpages 208ndash215 IEEE 1976

Jonathan Richard Shewchuk Delaunay Re-finement Mesh Generation PhD thesisSchool of Computer Science CarnegieMellon University Pittsburg USA 1997a

Jonathan Richard Shewchuk Adaptiveprecision floating-point arithmetic andfast robust geometric predicates Dis-crete amp Computational Geometry 18305ndash363 1997b

Hang Si Three Dimensional Boundary Con-forming Delaunay Mesh Generation PhDthesis Technische Universitaumlt Berlin2008

C Dana Tomlin Map algebra One perspec-tive Landscape and Urban Planning 30(1ndash2)3ndash12 October 1994

Peter van OosteromWilko Quak and TheoTijssen About invalid valid and cleanpolygons In Peter F Fisher editor Devel-opments in Spatial Data Handlingmdash11th In-ternational Symposium on Spatial DataHan-dling pages 1ndash16 Springer 2004

Jan W van Roessel A new approach toplane-sweep overlay Topological struc-turing and line-segment classificationCartography and Geographic InformationScience 18(1)49ndash67 January 1991

Shuxin Yuan and Chuang Tao Develop-ment of conflation components Geoin-formatics and Socioinformatics pages 1ndash13June 1999

17

  • Introduction
  • Related Work
    • Simple Features and validity of simple polygons
    • Validation of a Planar Partition Using a Planar Graph
    • Repair Using Point and Edge Snapping and Splitting
    • Repair Using Topological Information
      • Validation and Automatic Repair Using a Constrained Triangulation
        • Triangulation of a polygon and constrained triangulation
        • Labelling the triangles of a Planar Partition
        • Validation
        • Repair Operations
        • Extraction of Polygons From a Triangulation
          • Implementation and Experiments
            • Implementation
            • Experiments with real-world planar partition datasets
              • Discussion and Conclusions
Page 8: ValidationandAutomaticRepairofPlanar ... · mightbesnappedtogether,creatingre-peatedvertices,disjointregions,orvar-ioustopologicalproblems. 4.None of the vertices of a polygon shouldbecloserthanthisthreshold

1997a] However having well-shaped tri-angles is useful for repair purposes anddoes not significantly increase the process-ing time Therefore our implementation ac-tually constructs and uses a constrainedDe-launay triangulation

32 Labelling the triangles of aPlanar Partition

Labelling a triangle means assigning it la-bel(s) for the polygon(s) that it belongs to (iftwo input polygons overlap each triangle inthe overlapping region should have the la-bels of the two polygons)

In our previous work on validation [Ledouxand Meijers 2010] we used the centroid ofa polygon to start the labelling process butthis method is prone to errors (if for in-stance the calculated centroid is outside oron the boundary of the polygon) and doesnot permit us to differentiate gaps fromoverlaps

To solve these problems we store informa-tion about the constrained edges of the CTSince it is known that the input rings areclosed and have a known orientation (ac-cording to the ISO 19107 orientation rules)it is also known on which side of a cer-tain line segment the interior of the poly-gon lies This property is used for robustlabelling of each polygon Triangles adja-cent to the outer ring of the polygons arelabelled first and this is later expandedto triangles further in the interior of thepolygon recursively labelling adjacent un-labelled faces as long as no constrainededges are crossed After this operation alltriangles that are part of any polygon arelabelled with overlapping regions havingmultiply labelled triangles However holesare then indistinguishable from trianglesoutside theplanar partition since bothhavezero labels Therefore a special label is cre-ated for all the triangles outside the planarpartition referred to as the ldquouniverserdquo labelwhich are labelled by recursively labellingadjacent triangles from any triangle knownto lie in the exterior of the planar parti-tion Weexploit the concept of the ldquofar-awaypointrdquo to achieve this [Liu and Snoeyink

2006] which is used by several implemen-tations and is also known as the ldquobig trian-glerdquo [Facello 1995]

33 Validation

If the set of input polygons forms a planarpartition then all the triangles will be la-belled with one and only one label Theproblems are easily detected

bull Gaps are detected by finding triangleswithout any labels

bull Overlaps are detected by finding trian-gles with two or more labels

bull Disjoint regions are detected by iden-tifying regions separated by the ldquouni-verserdquo label This is done by starting ata given triangle and doing a breadth-first search on the dual of the triangu-lation without visiting the triangles la-belled as ldquouniverserdquo If all the trianglescan be reached then no polygons of theplanar partition are disjoint

34 Repair Operations

The greatest benefits of using a labelled tri-angulation for planar partition repair stemfrom the fact that while repair operationsare performed the validity of the planarpartition is always kept together with theintegrity of the data Unlike with snappingvertices are not required to be moved dur-ing the process and unlike snapping repairis performed using local criteria instead ofglobal ones (in snapping the threshold isusually fixed for the whole dataset) Thiscomes as a contrast to othermethods wherecare needs to be taken to ensure that the (ge-ometric or topological) validity is kept

Figure 7 shows the standard steps requiredin a repair operation In order to avoid or-der dependency when repairing the repairoperation itself is always performed after allthe choices for which label to assign havebeen made

In particular we propose six different re-pair operations that can be used to fix gaps

8

Table 1 The repair operations currently implemented in our software The types of oper-ations are defined in the map algebra classification [Tomlin 1994]

Repair operation Type Criteria

Triangle by priority list Varies Label that has the highest priority according toa predefined priority list

Triangle by number of Focal The label present in the largest number ofneighbours adjacent faces overlaps includedTriangle by absolute majority Focal Label present in two or more valid

adjacent facesTriangle by longest boundary Focal Label present along the longest portion of

the boundary of the adjacent facesRegions by longest boundary Focal of Label present along the longest portion of

zonal the boundary of the adjacent facesRegions by random neighbour Focal of Random label from the adjacent faces

zonal

and overlaps These are shown in Table 1and all imply re-labelling triangles Fourof them use triangles as a base (ie the la-bel assigned is based only on that of itsthree neighbouring triangles) which is fastand modifies the area of each input poly-gon the least Despite their simplicity theyoffer substantial control over the resultsFor instance the first two operations onlydiffer from each other in their handlingof overlapping faces but triangle by num-ber of neighbours is better for large over-lapping regions while triangle by absolutemajority is better in fixing small problemsHaving well-shaped (Delaunay) triangles ismost useful for triangle-based repair func-tions Two of them use regions of adja-cent triangles with equivalent sets of la-bels which is slower than a triangle-basedmethod but yields results that can be car-tographically more pleasing An interest-ing repair operation for practicioners is theone in which a priority of labels is used iein case of gapsoverlaps the labels in thetriangle (overlap) or in the adjacent poly-gons (gap) are ordered according to a user-defined priority and the highest priority isassigned to the problematic triangles No-tice that these repair operations can be usedone after the other (in a hierarchical man-ner) for instance if first the repair accord-ing to the longest boundary is used but onezone has two or more boundaries with ex-actly the same length then the deadlock can

be solved by choosing one randomly Asample of the results obtained with differ-ent repair operators is also shown in Fig-ure 8

More repair operations basedonextensionsto the idea of labelling trianglesregions canbe further developed For instance tri-angles could be split to subdivide an areawith problems (as in Bader and Weibel[1997]) or sliver trianglesregions could bediscarded (and then filled during repair)

35 Extraction of Polygons From aTriangulation

Starting from a labelled triangulation itis possible to reconstruct the individualpolygons of a planar partition to conformto valid polygons according to the ISO19107 and the Simple Features specifica-tions which allows users to incorporate au-tomatic validation and repair in their work-flow

We do this operation polygon by polygonWe start at anunprocessed triangle and visitall the connected triangles having the samelabel (to reconstruct the polygon) mark-ing the triangles as processed as we visitthem Note that since all these trianglesare connected the outer and inner bound-aries of a polygon are all simple (non self-

9

over

lapp

ing

region

red

blue

(a) A set ofoverlappingpolygons thatneeds to berepaired

red

blue

(b) The poly-gons aretriangulatedand eachtriangle islabelled withthe polygons itbelongs to

A

C

D

B

red

blue

(c) An invalidregion isfound definedas a set ofcontiguouspolygons withan equivalentset of 0 or ge 1113569labels

Triangle Repair

ABCD

redredredred

(d) The neigh-bours of theregion areanalysed to seehow it shouldbe re-labelledand the resultis saved onto alist

red

blue

(e) The tri-angles arerelabelledaccording tothe list

red

blue

(f) The outputpolygonswhich nowhave no gapsor overlaps

Figure 7 The steps in a generic repairoperation

intersecting) We repeat this operation un-til all the triangles have been processed

For each polygon we have to recover notonly its outer boundary but also its in-ner boundaries which are not connectedObserve that we cannot simply follow theoriginal constrained edges as these do nothave any meaning after a planar partitionswas repaired the boundaries of the repairedpolygons are instead formed by edges in-cident to two triangles having different la-bels

For each polygon we walk at a triangle andmove on to triangles having the same label

(a) The original poly-gons

(b) Repaired eachtriangle using thelabel adjacent alongthe longest boundaryfrom the neighbouringtriangles

(c) Repaired each regionusing a random labelfrom the neighbouringtriangles

(d) Repaired each regionusing the label adjacentalong the longest bound-ary from the neighbour-ing triangles

Figure 8 Different repair operations usedin the twopolygons for theArribesdel Duero Natural Park in Spain(reddarker grey) and the Inter-national Douro Natural Park inPortugal (greenlighter grey) Allof them can be considered bestby a certain criterion like pre-serving the area ratio between thetwo polygons (b) smoothness ofthe boundary (d) or a balance be-tween the two (c)

As the process goes a single polyline thatruns along all the boundaries of the poly-gon is generated This involves a depth-firstsearch (clockwise) that recursively reachesuntil the boundary of a polygon return-ing a long chain of edges in a proceduresimilar to following the boundary edge byedge The procedure is shown step by stepin Figure 9 The polyline created with thismethod has ldquobridgesrdquo which allow us tokeep all inner boundaries (holes) connectedwith the outer one in a manner that keepsthe interior connected as well These helpto preserve connectivity and the relationsbetween different (outer and inner) bound-aries but are removed later in the process(to conform to the Simple Features specifi-cation) Also its orientation conveys the in-

10

AB

CD

FE

123

45

67 8 9

1011 12

1314 15 16 17

1819

20

21

22

23

24

25

2627 28

29

3130G

H IJ

K

L

MN

O

Figure 9 The traversal order when navigating through triangles in a clockwise mannerStarting at 119860 and the edge between 119860 and 119861 the operations occur accordingto the encircled numbers Black arrows denote when an unprocessed triangle isfound red arrowswhen it is not Notice how the traversal is performed clockwise(shown in dark blue arrows) for both cases despite starting from different sides

formation of whether a section of it is partof an inner or an outer boundary

This polyline is processed with a stack-based algorithm that generates separateclosed rings for the outer boundary andeach of the inner boundaries collapsing theldquobridgesrdquo that were generated In order todo this the polyline is cut at the placeswhere more than two edges join and theseare joined in the correct order by keepingtrack of (yet) unclosed rings When a newsegment is being processed it can be oneof three options one that completes a ringone that is part of a bridge or one thatis not yet closed Closed rings are storedand bridges are removed while unclosedrings are saved in the stack until they can bepopped to forma closed ring togetherwith anew segment This is shown in Figure 10

4 Implementation andExperiments

41 Implementation

An implementation of the algorithms de-scribed in Section 3 was written in the C++

programming language using external li-braries for some functionality The de-veloped software is called pprepair and isopen source and freely available at httptudelft3dgithubcompprepair C++was selected in order to have plenty of con-trol with regards to low level details andto achieve good performance which makesit possible to compare it with existing so-lutions The libraries used are the OGRSimple Features Library9 (which allows in-put and output from a large variety ofdata formats common in GIS) and CGAL10(which has support for many robust spatialdata structures and the operations based onthem we use its constrained triangulationmodule)

42 Experiments with real-worldplanar partition datasets

We havemade experiments with four freelyavailable real-world datasets ie we havevalidated and automatically repaired themwith our implementation the overview ofthese datasets is shown in Figure 11 and9httpwwwgdalorgogr10Computational Geometry Algorithms Library

httpwwwcgalorg

11

AB

a

b

CD

EFG

HIJK c

de

f

Figure 10 Processing the polyline shown starting from vertex 119886 and moving to-wards polyline 119860 the following operations occur (1) 119860 unclosedrarrpush(2) 119861 unclosedrarrpush (3) 119862 unclosedrarrpush (4) 119863 unclosedrarrpush (5)119864 unclosedrarrpush (6) 119865 closedrarrstore (7) poprarr 119864119866 closedrarrstore (8)poprarr 119863119867 degeneraterarrerase (9) poprarr 119862119868 closedrarrstore (10) poprarr 119861119869degeneraterarrerase (11) poprarr 119860119870 closedrarrstore

(a) E41N27 (b) 4tiles

(c) 16tiles (d) Mexico

Figure 11 Overview of the four datasets

their properties in Table 2 The datasets arethe following

E41N27 Corine 200011 tile E41N27 whichcontains a shifted polygon (by about10 cm) creating many small gaps andoverlaps in the dataset The snapping

11Corine is a land cover dataset for 32 Eu-ropean countries It is freely available athttpwwweeaeuropaeudata-and-mapsdatacorine-land-cover-2000-clc2000-seamless-vector-database

threshold has been set at 1 m

4tiles Corine 2000 tiles E39N32 E39N33E40N32 and E40N33 which are knownto have long and thin overlapping re-gions (lt 1 mm) with each other Thesnapping thresholdhas been set at 1 cm

16tiles 16 adjacent Corine 2000 tilesE39N30 E39N31 E39N32 E39N33E40N30 E40N31 E40N32 E40N33E41N30 E41N31 E41N32 E41N33E42N30 E42N31 E42N32 E42E33Some have gaps between one anothersome overlap but match within a fewcentimetres The snapping thresholdhas been set at 10 cm

Mexico 11 000 000 scale land cover datasetfrom INEGI consisting of over 26 000polygons Interestingly it is mostly al-ready valid (according to the Shapefilespecification) but contains some verylarge polygons with tens of thousandsof vertices

As a comparison we have also tried to per-form the same operations with other avail-able software While the capabilities forplanar partition repair among the softwaretested vary considerably with full topolog-ical repair only available in ArcGIS (usingmanual operations only) it is also impor-tant to consider how different repair im-plementations scale to large datasets For

12

Table 2 Properties of the datasets used for the experiments

pts avg pts polygons pts largest polygon per polygon

E41N27 14 969 496 303 26 740 3374tiles 4 984 365 702 16 961 74716tiles 63 868 6 622 133 95 112 1037Mexico 26 866 4 181 354 117 736 1556

this a few performance tests were made inour implementation and three planar par-tition repair tools that perform this processusing snapping and splitting The testingmethodology for each tool is as follows

ArcGIS In ArcCatalog a multiple featuredataset is created in a Geodatabase setwith XY resolution values equal to thesnapping threshold The features areimported into it and themerge and dis-solve operations are used tomerge adja-cent polygons with the same ID Topol-ogy is then generated to check that theplanar partition is valid Everything isfinally exported to a single ShapefileThe individual parts of the process aretimed and the total is recorded Mem-ory usage is calculated as the differ-ence between the just loaded ArcCata-log application and itsmaximummem-ory usage throughout the process

FME A reader is created for each input filewhich serve as input to a Snapper trans-former features with the same IDs arethen dissolved and finally they are out-put into a new Shapefile writer Thetopology generator is used to be ableto tell whether the result is a valid pla-nar partition Results are timed and themaximum memory allocation of thefmeexe process is recorded

GRASS Input files are imported withvinogr with all polygon cleaning op-erations performed and snapping set tothe correct values Boundaries betweenfeatures with the same IDs are thendissolved using vdissolve Files arethen exported with voutogr Timesreported by GRASS are added togetherto give the total while memory usage

by the vinogrexe vdissolveexeand voutogrexe are monitored andtheir maximum is recorded

pprepair Files are read and put into thetriangulation the triangulation is la-belled the repair is performed withthe longest boundary first (see Table 1)with ambiguous cases resolved with arandom choice Polygons are then ex-tracted from the triangulation and out-put to a single Shapefile The entireprocess is timed and the maximumamount of memory used is recorded

Notice that we are somewhat comparingapples with oranges here since our imple-mentation is able to repair more cases thanother methods keeps topological consis-tency and does not require finding out theappropriate threshold value (if it exists)More importantly it is able to directly statewhether the result is a planar partition un-like the three other solutions (with thesesnapping is performed but that does notguarantee that the output is a valid planarpartition) The results of the experimentsare shown in Table 3

We have made the experiments in orderto have an idea of the processing timeand the memory usage involved Onlycases that are acceptably solved by snappingand splitting have been considered sincethere is no other comparable automatedtopological repair tool among those stud-ied that would also work for more complexcases such as those that require a snappingthreshold too high to be practical (eg hor-izontal conflation of independently gener-ated datasets [Yuan and Tao 1999 Davis2003])

13

Table 3 Planar partition repair comparison using large datasets

pprepair ArcGIS FME GRASS

memory time memory time memory time memory time

E41N27 145 MB 19s 145 MB 1m3s 158 MB 31s 59 MB 3m9s4tiles 116 MB 17s 113 MB 37s 105 MB 31s 49 MB 53s16tiles 145 GB 4m47s crashes ndash 636 MB 15m48s crashes ndashMexico 101 GB 3m31s 216 MB gt1d 264 MB 2m45s 408 MB 11m38s

All tests have been run 5 times in a ma-chine just booted and the results averagedto account for the small variations that oc-curred except in the case where the execu-tion was cancelled after one full day due totime limitations or when it causes the pro-gram to crash The hardware is a 266 GHzCore 2DuoMacBookProwith 4GBofRAMArcGIS 93 FME 2010 SP1 and GRASS 64were run in Windows 7 while pprepair wasrun in Mac OS X 1071

As Table 3 shows our approach uses some-what more memory than other solutionsThis is explained by the extra (uncon-strained) edges that are added to the inputover when triangulating them It shouldhowever be noticed that both ArcGIS andGRASS crashed with the biggest dataset16tiles Our implementation is the fastestof the four tested being for instance aroundthree times faster than FME for the biggestdatasets Only for the Mexico dataset isour implementation slower than FME Thisis (probably) explained by the fact that itspolygons already form a valid planar parti-tion and therefore very few snapping op-erations have been performed by FME theplanar graph of the input was basicallybuilt and then the polygons saved back todisk We believe that ArcGIS and GRASSstruggled with the dataset because it con-tains several very large polygons (withmorethan 10 000 vertices) Our implementa-tion took 3m31s but as Table 4 shows ittook CGAL 3m02s to simply triangulate theinput edges This table also demonstratesthe efficiency of our approach for the fourdatasets around 85 of the time of our ap-proach was usedmdashby the CGAL librarymdashtoread the input fromdisk and triangulate the

Table 4 Timed steps of the planar partitionrepair procedure rounded down tothe nearest second The percentageis the triangulation time over thetotal time

E41N27 4tiles 16tiles Mexico

Triangulate 017 015 400 302Label 001 001 027 016Repair 000 000 000 000Reconstruct 001 001 011 007Output 000 000 010 006

Total 019 017 447 331triangulate 89 88 83 86

input edges

During the implementation we took sev-eral engineering decisions to optimise ourprogram One of them was to favour diskspace over computation time as we wantedto be able to process large datasets This iswhy we always reconstruct polygons even ifthey not modified by the repair process Al-though possible it would require us to keepin memory the original polygons (whichwould significantly increase the memoryconsumption) and to keep track of whichlabels have been modified And as Table 4shows the reconstruction is very efficient asit only takes around 3ndash4 of the total timefor the 16tiles and Mexico datasets

14

5 Discussion andConclusions

We have presented a new method for re-pairing polygonal area partitions ensur-ing that the output partition is valid ieall individual polygons are conforming tothe ISO 19107 [ISO 2003] and Simple Fea-tures [OGC 2006] specifications and nogaps nor overlaps are present between anypair of polygons Thenovelty of ourmethodlies in the fact that repair is performed ac-cording to user defined criteria but thentakes place without any human interven-tion Automatic repair is becoming increas-ingly an important topic due to data in-tegration eg data collected for the differ-ent themes of the European INSPIRE Ini-tiative will finally have to fit together anddata could eventually bematched fully auto-matically by dedicated web service compo-nents [INSPIRE 2009] Our approach couldbe at the heart of such a web service

The proposed approach excels in auto-mated repair at the cost of increased mem-ory usage compared to a pure graph-basedapproachmdashthis difference is mainly causedby the unconstrained edges introduced bythe triangulation While it would be possi-ble to use our repair rules together with a(primaldual) graph-based approach theseadditional edges in the triangulation givefine-grained control over the repair oper-ations and ensures that the graph is con-nected which facilitates the reconstructionof polygons

We have implemented our algorithm overa numerically robust triangulator (CGAL)and since repair operations are expressedsolely in terms of re-labelling of triangles(no geometric computation is involved) theapproach is also fully robust Since duringour experiments most of the time was usedto compute the constrained triangulationanother library could also be tested to im-prove the implementation

For the future we plan to

bull Improve the scalability of the approachand process and repair datasets withmore than 10 million polygons It is

known that using divide-and-conquertechniques triangulation algorithmscan handle big datasets [Amenta et al2003 Blandford et al 2005] We willinvestigate whether it is possible toautomatically repair each divided partindividually and lsquogluersquo the repairedparts together

bull Investigate snap rounding [Hobby1999 de Berg et al 2007] as a pre-processing stepmdashor embeddeddirectlyin the triangulation mdash to guaranteethat repaired planar partitions have novertices that are closer than a certain120576 threshold However snap roundingmay change the topology of the inputbut the output will nevertheless be avalid partition (as the topology willbe repaired) Apart from topologicalchanges snap rounding can also leadto removed polygon parts that are toosmall to be preserved based on thechosen 120576

bull Add more advanced repair operationsto our repair toolkit eg repair couldtake place based on splitting a collec-tion of triangles

bull Extend our work to include the thirddimension to validate and repair 3Dcity models using a constrained tetra-hedralisation [Si 2008] Notice thatthe tetrahedralisation of a given poly-hedron does not always exist and thusextra (Steiner) points might need tobe added The main concepts of ourapproach (re)labelling and reconstruc-tion extend naturally to 3D How-ever appropriate repair operations for3D city models would need to be de-fined and some application-specificconstraints (eg right angles at corners)are not trivial to implement in our cur-rent approach

ReferencesNina Amenta Sunghee Choi and Guumln-ter Rote Incremental constructions conBRIO In Proceedings of the 19th An-nual Symposium on Computational Geome-try pages 211ndash219 ACM Press 2003

15

WaelM Badawy andWalidG Aref On localheuristics to speed up polygon-polygonintersection tests In Proceedings of the 7thACM International Symposium on Advancesin Geographic Information Systems pages97ndash102 ACM 1999

M Bader and R Weibel Detecting andresolving size and proximity conflicts inthe generalization of polygonal maps InProceedings of the 18th International Carto-graphic Conference Stockholm pages 1525ndash1532 1997

Daniel K Blandford Guy E BlellochDavid E Cardoze and Clemens KadowCompact representations of simplicialmeshes in two and three dimensionsInternational Journal of ComputationalGeometry and Applications 15(1)3ndash242005

CGAL CGAL 38 User and Reference ManualCGAL Editorial Board 2011

Kenneth L Clarkson Kurt Mehlhorn andRaimund Seidel Four results on random-ized incremental constructions In AlainFinkel and Matthias Jantzen editors Pro-ceedings of the 9th Annual Symposium onTheoretical Aspects of Computer Science vol-ume 577 of Lecture Notes in Computer Sci-ence pages 461ndash474 SpringerBerlin Hei-delberg 1992

Martin Davis Java conflation suite Tech-nical report Vivid Solutions 2003 Avail-able at httpwwwvividsolutionscomjcs

Mark de Berg Dan Halperin and MarkOvermars An intersection-sensitive al-gorithm for snap rounding Computa-tional Geometry 36(3)159 ndash 165 2007 ISSN0925-7721

Mark de Berg Marc van Kreveld MarkOvermars and Otfried SchwarzkopfComputational Geometry Algorithms andApplications Springer-Verlag 3 edition2008

ESRI Shapefile technical descriptionWhite paper ESRI July 1998

Michael A Facello Implementation of arandomized algorithm for Delaunay and

regular triangulations in three dimen-sions Computer Aided Geometric Design 12(4)349ndash370 1995

GTS GTS Library Reference Manual 2006URL httpgtssourceforgenetreferencebook1html

LeonidasGuibas and Jorge Stolfi Primitivesfor the manipulation of general subdivi-sions and the computation of VoronoiACM Transactions on Graphics 4(2)74ndash1231985

John D Hobby Practical segment intersec-tion with finite precision output Com-putational Geometry 13(4)199 ndash 214 1999ISSN 0925-7721

Christoph M Hoffmann John E Hopcroftand Michael S Karasick Towards imple-menting robust geometric computationsIn Proceedings of the 4th Annual Symposiumon Computational Geometry pages 106ndash117ACM 1988

INSPIRE D25 Generic conceptual modelversion 32 Technical report DraftingTeam ldquoData Specificationsrdquo 2009 URLhttpinspirejrceceuropaeudocumentsData_SpecificationsD25_v32pdf

ISO 19107 Geographic information ndash spa-tial schema March 2003

David Kirkpatrick Jack Snoeyink and Bet-tina Speckmann Kinetic collision detec-tion for simple polygons InternationalJournal of Computational Geometry and Ap-plications 12(1ndash2)3ndash27 2002

Robert Laurini and Franccediloise Milleret-Raffort Topological reorganization ofinconsistent geographical databases Astep towards their certification Comput-ers amp Graphics 18(6)803ndash813 December1994

Hugo Ledoux and Martijn Meijers Val-idation of planar partitions using con-strained triangulations In ProceedingsJoint International Conference on TheoryData Handling and Modelling in GeoSpa-tial Information Science pages 51ndash55 HongKong May 2010

16

Hugo Ledoux Ken Arroyo Ohori and Mar-tijn Meijers Automatically repairing in-valid polygons with a constrained trian-gulation In Proceedings of the 15th AGILEInternational Conference on Geographic In-formation Science 2012

Yuanxin Liu and Jack Snoeyink Farawaypoint A sentinel point forDelaunay com-putation International Journal of Compu-tational Geometry and Applications 18(4)343ndash355 November 2006

Avraham Margalit and Gary D Knott Analgorithm for computing the union in-tersection or difference of two poly-gons Computers amp Graphics 13(2)167ndash183 1989

Ernst P Muumlcke Isaac Saias and Bin-hai Zhu Fast randomized point loca-tion without preprocessing in two- andthree-dimensional Delaunay triangula-tions Computational GeometrymdashTheoryand Applications 1263ndash83 1999

OGC OpenGIS Implementation Specificationfor Geographic Information - Simple FeatureAccess - Part 1 Common Architecture 120edition October 2006

Lutz Pluumlmer and Gerhard Groumlger Achiev-ing integrity in geographic informationsystemsmdashmaps and nested maps GeoIn-formatica 1(4)345ndash367 1997

M Rivero and FR Feito Boolean opera-tions on general planar polygons Com-puters amp Graphics 24(6)881ndash896 Decem-ber 2000

Stefan Schirra Precision and Robustness inGeometric Computations volume Algorith-mic Foundations of Geographic Informa-

tion Systems of Lecture Notes in ComputerScience chapter 9 pages 255ndash287 SpringerBerlin Heidelberg 1997

Michael Ian Shamos and Dan Hoey Geo-metric intersection problems In FOCSpages 208ndash215 IEEE 1976

Jonathan Richard Shewchuk Delaunay Re-finement Mesh Generation PhD thesisSchool of Computer Science CarnegieMellon University Pittsburg USA 1997a

Jonathan Richard Shewchuk Adaptiveprecision floating-point arithmetic andfast robust geometric predicates Dis-crete amp Computational Geometry 18305ndash363 1997b

Hang Si Three Dimensional Boundary Con-forming Delaunay Mesh Generation PhDthesis Technische Universitaumlt Berlin2008

C Dana Tomlin Map algebra One perspec-tive Landscape and Urban Planning 30(1ndash2)3ndash12 October 1994

Peter van OosteromWilko Quak and TheoTijssen About invalid valid and cleanpolygons In Peter F Fisher editor Devel-opments in Spatial Data Handlingmdash11th In-ternational Symposium on Spatial DataHan-dling pages 1ndash16 Springer 2004

Jan W van Roessel A new approach toplane-sweep overlay Topological struc-turing and line-segment classificationCartography and Geographic InformationScience 18(1)49ndash67 January 1991

Shuxin Yuan and Chuang Tao Develop-ment of conflation components Geoin-formatics and Socioinformatics pages 1ndash13June 1999

17

  • Introduction
  • Related Work
    • Simple Features and validity of simple polygons
    • Validation of a Planar Partition Using a Planar Graph
    • Repair Using Point and Edge Snapping and Splitting
    • Repair Using Topological Information
      • Validation and Automatic Repair Using a Constrained Triangulation
        • Triangulation of a polygon and constrained triangulation
        • Labelling the triangles of a Planar Partition
        • Validation
        • Repair Operations
        • Extraction of Polygons From a Triangulation
          • Implementation and Experiments
            • Implementation
            • Experiments with real-world planar partition datasets
              • Discussion and Conclusions
Page 9: ValidationandAutomaticRepairofPlanar ... · mightbesnappedtogether,creatingre-peatedvertices,disjointregions,orvar-ioustopologicalproblems. 4.None of the vertices of a polygon shouldbecloserthanthisthreshold

Table 1 The repair operations currently implemented in our software The types of oper-ations are defined in the map algebra classification [Tomlin 1994]

Repair operation Type Criteria

Triangle by priority list Varies Label that has the highest priority according toa predefined priority list

Triangle by number of Focal The label present in the largest number ofneighbours adjacent faces overlaps includedTriangle by absolute majority Focal Label present in two or more valid

adjacent facesTriangle by longest boundary Focal Label present along the longest portion of

the boundary of the adjacent facesRegions by longest boundary Focal of Label present along the longest portion of

zonal the boundary of the adjacent facesRegions by random neighbour Focal of Random label from the adjacent faces

zonal

and overlaps These are shown in Table 1and all imply re-labelling triangles Fourof them use triangles as a base (ie the la-bel assigned is based only on that of itsthree neighbouring triangles) which is fastand modifies the area of each input poly-gon the least Despite their simplicity theyoffer substantial control over the resultsFor instance the first two operations onlydiffer from each other in their handlingof overlapping faces but triangle by num-ber of neighbours is better for large over-lapping regions while triangle by absolutemajority is better in fixing small problemsHaving well-shaped (Delaunay) triangles ismost useful for triangle-based repair func-tions Two of them use regions of adja-cent triangles with equivalent sets of la-bels which is slower than a triangle-basedmethod but yields results that can be car-tographically more pleasing An interest-ing repair operation for practicioners is theone in which a priority of labels is used iein case of gapsoverlaps the labels in thetriangle (overlap) or in the adjacent poly-gons (gap) are ordered according to a user-defined priority and the highest priority isassigned to the problematic triangles No-tice that these repair operations can be usedone after the other (in a hierarchical man-ner) for instance if first the repair accord-ing to the longest boundary is used but onezone has two or more boundaries with ex-actly the same length then the deadlock can

be solved by choosing one randomly Asample of the results obtained with differ-ent repair operators is also shown in Fig-ure 8

More repair operations basedonextensionsto the idea of labelling trianglesregions canbe further developed For instance tri-angles could be split to subdivide an areawith problems (as in Bader and Weibel[1997]) or sliver trianglesregions could bediscarded (and then filled during repair)

35 Extraction of Polygons From aTriangulation

Starting from a labelled triangulation itis possible to reconstruct the individualpolygons of a planar partition to conformto valid polygons according to the ISO19107 and the Simple Features specifica-tions which allows users to incorporate au-tomatic validation and repair in their work-flow

We do this operation polygon by polygonWe start at anunprocessed triangle and visitall the connected triangles having the samelabel (to reconstruct the polygon) mark-ing the triangles as processed as we visitthem Note that since all these trianglesare connected the outer and inner bound-aries of a polygon are all simple (non self-

9

over

lapp

ing

region

red

blue

(a) A set ofoverlappingpolygons thatneeds to berepaired

red

blue

(b) The poly-gons aretriangulatedand eachtriangle islabelled withthe polygons itbelongs to

A

C

D

B

red

blue

(c) An invalidregion isfound definedas a set ofcontiguouspolygons withan equivalentset of 0 or ge 1113569labels

Triangle Repair

ABCD

redredredred

(d) The neigh-bours of theregion areanalysed to seehow it shouldbe re-labelledand the resultis saved onto alist

red

blue

(e) The tri-angles arerelabelledaccording tothe list

red

blue

(f) The outputpolygonswhich nowhave no gapsor overlaps

Figure 7 The steps in a generic repairoperation

intersecting) We repeat this operation un-til all the triangles have been processed

For each polygon we have to recover notonly its outer boundary but also its in-ner boundaries which are not connectedObserve that we cannot simply follow theoriginal constrained edges as these do nothave any meaning after a planar partitionswas repaired the boundaries of the repairedpolygons are instead formed by edges in-cident to two triangles having different la-bels

For each polygon we walk at a triangle andmove on to triangles having the same label

(a) The original poly-gons

(b) Repaired eachtriangle using thelabel adjacent alongthe longest boundaryfrom the neighbouringtriangles

(c) Repaired each regionusing a random labelfrom the neighbouringtriangles

(d) Repaired each regionusing the label adjacentalong the longest bound-ary from the neighbour-ing triangles

Figure 8 Different repair operations usedin the twopolygons for theArribesdel Duero Natural Park in Spain(reddarker grey) and the Inter-national Douro Natural Park inPortugal (greenlighter grey) Allof them can be considered bestby a certain criterion like pre-serving the area ratio between thetwo polygons (b) smoothness ofthe boundary (d) or a balance be-tween the two (c)

As the process goes a single polyline thatruns along all the boundaries of the poly-gon is generated This involves a depth-firstsearch (clockwise) that recursively reachesuntil the boundary of a polygon return-ing a long chain of edges in a proceduresimilar to following the boundary edge byedge The procedure is shown step by stepin Figure 9 The polyline created with thismethod has ldquobridgesrdquo which allow us tokeep all inner boundaries (holes) connectedwith the outer one in a manner that keepsthe interior connected as well These helpto preserve connectivity and the relationsbetween different (outer and inner) bound-aries but are removed later in the process(to conform to the Simple Features specifi-cation) Also its orientation conveys the in-

10

AB

CD

FE

123

45

67 8 9

1011 12

1314 15 16 17

1819

20

21

22

23

24

25

2627 28

29

3130G

H IJ

K

L

MN

O

Figure 9 The traversal order when navigating through triangles in a clockwise mannerStarting at 119860 and the edge between 119860 and 119861 the operations occur accordingto the encircled numbers Black arrows denote when an unprocessed triangle isfound red arrowswhen it is not Notice how the traversal is performed clockwise(shown in dark blue arrows) for both cases despite starting from different sides

formation of whether a section of it is partof an inner or an outer boundary

This polyline is processed with a stack-based algorithm that generates separateclosed rings for the outer boundary andeach of the inner boundaries collapsing theldquobridgesrdquo that were generated In order todo this the polyline is cut at the placeswhere more than two edges join and theseare joined in the correct order by keepingtrack of (yet) unclosed rings When a newsegment is being processed it can be oneof three options one that completes a ringone that is part of a bridge or one thatis not yet closed Closed rings are storedand bridges are removed while unclosedrings are saved in the stack until they can bepopped to forma closed ring togetherwith anew segment This is shown in Figure 10

4 Implementation andExperiments

41 Implementation

An implementation of the algorithms de-scribed in Section 3 was written in the C++

programming language using external li-braries for some functionality The de-veloped software is called pprepair and isopen source and freely available at httptudelft3dgithubcompprepair C++was selected in order to have plenty of con-trol with regards to low level details andto achieve good performance which makesit possible to compare it with existing so-lutions The libraries used are the OGRSimple Features Library9 (which allows in-put and output from a large variety ofdata formats common in GIS) and CGAL10(which has support for many robust spatialdata structures and the operations based onthem we use its constrained triangulationmodule)

42 Experiments with real-worldplanar partition datasets

We havemade experiments with four freelyavailable real-world datasets ie we havevalidated and automatically repaired themwith our implementation the overview ofthese datasets is shown in Figure 11 and9httpwwwgdalorgogr10Computational Geometry Algorithms Library

httpwwwcgalorg

11

AB

a

b

CD

EFG

HIJK c

de

f

Figure 10 Processing the polyline shown starting from vertex 119886 and moving to-wards polyline 119860 the following operations occur (1) 119860 unclosedrarrpush(2) 119861 unclosedrarrpush (3) 119862 unclosedrarrpush (4) 119863 unclosedrarrpush (5)119864 unclosedrarrpush (6) 119865 closedrarrstore (7) poprarr 119864119866 closedrarrstore (8)poprarr 119863119867 degeneraterarrerase (9) poprarr 119862119868 closedrarrstore (10) poprarr 119861119869degeneraterarrerase (11) poprarr 119860119870 closedrarrstore

(a) E41N27 (b) 4tiles

(c) 16tiles (d) Mexico

Figure 11 Overview of the four datasets

their properties in Table 2 The datasets arethe following

E41N27 Corine 200011 tile E41N27 whichcontains a shifted polygon (by about10 cm) creating many small gaps andoverlaps in the dataset The snapping

11Corine is a land cover dataset for 32 Eu-ropean countries It is freely available athttpwwweeaeuropaeudata-and-mapsdatacorine-land-cover-2000-clc2000-seamless-vector-database

threshold has been set at 1 m

4tiles Corine 2000 tiles E39N32 E39N33E40N32 and E40N33 which are knownto have long and thin overlapping re-gions (lt 1 mm) with each other Thesnapping thresholdhas been set at 1 cm

16tiles 16 adjacent Corine 2000 tilesE39N30 E39N31 E39N32 E39N33E40N30 E40N31 E40N32 E40N33E41N30 E41N31 E41N32 E41N33E42N30 E42N31 E42N32 E42E33Some have gaps between one anothersome overlap but match within a fewcentimetres The snapping thresholdhas been set at 10 cm

Mexico 11 000 000 scale land cover datasetfrom INEGI consisting of over 26 000polygons Interestingly it is mostly al-ready valid (according to the Shapefilespecification) but contains some verylarge polygons with tens of thousandsof vertices

As a comparison we have also tried to per-form the same operations with other avail-able software While the capabilities forplanar partition repair among the softwaretested vary considerably with full topolog-ical repair only available in ArcGIS (usingmanual operations only) it is also impor-tant to consider how different repair im-plementations scale to large datasets For

12

Table 2 Properties of the datasets used for the experiments

pts avg pts polygons pts largest polygon per polygon

E41N27 14 969 496 303 26 740 3374tiles 4 984 365 702 16 961 74716tiles 63 868 6 622 133 95 112 1037Mexico 26 866 4 181 354 117 736 1556

this a few performance tests were made inour implementation and three planar par-tition repair tools that perform this processusing snapping and splitting The testingmethodology for each tool is as follows

ArcGIS In ArcCatalog a multiple featuredataset is created in a Geodatabase setwith XY resolution values equal to thesnapping threshold The features areimported into it and themerge and dis-solve operations are used tomerge adja-cent polygons with the same ID Topol-ogy is then generated to check that theplanar partition is valid Everything isfinally exported to a single ShapefileThe individual parts of the process aretimed and the total is recorded Mem-ory usage is calculated as the differ-ence between the just loaded ArcCata-log application and itsmaximummem-ory usage throughout the process

FME A reader is created for each input filewhich serve as input to a Snapper trans-former features with the same IDs arethen dissolved and finally they are out-put into a new Shapefile writer Thetopology generator is used to be ableto tell whether the result is a valid pla-nar partition Results are timed and themaximum memory allocation of thefmeexe process is recorded

GRASS Input files are imported withvinogr with all polygon cleaning op-erations performed and snapping set tothe correct values Boundaries betweenfeatures with the same IDs are thendissolved using vdissolve Files arethen exported with voutogr Timesreported by GRASS are added togetherto give the total while memory usage

by the vinogrexe vdissolveexeand voutogrexe are monitored andtheir maximum is recorded

pprepair Files are read and put into thetriangulation the triangulation is la-belled the repair is performed withthe longest boundary first (see Table 1)with ambiguous cases resolved with arandom choice Polygons are then ex-tracted from the triangulation and out-put to a single Shapefile The entireprocess is timed and the maximumamount of memory used is recorded

Notice that we are somewhat comparingapples with oranges here since our imple-mentation is able to repair more cases thanother methods keeps topological consis-tency and does not require finding out theappropriate threshold value (if it exists)More importantly it is able to directly statewhether the result is a planar partition un-like the three other solutions (with thesesnapping is performed but that does notguarantee that the output is a valid planarpartition) The results of the experimentsare shown in Table 3

We have made the experiments in orderto have an idea of the processing timeand the memory usage involved Onlycases that are acceptably solved by snappingand splitting have been considered sincethere is no other comparable automatedtopological repair tool among those stud-ied that would also work for more complexcases such as those that require a snappingthreshold too high to be practical (eg hor-izontal conflation of independently gener-ated datasets [Yuan and Tao 1999 Davis2003])

13

Table 3 Planar partition repair comparison using large datasets

pprepair ArcGIS FME GRASS

memory time memory time memory time memory time

E41N27 145 MB 19s 145 MB 1m3s 158 MB 31s 59 MB 3m9s4tiles 116 MB 17s 113 MB 37s 105 MB 31s 49 MB 53s16tiles 145 GB 4m47s crashes ndash 636 MB 15m48s crashes ndashMexico 101 GB 3m31s 216 MB gt1d 264 MB 2m45s 408 MB 11m38s

All tests have been run 5 times in a ma-chine just booted and the results averagedto account for the small variations that oc-curred except in the case where the execu-tion was cancelled after one full day due totime limitations or when it causes the pro-gram to crash The hardware is a 266 GHzCore 2DuoMacBookProwith 4GBofRAMArcGIS 93 FME 2010 SP1 and GRASS 64were run in Windows 7 while pprepair wasrun in Mac OS X 1071

As Table 3 shows our approach uses some-what more memory than other solutionsThis is explained by the extra (uncon-strained) edges that are added to the inputover when triangulating them It shouldhowever be noticed that both ArcGIS andGRASS crashed with the biggest dataset16tiles Our implementation is the fastestof the four tested being for instance aroundthree times faster than FME for the biggestdatasets Only for the Mexico dataset isour implementation slower than FME Thisis (probably) explained by the fact that itspolygons already form a valid planar parti-tion and therefore very few snapping op-erations have been performed by FME theplanar graph of the input was basicallybuilt and then the polygons saved back todisk We believe that ArcGIS and GRASSstruggled with the dataset because it con-tains several very large polygons (withmorethan 10 000 vertices) Our implementa-tion took 3m31s but as Table 4 shows ittook CGAL 3m02s to simply triangulate theinput edges This table also demonstratesthe efficiency of our approach for the fourdatasets around 85 of the time of our ap-proach was usedmdashby the CGAL librarymdashtoread the input fromdisk and triangulate the

Table 4 Timed steps of the planar partitionrepair procedure rounded down tothe nearest second The percentageis the triangulation time over thetotal time

E41N27 4tiles 16tiles Mexico

Triangulate 017 015 400 302Label 001 001 027 016Repair 000 000 000 000Reconstruct 001 001 011 007Output 000 000 010 006

Total 019 017 447 331triangulate 89 88 83 86

input edges

During the implementation we took sev-eral engineering decisions to optimise ourprogram One of them was to favour diskspace over computation time as we wantedto be able to process large datasets This iswhy we always reconstruct polygons even ifthey not modified by the repair process Al-though possible it would require us to keepin memory the original polygons (whichwould significantly increase the memoryconsumption) and to keep track of whichlabels have been modified And as Table 4shows the reconstruction is very efficient asit only takes around 3ndash4 of the total timefor the 16tiles and Mexico datasets

14

5 Discussion andConclusions

We have presented a new method for re-pairing polygonal area partitions ensur-ing that the output partition is valid ieall individual polygons are conforming tothe ISO 19107 [ISO 2003] and Simple Fea-tures [OGC 2006] specifications and nogaps nor overlaps are present between anypair of polygons Thenovelty of ourmethodlies in the fact that repair is performed ac-cording to user defined criteria but thentakes place without any human interven-tion Automatic repair is becoming increas-ingly an important topic due to data in-tegration eg data collected for the differ-ent themes of the European INSPIRE Ini-tiative will finally have to fit together anddata could eventually bematched fully auto-matically by dedicated web service compo-nents [INSPIRE 2009] Our approach couldbe at the heart of such a web service

The proposed approach excels in auto-mated repair at the cost of increased mem-ory usage compared to a pure graph-basedapproachmdashthis difference is mainly causedby the unconstrained edges introduced bythe triangulation While it would be possi-ble to use our repair rules together with a(primaldual) graph-based approach theseadditional edges in the triangulation givefine-grained control over the repair oper-ations and ensures that the graph is con-nected which facilitates the reconstructionof polygons

We have implemented our algorithm overa numerically robust triangulator (CGAL)and since repair operations are expressedsolely in terms of re-labelling of triangles(no geometric computation is involved) theapproach is also fully robust Since duringour experiments most of the time was usedto compute the constrained triangulationanother library could also be tested to im-prove the implementation

For the future we plan to

bull Improve the scalability of the approachand process and repair datasets withmore than 10 million polygons It is

known that using divide-and-conquertechniques triangulation algorithmscan handle big datasets [Amenta et al2003 Blandford et al 2005] We willinvestigate whether it is possible toautomatically repair each divided partindividually and lsquogluersquo the repairedparts together

bull Investigate snap rounding [Hobby1999 de Berg et al 2007] as a pre-processing stepmdashor embeddeddirectlyin the triangulation mdash to guaranteethat repaired planar partitions have novertices that are closer than a certain120576 threshold However snap roundingmay change the topology of the inputbut the output will nevertheless be avalid partition (as the topology willbe repaired) Apart from topologicalchanges snap rounding can also leadto removed polygon parts that are toosmall to be preserved based on thechosen 120576

bull Add more advanced repair operationsto our repair toolkit eg repair couldtake place based on splitting a collec-tion of triangles

bull Extend our work to include the thirddimension to validate and repair 3Dcity models using a constrained tetra-hedralisation [Si 2008] Notice thatthe tetrahedralisation of a given poly-hedron does not always exist and thusextra (Steiner) points might need tobe added The main concepts of ourapproach (re)labelling and reconstruc-tion extend naturally to 3D How-ever appropriate repair operations for3D city models would need to be de-fined and some application-specificconstraints (eg right angles at corners)are not trivial to implement in our cur-rent approach

ReferencesNina Amenta Sunghee Choi and Guumln-ter Rote Incremental constructions conBRIO In Proceedings of the 19th An-nual Symposium on Computational Geome-try pages 211ndash219 ACM Press 2003

15

WaelM Badawy andWalidG Aref On localheuristics to speed up polygon-polygonintersection tests In Proceedings of the 7thACM International Symposium on Advancesin Geographic Information Systems pages97ndash102 ACM 1999

M Bader and R Weibel Detecting andresolving size and proximity conflicts inthe generalization of polygonal maps InProceedings of the 18th International Carto-graphic Conference Stockholm pages 1525ndash1532 1997

Daniel K Blandford Guy E BlellochDavid E Cardoze and Clemens KadowCompact representations of simplicialmeshes in two and three dimensionsInternational Journal of ComputationalGeometry and Applications 15(1)3ndash242005

CGAL CGAL 38 User and Reference ManualCGAL Editorial Board 2011

Kenneth L Clarkson Kurt Mehlhorn andRaimund Seidel Four results on random-ized incremental constructions In AlainFinkel and Matthias Jantzen editors Pro-ceedings of the 9th Annual Symposium onTheoretical Aspects of Computer Science vol-ume 577 of Lecture Notes in Computer Sci-ence pages 461ndash474 SpringerBerlin Hei-delberg 1992

Martin Davis Java conflation suite Tech-nical report Vivid Solutions 2003 Avail-able at httpwwwvividsolutionscomjcs

Mark de Berg Dan Halperin and MarkOvermars An intersection-sensitive al-gorithm for snap rounding Computa-tional Geometry 36(3)159 ndash 165 2007 ISSN0925-7721

Mark de Berg Marc van Kreveld MarkOvermars and Otfried SchwarzkopfComputational Geometry Algorithms andApplications Springer-Verlag 3 edition2008

ESRI Shapefile technical descriptionWhite paper ESRI July 1998

Michael A Facello Implementation of arandomized algorithm for Delaunay and

regular triangulations in three dimen-sions Computer Aided Geometric Design 12(4)349ndash370 1995

GTS GTS Library Reference Manual 2006URL httpgtssourceforgenetreferencebook1html

LeonidasGuibas and Jorge Stolfi Primitivesfor the manipulation of general subdivi-sions and the computation of VoronoiACM Transactions on Graphics 4(2)74ndash1231985

John D Hobby Practical segment intersec-tion with finite precision output Com-putational Geometry 13(4)199 ndash 214 1999ISSN 0925-7721

Christoph M Hoffmann John E Hopcroftand Michael S Karasick Towards imple-menting robust geometric computationsIn Proceedings of the 4th Annual Symposiumon Computational Geometry pages 106ndash117ACM 1988

INSPIRE D25 Generic conceptual modelversion 32 Technical report DraftingTeam ldquoData Specificationsrdquo 2009 URLhttpinspirejrceceuropaeudocumentsData_SpecificationsD25_v32pdf

ISO 19107 Geographic information ndash spa-tial schema March 2003

David Kirkpatrick Jack Snoeyink and Bet-tina Speckmann Kinetic collision detec-tion for simple polygons InternationalJournal of Computational Geometry and Ap-plications 12(1ndash2)3ndash27 2002

Robert Laurini and Franccediloise Milleret-Raffort Topological reorganization ofinconsistent geographical databases Astep towards their certification Comput-ers amp Graphics 18(6)803ndash813 December1994

Hugo Ledoux and Martijn Meijers Val-idation of planar partitions using con-strained triangulations In ProceedingsJoint International Conference on TheoryData Handling and Modelling in GeoSpa-tial Information Science pages 51ndash55 HongKong May 2010

16

Hugo Ledoux Ken Arroyo Ohori and Mar-tijn Meijers Automatically repairing in-valid polygons with a constrained trian-gulation In Proceedings of the 15th AGILEInternational Conference on Geographic In-formation Science 2012

Yuanxin Liu and Jack Snoeyink Farawaypoint A sentinel point forDelaunay com-putation International Journal of Compu-tational Geometry and Applications 18(4)343ndash355 November 2006

Avraham Margalit and Gary D Knott Analgorithm for computing the union in-tersection or difference of two poly-gons Computers amp Graphics 13(2)167ndash183 1989

Ernst P Muumlcke Isaac Saias and Bin-hai Zhu Fast randomized point loca-tion without preprocessing in two- andthree-dimensional Delaunay triangula-tions Computational GeometrymdashTheoryand Applications 1263ndash83 1999

OGC OpenGIS Implementation Specificationfor Geographic Information - Simple FeatureAccess - Part 1 Common Architecture 120edition October 2006

Lutz Pluumlmer and Gerhard Groumlger Achiev-ing integrity in geographic informationsystemsmdashmaps and nested maps GeoIn-formatica 1(4)345ndash367 1997

M Rivero and FR Feito Boolean opera-tions on general planar polygons Com-puters amp Graphics 24(6)881ndash896 Decem-ber 2000

Stefan Schirra Precision and Robustness inGeometric Computations volume Algorith-mic Foundations of Geographic Informa-

tion Systems of Lecture Notes in ComputerScience chapter 9 pages 255ndash287 SpringerBerlin Heidelberg 1997

Michael Ian Shamos and Dan Hoey Geo-metric intersection problems In FOCSpages 208ndash215 IEEE 1976

Jonathan Richard Shewchuk Delaunay Re-finement Mesh Generation PhD thesisSchool of Computer Science CarnegieMellon University Pittsburg USA 1997a

Jonathan Richard Shewchuk Adaptiveprecision floating-point arithmetic andfast robust geometric predicates Dis-crete amp Computational Geometry 18305ndash363 1997b

Hang Si Three Dimensional Boundary Con-forming Delaunay Mesh Generation PhDthesis Technische Universitaumlt Berlin2008

C Dana Tomlin Map algebra One perspec-tive Landscape and Urban Planning 30(1ndash2)3ndash12 October 1994

Peter van OosteromWilko Quak and TheoTijssen About invalid valid and cleanpolygons In Peter F Fisher editor Devel-opments in Spatial Data Handlingmdash11th In-ternational Symposium on Spatial DataHan-dling pages 1ndash16 Springer 2004

Jan W van Roessel A new approach toplane-sweep overlay Topological struc-turing and line-segment classificationCartography and Geographic InformationScience 18(1)49ndash67 January 1991

Shuxin Yuan and Chuang Tao Develop-ment of conflation components Geoin-formatics and Socioinformatics pages 1ndash13June 1999

17

  • Introduction
  • Related Work
    • Simple Features and validity of simple polygons
    • Validation of a Planar Partition Using a Planar Graph
    • Repair Using Point and Edge Snapping and Splitting
    • Repair Using Topological Information
      • Validation and Automatic Repair Using a Constrained Triangulation
        • Triangulation of a polygon and constrained triangulation
        • Labelling the triangles of a Planar Partition
        • Validation
        • Repair Operations
        • Extraction of Polygons From a Triangulation
          • Implementation and Experiments
            • Implementation
            • Experiments with real-world planar partition datasets
              • Discussion and Conclusions
Page 10: ValidationandAutomaticRepairofPlanar ... · mightbesnappedtogether,creatingre-peatedvertices,disjointregions,orvar-ioustopologicalproblems. 4.None of the vertices of a polygon shouldbecloserthanthisthreshold

over

lapp

ing

region

red

blue

(a) A set ofoverlappingpolygons thatneeds to berepaired

red

blue

(b) The poly-gons aretriangulatedand eachtriangle islabelled withthe polygons itbelongs to

A

C

D

B

red

blue

(c) An invalidregion isfound definedas a set ofcontiguouspolygons withan equivalentset of 0 or ge 1113569labels

Triangle Repair

ABCD

redredredred

(d) The neigh-bours of theregion areanalysed to seehow it shouldbe re-labelledand the resultis saved onto alist

red

blue

(e) The tri-angles arerelabelledaccording tothe list

red

blue

(f) The outputpolygonswhich nowhave no gapsor overlaps

Figure 7 The steps in a generic repairoperation

intersecting) We repeat this operation un-til all the triangles have been processed

For each polygon we have to recover notonly its outer boundary but also its in-ner boundaries which are not connectedObserve that we cannot simply follow theoriginal constrained edges as these do nothave any meaning after a planar partitionswas repaired the boundaries of the repairedpolygons are instead formed by edges in-cident to two triangles having different la-bels

For each polygon we walk at a triangle andmove on to triangles having the same label

(a) The original poly-gons

(b) Repaired eachtriangle using thelabel adjacent alongthe longest boundaryfrom the neighbouringtriangles

(c) Repaired each regionusing a random labelfrom the neighbouringtriangles

(d) Repaired each regionusing the label adjacentalong the longest bound-ary from the neighbour-ing triangles

Figure 8 Different repair operations usedin the twopolygons for theArribesdel Duero Natural Park in Spain(reddarker grey) and the Inter-national Douro Natural Park inPortugal (greenlighter grey) Allof them can be considered bestby a certain criterion like pre-serving the area ratio between thetwo polygons (b) smoothness ofthe boundary (d) or a balance be-tween the two (c)

As the process goes a single polyline thatruns along all the boundaries of the poly-gon is generated This involves a depth-firstsearch (clockwise) that recursively reachesuntil the boundary of a polygon return-ing a long chain of edges in a proceduresimilar to following the boundary edge byedge The procedure is shown step by stepin Figure 9 The polyline created with thismethod has ldquobridgesrdquo which allow us tokeep all inner boundaries (holes) connectedwith the outer one in a manner that keepsthe interior connected as well These helpto preserve connectivity and the relationsbetween different (outer and inner) bound-aries but are removed later in the process(to conform to the Simple Features specifi-cation) Also its orientation conveys the in-

10

AB

CD

FE

123

45

67 8 9

1011 12

1314 15 16 17

1819

20

21

22

23

24

25

2627 28

29

3130G

H IJ

K

L

MN

O

Figure 9 The traversal order when navigating through triangles in a clockwise mannerStarting at 119860 and the edge between 119860 and 119861 the operations occur accordingto the encircled numbers Black arrows denote when an unprocessed triangle isfound red arrowswhen it is not Notice how the traversal is performed clockwise(shown in dark blue arrows) for both cases despite starting from different sides

formation of whether a section of it is partof an inner or an outer boundary

This polyline is processed with a stack-based algorithm that generates separateclosed rings for the outer boundary andeach of the inner boundaries collapsing theldquobridgesrdquo that were generated In order todo this the polyline is cut at the placeswhere more than two edges join and theseare joined in the correct order by keepingtrack of (yet) unclosed rings When a newsegment is being processed it can be oneof three options one that completes a ringone that is part of a bridge or one thatis not yet closed Closed rings are storedand bridges are removed while unclosedrings are saved in the stack until they can bepopped to forma closed ring togetherwith anew segment This is shown in Figure 10

4 Implementation andExperiments

41 Implementation

An implementation of the algorithms de-scribed in Section 3 was written in the C++

programming language using external li-braries for some functionality The de-veloped software is called pprepair and isopen source and freely available at httptudelft3dgithubcompprepair C++was selected in order to have plenty of con-trol with regards to low level details andto achieve good performance which makesit possible to compare it with existing so-lutions The libraries used are the OGRSimple Features Library9 (which allows in-put and output from a large variety ofdata formats common in GIS) and CGAL10(which has support for many robust spatialdata structures and the operations based onthem we use its constrained triangulationmodule)

42 Experiments with real-worldplanar partition datasets

We havemade experiments with four freelyavailable real-world datasets ie we havevalidated and automatically repaired themwith our implementation the overview ofthese datasets is shown in Figure 11 and9httpwwwgdalorgogr10Computational Geometry Algorithms Library

httpwwwcgalorg

11

AB

a

b

CD

EFG

HIJK c

de

f

Figure 10 Processing the polyline shown starting from vertex 119886 and moving to-wards polyline 119860 the following operations occur (1) 119860 unclosedrarrpush(2) 119861 unclosedrarrpush (3) 119862 unclosedrarrpush (4) 119863 unclosedrarrpush (5)119864 unclosedrarrpush (6) 119865 closedrarrstore (7) poprarr 119864119866 closedrarrstore (8)poprarr 119863119867 degeneraterarrerase (9) poprarr 119862119868 closedrarrstore (10) poprarr 119861119869degeneraterarrerase (11) poprarr 119860119870 closedrarrstore

(a) E41N27 (b) 4tiles

(c) 16tiles (d) Mexico

Figure 11 Overview of the four datasets

their properties in Table 2 The datasets arethe following

E41N27 Corine 200011 tile E41N27 whichcontains a shifted polygon (by about10 cm) creating many small gaps andoverlaps in the dataset The snapping

11Corine is a land cover dataset for 32 Eu-ropean countries It is freely available athttpwwweeaeuropaeudata-and-mapsdatacorine-land-cover-2000-clc2000-seamless-vector-database

threshold has been set at 1 m

4tiles Corine 2000 tiles E39N32 E39N33E40N32 and E40N33 which are knownto have long and thin overlapping re-gions (lt 1 mm) with each other Thesnapping thresholdhas been set at 1 cm

16tiles 16 adjacent Corine 2000 tilesE39N30 E39N31 E39N32 E39N33E40N30 E40N31 E40N32 E40N33E41N30 E41N31 E41N32 E41N33E42N30 E42N31 E42N32 E42E33Some have gaps between one anothersome overlap but match within a fewcentimetres The snapping thresholdhas been set at 10 cm

Mexico 11 000 000 scale land cover datasetfrom INEGI consisting of over 26 000polygons Interestingly it is mostly al-ready valid (according to the Shapefilespecification) but contains some verylarge polygons with tens of thousandsof vertices

As a comparison we have also tried to per-form the same operations with other avail-able software While the capabilities forplanar partition repair among the softwaretested vary considerably with full topolog-ical repair only available in ArcGIS (usingmanual operations only) it is also impor-tant to consider how different repair im-plementations scale to large datasets For

12

Table 2 Properties of the datasets used for the experiments

pts avg pts polygons pts largest polygon per polygon

E41N27 14 969 496 303 26 740 3374tiles 4 984 365 702 16 961 74716tiles 63 868 6 622 133 95 112 1037Mexico 26 866 4 181 354 117 736 1556

this a few performance tests were made inour implementation and three planar par-tition repair tools that perform this processusing snapping and splitting The testingmethodology for each tool is as follows

ArcGIS In ArcCatalog a multiple featuredataset is created in a Geodatabase setwith XY resolution values equal to thesnapping threshold The features areimported into it and themerge and dis-solve operations are used tomerge adja-cent polygons with the same ID Topol-ogy is then generated to check that theplanar partition is valid Everything isfinally exported to a single ShapefileThe individual parts of the process aretimed and the total is recorded Mem-ory usage is calculated as the differ-ence between the just loaded ArcCata-log application and itsmaximummem-ory usage throughout the process

FME A reader is created for each input filewhich serve as input to a Snapper trans-former features with the same IDs arethen dissolved and finally they are out-put into a new Shapefile writer Thetopology generator is used to be ableto tell whether the result is a valid pla-nar partition Results are timed and themaximum memory allocation of thefmeexe process is recorded

GRASS Input files are imported withvinogr with all polygon cleaning op-erations performed and snapping set tothe correct values Boundaries betweenfeatures with the same IDs are thendissolved using vdissolve Files arethen exported with voutogr Timesreported by GRASS are added togetherto give the total while memory usage

by the vinogrexe vdissolveexeand voutogrexe are monitored andtheir maximum is recorded

pprepair Files are read and put into thetriangulation the triangulation is la-belled the repair is performed withthe longest boundary first (see Table 1)with ambiguous cases resolved with arandom choice Polygons are then ex-tracted from the triangulation and out-put to a single Shapefile The entireprocess is timed and the maximumamount of memory used is recorded

Notice that we are somewhat comparingapples with oranges here since our imple-mentation is able to repair more cases thanother methods keeps topological consis-tency and does not require finding out theappropriate threshold value (if it exists)More importantly it is able to directly statewhether the result is a planar partition un-like the three other solutions (with thesesnapping is performed but that does notguarantee that the output is a valid planarpartition) The results of the experimentsare shown in Table 3

We have made the experiments in orderto have an idea of the processing timeand the memory usage involved Onlycases that are acceptably solved by snappingand splitting have been considered sincethere is no other comparable automatedtopological repair tool among those stud-ied that would also work for more complexcases such as those that require a snappingthreshold too high to be practical (eg hor-izontal conflation of independently gener-ated datasets [Yuan and Tao 1999 Davis2003])

13

Table 3 Planar partition repair comparison using large datasets

pprepair ArcGIS FME GRASS

memory time memory time memory time memory time

E41N27 145 MB 19s 145 MB 1m3s 158 MB 31s 59 MB 3m9s4tiles 116 MB 17s 113 MB 37s 105 MB 31s 49 MB 53s16tiles 145 GB 4m47s crashes ndash 636 MB 15m48s crashes ndashMexico 101 GB 3m31s 216 MB gt1d 264 MB 2m45s 408 MB 11m38s

All tests have been run 5 times in a ma-chine just booted and the results averagedto account for the small variations that oc-curred except in the case where the execu-tion was cancelled after one full day due totime limitations or when it causes the pro-gram to crash The hardware is a 266 GHzCore 2DuoMacBookProwith 4GBofRAMArcGIS 93 FME 2010 SP1 and GRASS 64were run in Windows 7 while pprepair wasrun in Mac OS X 1071

As Table 3 shows our approach uses some-what more memory than other solutionsThis is explained by the extra (uncon-strained) edges that are added to the inputover when triangulating them It shouldhowever be noticed that both ArcGIS andGRASS crashed with the biggest dataset16tiles Our implementation is the fastestof the four tested being for instance aroundthree times faster than FME for the biggestdatasets Only for the Mexico dataset isour implementation slower than FME Thisis (probably) explained by the fact that itspolygons already form a valid planar parti-tion and therefore very few snapping op-erations have been performed by FME theplanar graph of the input was basicallybuilt and then the polygons saved back todisk We believe that ArcGIS and GRASSstruggled with the dataset because it con-tains several very large polygons (withmorethan 10 000 vertices) Our implementa-tion took 3m31s but as Table 4 shows ittook CGAL 3m02s to simply triangulate theinput edges This table also demonstratesthe efficiency of our approach for the fourdatasets around 85 of the time of our ap-proach was usedmdashby the CGAL librarymdashtoread the input fromdisk and triangulate the

Table 4 Timed steps of the planar partitionrepair procedure rounded down tothe nearest second The percentageis the triangulation time over thetotal time

E41N27 4tiles 16tiles Mexico

Triangulate 017 015 400 302Label 001 001 027 016Repair 000 000 000 000Reconstruct 001 001 011 007Output 000 000 010 006

Total 019 017 447 331triangulate 89 88 83 86

input edges

During the implementation we took sev-eral engineering decisions to optimise ourprogram One of them was to favour diskspace over computation time as we wantedto be able to process large datasets This iswhy we always reconstruct polygons even ifthey not modified by the repair process Al-though possible it would require us to keepin memory the original polygons (whichwould significantly increase the memoryconsumption) and to keep track of whichlabels have been modified And as Table 4shows the reconstruction is very efficient asit only takes around 3ndash4 of the total timefor the 16tiles and Mexico datasets

14

5 Discussion andConclusions

We have presented a new method for re-pairing polygonal area partitions ensur-ing that the output partition is valid ieall individual polygons are conforming tothe ISO 19107 [ISO 2003] and Simple Fea-tures [OGC 2006] specifications and nogaps nor overlaps are present between anypair of polygons Thenovelty of ourmethodlies in the fact that repair is performed ac-cording to user defined criteria but thentakes place without any human interven-tion Automatic repair is becoming increas-ingly an important topic due to data in-tegration eg data collected for the differ-ent themes of the European INSPIRE Ini-tiative will finally have to fit together anddata could eventually bematched fully auto-matically by dedicated web service compo-nents [INSPIRE 2009] Our approach couldbe at the heart of such a web service

The proposed approach excels in auto-mated repair at the cost of increased mem-ory usage compared to a pure graph-basedapproachmdashthis difference is mainly causedby the unconstrained edges introduced bythe triangulation While it would be possi-ble to use our repair rules together with a(primaldual) graph-based approach theseadditional edges in the triangulation givefine-grained control over the repair oper-ations and ensures that the graph is con-nected which facilitates the reconstructionof polygons

We have implemented our algorithm overa numerically robust triangulator (CGAL)and since repair operations are expressedsolely in terms of re-labelling of triangles(no geometric computation is involved) theapproach is also fully robust Since duringour experiments most of the time was usedto compute the constrained triangulationanother library could also be tested to im-prove the implementation

For the future we plan to

bull Improve the scalability of the approachand process and repair datasets withmore than 10 million polygons It is

known that using divide-and-conquertechniques triangulation algorithmscan handle big datasets [Amenta et al2003 Blandford et al 2005] We willinvestigate whether it is possible toautomatically repair each divided partindividually and lsquogluersquo the repairedparts together

bull Investigate snap rounding [Hobby1999 de Berg et al 2007] as a pre-processing stepmdashor embeddeddirectlyin the triangulation mdash to guaranteethat repaired planar partitions have novertices that are closer than a certain120576 threshold However snap roundingmay change the topology of the inputbut the output will nevertheless be avalid partition (as the topology willbe repaired) Apart from topologicalchanges snap rounding can also leadto removed polygon parts that are toosmall to be preserved based on thechosen 120576

bull Add more advanced repair operationsto our repair toolkit eg repair couldtake place based on splitting a collec-tion of triangles

bull Extend our work to include the thirddimension to validate and repair 3Dcity models using a constrained tetra-hedralisation [Si 2008] Notice thatthe tetrahedralisation of a given poly-hedron does not always exist and thusextra (Steiner) points might need tobe added The main concepts of ourapproach (re)labelling and reconstruc-tion extend naturally to 3D How-ever appropriate repair operations for3D city models would need to be de-fined and some application-specificconstraints (eg right angles at corners)are not trivial to implement in our cur-rent approach

ReferencesNina Amenta Sunghee Choi and Guumln-ter Rote Incremental constructions conBRIO In Proceedings of the 19th An-nual Symposium on Computational Geome-try pages 211ndash219 ACM Press 2003

15

WaelM Badawy andWalidG Aref On localheuristics to speed up polygon-polygonintersection tests In Proceedings of the 7thACM International Symposium on Advancesin Geographic Information Systems pages97ndash102 ACM 1999

M Bader and R Weibel Detecting andresolving size and proximity conflicts inthe generalization of polygonal maps InProceedings of the 18th International Carto-graphic Conference Stockholm pages 1525ndash1532 1997

Daniel K Blandford Guy E BlellochDavid E Cardoze and Clemens KadowCompact representations of simplicialmeshes in two and three dimensionsInternational Journal of ComputationalGeometry and Applications 15(1)3ndash242005

CGAL CGAL 38 User and Reference ManualCGAL Editorial Board 2011

Kenneth L Clarkson Kurt Mehlhorn andRaimund Seidel Four results on random-ized incremental constructions In AlainFinkel and Matthias Jantzen editors Pro-ceedings of the 9th Annual Symposium onTheoretical Aspects of Computer Science vol-ume 577 of Lecture Notes in Computer Sci-ence pages 461ndash474 SpringerBerlin Hei-delberg 1992

Martin Davis Java conflation suite Tech-nical report Vivid Solutions 2003 Avail-able at httpwwwvividsolutionscomjcs

Mark de Berg Dan Halperin and MarkOvermars An intersection-sensitive al-gorithm for snap rounding Computa-tional Geometry 36(3)159 ndash 165 2007 ISSN0925-7721

Mark de Berg Marc van Kreveld MarkOvermars and Otfried SchwarzkopfComputational Geometry Algorithms andApplications Springer-Verlag 3 edition2008

ESRI Shapefile technical descriptionWhite paper ESRI July 1998

Michael A Facello Implementation of arandomized algorithm for Delaunay and

regular triangulations in three dimen-sions Computer Aided Geometric Design 12(4)349ndash370 1995

GTS GTS Library Reference Manual 2006URL httpgtssourceforgenetreferencebook1html

LeonidasGuibas and Jorge Stolfi Primitivesfor the manipulation of general subdivi-sions and the computation of VoronoiACM Transactions on Graphics 4(2)74ndash1231985

John D Hobby Practical segment intersec-tion with finite precision output Com-putational Geometry 13(4)199 ndash 214 1999ISSN 0925-7721

Christoph M Hoffmann John E Hopcroftand Michael S Karasick Towards imple-menting robust geometric computationsIn Proceedings of the 4th Annual Symposiumon Computational Geometry pages 106ndash117ACM 1988

INSPIRE D25 Generic conceptual modelversion 32 Technical report DraftingTeam ldquoData Specificationsrdquo 2009 URLhttpinspirejrceceuropaeudocumentsData_SpecificationsD25_v32pdf

ISO 19107 Geographic information ndash spa-tial schema March 2003

David Kirkpatrick Jack Snoeyink and Bet-tina Speckmann Kinetic collision detec-tion for simple polygons InternationalJournal of Computational Geometry and Ap-plications 12(1ndash2)3ndash27 2002

Robert Laurini and Franccediloise Milleret-Raffort Topological reorganization ofinconsistent geographical databases Astep towards their certification Comput-ers amp Graphics 18(6)803ndash813 December1994

Hugo Ledoux and Martijn Meijers Val-idation of planar partitions using con-strained triangulations In ProceedingsJoint International Conference on TheoryData Handling and Modelling in GeoSpa-tial Information Science pages 51ndash55 HongKong May 2010

16

Hugo Ledoux Ken Arroyo Ohori and Mar-tijn Meijers Automatically repairing in-valid polygons with a constrained trian-gulation In Proceedings of the 15th AGILEInternational Conference on Geographic In-formation Science 2012

Yuanxin Liu and Jack Snoeyink Farawaypoint A sentinel point forDelaunay com-putation International Journal of Compu-tational Geometry and Applications 18(4)343ndash355 November 2006

Avraham Margalit and Gary D Knott Analgorithm for computing the union in-tersection or difference of two poly-gons Computers amp Graphics 13(2)167ndash183 1989

Ernst P Muumlcke Isaac Saias and Bin-hai Zhu Fast randomized point loca-tion without preprocessing in two- andthree-dimensional Delaunay triangula-tions Computational GeometrymdashTheoryand Applications 1263ndash83 1999

OGC OpenGIS Implementation Specificationfor Geographic Information - Simple FeatureAccess - Part 1 Common Architecture 120edition October 2006

Lutz Pluumlmer and Gerhard Groumlger Achiev-ing integrity in geographic informationsystemsmdashmaps and nested maps GeoIn-formatica 1(4)345ndash367 1997

M Rivero and FR Feito Boolean opera-tions on general planar polygons Com-puters amp Graphics 24(6)881ndash896 Decem-ber 2000

Stefan Schirra Precision and Robustness inGeometric Computations volume Algorith-mic Foundations of Geographic Informa-

tion Systems of Lecture Notes in ComputerScience chapter 9 pages 255ndash287 SpringerBerlin Heidelberg 1997

Michael Ian Shamos and Dan Hoey Geo-metric intersection problems In FOCSpages 208ndash215 IEEE 1976

Jonathan Richard Shewchuk Delaunay Re-finement Mesh Generation PhD thesisSchool of Computer Science CarnegieMellon University Pittsburg USA 1997a

Jonathan Richard Shewchuk Adaptiveprecision floating-point arithmetic andfast robust geometric predicates Dis-crete amp Computational Geometry 18305ndash363 1997b

Hang Si Three Dimensional Boundary Con-forming Delaunay Mesh Generation PhDthesis Technische Universitaumlt Berlin2008

C Dana Tomlin Map algebra One perspec-tive Landscape and Urban Planning 30(1ndash2)3ndash12 October 1994

Peter van OosteromWilko Quak and TheoTijssen About invalid valid and cleanpolygons In Peter F Fisher editor Devel-opments in Spatial Data Handlingmdash11th In-ternational Symposium on Spatial DataHan-dling pages 1ndash16 Springer 2004

Jan W van Roessel A new approach toplane-sweep overlay Topological struc-turing and line-segment classificationCartography and Geographic InformationScience 18(1)49ndash67 January 1991

Shuxin Yuan and Chuang Tao Develop-ment of conflation components Geoin-formatics and Socioinformatics pages 1ndash13June 1999

17

  • Introduction
  • Related Work
    • Simple Features and validity of simple polygons
    • Validation of a Planar Partition Using a Planar Graph
    • Repair Using Point and Edge Snapping and Splitting
    • Repair Using Topological Information
      • Validation and Automatic Repair Using a Constrained Triangulation
        • Triangulation of a polygon and constrained triangulation
        • Labelling the triangles of a Planar Partition
        • Validation
        • Repair Operations
        • Extraction of Polygons From a Triangulation
          • Implementation and Experiments
            • Implementation
            • Experiments with real-world planar partition datasets
              • Discussion and Conclusions
Page 11: ValidationandAutomaticRepairofPlanar ... · mightbesnappedtogether,creatingre-peatedvertices,disjointregions,orvar-ioustopologicalproblems. 4.None of the vertices of a polygon shouldbecloserthanthisthreshold

AB

CD

FE

123

45

67 8 9

1011 12

1314 15 16 17

1819

20

21

22

23

24

25

2627 28

29

3130G

H IJ

K

L

MN

O

Figure 9 The traversal order when navigating through triangles in a clockwise mannerStarting at 119860 and the edge between 119860 and 119861 the operations occur accordingto the encircled numbers Black arrows denote when an unprocessed triangle isfound red arrowswhen it is not Notice how the traversal is performed clockwise(shown in dark blue arrows) for both cases despite starting from different sides

formation of whether a section of it is partof an inner or an outer boundary

This polyline is processed with a stack-based algorithm that generates separateclosed rings for the outer boundary andeach of the inner boundaries collapsing theldquobridgesrdquo that were generated In order todo this the polyline is cut at the placeswhere more than two edges join and theseare joined in the correct order by keepingtrack of (yet) unclosed rings When a newsegment is being processed it can be oneof three options one that completes a ringone that is part of a bridge or one thatis not yet closed Closed rings are storedand bridges are removed while unclosedrings are saved in the stack until they can bepopped to forma closed ring togetherwith anew segment This is shown in Figure 10

4 Implementation andExperiments

41 Implementation

An implementation of the algorithms de-scribed in Section 3 was written in the C++

programming language using external li-braries for some functionality The de-veloped software is called pprepair and isopen source and freely available at httptudelft3dgithubcompprepair C++was selected in order to have plenty of con-trol with regards to low level details andto achieve good performance which makesit possible to compare it with existing so-lutions The libraries used are the OGRSimple Features Library9 (which allows in-put and output from a large variety ofdata formats common in GIS) and CGAL10(which has support for many robust spatialdata structures and the operations based onthem we use its constrained triangulationmodule)

42 Experiments with real-worldplanar partition datasets

We havemade experiments with four freelyavailable real-world datasets ie we havevalidated and automatically repaired themwith our implementation the overview ofthese datasets is shown in Figure 11 and9httpwwwgdalorgogr10Computational Geometry Algorithms Library

httpwwwcgalorg

11

AB

a

b

CD

EFG

HIJK c

de

f

Figure 10 Processing the polyline shown starting from vertex 119886 and moving to-wards polyline 119860 the following operations occur (1) 119860 unclosedrarrpush(2) 119861 unclosedrarrpush (3) 119862 unclosedrarrpush (4) 119863 unclosedrarrpush (5)119864 unclosedrarrpush (6) 119865 closedrarrstore (7) poprarr 119864119866 closedrarrstore (8)poprarr 119863119867 degeneraterarrerase (9) poprarr 119862119868 closedrarrstore (10) poprarr 119861119869degeneraterarrerase (11) poprarr 119860119870 closedrarrstore

(a) E41N27 (b) 4tiles

(c) 16tiles (d) Mexico

Figure 11 Overview of the four datasets

their properties in Table 2 The datasets arethe following

E41N27 Corine 200011 tile E41N27 whichcontains a shifted polygon (by about10 cm) creating many small gaps andoverlaps in the dataset The snapping

11Corine is a land cover dataset for 32 Eu-ropean countries It is freely available athttpwwweeaeuropaeudata-and-mapsdatacorine-land-cover-2000-clc2000-seamless-vector-database

threshold has been set at 1 m

4tiles Corine 2000 tiles E39N32 E39N33E40N32 and E40N33 which are knownto have long and thin overlapping re-gions (lt 1 mm) with each other Thesnapping thresholdhas been set at 1 cm

16tiles 16 adjacent Corine 2000 tilesE39N30 E39N31 E39N32 E39N33E40N30 E40N31 E40N32 E40N33E41N30 E41N31 E41N32 E41N33E42N30 E42N31 E42N32 E42E33Some have gaps between one anothersome overlap but match within a fewcentimetres The snapping thresholdhas been set at 10 cm

Mexico 11 000 000 scale land cover datasetfrom INEGI consisting of over 26 000polygons Interestingly it is mostly al-ready valid (according to the Shapefilespecification) but contains some verylarge polygons with tens of thousandsof vertices

As a comparison we have also tried to per-form the same operations with other avail-able software While the capabilities forplanar partition repair among the softwaretested vary considerably with full topolog-ical repair only available in ArcGIS (usingmanual operations only) it is also impor-tant to consider how different repair im-plementations scale to large datasets For

12

Table 2 Properties of the datasets used for the experiments

pts avg pts polygons pts largest polygon per polygon

E41N27 14 969 496 303 26 740 3374tiles 4 984 365 702 16 961 74716tiles 63 868 6 622 133 95 112 1037Mexico 26 866 4 181 354 117 736 1556

this a few performance tests were made inour implementation and three planar par-tition repair tools that perform this processusing snapping and splitting The testingmethodology for each tool is as follows

ArcGIS In ArcCatalog a multiple featuredataset is created in a Geodatabase setwith XY resolution values equal to thesnapping threshold The features areimported into it and themerge and dis-solve operations are used tomerge adja-cent polygons with the same ID Topol-ogy is then generated to check that theplanar partition is valid Everything isfinally exported to a single ShapefileThe individual parts of the process aretimed and the total is recorded Mem-ory usage is calculated as the differ-ence between the just loaded ArcCata-log application and itsmaximummem-ory usage throughout the process

FME A reader is created for each input filewhich serve as input to a Snapper trans-former features with the same IDs arethen dissolved and finally they are out-put into a new Shapefile writer Thetopology generator is used to be ableto tell whether the result is a valid pla-nar partition Results are timed and themaximum memory allocation of thefmeexe process is recorded

GRASS Input files are imported withvinogr with all polygon cleaning op-erations performed and snapping set tothe correct values Boundaries betweenfeatures with the same IDs are thendissolved using vdissolve Files arethen exported with voutogr Timesreported by GRASS are added togetherto give the total while memory usage

by the vinogrexe vdissolveexeand voutogrexe are monitored andtheir maximum is recorded

pprepair Files are read and put into thetriangulation the triangulation is la-belled the repair is performed withthe longest boundary first (see Table 1)with ambiguous cases resolved with arandom choice Polygons are then ex-tracted from the triangulation and out-put to a single Shapefile The entireprocess is timed and the maximumamount of memory used is recorded

Notice that we are somewhat comparingapples with oranges here since our imple-mentation is able to repair more cases thanother methods keeps topological consis-tency and does not require finding out theappropriate threshold value (if it exists)More importantly it is able to directly statewhether the result is a planar partition un-like the three other solutions (with thesesnapping is performed but that does notguarantee that the output is a valid planarpartition) The results of the experimentsare shown in Table 3

We have made the experiments in orderto have an idea of the processing timeand the memory usage involved Onlycases that are acceptably solved by snappingand splitting have been considered sincethere is no other comparable automatedtopological repair tool among those stud-ied that would also work for more complexcases such as those that require a snappingthreshold too high to be practical (eg hor-izontal conflation of independently gener-ated datasets [Yuan and Tao 1999 Davis2003])

13

Table 3 Planar partition repair comparison using large datasets

pprepair ArcGIS FME GRASS

memory time memory time memory time memory time

E41N27 145 MB 19s 145 MB 1m3s 158 MB 31s 59 MB 3m9s4tiles 116 MB 17s 113 MB 37s 105 MB 31s 49 MB 53s16tiles 145 GB 4m47s crashes ndash 636 MB 15m48s crashes ndashMexico 101 GB 3m31s 216 MB gt1d 264 MB 2m45s 408 MB 11m38s

All tests have been run 5 times in a ma-chine just booted and the results averagedto account for the small variations that oc-curred except in the case where the execu-tion was cancelled after one full day due totime limitations or when it causes the pro-gram to crash The hardware is a 266 GHzCore 2DuoMacBookProwith 4GBofRAMArcGIS 93 FME 2010 SP1 and GRASS 64were run in Windows 7 while pprepair wasrun in Mac OS X 1071

As Table 3 shows our approach uses some-what more memory than other solutionsThis is explained by the extra (uncon-strained) edges that are added to the inputover when triangulating them It shouldhowever be noticed that both ArcGIS andGRASS crashed with the biggest dataset16tiles Our implementation is the fastestof the four tested being for instance aroundthree times faster than FME for the biggestdatasets Only for the Mexico dataset isour implementation slower than FME Thisis (probably) explained by the fact that itspolygons already form a valid planar parti-tion and therefore very few snapping op-erations have been performed by FME theplanar graph of the input was basicallybuilt and then the polygons saved back todisk We believe that ArcGIS and GRASSstruggled with the dataset because it con-tains several very large polygons (withmorethan 10 000 vertices) Our implementa-tion took 3m31s but as Table 4 shows ittook CGAL 3m02s to simply triangulate theinput edges This table also demonstratesthe efficiency of our approach for the fourdatasets around 85 of the time of our ap-proach was usedmdashby the CGAL librarymdashtoread the input fromdisk and triangulate the

Table 4 Timed steps of the planar partitionrepair procedure rounded down tothe nearest second The percentageis the triangulation time over thetotal time

E41N27 4tiles 16tiles Mexico

Triangulate 017 015 400 302Label 001 001 027 016Repair 000 000 000 000Reconstruct 001 001 011 007Output 000 000 010 006

Total 019 017 447 331triangulate 89 88 83 86

input edges

During the implementation we took sev-eral engineering decisions to optimise ourprogram One of them was to favour diskspace over computation time as we wantedto be able to process large datasets This iswhy we always reconstruct polygons even ifthey not modified by the repair process Al-though possible it would require us to keepin memory the original polygons (whichwould significantly increase the memoryconsumption) and to keep track of whichlabels have been modified And as Table 4shows the reconstruction is very efficient asit only takes around 3ndash4 of the total timefor the 16tiles and Mexico datasets

14

5 Discussion andConclusions

We have presented a new method for re-pairing polygonal area partitions ensur-ing that the output partition is valid ieall individual polygons are conforming tothe ISO 19107 [ISO 2003] and Simple Fea-tures [OGC 2006] specifications and nogaps nor overlaps are present between anypair of polygons Thenovelty of ourmethodlies in the fact that repair is performed ac-cording to user defined criteria but thentakes place without any human interven-tion Automatic repair is becoming increas-ingly an important topic due to data in-tegration eg data collected for the differ-ent themes of the European INSPIRE Ini-tiative will finally have to fit together anddata could eventually bematched fully auto-matically by dedicated web service compo-nents [INSPIRE 2009] Our approach couldbe at the heart of such a web service

The proposed approach excels in auto-mated repair at the cost of increased mem-ory usage compared to a pure graph-basedapproachmdashthis difference is mainly causedby the unconstrained edges introduced bythe triangulation While it would be possi-ble to use our repair rules together with a(primaldual) graph-based approach theseadditional edges in the triangulation givefine-grained control over the repair oper-ations and ensures that the graph is con-nected which facilitates the reconstructionof polygons

We have implemented our algorithm overa numerically robust triangulator (CGAL)and since repair operations are expressedsolely in terms of re-labelling of triangles(no geometric computation is involved) theapproach is also fully robust Since duringour experiments most of the time was usedto compute the constrained triangulationanother library could also be tested to im-prove the implementation

For the future we plan to

bull Improve the scalability of the approachand process and repair datasets withmore than 10 million polygons It is

known that using divide-and-conquertechniques triangulation algorithmscan handle big datasets [Amenta et al2003 Blandford et al 2005] We willinvestigate whether it is possible toautomatically repair each divided partindividually and lsquogluersquo the repairedparts together

bull Investigate snap rounding [Hobby1999 de Berg et al 2007] as a pre-processing stepmdashor embeddeddirectlyin the triangulation mdash to guaranteethat repaired planar partitions have novertices that are closer than a certain120576 threshold However snap roundingmay change the topology of the inputbut the output will nevertheless be avalid partition (as the topology willbe repaired) Apart from topologicalchanges snap rounding can also leadto removed polygon parts that are toosmall to be preserved based on thechosen 120576

bull Add more advanced repair operationsto our repair toolkit eg repair couldtake place based on splitting a collec-tion of triangles

bull Extend our work to include the thirddimension to validate and repair 3Dcity models using a constrained tetra-hedralisation [Si 2008] Notice thatthe tetrahedralisation of a given poly-hedron does not always exist and thusextra (Steiner) points might need tobe added The main concepts of ourapproach (re)labelling and reconstruc-tion extend naturally to 3D How-ever appropriate repair operations for3D city models would need to be de-fined and some application-specificconstraints (eg right angles at corners)are not trivial to implement in our cur-rent approach

ReferencesNina Amenta Sunghee Choi and Guumln-ter Rote Incremental constructions conBRIO In Proceedings of the 19th An-nual Symposium on Computational Geome-try pages 211ndash219 ACM Press 2003

15

WaelM Badawy andWalidG Aref On localheuristics to speed up polygon-polygonintersection tests In Proceedings of the 7thACM International Symposium on Advancesin Geographic Information Systems pages97ndash102 ACM 1999

M Bader and R Weibel Detecting andresolving size and proximity conflicts inthe generalization of polygonal maps InProceedings of the 18th International Carto-graphic Conference Stockholm pages 1525ndash1532 1997

Daniel K Blandford Guy E BlellochDavid E Cardoze and Clemens KadowCompact representations of simplicialmeshes in two and three dimensionsInternational Journal of ComputationalGeometry and Applications 15(1)3ndash242005

CGAL CGAL 38 User and Reference ManualCGAL Editorial Board 2011

Kenneth L Clarkson Kurt Mehlhorn andRaimund Seidel Four results on random-ized incremental constructions In AlainFinkel and Matthias Jantzen editors Pro-ceedings of the 9th Annual Symposium onTheoretical Aspects of Computer Science vol-ume 577 of Lecture Notes in Computer Sci-ence pages 461ndash474 SpringerBerlin Hei-delberg 1992

Martin Davis Java conflation suite Tech-nical report Vivid Solutions 2003 Avail-able at httpwwwvividsolutionscomjcs

Mark de Berg Dan Halperin and MarkOvermars An intersection-sensitive al-gorithm for snap rounding Computa-tional Geometry 36(3)159 ndash 165 2007 ISSN0925-7721

Mark de Berg Marc van Kreveld MarkOvermars and Otfried SchwarzkopfComputational Geometry Algorithms andApplications Springer-Verlag 3 edition2008

ESRI Shapefile technical descriptionWhite paper ESRI July 1998

Michael A Facello Implementation of arandomized algorithm for Delaunay and

regular triangulations in three dimen-sions Computer Aided Geometric Design 12(4)349ndash370 1995

GTS GTS Library Reference Manual 2006URL httpgtssourceforgenetreferencebook1html

LeonidasGuibas and Jorge Stolfi Primitivesfor the manipulation of general subdivi-sions and the computation of VoronoiACM Transactions on Graphics 4(2)74ndash1231985

John D Hobby Practical segment intersec-tion with finite precision output Com-putational Geometry 13(4)199 ndash 214 1999ISSN 0925-7721

Christoph M Hoffmann John E Hopcroftand Michael S Karasick Towards imple-menting robust geometric computationsIn Proceedings of the 4th Annual Symposiumon Computational Geometry pages 106ndash117ACM 1988

INSPIRE D25 Generic conceptual modelversion 32 Technical report DraftingTeam ldquoData Specificationsrdquo 2009 URLhttpinspirejrceceuropaeudocumentsData_SpecificationsD25_v32pdf

ISO 19107 Geographic information ndash spa-tial schema March 2003

David Kirkpatrick Jack Snoeyink and Bet-tina Speckmann Kinetic collision detec-tion for simple polygons InternationalJournal of Computational Geometry and Ap-plications 12(1ndash2)3ndash27 2002

Robert Laurini and Franccediloise Milleret-Raffort Topological reorganization ofinconsistent geographical databases Astep towards their certification Comput-ers amp Graphics 18(6)803ndash813 December1994

Hugo Ledoux and Martijn Meijers Val-idation of planar partitions using con-strained triangulations In ProceedingsJoint International Conference on TheoryData Handling and Modelling in GeoSpa-tial Information Science pages 51ndash55 HongKong May 2010

16

Hugo Ledoux Ken Arroyo Ohori and Mar-tijn Meijers Automatically repairing in-valid polygons with a constrained trian-gulation In Proceedings of the 15th AGILEInternational Conference on Geographic In-formation Science 2012

Yuanxin Liu and Jack Snoeyink Farawaypoint A sentinel point forDelaunay com-putation International Journal of Compu-tational Geometry and Applications 18(4)343ndash355 November 2006

Avraham Margalit and Gary D Knott Analgorithm for computing the union in-tersection or difference of two poly-gons Computers amp Graphics 13(2)167ndash183 1989

Ernst P Muumlcke Isaac Saias and Bin-hai Zhu Fast randomized point loca-tion without preprocessing in two- andthree-dimensional Delaunay triangula-tions Computational GeometrymdashTheoryand Applications 1263ndash83 1999

OGC OpenGIS Implementation Specificationfor Geographic Information - Simple FeatureAccess - Part 1 Common Architecture 120edition October 2006

Lutz Pluumlmer and Gerhard Groumlger Achiev-ing integrity in geographic informationsystemsmdashmaps and nested maps GeoIn-formatica 1(4)345ndash367 1997

M Rivero and FR Feito Boolean opera-tions on general planar polygons Com-puters amp Graphics 24(6)881ndash896 Decem-ber 2000

Stefan Schirra Precision and Robustness inGeometric Computations volume Algorith-mic Foundations of Geographic Informa-

tion Systems of Lecture Notes in ComputerScience chapter 9 pages 255ndash287 SpringerBerlin Heidelberg 1997

Michael Ian Shamos and Dan Hoey Geo-metric intersection problems In FOCSpages 208ndash215 IEEE 1976

Jonathan Richard Shewchuk Delaunay Re-finement Mesh Generation PhD thesisSchool of Computer Science CarnegieMellon University Pittsburg USA 1997a

Jonathan Richard Shewchuk Adaptiveprecision floating-point arithmetic andfast robust geometric predicates Dis-crete amp Computational Geometry 18305ndash363 1997b

Hang Si Three Dimensional Boundary Con-forming Delaunay Mesh Generation PhDthesis Technische Universitaumlt Berlin2008

C Dana Tomlin Map algebra One perspec-tive Landscape and Urban Planning 30(1ndash2)3ndash12 October 1994

Peter van OosteromWilko Quak and TheoTijssen About invalid valid and cleanpolygons In Peter F Fisher editor Devel-opments in Spatial Data Handlingmdash11th In-ternational Symposium on Spatial DataHan-dling pages 1ndash16 Springer 2004

Jan W van Roessel A new approach toplane-sweep overlay Topological struc-turing and line-segment classificationCartography and Geographic InformationScience 18(1)49ndash67 January 1991

Shuxin Yuan and Chuang Tao Develop-ment of conflation components Geoin-formatics and Socioinformatics pages 1ndash13June 1999

17

  • Introduction
  • Related Work
    • Simple Features and validity of simple polygons
    • Validation of a Planar Partition Using a Planar Graph
    • Repair Using Point and Edge Snapping and Splitting
    • Repair Using Topological Information
      • Validation and Automatic Repair Using a Constrained Triangulation
        • Triangulation of a polygon and constrained triangulation
        • Labelling the triangles of a Planar Partition
        • Validation
        • Repair Operations
        • Extraction of Polygons From a Triangulation
          • Implementation and Experiments
            • Implementation
            • Experiments with real-world planar partition datasets
              • Discussion and Conclusions
Page 12: ValidationandAutomaticRepairofPlanar ... · mightbesnappedtogether,creatingre-peatedvertices,disjointregions,orvar-ioustopologicalproblems. 4.None of the vertices of a polygon shouldbecloserthanthisthreshold

AB

a

b

CD

EFG

HIJK c

de

f

Figure 10 Processing the polyline shown starting from vertex 119886 and moving to-wards polyline 119860 the following operations occur (1) 119860 unclosedrarrpush(2) 119861 unclosedrarrpush (3) 119862 unclosedrarrpush (4) 119863 unclosedrarrpush (5)119864 unclosedrarrpush (6) 119865 closedrarrstore (7) poprarr 119864119866 closedrarrstore (8)poprarr 119863119867 degeneraterarrerase (9) poprarr 119862119868 closedrarrstore (10) poprarr 119861119869degeneraterarrerase (11) poprarr 119860119870 closedrarrstore

(a) E41N27 (b) 4tiles

(c) 16tiles (d) Mexico

Figure 11 Overview of the four datasets

their properties in Table 2 The datasets arethe following

E41N27 Corine 200011 tile E41N27 whichcontains a shifted polygon (by about10 cm) creating many small gaps andoverlaps in the dataset The snapping

11Corine is a land cover dataset for 32 Eu-ropean countries It is freely available athttpwwweeaeuropaeudata-and-mapsdatacorine-land-cover-2000-clc2000-seamless-vector-database

threshold has been set at 1 m

4tiles Corine 2000 tiles E39N32 E39N33E40N32 and E40N33 which are knownto have long and thin overlapping re-gions (lt 1 mm) with each other Thesnapping thresholdhas been set at 1 cm

16tiles 16 adjacent Corine 2000 tilesE39N30 E39N31 E39N32 E39N33E40N30 E40N31 E40N32 E40N33E41N30 E41N31 E41N32 E41N33E42N30 E42N31 E42N32 E42E33Some have gaps between one anothersome overlap but match within a fewcentimetres The snapping thresholdhas been set at 10 cm

Mexico 11 000 000 scale land cover datasetfrom INEGI consisting of over 26 000polygons Interestingly it is mostly al-ready valid (according to the Shapefilespecification) but contains some verylarge polygons with tens of thousandsof vertices

As a comparison we have also tried to per-form the same operations with other avail-able software While the capabilities forplanar partition repair among the softwaretested vary considerably with full topolog-ical repair only available in ArcGIS (usingmanual operations only) it is also impor-tant to consider how different repair im-plementations scale to large datasets For

12

Table 2 Properties of the datasets used for the experiments

pts avg pts polygons pts largest polygon per polygon

E41N27 14 969 496 303 26 740 3374tiles 4 984 365 702 16 961 74716tiles 63 868 6 622 133 95 112 1037Mexico 26 866 4 181 354 117 736 1556

this a few performance tests were made inour implementation and three planar par-tition repair tools that perform this processusing snapping and splitting The testingmethodology for each tool is as follows

ArcGIS In ArcCatalog a multiple featuredataset is created in a Geodatabase setwith XY resolution values equal to thesnapping threshold The features areimported into it and themerge and dis-solve operations are used tomerge adja-cent polygons with the same ID Topol-ogy is then generated to check that theplanar partition is valid Everything isfinally exported to a single ShapefileThe individual parts of the process aretimed and the total is recorded Mem-ory usage is calculated as the differ-ence between the just loaded ArcCata-log application and itsmaximummem-ory usage throughout the process

FME A reader is created for each input filewhich serve as input to a Snapper trans-former features with the same IDs arethen dissolved and finally they are out-put into a new Shapefile writer Thetopology generator is used to be ableto tell whether the result is a valid pla-nar partition Results are timed and themaximum memory allocation of thefmeexe process is recorded

GRASS Input files are imported withvinogr with all polygon cleaning op-erations performed and snapping set tothe correct values Boundaries betweenfeatures with the same IDs are thendissolved using vdissolve Files arethen exported with voutogr Timesreported by GRASS are added togetherto give the total while memory usage

by the vinogrexe vdissolveexeand voutogrexe are monitored andtheir maximum is recorded

pprepair Files are read and put into thetriangulation the triangulation is la-belled the repair is performed withthe longest boundary first (see Table 1)with ambiguous cases resolved with arandom choice Polygons are then ex-tracted from the triangulation and out-put to a single Shapefile The entireprocess is timed and the maximumamount of memory used is recorded

Notice that we are somewhat comparingapples with oranges here since our imple-mentation is able to repair more cases thanother methods keeps topological consis-tency and does not require finding out theappropriate threshold value (if it exists)More importantly it is able to directly statewhether the result is a planar partition un-like the three other solutions (with thesesnapping is performed but that does notguarantee that the output is a valid planarpartition) The results of the experimentsare shown in Table 3

We have made the experiments in orderto have an idea of the processing timeand the memory usage involved Onlycases that are acceptably solved by snappingand splitting have been considered sincethere is no other comparable automatedtopological repair tool among those stud-ied that would also work for more complexcases such as those that require a snappingthreshold too high to be practical (eg hor-izontal conflation of independently gener-ated datasets [Yuan and Tao 1999 Davis2003])

13

Table 3 Planar partition repair comparison using large datasets

pprepair ArcGIS FME GRASS

memory time memory time memory time memory time

E41N27 145 MB 19s 145 MB 1m3s 158 MB 31s 59 MB 3m9s4tiles 116 MB 17s 113 MB 37s 105 MB 31s 49 MB 53s16tiles 145 GB 4m47s crashes ndash 636 MB 15m48s crashes ndashMexico 101 GB 3m31s 216 MB gt1d 264 MB 2m45s 408 MB 11m38s

All tests have been run 5 times in a ma-chine just booted and the results averagedto account for the small variations that oc-curred except in the case where the execu-tion was cancelled after one full day due totime limitations or when it causes the pro-gram to crash The hardware is a 266 GHzCore 2DuoMacBookProwith 4GBofRAMArcGIS 93 FME 2010 SP1 and GRASS 64were run in Windows 7 while pprepair wasrun in Mac OS X 1071

As Table 3 shows our approach uses some-what more memory than other solutionsThis is explained by the extra (uncon-strained) edges that are added to the inputover when triangulating them It shouldhowever be noticed that both ArcGIS andGRASS crashed with the biggest dataset16tiles Our implementation is the fastestof the four tested being for instance aroundthree times faster than FME for the biggestdatasets Only for the Mexico dataset isour implementation slower than FME Thisis (probably) explained by the fact that itspolygons already form a valid planar parti-tion and therefore very few snapping op-erations have been performed by FME theplanar graph of the input was basicallybuilt and then the polygons saved back todisk We believe that ArcGIS and GRASSstruggled with the dataset because it con-tains several very large polygons (withmorethan 10 000 vertices) Our implementa-tion took 3m31s but as Table 4 shows ittook CGAL 3m02s to simply triangulate theinput edges This table also demonstratesthe efficiency of our approach for the fourdatasets around 85 of the time of our ap-proach was usedmdashby the CGAL librarymdashtoread the input fromdisk and triangulate the

Table 4 Timed steps of the planar partitionrepair procedure rounded down tothe nearest second The percentageis the triangulation time over thetotal time

E41N27 4tiles 16tiles Mexico

Triangulate 017 015 400 302Label 001 001 027 016Repair 000 000 000 000Reconstruct 001 001 011 007Output 000 000 010 006

Total 019 017 447 331triangulate 89 88 83 86

input edges

During the implementation we took sev-eral engineering decisions to optimise ourprogram One of them was to favour diskspace over computation time as we wantedto be able to process large datasets This iswhy we always reconstruct polygons even ifthey not modified by the repair process Al-though possible it would require us to keepin memory the original polygons (whichwould significantly increase the memoryconsumption) and to keep track of whichlabels have been modified And as Table 4shows the reconstruction is very efficient asit only takes around 3ndash4 of the total timefor the 16tiles and Mexico datasets

14

5 Discussion andConclusions

We have presented a new method for re-pairing polygonal area partitions ensur-ing that the output partition is valid ieall individual polygons are conforming tothe ISO 19107 [ISO 2003] and Simple Fea-tures [OGC 2006] specifications and nogaps nor overlaps are present between anypair of polygons Thenovelty of ourmethodlies in the fact that repair is performed ac-cording to user defined criteria but thentakes place without any human interven-tion Automatic repair is becoming increas-ingly an important topic due to data in-tegration eg data collected for the differ-ent themes of the European INSPIRE Ini-tiative will finally have to fit together anddata could eventually bematched fully auto-matically by dedicated web service compo-nents [INSPIRE 2009] Our approach couldbe at the heart of such a web service

The proposed approach excels in auto-mated repair at the cost of increased mem-ory usage compared to a pure graph-basedapproachmdashthis difference is mainly causedby the unconstrained edges introduced bythe triangulation While it would be possi-ble to use our repair rules together with a(primaldual) graph-based approach theseadditional edges in the triangulation givefine-grained control over the repair oper-ations and ensures that the graph is con-nected which facilitates the reconstructionof polygons

We have implemented our algorithm overa numerically robust triangulator (CGAL)and since repair operations are expressedsolely in terms of re-labelling of triangles(no geometric computation is involved) theapproach is also fully robust Since duringour experiments most of the time was usedto compute the constrained triangulationanother library could also be tested to im-prove the implementation

For the future we plan to

bull Improve the scalability of the approachand process and repair datasets withmore than 10 million polygons It is

known that using divide-and-conquertechniques triangulation algorithmscan handle big datasets [Amenta et al2003 Blandford et al 2005] We willinvestigate whether it is possible toautomatically repair each divided partindividually and lsquogluersquo the repairedparts together

bull Investigate snap rounding [Hobby1999 de Berg et al 2007] as a pre-processing stepmdashor embeddeddirectlyin the triangulation mdash to guaranteethat repaired planar partitions have novertices that are closer than a certain120576 threshold However snap roundingmay change the topology of the inputbut the output will nevertheless be avalid partition (as the topology willbe repaired) Apart from topologicalchanges snap rounding can also leadto removed polygon parts that are toosmall to be preserved based on thechosen 120576

bull Add more advanced repair operationsto our repair toolkit eg repair couldtake place based on splitting a collec-tion of triangles

bull Extend our work to include the thirddimension to validate and repair 3Dcity models using a constrained tetra-hedralisation [Si 2008] Notice thatthe tetrahedralisation of a given poly-hedron does not always exist and thusextra (Steiner) points might need tobe added The main concepts of ourapproach (re)labelling and reconstruc-tion extend naturally to 3D How-ever appropriate repair operations for3D city models would need to be de-fined and some application-specificconstraints (eg right angles at corners)are not trivial to implement in our cur-rent approach

ReferencesNina Amenta Sunghee Choi and Guumln-ter Rote Incremental constructions conBRIO In Proceedings of the 19th An-nual Symposium on Computational Geome-try pages 211ndash219 ACM Press 2003

15

WaelM Badawy andWalidG Aref On localheuristics to speed up polygon-polygonintersection tests In Proceedings of the 7thACM International Symposium on Advancesin Geographic Information Systems pages97ndash102 ACM 1999

M Bader and R Weibel Detecting andresolving size and proximity conflicts inthe generalization of polygonal maps InProceedings of the 18th International Carto-graphic Conference Stockholm pages 1525ndash1532 1997

Daniel K Blandford Guy E BlellochDavid E Cardoze and Clemens KadowCompact representations of simplicialmeshes in two and three dimensionsInternational Journal of ComputationalGeometry and Applications 15(1)3ndash242005

CGAL CGAL 38 User and Reference ManualCGAL Editorial Board 2011

Kenneth L Clarkson Kurt Mehlhorn andRaimund Seidel Four results on random-ized incremental constructions In AlainFinkel and Matthias Jantzen editors Pro-ceedings of the 9th Annual Symposium onTheoretical Aspects of Computer Science vol-ume 577 of Lecture Notes in Computer Sci-ence pages 461ndash474 SpringerBerlin Hei-delberg 1992

Martin Davis Java conflation suite Tech-nical report Vivid Solutions 2003 Avail-able at httpwwwvividsolutionscomjcs

Mark de Berg Dan Halperin and MarkOvermars An intersection-sensitive al-gorithm for snap rounding Computa-tional Geometry 36(3)159 ndash 165 2007 ISSN0925-7721

Mark de Berg Marc van Kreveld MarkOvermars and Otfried SchwarzkopfComputational Geometry Algorithms andApplications Springer-Verlag 3 edition2008

ESRI Shapefile technical descriptionWhite paper ESRI July 1998

Michael A Facello Implementation of arandomized algorithm for Delaunay and

regular triangulations in three dimen-sions Computer Aided Geometric Design 12(4)349ndash370 1995

GTS GTS Library Reference Manual 2006URL httpgtssourceforgenetreferencebook1html

LeonidasGuibas and Jorge Stolfi Primitivesfor the manipulation of general subdivi-sions and the computation of VoronoiACM Transactions on Graphics 4(2)74ndash1231985

John D Hobby Practical segment intersec-tion with finite precision output Com-putational Geometry 13(4)199 ndash 214 1999ISSN 0925-7721

Christoph M Hoffmann John E Hopcroftand Michael S Karasick Towards imple-menting robust geometric computationsIn Proceedings of the 4th Annual Symposiumon Computational Geometry pages 106ndash117ACM 1988

INSPIRE D25 Generic conceptual modelversion 32 Technical report DraftingTeam ldquoData Specificationsrdquo 2009 URLhttpinspirejrceceuropaeudocumentsData_SpecificationsD25_v32pdf

ISO 19107 Geographic information ndash spa-tial schema March 2003

David Kirkpatrick Jack Snoeyink and Bet-tina Speckmann Kinetic collision detec-tion for simple polygons InternationalJournal of Computational Geometry and Ap-plications 12(1ndash2)3ndash27 2002

Robert Laurini and Franccediloise Milleret-Raffort Topological reorganization ofinconsistent geographical databases Astep towards their certification Comput-ers amp Graphics 18(6)803ndash813 December1994

Hugo Ledoux and Martijn Meijers Val-idation of planar partitions using con-strained triangulations In ProceedingsJoint International Conference on TheoryData Handling and Modelling in GeoSpa-tial Information Science pages 51ndash55 HongKong May 2010

16

Hugo Ledoux Ken Arroyo Ohori and Mar-tijn Meijers Automatically repairing in-valid polygons with a constrained trian-gulation In Proceedings of the 15th AGILEInternational Conference on Geographic In-formation Science 2012

Yuanxin Liu and Jack Snoeyink Farawaypoint A sentinel point forDelaunay com-putation International Journal of Compu-tational Geometry and Applications 18(4)343ndash355 November 2006

Avraham Margalit and Gary D Knott Analgorithm for computing the union in-tersection or difference of two poly-gons Computers amp Graphics 13(2)167ndash183 1989

Ernst P Muumlcke Isaac Saias and Bin-hai Zhu Fast randomized point loca-tion without preprocessing in two- andthree-dimensional Delaunay triangula-tions Computational GeometrymdashTheoryand Applications 1263ndash83 1999

OGC OpenGIS Implementation Specificationfor Geographic Information - Simple FeatureAccess - Part 1 Common Architecture 120edition October 2006

Lutz Pluumlmer and Gerhard Groumlger Achiev-ing integrity in geographic informationsystemsmdashmaps and nested maps GeoIn-formatica 1(4)345ndash367 1997

M Rivero and FR Feito Boolean opera-tions on general planar polygons Com-puters amp Graphics 24(6)881ndash896 Decem-ber 2000

Stefan Schirra Precision and Robustness inGeometric Computations volume Algorith-mic Foundations of Geographic Informa-

tion Systems of Lecture Notes in ComputerScience chapter 9 pages 255ndash287 SpringerBerlin Heidelberg 1997

Michael Ian Shamos and Dan Hoey Geo-metric intersection problems In FOCSpages 208ndash215 IEEE 1976

Jonathan Richard Shewchuk Delaunay Re-finement Mesh Generation PhD thesisSchool of Computer Science CarnegieMellon University Pittsburg USA 1997a

Jonathan Richard Shewchuk Adaptiveprecision floating-point arithmetic andfast robust geometric predicates Dis-crete amp Computational Geometry 18305ndash363 1997b

Hang Si Three Dimensional Boundary Con-forming Delaunay Mesh Generation PhDthesis Technische Universitaumlt Berlin2008

C Dana Tomlin Map algebra One perspec-tive Landscape and Urban Planning 30(1ndash2)3ndash12 October 1994

Peter van OosteromWilko Quak and TheoTijssen About invalid valid and cleanpolygons In Peter F Fisher editor Devel-opments in Spatial Data Handlingmdash11th In-ternational Symposium on Spatial DataHan-dling pages 1ndash16 Springer 2004

Jan W van Roessel A new approach toplane-sweep overlay Topological struc-turing and line-segment classificationCartography and Geographic InformationScience 18(1)49ndash67 January 1991

Shuxin Yuan and Chuang Tao Develop-ment of conflation components Geoin-formatics and Socioinformatics pages 1ndash13June 1999

17

  • Introduction
  • Related Work
    • Simple Features and validity of simple polygons
    • Validation of a Planar Partition Using a Planar Graph
    • Repair Using Point and Edge Snapping and Splitting
    • Repair Using Topological Information
      • Validation and Automatic Repair Using a Constrained Triangulation
        • Triangulation of a polygon and constrained triangulation
        • Labelling the triangles of a Planar Partition
        • Validation
        • Repair Operations
        • Extraction of Polygons From a Triangulation
          • Implementation and Experiments
            • Implementation
            • Experiments with real-world planar partition datasets
              • Discussion and Conclusions
Page 13: ValidationandAutomaticRepairofPlanar ... · mightbesnappedtogether,creatingre-peatedvertices,disjointregions,orvar-ioustopologicalproblems. 4.None of the vertices of a polygon shouldbecloserthanthisthreshold

Table 2 Properties of the datasets used for the experiments

pts avg pts polygons pts largest polygon per polygon

E41N27 14 969 496 303 26 740 3374tiles 4 984 365 702 16 961 74716tiles 63 868 6 622 133 95 112 1037Mexico 26 866 4 181 354 117 736 1556

this a few performance tests were made inour implementation and three planar par-tition repair tools that perform this processusing snapping and splitting The testingmethodology for each tool is as follows

ArcGIS In ArcCatalog a multiple featuredataset is created in a Geodatabase setwith XY resolution values equal to thesnapping threshold The features areimported into it and themerge and dis-solve operations are used tomerge adja-cent polygons with the same ID Topol-ogy is then generated to check that theplanar partition is valid Everything isfinally exported to a single ShapefileThe individual parts of the process aretimed and the total is recorded Mem-ory usage is calculated as the differ-ence between the just loaded ArcCata-log application and itsmaximummem-ory usage throughout the process

FME A reader is created for each input filewhich serve as input to a Snapper trans-former features with the same IDs arethen dissolved and finally they are out-put into a new Shapefile writer Thetopology generator is used to be ableto tell whether the result is a valid pla-nar partition Results are timed and themaximum memory allocation of thefmeexe process is recorded

GRASS Input files are imported withvinogr with all polygon cleaning op-erations performed and snapping set tothe correct values Boundaries betweenfeatures with the same IDs are thendissolved using vdissolve Files arethen exported with voutogr Timesreported by GRASS are added togetherto give the total while memory usage

by the vinogrexe vdissolveexeand voutogrexe are monitored andtheir maximum is recorded

pprepair Files are read and put into thetriangulation the triangulation is la-belled the repair is performed withthe longest boundary first (see Table 1)with ambiguous cases resolved with arandom choice Polygons are then ex-tracted from the triangulation and out-put to a single Shapefile The entireprocess is timed and the maximumamount of memory used is recorded

Notice that we are somewhat comparingapples with oranges here since our imple-mentation is able to repair more cases thanother methods keeps topological consis-tency and does not require finding out theappropriate threshold value (if it exists)More importantly it is able to directly statewhether the result is a planar partition un-like the three other solutions (with thesesnapping is performed but that does notguarantee that the output is a valid planarpartition) The results of the experimentsare shown in Table 3

We have made the experiments in orderto have an idea of the processing timeand the memory usage involved Onlycases that are acceptably solved by snappingand splitting have been considered sincethere is no other comparable automatedtopological repair tool among those stud-ied that would also work for more complexcases such as those that require a snappingthreshold too high to be practical (eg hor-izontal conflation of independently gener-ated datasets [Yuan and Tao 1999 Davis2003])

13

Table 3 Planar partition repair comparison using large datasets

pprepair ArcGIS FME GRASS

memory time memory time memory time memory time

E41N27 145 MB 19s 145 MB 1m3s 158 MB 31s 59 MB 3m9s4tiles 116 MB 17s 113 MB 37s 105 MB 31s 49 MB 53s16tiles 145 GB 4m47s crashes ndash 636 MB 15m48s crashes ndashMexico 101 GB 3m31s 216 MB gt1d 264 MB 2m45s 408 MB 11m38s

All tests have been run 5 times in a ma-chine just booted and the results averagedto account for the small variations that oc-curred except in the case where the execu-tion was cancelled after one full day due totime limitations or when it causes the pro-gram to crash The hardware is a 266 GHzCore 2DuoMacBookProwith 4GBofRAMArcGIS 93 FME 2010 SP1 and GRASS 64were run in Windows 7 while pprepair wasrun in Mac OS X 1071

As Table 3 shows our approach uses some-what more memory than other solutionsThis is explained by the extra (uncon-strained) edges that are added to the inputover when triangulating them It shouldhowever be noticed that both ArcGIS andGRASS crashed with the biggest dataset16tiles Our implementation is the fastestof the four tested being for instance aroundthree times faster than FME for the biggestdatasets Only for the Mexico dataset isour implementation slower than FME Thisis (probably) explained by the fact that itspolygons already form a valid planar parti-tion and therefore very few snapping op-erations have been performed by FME theplanar graph of the input was basicallybuilt and then the polygons saved back todisk We believe that ArcGIS and GRASSstruggled with the dataset because it con-tains several very large polygons (withmorethan 10 000 vertices) Our implementa-tion took 3m31s but as Table 4 shows ittook CGAL 3m02s to simply triangulate theinput edges This table also demonstratesthe efficiency of our approach for the fourdatasets around 85 of the time of our ap-proach was usedmdashby the CGAL librarymdashtoread the input fromdisk and triangulate the

Table 4 Timed steps of the planar partitionrepair procedure rounded down tothe nearest second The percentageis the triangulation time over thetotal time

E41N27 4tiles 16tiles Mexico

Triangulate 017 015 400 302Label 001 001 027 016Repair 000 000 000 000Reconstruct 001 001 011 007Output 000 000 010 006

Total 019 017 447 331triangulate 89 88 83 86

input edges

During the implementation we took sev-eral engineering decisions to optimise ourprogram One of them was to favour diskspace over computation time as we wantedto be able to process large datasets This iswhy we always reconstruct polygons even ifthey not modified by the repair process Al-though possible it would require us to keepin memory the original polygons (whichwould significantly increase the memoryconsumption) and to keep track of whichlabels have been modified And as Table 4shows the reconstruction is very efficient asit only takes around 3ndash4 of the total timefor the 16tiles and Mexico datasets

14

5 Discussion andConclusions

We have presented a new method for re-pairing polygonal area partitions ensur-ing that the output partition is valid ieall individual polygons are conforming tothe ISO 19107 [ISO 2003] and Simple Fea-tures [OGC 2006] specifications and nogaps nor overlaps are present between anypair of polygons Thenovelty of ourmethodlies in the fact that repair is performed ac-cording to user defined criteria but thentakes place without any human interven-tion Automatic repair is becoming increas-ingly an important topic due to data in-tegration eg data collected for the differ-ent themes of the European INSPIRE Ini-tiative will finally have to fit together anddata could eventually bematched fully auto-matically by dedicated web service compo-nents [INSPIRE 2009] Our approach couldbe at the heart of such a web service

The proposed approach excels in auto-mated repair at the cost of increased mem-ory usage compared to a pure graph-basedapproachmdashthis difference is mainly causedby the unconstrained edges introduced bythe triangulation While it would be possi-ble to use our repair rules together with a(primaldual) graph-based approach theseadditional edges in the triangulation givefine-grained control over the repair oper-ations and ensures that the graph is con-nected which facilitates the reconstructionof polygons

We have implemented our algorithm overa numerically robust triangulator (CGAL)and since repair operations are expressedsolely in terms of re-labelling of triangles(no geometric computation is involved) theapproach is also fully robust Since duringour experiments most of the time was usedto compute the constrained triangulationanother library could also be tested to im-prove the implementation

For the future we plan to

bull Improve the scalability of the approachand process and repair datasets withmore than 10 million polygons It is

known that using divide-and-conquertechniques triangulation algorithmscan handle big datasets [Amenta et al2003 Blandford et al 2005] We willinvestigate whether it is possible toautomatically repair each divided partindividually and lsquogluersquo the repairedparts together

bull Investigate snap rounding [Hobby1999 de Berg et al 2007] as a pre-processing stepmdashor embeddeddirectlyin the triangulation mdash to guaranteethat repaired planar partitions have novertices that are closer than a certain120576 threshold However snap roundingmay change the topology of the inputbut the output will nevertheless be avalid partition (as the topology willbe repaired) Apart from topologicalchanges snap rounding can also leadto removed polygon parts that are toosmall to be preserved based on thechosen 120576

bull Add more advanced repair operationsto our repair toolkit eg repair couldtake place based on splitting a collec-tion of triangles

bull Extend our work to include the thirddimension to validate and repair 3Dcity models using a constrained tetra-hedralisation [Si 2008] Notice thatthe tetrahedralisation of a given poly-hedron does not always exist and thusextra (Steiner) points might need tobe added The main concepts of ourapproach (re)labelling and reconstruc-tion extend naturally to 3D How-ever appropriate repair operations for3D city models would need to be de-fined and some application-specificconstraints (eg right angles at corners)are not trivial to implement in our cur-rent approach

ReferencesNina Amenta Sunghee Choi and Guumln-ter Rote Incremental constructions conBRIO In Proceedings of the 19th An-nual Symposium on Computational Geome-try pages 211ndash219 ACM Press 2003

15

WaelM Badawy andWalidG Aref On localheuristics to speed up polygon-polygonintersection tests In Proceedings of the 7thACM International Symposium on Advancesin Geographic Information Systems pages97ndash102 ACM 1999

M Bader and R Weibel Detecting andresolving size and proximity conflicts inthe generalization of polygonal maps InProceedings of the 18th International Carto-graphic Conference Stockholm pages 1525ndash1532 1997

Daniel K Blandford Guy E BlellochDavid E Cardoze and Clemens KadowCompact representations of simplicialmeshes in two and three dimensionsInternational Journal of ComputationalGeometry and Applications 15(1)3ndash242005

CGAL CGAL 38 User and Reference ManualCGAL Editorial Board 2011

Kenneth L Clarkson Kurt Mehlhorn andRaimund Seidel Four results on random-ized incremental constructions In AlainFinkel and Matthias Jantzen editors Pro-ceedings of the 9th Annual Symposium onTheoretical Aspects of Computer Science vol-ume 577 of Lecture Notes in Computer Sci-ence pages 461ndash474 SpringerBerlin Hei-delberg 1992

Martin Davis Java conflation suite Tech-nical report Vivid Solutions 2003 Avail-able at httpwwwvividsolutionscomjcs

Mark de Berg Dan Halperin and MarkOvermars An intersection-sensitive al-gorithm for snap rounding Computa-tional Geometry 36(3)159 ndash 165 2007 ISSN0925-7721

Mark de Berg Marc van Kreveld MarkOvermars and Otfried SchwarzkopfComputational Geometry Algorithms andApplications Springer-Verlag 3 edition2008

ESRI Shapefile technical descriptionWhite paper ESRI July 1998

Michael A Facello Implementation of arandomized algorithm for Delaunay and

regular triangulations in three dimen-sions Computer Aided Geometric Design 12(4)349ndash370 1995

GTS GTS Library Reference Manual 2006URL httpgtssourceforgenetreferencebook1html

LeonidasGuibas and Jorge Stolfi Primitivesfor the manipulation of general subdivi-sions and the computation of VoronoiACM Transactions on Graphics 4(2)74ndash1231985

John D Hobby Practical segment intersec-tion with finite precision output Com-putational Geometry 13(4)199 ndash 214 1999ISSN 0925-7721

Christoph M Hoffmann John E Hopcroftand Michael S Karasick Towards imple-menting robust geometric computationsIn Proceedings of the 4th Annual Symposiumon Computational Geometry pages 106ndash117ACM 1988

INSPIRE D25 Generic conceptual modelversion 32 Technical report DraftingTeam ldquoData Specificationsrdquo 2009 URLhttpinspirejrceceuropaeudocumentsData_SpecificationsD25_v32pdf

ISO 19107 Geographic information ndash spa-tial schema March 2003

David Kirkpatrick Jack Snoeyink and Bet-tina Speckmann Kinetic collision detec-tion for simple polygons InternationalJournal of Computational Geometry and Ap-plications 12(1ndash2)3ndash27 2002

Robert Laurini and Franccediloise Milleret-Raffort Topological reorganization ofinconsistent geographical databases Astep towards their certification Comput-ers amp Graphics 18(6)803ndash813 December1994

Hugo Ledoux and Martijn Meijers Val-idation of planar partitions using con-strained triangulations In ProceedingsJoint International Conference on TheoryData Handling and Modelling in GeoSpa-tial Information Science pages 51ndash55 HongKong May 2010

16

Hugo Ledoux Ken Arroyo Ohori and Mar-tijn Meijers Automatically repairing in-valid polygons with a constrained trian-gulation In Proceedings of the 15th AGILEInternational Conference on Geographic In-formation Science 2012

Yuanxin Liu and Jack Snoeyink Farawaypoint A sentinel point forDelaunay com-putation International Journal of Compu-tational Geometry and Applications 18(4)343ndash355 November 2006

Avraham Margalit and Gary D Knott Analgorithm for computing the union in-tersection or difference of two poly-gons Computers amp Graphics 13(2)167ndash183 1989

Ernst P Muumlcke Isaac Saias and Bin-hai Zhu Fast randomized point loca-tion without preprocessing in two- andthree-dimensional Delaunay triangula-tions Computational GeometrymdashTheoryand Applications 1263ndash83 1999

OGC OpenGIS Implementation Specificationfor Geographic Information - Simple FeatureAccess - Part 1 Common Architecture 120edition October 2006

Lutz Pluumlmer and Gerhard Groumlger Achiev-ing integrity in geographic informationsystemsmdashmaps and nested maps GeoIn-formatica 1(4)345ndash367 1997

M Rivero and FR Feito Boolean opera-tions on general planar polygons Com-puters amp Graphics 24(6)881ndash896 Decem-ber 2000

Stefan Schirra Precision and Robustness inGeometric Computations volume Algorith-mic Foundations of Geographic Informa-

tion Systems of Lecture Notes in ComputerScience chapter 9 pages 255ndash287 SpringerBerlin Heidelberg 1997

Michael Ian Shamos and Dan Hoey Geo-metric intersection problems In FOCSpages 208ndash215 IEEE 1976

Jonathan Richard Shewchuk Delaunay Re-finement Mesh Generation PhD thesisSchool of Computer Science CarnegieMellon University Pittsburg USA 1997a

Jonathan Richard Shewchuk Adaptiveprecision floating-point arithmetic andfast robust geometric predicates Dis-crete amp Computational Geometry 18305ndash363 1997b

Hang Si Three Dimensional Boundary Con-forming Delaunay Mesh Generation PhDthesis Technische Universitaumlt Berlin2008

C Dana Tomlin Map algebra One perspec-tive Landscape and Urban Planning 30(1ndash2)3ndash12 October 1994

Peter van OosteromWilko Quak and TheoTijssen About invalid valid and cleanpolygons In Peter F Fisher editor Devel-opments in Spatial Data Handlingmdash11th In-ternational Symposium on Spatial DataHan-dling pages 1ndash16 Springer 2004

Jan W van Roessel A new approach toplane-sweep overlay Topological struc-turing and line-segment classificationCartography and Geographic InformationScience 18(1)49ndash67 January 1991

Shuxin Yuan and Chuang Tao Develop-ment of conflation components Geoin-formatics and Socioinformatics pages 1ndash13June 1999

17

  • Introduction
  • Related Work
    • Simple Features and validity of simple polygons
    • Validation of a Planar Partition Using a Planar Graph
    • Repair Using Point and Edge Snapping and Splitting
    • Repair Using Topological Information
      • Validation and Automatic Repair Using a Constrained Triangulation
        • Triangulation of a polygon and constrained triangulation
        • Labelling the triangles of a Planar Partition
        • Validation
        • Repair Operations
        • Extraction of Polygons From a Triangulation
          • Implementation and Experiments
            • Implementation
            • Experiments with real-world planar partition datasets
              • Discussion and Conclusions
Page 14: ValidationandAutomaticRepairofPlanar ... · mightbesnappedtogether,creatingre-peatedvertices,disjointregions,orvar-ioustopologicalproblems. 4.None of the vertices of a polygon shouldbecloserthanthisthreshold

Table 3 Planar partition repair comparison using large datasets

pprepair ArcGIS FME GRASS

memory time memory time memory time memory time

E41N27 145 MB 19s 145 MB 1m3s 158 MB 31s 59 MB 3m9s4tiles 116 MB 17s 113 MB 37s 105 MB 31s 49 MB 53s16tiles 145 GB 4m47s crashes ndash 636 MB 15m48s crashes ndashMexico 101 GB 3m31s 216 MB gt1d 264 MB 2m45s 408 MB 11m38s

All tests have been run 5 times in a ma-chine just booted and the results averagedto account for the small variations that oc-curred except in the case where the execu-tion was cancelled after one full day due totime limitations or when it causes the pro-gram to crash The hardware is a 266 GHzCore 2DuoMacBookProwith 4GBofRAMArcGIS 93 FME 2010 SP1 and GRASS 64were run in Windows 7 while pprepair wasrun in Mac OS X 1071

As Table 3 shows our approach uses some-what more memory than other solutionsThis is explained by the extra (uncon-strained) edges that are added to the inputover when triangulating them It shouldhowever be noticed that both ArcGIS andGRASS crashed with the biggest dataset16tiles Our implementation is the fastestof the four tested being for instance aroundthree times faster than FME for the biggestdatasets Only for the Mexico dataset isour implementation slower than FME Thisis (probably) explained by the fact that itspolygons already form a valid planar parti-tion and therefore very few snapping op-erations have been performed by FME theplanar graph of the input was basicallybuilt and then the polygons saved back todisk We believe that ArcGIS and GRASSstruggled with the dataset because it con-tains several very large polygons (withmorethan 10 000 vertices) Our implementa-tion took 3m31s but as Table 4 shows ittook CGAL 3m02s to simply triangulate theinput edges This table also demonstratesthe efficiency of our approach for the fourdatasets around 85 of the time of our ap-proach was usedmdashby the CGAL librarymdashtoread the input fromdisk and triangulate the

Table 4 Timed steps of the planar partitionrepair procedure rounded down tothe nearest second The percentageis the triangulation time over thetotal time

E41N27 4tiles 16tiles Mexico

Triangulate 017 015 400 302Label 001 001 027 016Repair 000 000 000 000Reconstruct 001 001 011 007Output 000 000 010 006

Total 019 017 447 331triangulate 89 88 83 86

input edges

During the implementation we took sev-eral engineering decisions to optimise ourprogram One of them was to favour diskspace over computation time as we wantedto be able to process large datasets This iswhy we always reconstruct polygons even ifthey not modified by the repair process Al-though possible it would require us to keepin memory the original polygons (whichwould significantly increase the memoryconsumption) and to keep track of whichlabels have been modified And as Table 4shows the reconstruction is very efficient asit only takes around 3ndash4 of the total timefor the 16tiles and Mexico datasets

14

5 Discussion andConclusions

We have presented a new method for re-pairing polygonal area partitions ensur-ing that the output partition is valid ieall individual polygons are conforming tothe ISO 19107 [ISO 2003] and Simple Fea-tures [OGC 2006] specifications and nogaps nor overlaps are present between anypair of polygons Thenovelty of ourmethodlies in the fact that repair is performed ac-cording to user defined criteria but thentakes place without any human interven-tion Automatic repair is becoming increas-ingly an important topic due to data in-tegration eg data collected for the differ-ent themes of the European INSPIRE Ini-tiative will finally have to fit together anddata could eventually bematched fully auto-matically by dedicated web service compo-nents [INSPIRE 2009] Our approach couldbe at the heart of such a web service

The proposed approach excels in auto-mated repair at the cost of increased mem-ory usage compared to a pure graph-basedapproachmdashthis difference is mainly causedby the unconstrained edges introduced bythe triangulation While it would be possi-ble to use our repair rules together with a(primaldual) graph-based approach theseadditional edges in the triangulation givefine-grained control over the repair oper-ations and ensures that the graph is con-nected which facilitates the reconstructionof polygons

We have implemented our algorithm overa numerically robust triangulator (CGAL)and since repair operations are expressedsolely in terms of re-labelling of triangles(no geometric computation is involved) theapproach is also fully robust Since duringour experiments most of the time was usedto compute the constrained triangulationanother library could also be tested to im-prove the implementation

For the future we plan to

bull Improve the scalability of the approachand process and repair datasets withmore than 10 million polygons It is

known that using divide-and-conquertechniques triangulation algorithmscan handle big datasets [Amenta et al2003 Blandford et al 2005] We willinvestigate whether it is possible toautomatically repair each divided partindividually and lsquogluersquo the repairedparts together

bull Investigate snap rounding [Hobby1999 de Berg et al 2007] as a pre-processing stepmdashor embeddeddirectlyin the triangulation mdash to guaranteethat repaired planar partitions have novertices that are closer than a certain120576 threshold However snap roundingmay change the topology of the inputbut the output will nevertheless be avalid partition (as the topology willbe repaired) Apart from topologicalchanges snap rounding can also leadto removed polygon parts that are toosmall to be preserved based on thechosen 120576

bull Add more advanced repair operationsto our repair toolkit eg repair couldtake place based on splitting a collec-tion of triangles

bull Extend our work to include the thirddimension to validate and repair 3Dcity models using a constrained tetra-hedralisation [Si 2008] Notice thatthe tetrahedralisation of a given poly-hedron does not always exist and thusextra (Steiner) points might need tobe added The main concepts of ourapproach (re)labelling and reconstruc-tion extend naturally to 3D How-ever appropriate repair operations for3D city models would need to be de-fined and some application-specificconstraints (eg right angles at corners)are not trivial to implement in our cur-rent approach

ReferencesNina Amenta Sunghee Choi and Guumln-ter Rote Incremental constructions conBRIO In Proceedings of the 19th An-nual Symposium on Computational Geome-try pages 211ndash219 ACM Press 2003

15

WaelM Badawy andWalidG Aref On localheuristics to speed up polygon-polygonintersection tests In Proceedings of the 7thACM International Symposium on Advancesin Geographic Information Systems pages97ndash102 ACM 1999

M Bader and R Weibel Detecting andresolving size and proximity conflicts inthe generalization of polygonal maps InProceedings of the 18th International Carto-graphic Conference Stockholm pages 1525ndash1532 1997

Daniel K Blandford Guy E BlellochDavid E Cardoze and Clemens KadowCompact representations of simplicialmeshes in two and three dimensionsInternational Journal of ComputationalGeometry and Applications 15(1)3ndash242005

CGAL CGAL 38 User and Reference ManualCGAL Editorial Board 2011

Kenneth L Clarkson Kurt Mehlhorn andRaimund Seidel Four results on random-ized incremental constructions In AlainFinkel and Matthias Jantzen editors Pro-ceedings of the 9th Annual Symposium onTheoretical Aspects of Computer Science vol-ume 577 of Lecture Notes in Computer Sci-ence pages 461ndash474 SpringerBerlin Hei-delberg 1992

Martin Davis Java conflation suite Tech-nical report Vivid Solutions 2003 Avail-able at httpwwwvividsolutionscomjcs

Mark de Berg Dan Halperin and MarkOvermars An intersection-sensitive al-gorithm for snap rounding Computa-tional Geometry 36(3)159 ndash 165 2007 ISSN0925-7721

Mark de Berg Marc van Kreveld MarkOvermars and Otfried SchwarzkopfComputational Geometry Algorithms andApplications Springer-Verlag 3 edition2008

ESRI Shapefile technical descriptionWhite paper ESRI July 1998

Michael A Facello Implementation of arandomized algorithm for Delaunay and

regular triangulations in three dimen-sions Computer Aided Geometric Design 12(4)349ndash370 1995

GTS GTS Library Reference Manual 2006URL httpgtssourceforgenetreferencebook1html

LeonidasGuibas and Jorge Stolfi Primitivesfor the manipulation of general subdivi-sions and the computation of VoronoiACM Transactions on Graphics 4(2)74ndash1231985

John D Hobby Practical segment intersec-tion with finite precision output Com-putational Geometry 13(4)199 ndash 214 1999ISSN 0925-7721

Christoph M Hoffmann John E Hopcroftand Michael S Karasick Towards imple-menting robust geometric computationsIn Proceedings of the 4th Annual Symposiumon Computational Geometry pages 106ndash117ACM 1988

INSPIRE D25 Generic conceptual modelversion 32 Technical report DraftingTeam ldquoData Specificationsrdquo 2009 URLhttpinspirejrceceuropaeudocumentsData_SpecificationsD25_v32pdf

ISO 19107 Geographic information ndash spa-tial schema March 2003

David Kirkpatrick Jack Snoeyink and Bet-tina Speckmann Kinetic collision detec-tion for simple polygons InternationalJournal of Computational Geometry and Ap-plications 12(1ndash2)3ndash27 2002

Robert Laurini and Franccediloise Milleret-Raffort Topological reorganization ofinconsistent geographical databases Astep towards their certification Comput-ers amp Graphics 18(6)803ndash813 December1994

Hugo Ledoux and Martijn Meijers Val-idation of planar partitions using con-strained triangulations In ProceedingsJoint International Conference on TheoryData Handling and Modelling in GeoSpa-tial Information Science pages 51ndash55 HongKong May 2010

16

Hugo Ledoux Ken Arroyo Ohori and Mar-tijn Meijers Automatically repairing in-valid polygons with a constrained trian-gulation In Proceedings of the 15th AGILEInternational Conference on Geographic In-formation Science 2012

Yuanxin Liu and Jack Snoeyink Farawaypoint A sentinel point forDelaunay com-putation International Journal of Compu-tational Geometry and Applications 18(4)343ndash355 November 2006

Avraham Margalit and Gary D Knott Analgorithm for computing the union in-tersection or difference of two poly-gons Computers amp Graphics 13(2)167ndash183 1989

Ernst P Muumlcke Isaac Saias and Bin-hai Zhu Fast randomized point loca-tion without preprocessing in two- andthree-dimensional Delaunay triangula-tions Computational GeometrymdashTheoryand Applications 1263ndash83 1999

OGC OpenGIS Implementation Specificationfor Geographic Information - Simple FeatureAccess - Part 1 Common Architecture 120edition October 2006

Lutz Pluumlmer and Gerhard Groumlger Achiev-ing integrity in geographic informationsystemsmdashmaps and nested maps GeoIn-formatica 1(4)345ndash367 1997

M Rivero and FR Feito Boolean opera-tions on general planar polygons Com-puters amp Graphics 24(6)881ndash896 Decem-ber 2000

Stefan Schirra Precision and Robustness inGeometric Computations volume Algorith-mic Foundations of Geographic Informa-

tion Systems of Lecture Notes in ComputerScience chapter 9 pages 255ndash287 SpringerBerlin Heidelberg 1997

Michael Ian Shamos and Dan Hoey Geo-metric intersection problems In FOCSpages 208ndash215 IEEE 1976

Jonathan Richard Shewchuk Delaunay Re-finement Mesh Generation PhD thesisSchool of Computer Science CarnegieMellon University Pittsburg USA 1997a

Jonathan Richard Shewchuk Adaptiveprecision floating-point arithmetic andfast robust geometric predicates Dis-crete amp Computational Geometry 18305ndash363 1997b

Hang Si Three Dimensional Boundary Con-forming Delaunay Mesh Generation PhDthesis Technische Universitaumlt Berlin2008

C Dana Tomlin Map algebra One perspec-tive Landscape and Urban Planning 30(1ndash2)3ndash12 October 1994

Peter van OosteromWilko Quak and TheoTijssen About invalid valid and cleanpolygons In Peter F Fisher editor Devel-opments in Spatial Data Handlingmdash11th In-ternational Symposium on Spatial DataHan-dling pages 1ndash16 Springer 2004

Jan W van Roessel A new approach toplane-sweep overlay Topological struc-turing and line-segment classificationCartography and Geographic InformationScience 18(1)49ndash67 January 1991

Shuxin Yuan and Chuang Tao Develop-ment of conflation components Geoin-formatics and Socioinformatics pages 1ndash13June 1999

17

  • Introduction
  • Related Work
    • Simple Features and validity of simple polygons
    • Validation of a Planar Partition Using a Planar Graph
    • Repair Using Point and Edge Snapping and Splitting
    • Repair Using Topological Information
      • Validation and Automatic Repair Using a Constrained Triangulation
        • Triangulation of a polygon and constrained triangulation
        • Labelling the triangles of a Planar Partition
        • Validation
        • Repair Operations
        • Extraction of Polygons From a Triangulation
          • Implementation and Experiments
            • Implementation
            • Experiments with real-world planar partition datasets
              • Discussion and Conclusions
Page 15: ValidationandAutomaticRepairofPlanar ... · mightbesnappedtogether,creatingre-peatedvertices,disjointregions,orvar-ioustopologicalproblems. 4.None of the vertices of a polygon shouldbecloserthanthisthreshold

5 Discussion andConclusions

We have presented a new method for re-pairing polygonal area partitions ensur-ing that the output partition is valid ieall individual polygons are conforming tothe ISO 19107 [ISO 2003] and Simple Fea-tures [OGC 2006] specifications and nogaps nor overlaps are present between anypair of polygons Thenovelty of ourmethodlies in the fact that repair is performed ac-cording to user defined criteria but thentakes place without any human interven-tion Automatic repair is becoming increas-ingly an important topic due to data in-tegration eg data collected for the differ-ent themes of the European INSPIRE Ini-tiative will finally have to fit together anddata could eventually bematched fully auto-matically by dedicated web service compo-nents [INSPIRE 2009] Our approach couldbe at the heart of such a web service

The proposed approach excels in auto-mated repair at the cost of increased mem-ory usage compared to a pure graph-basedapproachmdashthis difference is mainly causedby the unconstrained edges introduced bythe triangulation While it would be possi-ble to use our repair rules together with a(primaldual) graph-based approach theseadditional edges in the triangulation givefine-grained control over the repair oper-ations and ensures that the graph is con-nected which facilitates the reconstructionof polygons

We have implemented our algorithm overa numerically robust triangulator (CGAL)and since repair operations are expressedsolely in terms of re-labelling of triangles(no geometric computation is involved) theapproach is also fully robust Since duringour experiments most of the time was usedto compute the constrained triangulationanother library could also be tested to im-prove the implementation

For the future we plan to

bull Improve the scalability of the approachand process and repair datasets withmore than 10 million polygons It is

known that using divide-and-conquertechniques triangulation algorithmscan handle big datasets [Amenta et al2003 Blandford et al 2005] We willinvestigate whether it is possible toautomatically repair each divided partindividually and lsquogluersquo the repairedparts together

bull Investigate snap rounding [Hobby1999 de Berg et al 2007] as a pre-processing stepmdashor embeddeddirectlyin the triangulation mdash to guaranteethat repaired planar partitions have novertices that are closer than a certain120576 threshold However snap roundingmay change the topology of the inputbut the output will nevertheless be avalid partition (as the topology willbe repaired) Apart from topologicalchanges snap rounding can also leadto removed polygon parts that are toosmall to be preserved based on thechosen 120576

bull Add more advanced repair operationsto our repair toolkit eg repair couldtake place based on splitting a collec-tion of triangles

bull Extend our work to include the thirddimension to validate and repair 3Dcity models using a constrained tetra-hedralisation [Si 2008] Notice thatthe tetrahedralisation of a given poly-hedron does not always exist and thusextra (Steiner) points might need tobe added The main concepts of ourapproach (re)labelling and reconstruc-tion extend naturally to 3D How-ever appropriate repair operations for3D city models would need to be de-fined and some application-specificconstraints (eg right angles at corners)are not trivial to implement in our cur-rent approach

ReferencesNina Amenta Sunghee Choi and Guumln-ter Rote Incremental constructions conBRIO In Proceedings of the 19th An-nual Symposium on Computational Geome-try pages 211ndash219 ACM Press 2003

15

WaelM Badawy andWalidG Aref On localheuristics to speed up polygon-polygonintersection tests In Proceedings of the 7thACM International Symposium on Advancesin Geographic Information Systems pages97ndash102 ACM 1999

M Bader and R Weibel Detecting andresolving size and proximity conflicts inthe generalization of polygonal maps InProceedings of the 18th International Carto-graphic Conference Stockholm pages 1525ndash1532 1997

Daniel K Blandford Guy E BlellochDavid E Cardoze and Clemens KadowCompact representations of simplicialmeshes in two and three dimensionsInternational Journal of ComputationalGeometry and Applications 15(1)3ndash242005

CGAL CGAL 38 User and Reference ManualCGAL Editorial Board 2011

Kenneth L Clarkson Kurt Mehlhorn andRaimund Seidel Four results on random-ized incremental constructions In AlainFinkel and Matthias Jantzen editors Pro-ceedings of the 9th Annual Symposium onTheoretical Aspects of Computer Science vol-ume 577 of Lecture Notes in Computer Sci-ence pages 461ndash474 SpringerBerlin Hei-delberg 1992

Martin Davis Java conflation suite Tech-nical report Vivid Solutions 2003 Avail-able at httpwwwvividsolutionscomjcs

Mark de Berg Dan Halperin and MarkOvermars An intersection-sensitive al-gorithm for snap rounding Computa-tional Geometry 36(3)159 ndash 165 2007 ISSN0925-7721

Mark de Berg Marc van Kreveld MarkOvermars and Otfried SchwarzkopfComputational Geometry Algorithms andApplications Springer-Verlag 3 edition2008

ESRI Shapefile technical descriptionWhite paper ESRI July 1998

Michael A Facello Implementation of arandomized algorithm for Delaunay and

regular triangulations in three dimen-sions Computer Aided Geometric Design 12(4)349ndash370 1995

GTS GTS Library Reference Manual 2006URL httpgtssourceforgenetreferencebook1html

LeonidasGuibas and Jorge Stolfi Primitivesfor the manipulation of general subdivi-sions and the computation of VoronoiACM Transactions on Graphics 4(2)74ndash1231985

John D Hobby Practical segment intersec-tion with finite precision output Com-putational Geometry 13(4)199 ndash 214 1999ISSN 0925-7721

Christoph M Hoffmann John E Hopcroftand Michael S Karasick Towards imple-menting robust geometric computationsIn Proceedings of the 4th Annual Symposiumon Computational Geometry pages 106ndash117ACM 1988

INSPIRE D25 Generic conceptual modelversion 32 Technical report DraftingTeam ldquoData Specificationsrdquo 2009 URLhttpinspirejrceceuropaeudocumentsData_SpecificationsD25_v32pdf

ISO 19107 Geographic information ndash spa-tial schema March 2003

David Kirkpatrick Jack Snoeyink and Bet-tina Speckmann Kinetic collision detec-tion for simple polygons InternationalJournal of Computational Geometry and Ap-plications 12(1ndash2)3ndash27 2002

Robert Laurini and Franccediloise Milleret-Raffort Topological reorganization ofinconsistent geographical databases Astep towards their certification Comput-ers amp Graphics 18(6)803ndash813 December1994

Hugo Ledoux and Martijn Meijers Val-idation of planar partitions using con-strained triangulations In ProceedingsJoint International Conference on TheoryData Handling and Modelling in GeoSpa-tial Information Science pages 51ndash55 HongKong May 2010

16

Hugo Ledoux Ken Arroyo Ohori and Mar-tijn Meijers Automatically repairing in-valid polygons with a constrained trian-gulation In Proceedings of the 15th AGILEInternational Conference on Geographic In-formation Science 2012

Yuanxin Liu and Jack Snoeyink Farawaypoint A sentinel point forDelaunay com-putation International Journal of Compu-tational Geometry and Applications 18(4)343ndash355 November 2006

Avraham Margalit and Gary D Knott Analgorithm for computing the union in-tersection or difference of two poly-gons Computers amp Graphics 13(2)167ndash183 1989

Ernst P Muumlcke Isaac Saias and Bin-hai Zhu Fast randomized point loca-tion without preprocessing in two- andthree-dimensional Delaunay triangula-tions Computational GeometrymdashTheoryand Applications 1263ndash83 1999

OGC OpenGIS Implementation Specificationfor Geographic Information - Simple FeatureAccess - Part 1 Common Architecture 120edition October 2006

Lutz Pluumlmer and Gerhard Groumlger Achiev-ing integrity in geographic informationsystemsmdashmaps and nested maps GeoIn-formatica 1(4)345ndash367 1997

M Rivero and FR Feito Boolean opera-tions on general planar polygons Com-puters amp Graphics 24(6)881ndash896 Decem-ber 2000

Stefan Schirra Precision and Robustness inGeometric Computations volume Algorith-mic Foundations of Geographic Informa-

tion Systems of Lecture Notes in ComputerScience chapter 9 pages 255ndash287 SpringerBerlin Heidelberg 1997

Michael Ian Shamos and Dan Hoey Geo-metric intersection problems In FOCSpages 208ndash215 IEEE 1976

Jonathan Richard Shewchuk Delaunay Re-finement Mesh Generation PhD thesisSchool of Computer Science CarnegieMellon University Pittsburg USA 1997a

Jonathan Richard Shewchuk Adaptiveprecision floating-point arithmetic andfast robust geometric predicates Dis-crete amp Computational Geometry 18305ndash363 1997b

Hang Si Three Dimensional Boundary Con-forming Delaunay Mesh Generation PhDthesis Technische Universitaumlt Berlin2008

C Dana Tomlin Map algebra One perspec-tive Landscape and Urban Planning 30(1ndash2)3ndash12 October 1994

Peter van OosteromWilko Quak and TheoTijssen About invalid valid and cleanpolygons In Peter F Fisher editor Devel-opments in Spatial Data Handlingmdash11th In-ternational Symposium on Spatial DataHan-dling pages 1ndash16 Springer 2004

Jan W van Roessel A new approach toplane-sweep overlay Topological struc-turing and line-segment classificationCartography and Geographic InformationScience 18(1)49ndash67 January 1991

Shuxin Yuan and Chuang Tao Develop-ment of conflation components Geoin-formatics and Socioinformatics pages 1ndash13June 1999

17

  • Introduction
  • Related Work
    • Simple Features and validity of simple polygons
    • Validation of a Planar Partition Using a Planar Graph
    • Repair Using Point and Edge Snapping and Splitting
    • Repair Using Topological Information
      • Validation and Automatic Repair Using a Constrained Triangulation
        • Triangulation of a polygon and constrained triangulation
        • Labelling the triangles of a Planar Partition
        • Validation
        • Repair Operations
        • Extraction of Polygons From a Triangulation
          • Implementation and Experiments
            • Implementation
            • Experiments with real-world planar partition datasets
              • Discussion and Conclusions
Page 16: ValidationandAutomaticRepairofPlanar ... · mightbesnappedtogether,creatingre-peatedvertices,disjointregions,orvar-ioustopologicalproblems. 4.None of the vertices of a polygon shouldbecloserthanthisthreshold

WaelM Badawy andWalidG Aref On localheuristics to speed up polygon-polygonintersection tests In Proceedings of the 7thACM International Symposium on Advancesin Geographic Information Systems pages97ndash102 ACM 1999

M Bader and R Weibel Detecting andresolving size and proximity conflicts inthe generalization of polygonal maps InProceedings of the 18th International Carto-graphic Conference Stockholm pages 1525ndash1532 1997

Daniel K Blandford Guy E BlellochDavid E Cardoze and Clemens KadowCompact representations of simplicialmeshes in two and three dimensionsInternational Journal of ComputationalGeometry and Applications 15(1)3ndash242005

CGAL CGAL 38 User and Reference ManualCGAL Editorial Board 2011

Kenneth L Clarkson Kurt Mehlhorn andRaimund Seidel Four results on random-ized incremental constructions In AlainFinkel and Matthias Jantzen editors Pro-ceedings of the 9th Annual Symposium onTheoretical Aspects of Computer Science vol-ume 577 of Lecture Notes in Computer Sci-ence pages 461ndash474 SpringerBerlin Hei-delberg 1992

Martin Davis Java conflation suite Tech-nical report Vivid Solutions 2003 Avail-able at httpwwwvividsolutionscomjcs

Mark de Berg Dan Halperin and MarkOvermars An intersection-sensitive al-gorithm for snap rounding Computa-tional Geometry 36(3)159 ndash 165 2007 ISSN0925-7721

Mark de Berg Marc van Kreveld MarkOvermars and Otfried SchwarzkopfComputational Geometry Algorithms andApplications Springer-Verlag 3 edition2008

ESRI Shapefile technical descriptionWhite paper ESRI July 1998

Michael A Facello Implementation of arandomized algorithm for Delaunay and

regular triangulations in three dimen-sions Computer Aided Geometric Design 12(4)349ndash370 1995

GTS GTS Library Reference Manual 2006URL httpgtssourceforgenetreferencebook1html

LeonidasGuibas and Jorge Stolfi Primitivesfor the manipulation of general subdivi-sions and the computation of VoronoiACM Transactions on Graphics 4(2)74ndash1231985

John D Hobby Practical segment intersec-tion with finite precision output Com-putational Geometry 13(4)199 ndash 214 1999ISSN 0925-7721

Christoph M Hoffmann John E Hopcroftand Michael S Karasick Towards imple-menting robust geometric computationsIn Proceedings of the 4th Annual Symposiumon Computational Geometry pages 106ndash117ACM 1988

INSPIRE D25 Generic conceptual modelversion 32 Technical report DraftingTeam ldquoData Specificationsrdquo 2009 URLhttpinspirejrceceuropaeudocumentsData_SpecificationsD25_v32pdf

ISO 19107 Geographic information ndash spa-tial schema March 2003

David Kirkpatrick Jack Snoeyink and Bet-tina Speckmann Kinetic collision detec-tion for simple polygons InternationalJournal of Computational Geometry and Ap-plications 12(1ndash2)3ndash27 2002

Robert Laurini and Franccediloise Milleret-Raffort Topological reorganization ofinconsistent geographical databases Astep towards their certification Comput-ers amp Graphics 18(6)803ndash813 December1994

Hugo Ledoux and Martijn Meijers Val-idation of planar partitions using con-strained triangulations In ProceedingsJoint International Conference on TheoryData Handling and Modelling in GeoSpa-tial Information Science pages 51ndash55 HongKong May 2010

16

Hugo Ledoux Ken Arroyo Ohori and Mar-tijn Meijers Automatically repairing in-valid polygons with a constrained trian-gulation In Proceedings of the 15th AGILEInternational Conference on Geographic In-formation Science 2012

Yuanxin Liu and Jack Snoeyink Farawaypoint A sentinel point forDelaunay com-putation International Journal of Compu-tational Geometry and Applications 18(4)343ndash355 November 2006

Avraham Margalit and Gary D Knott Analgorithm for computing the union in-tersection or difference of two poly-gons Computers amp Graphics 13(2)167ndash183 1989

Ernst P Muumlcke Isaac Saias and Bin-hai Zhu Fast randomized point loca-tion without preprocessing in two- andthree-dimensional Delaunay triangula-tions Computational GeometrymdashTheoryand Applications 1263ndash83 1999

OGC OpenGIS Implementation Specificationfor Geographic Information - Simple FeatureAccess - Part 1 Common Architecture 120edition October 2006

Lutz Pluumlmer and Gerhard Groumlger Achiev-ing integrity in geographic informationsystemsmdashmaps and nested maps GeoIn-formatica 1(4)345ndash367 1997

M Rivero and FR Feito Boolean opera-tions on general planar polygons Com-puters amp Graphics 24(6)881ndash896 Decem-ber 2000

Stefan Schirra Precision and Robustness inGeometric Computations volume Algorith-mic Foundations of Geographic Informa-

tion Systems of Lecture Notes in ComputerScience chapter 9 pages 255ndash287 SpringerBerlin Heidelberg 1997

Michael Ian Shamos and Dan Hoey Geo-metric intersection problems In FOCSpages 208ndash215 IEEE 1976

Jonathan Richard Shewchuk Delaunay Re-finement Mesh Generation PhD thesisSchool of Computer Science CarnegieMellon University Pittsburg USA 1997a

Jonathan Richard Shewchuk Adaptiveprecision floating-point arithmetic andfast robust geometric predicates Dis-crete amp Computational Geometry 18305ndash363 1997b

Hang Si Three Dimensional Boundary Con-forming Delaunay Mesh Generation PhDthesis Technische Universitaumlt Berlin2008

C Dana Tomlin Map algebra One perspec-tive Landscape and Urban Planning 30(1ndash2)3ndash12 October 1994

Peter van OosteromWilko Quak and TheoTijssen About invalid valid and cleanpolygons In Peter F Fisher editor Devel-opments in Spatial Data Handlingmdash11th In-ternational Symposium on Spatial DataHan-dling pages 1ndash16 Springer 2004

Jan W van Roessel A new approach toplane-sweep overlay Topological struc-turing and line-segment classificationCartography and Geographic InformationScience 18(1)49ndash67 January 1991

Shuxin Yuan and Chuang Tao Develop-ment of conflation components Geoin-formatics and Socioinformatics pages 1ndash13June 1999

17

  • Introduction
  • Related Work
    • Simple Features and validity of simple polygons
    • Validation of a Planar Partition Using a Planar Graph
    • Repair Using Point and Edge Snapping and Splitting
    • Repair Using Topological Information
      • Validation and Automatic Repair Using a Constrained Triangulation
        • Triangulation of a polygon and constrained triangulation
        • Labelling the triangles of a Planar Partition
        • Validation
        • Repair Operations
        • Extraction of Polygons From a Triangulation
          • Implementation and Experiments
            • Implementation
            • Experiments with real-world planar partition datasets
              • Discussion and Conclusions
Page 17: ValidationandAutomaticRepairofPlanar ... · mightbesnappedtogether,creatingre-peatedvertices,disjointregions,orvar-ioustopologicalproblems. 4.None of the vertices of a polygon shouldbecloserthanthisthreshold

Hugo Ledoux Ken Arroyo Ohori and Mar-tijn Meijers Automatically repairing in-valid polygons with a constrained trian-gulation In Proceedings of the 15th AGILEInternational Conference on Geographic In-formation Science 2012

Yuanxin Liu and Jack Snoeyink Farawaypoint A sentinel point forDelaunay com-putation International Journal of Compu-tational Geometry and Applications 18(4)343ndash355 November 2006

Avraham Margalit and Gary D Knott Analgorithm for computing the union in-tersection or difference of two poly-gons Computers amp Graphics 13(2)167ndash183 1989

Ernst P Muumlcke Isaac Saias and Bin-hai Zhu Fast randomized point loca-tion without preprocessing in two- andthree-dimensional Delaunay triangula-tions Computational GeometrymdashTheoryand Applications 1263ndash83 1999

OGC OpenGIS Implementation Specificationfor Geographic Information - Simple FeatureAccess - Part 1 Common Architecture 120edition October 2006

Lutz Pluumlmer and Gerhard Groumlger Achiev-ing integrity in geographic informationsystemsmdashmaps and nested maps GeoIn-formatica 1(4)345ndash367 1997

M Rivero and FR Feito Boolean opera-tions on general planar polygons Com-puters amp Graphics 24(6)881ndash896 Decem-ber 2000

Stefan Schirra Precision and Robustness inGeometric Computations volume Algorith-mic Foundations of Geographic Informa-

tion Systems of Lecture Notes in ComputerScience chapter 9 pages 255ndash287 SpringerBerlin Heidelberg 1997

Michael Ian Shamos and Dan Hoey Geo-metric intersection problems In FOCSpages 208ndash215 IEEE 1976

Jonathan Richard Shewchuk Delaunay Re-finement Mesh Generation PhD thesisSchool of Computer Science CarnegieMellon University Pittsburg USA 1997a

Jonathan Richard Shewchuk Adaptiveprecision floating-point arithmetic andfast robust geometric predicates Dis-crete amp Computational Geometry 18305ndash363 1997b

Hang Si Three Dimensional Boundary Con-forming Delaunay Mesh Generation PhDthesis Technische Universitaumlt Berlin2008

C Dana Tomlin Map algebra One perspec-tive Landscape and Urban Planning 30(1ndash2)3ndash12 October 1994

Peter van OosteromWilko Quak and TheoTijssen About invalid valid and cleanpolygons In Peter F Fisher editor Devel-opments in Spatial Data Handlingmdash11th In-ternational Symposium on Spatial DataHan-dling pages 1ndash16 Springer 2004

Jan W van Roessel A new approach toplane-sweep overlay Topological struc-turing and line-segment classificationCartography and Geographic InformationScience 18(1)49ndash67 January 1991

Shuxin Yuan and Chuang Tao Develop-ment of conflation components Geoin-formatics and Socioinformatics pages 1ndash13June 1999

17

  • Introduction
  • Related Work
    • Simple Features and validity of simple polygons
    • Validation of a Planar Partition Using a Planar Graph
    • Repair Using Point and Edge Snapping and Splitting
    • Repair Using Topological Information
      • Validation and Automatic Repair Using a Constrained Triangulation
        • Triangulation of a polygon and constrained triangulation
        • Labelling the triangles of a Planar Partition
        • Validation
        • Repair Operations
        • Extraction of Polygons From a Triangulation
          • Implementation and Experiments
            • Implementation
            • Experiments with real-world planar partition datasets
              • Discussion and Conclusions