8
A new method for matching objects in two different geospatial datasets based on the geographic context Jung Ok Kim a , Kiyun Yu a,n , Joon Heo b , Won Hee Lee c a Department of Civil & Environmental Engineering, Seoul National University, 599 Gwanak-ro, Gwanak-gu, Seoul 151-742, Korea b School of Civil & Environmental Engineering, Yonsei University, 262 Seongsan-ro, Seodaemun-gu, Seoul 120-749, Korea c Department of Civil Engineering, Chosun University, 375 Seosuk-dong, Dong-gu, Gwangju 501-759, Korea article info Article history: Received 30 April 2009 Received in revised form 8 March 2010 Accepted 20 April 2010 Keywords: Geographic context Voronoi diagram Object matching Spatial similarity Triangulation abstract Although several methods of handling object matching problems across different datasets have been developed, there is still a need to design new approaches to address the diverse matching applications. Such cases include those where the coordinate differences in datasets are significant, where the shapes of the same objects are dissimilar, or even where the shapes are too similar for different objects. This is especially true, as many large portals worldwide are opening their spatial databases to public access by providing an open application programming interface (API). With this understanding, we propose in this paper a new method for matching objects in different datasets based on geographic context similarity measures. The proposed method employs and combines a set of concepts such as buffer growing, Voronoi diagrams, triangulation, and geometric measurements. This approach is simple in its algorithm but powerful in resolving situations when two datasets have significant coordinate discrepancies. In addition, the concept is highly effective regardless of the shapes of objects. After testing the method for the two major digital datasets in Korea, we found that the matching success rate reached 99.4%. & 2010 Elsevier Ltd. All rights reserved. 1. Introduction The integration of heterogeneous spatial data, which includes their conflating, sharing, and linking, is a very important research topic related to geographic information systems. New information not available from a single data source is available through the integration of spatial data. Integration usually involves matching the common spatial objects in different datasets. There have been various research projects on this topic. Yuan and Tao (1999) obtain centroids of polygon objects and calculate the distances between these for matching. Beeri et al. (2005) compare distances and develop a location-based database to join algorithms for point datasets. Walter and Fritsch (1999) identify and eliminate unlikely matching pairs using relational parameters such as topological information and feature-based parameters such as line angles. osseln and Sester (2004) apply several similarity measures, such as the degree of overlap or the Hausdorff distance (Rucklidge, 2004), to detect corresponding objects. In order to compare the shape similarity of individual objects, a turning function is used to describe the amount of change between two shapes (Arkin et al., 1991; Frank and Ester, 2006; Longin and Lakamper, 2000). In the cases cited, the matching of common objects in different geospatial datasets is for the most part obtained by using geometric methods, because spatial objects have geographic coordinates and shape. Geometric methods imply a comparison of the distance, angle, and shape contrast between objects. In addition, an overlay analysis using a buffer tool, map alignment (Saalfeld, 1988), or rubber-sheeting (Doytsher, 2000) between two datasets are used as well. Geometric methods are good at dealing with some cases but not others. In some circumstances, too much mismatching or under-matching in the result requires some human intervention. To be more specific, in Korea, large-scale apartment complexes are very common and they are composed of a number of spatial objects of the same shape and size. In such cases, matching attempted by geometric methods using, for example, overlay and shape similarity analyses is inaccurate. This is common when the datasets used have coordinate discrepancies of hundreds of meters, which is common in many digital spatial databases because of the adoption of different datum and plane coordinate systems. There have been recent studies on object matching using attribute or semantic information in a geographic context. These use spatial relationships between spatial objects (Cueto et al., Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/cageo Computers & Geosciences 0098-3004/$ - see front matter & 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.cageo.2010.04.003 n Corresponding author. Tel.: + 82 2 880 1355; fax: + 82 2 873 2684. E-mail addresses: [email protected] (J.O. Kim), [email protected] (K. Yu), [email protected] (J. Heo), [email protected] (W.H. Lee). Computers & Geosciences 36 (2010) 1115–1122

A new method for matching objects in two different geospatial datasets based on the geographic context

Embed Size (px)

Citation preview

Page 1: A new method for matching objects in two different geospatial datasets based on the geographic context

Computers & Geosciences 36 (2010) 1115–1122

Contents lists available at ScienceDirect

Computers & Geosciences

0098-30

doi:10.1

n Corr

E-m

jheo@yo

journal homepage: www.elsevier.com/locate/cageo

A new method for matching objects in two different geospatial datasetsbased on the geographic context

Jung Ok Kim a, Kiyun Yu a,n, Joon Heo b, Won Hee Lee c

a Department of Civil & Environmental Engineering, Seoul National University, 599 Gwanak-ro, Gwanak-gu, Seoul 151-742, Koreab School of Civil & Environmental Engineering, Yonsei University, 262 Seongsan-ro, Seodaemun-gu, Seoul 120-749, Koreac Department of Civil Engineering, Chosun University, 375 Seosuk-dong, Dong-gu, Gwangju 501-759, Korea

a r t i c l e i n f o

Article history:

Received 30 April 2009

Received in revised form

8 March 2010

Accepted 20 April 2010

Keywords:

Geographic context

Voronoi diagram

Object matching

Spatial similarity

Triangulation

04/$ - see front matter & 2010 Elsevier Ltd. A

016/j.cageo.2010.04.003

esponding author. Tel.: +82 2 880 1355; fax:

ail addresses: [email protected] (J.O. Kim), k

nsei.ac.kr (J. Heo), [email protected] (W.H

a b s t r a c t

Although several methods of handling object matching problems across different datasets have been

developed, there is still a need to design new approaches to address the diverse matching applications.

Such cases include those where the coordinate differences in datasets are significant, where the shapes

of the same objects are dissimilar, or even where the shapes are too similar for different objects. This is

especially true, as many large portals worldwide are opening their spatial databases to public access by

providing an open application programming interface (API). With this understanding, we propose in

this paper a new method for matching objects in different datasets based on geographic context

similarity measures. The proposed method employs and combines a set of concepts such as buffer

growing, Voronoi diagrams, triangulation, and geometric measurements. This approach is simple in its

algorithm but powerful in resolving situations when two datasets have significant coordinate

discrepancies. In addition, the concept is highly effective regardless of the shapes of objects. After

testing the method for the two major digital datasets in Korea, we found that the matching success rate

reached 99.4%.

& 2010 Elsevier Ltd. All rights reserved.

1. Introduction

The integration of heterogeneous spatial data, which includestheir conflating, sharing, and linking, is a very important researchtopic related to geographic information systems. New informationnot available from a single data source is available through theintegration of spatial data. Integration usually involves matchingthe common spatial objects in different datasets.

There have been various research projects on this topic. Yuanand Tao (1999) obtain centroids of polygon objects and calculatethe distances between these for matching. Beeri et al. (2005)compare distances and develop a location-based database to joinalgorithms for point datasets. Walter and Fritsch (1999) identifyand eliminate unlikely matching pairs using relational parameterssuch as topological information and feature-based parameterssuch as line angles. Gosseln and Sester (2004) apply severalsimilarity measures, such as the degree of overlap or theHausdorff distance (Rucklidge, 2004), to detect correspondingobjects. In order to compare the shape similarity of individualobjects, a turning function is used to describe the amount of

ll rights reserved.

+82 2 873 2684.

[email protected] (K. Yu),

. Lee).

change between two shapes (Arkin et al., 1991; Frank and Ester,2006; Longin and Lakamper, 2000).

In the cases cited, the matching of common objects in differentgeospatial datasets is for the most part obtained by usinggeometric methods, because spatial objects have geographiccoordinates and shape. Geometric methods imply a comparisonof the distance, angle, and shape contrast between objects. Inaddition, an overlay analysis using a buffer tool, map alignment(Saalfeld, 1988), or rubber-sheeting (Doytsher, 2000) betweentwo datasets are used as well.

Geometric methods are good at dealing with some cases butnot others. In some circumstances, too much mismatching orunder-matching in the result requires some human intervention.To be more specific, in Korea, large-scale apartment complexesare very common and they are composed of a number of spatialobjects of the same shape and size. In such cases, matchingattempted by geometric methods using, for example, overlay andshape similarity analyses is inaccurate. This is common when thedatasets used have coordinate discrepancies of hundreds ofmeters, which is common in many digital spatial databasesbecause of the adoption of different datum and plane coordinatesystems.

There have been recent studies on object matching usingattribute or semantic information in a geographic context. Theseuse spatial relationships between spatial objects (Cueto et al.,

Page 2: A new method for matching objects in two different geospatial datasets based on the geographic context

Map 1

A

B

A

B

Map 2

Fig. 1. Geographic context of objects A and B in two maps. Their contexts are

represented as graphs.

J.O. Kim et al. / Computers & Geosciences 36 (2010) 1115–11221116

2000; Samal et al., 2004). Among these, one notable researchpaper by Samal et al. (2004) models geographic context using aproximity or star graph and a mechanism to compute thesimilarity of two geographic contexts. The total vector offset ofthe common objects in both maps was calculated by overlayingtwo proximity graphs. The total vector offset provided by theseobjects being matched is seen to be small. However, thisproximity graph method has a limitation: it selects all the objectswithin a certain radius as targets for the graph, but only some ofthem are used for comparison and thus unnecessary computationis required. Notwithstanding this limitation, the proximity graphapproach has paved a new way to deal with the matching ofobjects based upon their geographic context.

Notwithstanding the progress already made, there is still astrong need to develop other methods under the umbrella of thesame approach. This is especially important considering theapplication potential of object matching. For example, in Korea,many large portals recently released their spatial databases as anopen API strategy. When more spatial databases become open topublic access, more diverse object matching methods will berequired to cover various applications in the matching ofheterogeneous datasets. Therefore, it is with this understandingthat we propose in this paper a new object matching method. Weintroduce a simple but powerful method to analyze thegeographic context of the two objects to be compared. Bymeasuring the context similarity of two objects based on Voronoidiagrams (O’Rourke, 1998) and triangulation geometry, themethod we propose effectively matches two objects in hetero-geneous datasets.

This paper is organized as follows: Section 2 containsa detailed explanation of the proposed matching algorithm.Section 3 presents our experimental results. Our conclusions aregiven in Section 4.

2. Proposed algorithm

2.1. Matching based on geographic context

We propose a new matching method for common objectsusing their geographic context in different datasets. The geo-graphic context refers to the spatial relationship betweenneighboring objects (Samal et al., 2004). For example, as shownin Fig. 1, the geographic context of object A is similar in bothmaps, provided that the selected neighbor objects are the same inthe two maps. This is also true for object B. Here, even though theobjects A and B are hard to discern their shapes, they can bediscriminated in matching by analyzing their geographicrelations. From this point of view, the proposed method tries tomeasure the geographic context similarity of objects. In order toachieve this, a set of traditional concepts such as buffer growing,Voronoi diagrams, and triangulations are adopted and combined.A flowchart showing the process of the proposed method isshown in Fig. 2. This is followed by a detailed explanation.

In the first step, the centroid of an input polygon object iscalculated in the reference dataset and a buffer zone is created.Using such a centroid and corresponding coordinates, a bufferzone of the same radius is created in the target dataset. For thesecond step, a set of landmarks are found by comparing theattributes of objects within the buffer zones in both datasets. A setof centroids for the selected landmark objects are then derived togive a point dataset for the reference and target datasets. Thirdly,a Voronoi diagram is created for each dataset using the derivedlandmark point dataset. In this step, the core Voronoi cell is foundby overlaying the input polygon object over the reference Voronoidiagram. Here, the input polygon object belongs to the core

Voronoi cell. Based on this information, the candidate Voronoi cellis identified using the landmark in the core Voronoi cell where thecore and candidate cells share the same landmark. The candidateobjects for matching in the target dataset are then defined byoverlaying the candidate Voronoi cell over the target dataset. Inthe fourth step, a reference triangulation is constructed based onthe input polygon object and corresponding neighbor landmarksin the reference dataset. Similarly, a series of candidate triangula-tions are made based on the candidate objects and correspondingneighbor landmarks in the target dataset. Finally, the geographiccontext similarities are calculated based on a comparison oftriangulations, which leads to the identification of the matchingpair. In the sections below, we clarify these steps.

2.2. Selection of landmarks

A comparison using the semantic information of datasets isalso referred to as an attribute method. Provided both objectshave a common attribute field and that the semantics of datavalues in that field are the same for both the objects, the objectsmay be matched. In this paper, the matched object pairs arecharacterized as landmarks. Two datasets are analyzed to definethe common attribute field. In Fig. 3, the attribute field ‘‘Name’’ indataset A has a semantic correspondence with ‘‘BD_NM’’ indataset B, such that they constitute a common attribute field.Once the common attribute field is defined, the next step is tocompare the data value of each object in dataset A with that ofeach object in dataset B. Only those objects that have correctsemantic matching are chosen as landmarks, so partially matchedobjects are not selected. In addition, only one-to-one relationshipsare selected, thus excluding landmarks with one-to-manyrelationships.

The selection process is an important part of the proposedmethod. Here, we describe an approach to determine landmarksautomatically in different geospatial datasets. As shown in Fig. 4,the automatic mechanism for selecting landmarks consists of

Page 3: A new method for matching objects in two different geospatial datasets based on the geographic context

Reference

Dataset

Target

Dataset

Calculate centroid of

input polygon object

Create buffer zone Create buffer zone

centroid info.

• Extract common landmarks by comparing name attribute

• Calculate centroid of landmarks

• Build point dataset for landmarks

Voronoi cell with input

polygon objectSelect candidate objects

VoronoiDiagram

cell info.

• Create Reference triangulation

• Save ID, area, perimeter of Δ• Create Candidate triangulation

• Save ID, area, perimeter of Δ

Compare geographic context similarity

Object matching

Fig. 2. Flowchart of proposed method.

ID

11 Polygon ABC 10

12

13 Polygon GHI 6

14

15 Polygon AB 2

ID

21 Point DEF 457001

22

23 Point GHI 457010

24

25 Point ABD 457021

Semantic correspondence: landmark

Dataset A Dataset B

Shape Name Floor ….

Polygon DEF 2

Polygon JKL 3

Shape BD_NM RD_SN ….

Point ABC 457002

Point DEF 457011

Fig. 3. Selection of landmarks from two datasets. Landmarks are matching pairs that have same name attribute in two geospatial datasets.

J.O. Kim et al. / Computers & Geosciences 36 (2010) 1115–1122 1117

three steps: blocking the datasets, selecting the landmarks, andverifying the landmark table.

2.2.1. Step 1: Blocking the datasets

The proposed automatic selection process starts with blockingthe two datasets; i.e., the reference and the target datasets.Thus, the blocking step (Kang et al., 2007; Bilenko et al., 2006)partitions the two datasets from which two regions are derived(Fig. 4, STEP 1). To derive the conjugate regions, an input objectwith location and preset radius are defined and applied to thereference dataset. The corresponding location of the input objecton the target dataset is then selected followed by application ofthe preset radius to create a buffer zone. This buffer zone is the

conjugate region on the target dataset. Once these regions arederived, the datasets within these regions constitute two sets ofcandidate landmarks, one from the reference dataset and theother from the target dataset.

2.2.2. Step 2: Selecting the landmarks

A landmark can be a building or any spatial object that is easilynoticed. It is likely to have the same name in different geospatialdatasets. Therefore, in the second step, the landmarks are selectedby comparing the name attribute of spatial objects within twocandidate landmarks (Fig. 4, STEP 2, upper figure). As the nameattribute is a string, string-matching algorithms (for example,Duda et al., 2001; Cormen et al., 2001; Charras and Lecroq, 2004)

Page 4: A new method for matching objects in two different geospatial datasets based on the geographic context

Reference dataset

Target dataset

Blocking –Reference dataset

STEP 1 STEP 2

Blocking –Target dataset

Compareattributes

Createlandmarks table

REF_IDREF_ID TAR_IDTAR_ID REF_IDREF_ID TAR_IDTAR_ID

INP.INP. NEARNEAR DIST.DIST. ANG.ANG. VERI.VERI.

1 26 1 1 11 22 1 02 12 0 0 02 47 0 0

Updatelandmarks table

Verify

NAMENAME NAMENAME

Fig. 4. Schematic overview of automatic selection approach.

J.O. Kim et al. / Computers & Geosciences 36 (2010) 1115–11221118

are used to compare the name attributes. If a name of thereference candidate landmark is equal to that of any targetcandidate landmark, the objects are determined as a matchingpair. The detailed matching conditions of two names are exactstring matches and a one-to-one relation. If matching pairssatisfying these conditions are found, the matching results arestored with their identification number and location informationin the landmark table (Fig. 4, STEP 2, lower figure).

2.2.3. Step 3: Verifying the landmark table

The final step is to verify the location accuracy of the selectedlandmark in the previous step. This step is necessary in theproposed automatic selection process because only a correctlandmark is used to derive the Voronoi diagram, triangulation,and the geographic context similarity, which play an importantrole in the matching objects. They are described in Sections 2.3and 2.4. Thus, the landmark has to be based on correspondingobject pairs that have the same or similar locations as well as thesame name. In order to verify each pair of landmarks in thelandmark table, we propose a modified fingerprint recognitionapproach (van Wamelen et al., 2004). The proposed method usesthe relative location, such as distance and angle, between alandmark and its neighbor landmark (Fig. 4, STEP 3, upper figure)that can be applied as follows.

01:

Search the k(r) nearest neighbor landmarks of a referencelandmark r in the landmark table.

02:

Find a corresponding target landmark t of a landmark r andthe corresponding k(t) landmarks of the k(r) landmarks in thelandmark table.

03:

Compute the angle areference and distance breference betweenlandmark r and landmark k(r).

04:

Compute the angle atarget and distance btarget between land-mark t and landmark k(t)

05:

Da¼9areference�atarget9oTa, where Da is the angle differencebetween the reference and target; a is the angle between astraight line connecting a landmark and its neighbor and thex-axis, and Ta, is a threshold.

06:

Db¼9breference�btarget9oTb, where Db is the distance differ-ence between the reference and the target; b is the Euclidian

distance between a landmark and its neighbor, and Tb is athreshold.

Here, if a landmark r and landmark t satisfies all the criteria,they will be confirmed as the landmark, while others will berejected and excluded. The landmark table is then updated (Fig. 4,STEP 3, lower figure).

2.3. Voronoi diagrams and triangulations

In the literature (for example, Samal et al., 2004), thegeographic context is usually represented by a proximity graphthat defines objects as points connected by lines. The graph isevaluated using parameters such as distance, angle, and direction.Rather than using such a proximity graph, in this paper, theVoronoi diagram, triangulation, and the corresponding geometricmeasurements are used.

There are two advantages in using the Voronoi diagram.Firstly, this allows the definition of a set of contiguous polygons.As shown in Fig. 5(a), polygons having landmarks from L1 to L7are the contiguous polygons. By defining these features, we canselect several critical landmarks. In other words, it is possible tofilter out those landmarks having a minor influence whenanalyzing the geographic context of the input polygon objectand the surrounding landmarks. Contiguous polygons commonlyfigure in computational geometry, together with algorithmicconsiderations (for example, O’Rourke, 1998). Secondly, theVoronoi diagram allows the definition of a limited number ofcandidate objects in the target dataset. From Fig. 5(b), the objectslabeled from C1 to C7 are the candidate objects in the targetdataset. Identification of the candidate objects occurs as severalsubprocesses. A Voronoi diagram consists of a set of Voronoi cellsthat contain corresponding landmarks. A cell with the inputpolygon object is called the core cell. By using the landmark in thecore Voronoi cell, the corresponding candidate Voronoi cell in thetarget dataset is selected. The two Voronoi cells share the samelandmark. Once the candidate cell is defined, the candidateobjects that are completely or partially within it are chosen. Theselection of candidate objects renders other objects in the target

Page 5: A new method for matching objects in two different geospatial datasets based on the geographic context

L1

L2

L3 L4

L5

L6

L7

C1

C2

C3

C7

C6

C5

C4

Fig. 5. Identification of candidate objects and derivation of triangulation with critical landmarks: (a) critical neighbor landmarks (left), (b) candidate objects (middle), and

(c) triangulation (right).

J.O. Kim et al. / Computers & Geosciences 36 (2010) 1115–1122 1119

dataset immune to comparison. This helps to reduce the searchspace in the target dataset.

The processes to define candidate objects in the target datasetfor matching are as follows.

01:

Define VDðRef Þ ¼ VCr1 [ VCr

2 [ VCr3 [ . . . [ VCr

n,VDðTarÞ ¼ VCt

1 [ VCt2 [ VCt

3 [ . . . [ VCtn;

And {landmark1, landmark2 y landmarkn}AVD (Ref),{landmark1, landmark2 y landmarkn}AVD (Tar).

02:

Overlay (centroid of input polygon object, VD (Ref))-core VCri ,

where landmarkcAcore VCri .

03:

Find landmarkc¼ landmarkt,where landmarktAcandidate VCt

j .

04: Overlay (target dataset, candidate VCt

j )-candidate objects formatching.

In this procedure, VD (Ref) is the reference Voronoi diagram,VD (Tar) is the target Voronoi diagram, VCr represents referenceVoronoi cells, VCt represents target Voronoi cells, landmarkc islandmark in core Voronoi cell (core VCr), landmarkt is landmark incandidate Voronoi cell (candidate VCt).

Once the Voronoi diagrams in both the datasets are delineated,the next step is to derive the triangulations. As shown in Fig. 5(c),an object in a Voronoi cell and a total of n surrounding landmarksmake a total of n triangles in a triangulation. In the referenceVoronoi diagram, a reference triangulation is made around theinput polygon object, whereas a number of candidate triangula-tions are made in the target Voronoi diagram, as there are anumber of candidate objects. Using the area and perimeter of thetriangles in both the reference and candidate triangulations, theratios of the areas and perimeters are then calculated andcompared.

2.4. Calculating geographic context similarity

In this section, a method to analyze the geographic contextsimilarity using the areas and perimeters of the reference andcandidate triangulations is introduced. Fig. 6 shows a referencetriangulation RT¼{rt1, rt2 y rtn}, where rtn is a single trianglecomposing RT and n is the number of triangles. An rtn in thereference triangulation has the identification number, area, andperimeter as attributes: rtn¼{Ref_ID, Ref_arean, Ref_perimetern}.Likewise, there is a candidate triangulation CTi ¼ fcti

1,cti2 . . . cti

ng,where i indicates ith candidate triangulation, ctn is a singletriangle composing CTi and n is the number of triangles of the CTi.

The ith ctn in the candidate triangulation also has attributes ofcti

n ¼ fCan_ID, Can_areain, Can_perimeteri

ng.If two differently sized triangles have the same shape, the ratio

of their areas is equal to the square of the ratio of their perimeters.Thus, for triangles x and y, the area ratio has the followingrelationship with the perimeter ratio:

ðareax=areayÞ ¼ ðperimeterx=perimeteryÞ2: ð1Þ

If the two triangles have the same shape then the difference E

between the area ratio and squared perimeter ratio equals zero.

E¼ 9ratio of area-ðratio of perimeterÞ29: ð2Þ

As the dissimilarity between the two triangles increases, E alsoincreases. Using such a relationship, we deduced the geographiccontext similarity of the two triangulations by calculating thetotal E using all the rt in the reference triangulation and the entirect in the candidate triangulation

Total E¼Xn

i ¼ 1

Ei, ð3Þ

where n indicates the total number of triangles in a triangulation.The total E is calculated for the reference triangulation and all ofthe candidate triangulations. The minimum total E indicates thecandidate triangulation most similar to the reference triangula-tion, which in turn indicates the matched objects.

3. Experiment and results analysis

3.1. Datasets used

The datasets used in this paper are the Digital TopographicMap (Topo Map) Version 2.0 and KLIS-rn (Korea Land InformationSystem-road name) in Korea. The Topo Map is a nationaltopographic base map that has a multilayered data structurewith point, line, polygon object, and corresponding attributetables. The KLIS-rn is produced to accommodate the new addresssystem based on street name and building number to assign anaddress to each building object. (The traditional address systemwas based on the parcel numbers of a cadastral survey.) Some ofdetails of the datasets used are explained in Table 1.

Page 6: A new method for matching objects in two different geospatial datasets based on the geographic context

Reference Triangulation (RT) Candidate Triangulations (CTs)

ct1ct2

ct3

ct4 ct5ct6

ct7

CT1 CT2 CT3 CT4

CT5 CT6 CTi

rt1rt2

rt3

rt4

rt5rt6

rt7

····

ct1ct2

ct3

ct4 ct5ct6

ct7ct1

ct2

ct3ct4 ct5

ct6

ct7

ct1ct2

ct3

ct4ct5

ct6

ct7

ct1ct2

ct3

ct4 ct5ct6

ct7

ct1

ct2

ct3ct4 ct5

ct6

ct7

ct1ct2

ct3

ct4 ct5ct6

ct7

Ref_ID Ref_area Ref_perimeter

1 5057.82 337.61

2 4915.07 334.12

3 5064.78 327.02

4 2293.87 233.98

5 1485.78 185.66

6 857.86 166.90

7 4059.77 352.41

Fig. 6. Examples of calculation of geographic context similarity. Reference triangulation (upper left), its attribute table (lower left), and candidate triangulations (right).

Table 1Specifications of test datasets used.

Reference dataset Target dataset

Name KLIS-rn Topo Map (V2.0)

Provider Ministry of Public Administration and Security Korea National Geographic Information Institute (NGI)

Format Shape NGI

Scale 1:5000 1:1000 and 1:5000

Layer code of interest Building B001 (building)

Attributes building number, road number, name, floor, zip code, etc. name, category, type, usage, annotation, unique feature identifier (UFID)

J.O. Kim et al. / Computers & Geosciences 36 (2010) 1115–11221120

3.2. Analysis

The KLIS-rn is based on the Bessel system, whereas the Topo Mapis based on the Geodetic Reference System 80 (GRS80). Conse-quently, there are coordinate differences for the same object with amaximum value of up to 300 m (Cho et al., 2008). Therefore, in thispaper the minimum buffer radius for the landmarks search was setto 300 m. In the case of insufficient landmarks being available forextraction from both the datasets, the buffer radius was increased.Such an approach is usually called buffer growing. A sufficientnumber of landmarks allow the designation of a set of contiguouspolygons, with none missing from around the input polygon object.In these experiments, the buffer radius was increased up to 100 m.

Once the number of landmarks is extracted, Voronoi diagramsare created for both the datasets. Thereafter, the core Voronoi cellin the reference diagram and the candidate cell in the targetdiagram are selected. From the candidate Voronoi cell, a numberof candidate objects are detected. For the input polygon object, areference triangulation is then produced in the reference dataset.In the target dataset, a number of candidate triangulations areproduced for each of the candidate objects. From these triangula-tions, a series of computations for geographic context similaritiesare processed. Thus, for the input polygon object, each of thecandidate objects is compared, and the corresponding geographiccontext similarity value is computed. A total of 346 input polygonobjects were selected and tested. Fig. 7 shows several inputpolygon objects (from A to E in successive columns) andcorresponding candidate objects (from 1 to 7 in successiverows) along with their geographic context similarity values.

As shown in Fig. 7, the minimum value along each rowindicates the matched object. For example, in the case of inputpolygon A, the minimum value along the first row is 0.095 andobject number 5 is the one matched. Test results for the 346 inputpolygons show that the matching rate is a satisfactory 99.4%. Onlytwo out of 346 cases are mismatched (Table 2a). Of the 346 cases,30 have a one-to-many relationship. This implies that an object inthe target dataset is broken into many objects in the referencedataset. In addition, among the 344 matched cases, 15 havedifferent object shapes and 13 cases have very similar objectshapes (Table 2b). This implies that the method works wellregardless of object shape similarity. In addition, to see whetherthe proposed method works for different scales of datasets, a1:1000 scale target dataset was used and tested. The twomismatched objects previously mentioned are from this 1:1000scale target dataset (Table 2c).

For the two mismatched cases, we analyzed the valueof geographic context similarity. As shown in Fig. 8, in boththe cases, the minimum geographic context similarity valuehas a relatively small difference from the next minimum value.To see the impact of an increasing number of landmarks, weadded several such objects in both the cases and the resultsshow that their differences were reduced and the rankwas reversed. Correct matching is thus indicated. From this test,it is concluded that if the difference between a minimumand the next minimum value is within a threshold, it isdesirable to increase the number of landmarks. This impliesthat the identification of an appropriate threshold value isnecessary.

Page 7: A new method for matching objects in two different geospatial datasets based on the geographic context

Candidate objects

Inpu

t ob

ject

s

0.779 0.709 2.021 0.791 0.095 0.712 4.768

0.814 0.200 2.394 1.229 1.125 1.366 5.088

C 0.920 0.867 2.730 0.061 0.739 1.692 2.946

0.703 0.819 1.412 1.239 0.551 0.140 5.481

1.480 1.426 0.119 3.123 2.402 2.575 10.340

1 2 3 4 5 6 7

A

B

D

E

Best fit (min. value) Correct match

Fig. 7. Geographic context similarity between two datasets.

Table 2Results of matching test datasets.

(a) Matching cardinality

Input polygon object Target object Number of test objects Number of correct matches

1 1 316 314

N 1 30 30

Total number of test objects 346 344

(b) Shape

Reference data Target data Number of tests Number of correct matches

Different shapes for the same object 15 15

Same shape for the different objects 13 13

(c) Scale

Reference dataset Target dataset Number of test objects Number of correct matches

1:5000 1:5000 240 240

1:5000 1:1000 106 104

Candidate objects

I

Inputobjects

A

critical landmarks used 0.362 0.490

added landmark 0.281

B

critical landmarks used

0.108 0.091

added landmark 0.174

Best fit (min. value) Correct match

0.498

0.485

J

Fig. 8. Mismatched case and calculation of geographic context similarity by

adding a landmark.

J.O. Kim et al. / Computers & Geosciences 36 (2010) 1115–1122 1121

4. Conclusion

In this paper, we proposed a new object matching methodbased on geographic context similarity measures. For this, a set oftraditional concepts such as buffer growing, Voronoi diagrams,triangulation, and geometry measures have been adopted andcombined. The proposed method was tested for the DigitalTopographic Map and KLIS-rn datasets in Korea, in which theformer was the target dataset and the latter was the referencedataset. Once a set of common landmarks was extracted fromthese two datasets, Voronoi diagrams from the reference andtarget dataset were generated. From the landmark in the coreVoronoi cell, the candidate Voronoi cell was identified, and thus aset of candidate objects in the target dataset were defined. Thisprocess was followed by delineation of reference and candidatetriangulations for computation to examine the geographic contextsimilarities between the triangulations.

Page 8: A new method for matching objects in two different geospatial datasets based on the geographic context

J.O. Kim et al. / Computers & Geosciences 36 (2010) 1115–11221122

From the test results, it was found that two out of 346 selectedobjects were mismatched. The result shows a success rate of up to99.4%. The algorithm worked well regardless of the object shapes.Furthermore, the algorithm was also effective for different scalesof datasets. The proposed method is very simple in concept butsufficiently powerful to match objects from different datasets thathave significant coordinate discrepancies. Through the adoptionof the Voronoi diagram, in particular, the method requires a verylimited number of landmarks and a small object search region.

Although the method is believed to be very useful, there areissues for further investigation concerning its practical applications.One matter is the selection of landmarks. In this paper, a set ofcommon landmarks was selected manually from within twodatasets, but this is not acceptable in practical applications. A newapproach is required to automate the landmark selection process. Afurther issue is the positioning method of the proposed approachwhen compared with that derived by the proximity graph method ofSamal et al. (2004). We must be aware of those cases in which theproposed method is particularly useful in contrast to the proximitygraph method. Having such knowledge will guide users to applyeither method appropriately, so that they are used in a comple-mentary manner. These two research topics are being addressedjointly and the results will be presented in the near future.

Acknowledgement

This research was supported by a grant (07KLSGC04) fromCutting-edge Urban Development of the Korean Land SpatializationResearch Project funded by the Ministry of Construction & Transpor-tation of the Korean government. In addition, the authors appreciatethe support of the Integrated Research Institute of Construction andEnvironmental Engineering at Seoul National University, Korea.

References

Arkin, E.M., Chew, L.P., Huttenlocher, D.P., Kedem, K., Mitchell, J.S.B., 1991. Anefficiently computable metric for comparing polygonal shapes. Institute ofElectrical and Electronics Engineers Transactions on Pattern Analysis andMachine Intelligence 13 (3), 209–216.

Beeri, C., Doytsher, Y., Kanza, Y., Safra, E., Sagiv, Y., 2005. Finding correspondingobjects when integrating several geo-spatial datasets. In: Proceedings of the13th Association for Computing Machinery International Workshop onGeographic Information Systems, Bremen, Germany, pp. 87–96.

Bilenko, M., Kamath, B., Mooney, R.J., 2006. Adaptive blocking: learning to scale uprecord linkage. In: Proceedings of the sixth Institute of Electrical andElectronics Engineers International Conference on Data Mining, Hong Kong,pp. 87–96.

Charras, C., Lecroq, T., 2004. Handbook of Exact String Matching Algorithms. King’sCollege Publications, London, UK 256pp.

Cho, J.K., Choi, Y.S., Kwon, J.H., Lee, B.M., 2008. A study on the accuracy analysis ofthe world geodetic system transformation for GIS base map and database.Journal of Korean Society for Geospatial Information System 16 (3), 79–85[in Korean].

Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C., 2001. Introduction to Algorithms2nd edn. The MIT Press, Cambridge, MA, USA 1184pp.

Cueto, K., Samal, A., Seth, S., 2000. Context-based similarity for GIS featurematching. In: Proceedings of the International Conference on GeographicInformation Science 2000, Savannah, Georgia, USA, pp. 288–290.

Doytsher, Y., 2000. A rubber sheeting algorithm for non-rectangular maps.Computers & Geosciences 26 (9–10), 1001–1010.

Duda, R.O., Hart, P.E., Stork, D.G., 2001. Pattern Classification 2nd edn. A Wiley-Interscience Publication, New York, NY 654pp.

Frank, R., Ester, M., 2006. A quantitative similarity measure for maps. In:Proceedings of 12th International Symposium on Spatial Data Handling,Vienna, Austria, pp. 435–450.

Gosseln, Gv, Sester, M., 2004. Integration of geoscientific datasets and the Germandigital map using a matching approach. In: Proceedings of the XXthInternational Society for Photogrammetry and Remote Sensing Congress,Comm. IV, Istanbul, Turkey, pp. 1249–1254.

Kang, H., Sehgal, V., Getoor, L., 2007. GeoDDupe: a novel interface for interactiveentity resolution in geospatial data. In: Proceedings of 11th InternationalConference Information Visualization, Zurich, Switzerland, pp. 489–496.

Longin, J.L., Lakamper, R., 2000. Shape similarity measure based on correspondenceof visual parts. Institute of Electrical and Electronics Engineers Transactions onPattern Analysis and Machine Intelligence 22 (10) 1185–1190.

O’Rourke, J., 1998. Computational Geometry in C 2nd edn. Cambridge UniversityPress, New York, NY 376pp.

Rucklidge, W.J., 2004. Efficiently locating objects using the Hausdorff distance.International Journal of Computer Vision 24 (3), 251–270.

Saalfeld, A., 1988. Conflation: automated map compilation. International Journal ofGeographical Information Systems 2 (3), 217–228.

Samal, A., Seth, S., Cueto, K., 2004. A feature-based approach to conflation ofgeospatial source. International Journal of Geographical Information Science18 (5), 459–489.

van Wamelen, P.B., Li, Z., Iyengar, S.S., 2004. A fast expected time algorithm for the2-D point pattern matching problem. Pattern Recognition 37 (8), 1699–1711.

Walter, V., Fritsch, D., 1999. Matching spatial datasets: a statistical approach.International Journal of Geographical Science 13 (5), 445–473.

Yuan, S., Tao, C., 1999. Development of conflation components. In: Proceedings ofthe International Conference on Geoinformatics and Socioinformatics, AnnArbor, Michigan, USA, pp. 1–13.