25
Advanced Information Systems Laboratory http://iaaa.cps.unizar.es Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21: "Ontologies for Urban Development: Interfacing Urban Information Systems" Building an Address Gazetteer on top of Urban Network Ontology J.Nogueras-Iso, F.J.López, J.Lacasta, F.J.Zarazaga-Soria, P.R.Muro-Medrano Geneva, 6-7 November 2006

Advanced Information Systems Laboratory Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21:

Embed Size (px)

Citation preview

Page 1: Advanced Information Systems Laboratory  Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21:

Advanced Information Systems Laboratory

http://iaaa.cps.unizar.esDepartment of Computer Science and Systems Engineering

1st Workshop of COST Action C21:

"Ontologies for Urban Development: Interfacing Urban Information Systems"

Building an Address Gazetteer on top of anUrban Network Ontology

J.Nogueras-Iso, F.J.López, J.Lacasta, F.J.Zarazaga-Soria, P.R.Muro-Medrano

Geneva, 6-7 November 2006

Page 2: Advanced Information Systems Laboratory  Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21:

2

Outline

1. Introduction 2. A typical use-case: IDEZar 3. Ontology building using a manual mapping 4. Ontology building using an automated

approach 5. Conclusions

Page 3: Advanced Information Systems Laboratory  Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21:

3

1. Introduction

The increasing relevance of geographic information for decision-making and resource management in diverse areas promoted the creation of Spatial Data Infrastructures (SDI)

SDI: a coordinated approach to technology, policies, standards, and human resources necessary for the effective acquisition, management, distribution and utilization of GI at different organization levels and involving both public and private institutions

Gazetteer Service A typical component of an SDI Directory of instances of a class or classes of features

containing some information regarding position Looks up geographic feature locations based on geographic

identifiers

Page 4: Advanced Information Systems Laboratory  Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21:

4

Address Gazetteer Service

In SDIs for local administrations such as a city council, address gazetteer services represent one of the most

important services that the councils must offer to their citizens

An Address Gazetteer Service Specialized on Urban Network Features (addresses) The councils are responsible for the management of

urban networks, and these networks are used as reference information for other services at national level such as cadaster or census services

Page 5: Advanced Information Systems Laboratory  Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21:

5

Creation of the contents of a gazetteer

It usually requires combining multiple repositories The same feature (concept) is stored in different

repositories, each of them contributing with a different piece of attribute information

Typical problems of heterogeneityDifferent data models (roles, granularity), encoding

Our proposal to deal with heterogeneity in this context: Build an urban network ontology upon existing feature

types taxonomies

Page 6: Advanced Information Systems Laboratory  Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21:

6

2. A typical use-case: IDEZar

The IDEZar Project is the result of a collaboration agreement signed in March 2004 between the City Council and the University of Zaragoza Zaragoza is a medium-sized city (some 650000 inhabitants),

in the northeast of Spain (capital of Aragón), growing fast in extension and population. The municipality is about 1000 km2 and includes several towns

Objective: development of a local SDI for Zaragoza To facilitate, increase and coordinate the use of spatial data

by the Council To develop applications for the citizens and to provide them

with access to public sector information

Page 7: Advanced Information Systems Laboratory  Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21:

7

IDEZar Service Architecture http://www.zaragoza.es/idezar/

<<WCAS>>•Catalog

<<WMS>>Urban-Thematic

•Public services (libraries, police stations...)•Private services (pharmacies, parkings...)

<<WMS>>Base

•Street maps

<<WMS>>Environment-Thematic

•Agenda 21, protected areas...

<<Gazetteer>>•Street names

<<Route planner>>•Arriving at Zaragoza

IDEZar (Local SDI)

<<Gazetteer>>IDEE-Nomenclátor

• Toponyms

<<WMS>>IDEE-Base

• Base map up to 1:25000 of Spain

IDEE(National SDI)

<<WMS>>Base

•Orthoimages

IDEAr (Aragón – Regional SDI)

GeoPortalStreet Map and Gazetteer

Page 8: Advanced Information Systems Laboratory  Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21:

8

Address related repositories Multiple repositories

Not very different models Feature = name + type + additional info (location, range, …)

But different taxonomies for urban network feature types Not specially synchronized Zaragoza City Council

Informatics Office

AYTO

NationalStatisticsInstitute

TVIAN

National Cadaster Office

SIGLA

Tax Office

SIGLA

Urban Planning Office

AYTO,SIGLA

IDEZar

AYTOElectoralCensus

InhabitantCensus

Addresses

PropertyCensus

Amends(streets, addresses)

Site development updates

Town planning updates

Addresses updates

Street names

Addresses

Maps

Street types

Street names

Addresses

Maps

Addresses rangesStatisticsOffice

TVIAN

TVIAN VARIANT NAME

AUTO AUTO AUTOPISTAAVDA AV AVINGUDAAVDA AVDA AVENIDAAVDA AVGDA AVINGUDAAVDA ETDEA ETORBIDEAAVDA HIRIB HIRIBIDEAVIA AUTOV AUTOVIAAVIA AVIA AUTOVIABARDA AUZOT AUZOTEGIBARDA BARDA BARRIADA

SIGLA NAME

AG AGREGADOAL ALDEA, ALMAEDA, ALMAIAR AREA, ARRABAL, ARCOAU AUTOPISTAAV AVENIDAAY ARROYOBJ BAJADABO BARRIO

AYTO NAMEAN ANDADORESAV AVENIDASCL CALLESCLP CL/PEATONALCLTP TRAMOPEATONALCR CARRERASCT CARRETERAS

Page 9: Advanced Information Systems Laboratory  Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21:

9

Address related repositories Statistics Office repository

Inhabitant/poll census, exchanges from/to National Statistics Institute TVIAN (Tipo de Vía Normalizada): standardized network feature types

of the National Statistics Institute

Zaragoza City Council

Informatics Office

AYTO

NationalStatisticsInstitute

TVIAN

National Cadaster Office

SIGLA

Tax Office

SIGLA

Urban Planning Office

AYTO,SIGLA

IDEZar

AYTOElectoralCensus

InhabitantCensus

Addresses

PropertyCensus

Amends(streets, addresses)

Site development updates

Town planning updates

Addresses updates

Street names

Addresses

Maps

Street types

Street names

Addresses

Maps

Addresses rangesStatisticsOffice

TVIAN

TVIAN VARIANT NAME

AUTO AUTO AUTOPISTAAVDA AV AVINGUDAAVDA AVDA AVENIDAAVDA AVGDA AVINGUDAAVDA ETDEA ETORBIDEAAVDA HIRIB HIRIBIDEAVIA AUTOV AUTOVIAAVIA AVIA AUTOVIABARDA AUZOT AUZOTEGIBARDA BARDA BARRIADA

SIGLA NAME

AG AGREGADOAL ALDEA, ALMAEDA, ALMAIAR AREA, ARRABAL, ARCOAU AUTOPISTAAV AVENIDAAY ARROYOBJ BAJADABO BARRIO

AYTO NAMEAN ANDADORESAV AVENIDASCL CALLESCLP CL/PEATONALCLTP TRAMOPEATONALCR CARRERASCT CARRETERAS

Page 10: Advanced Information Systems Laboratory  Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21:

10

Address related repositories Cadaster Office repository

Land/Tax management, exchanges from/to National Cadaster Office

SIGLA: network feature types of the Cadaster office

Zaragoza City Council

Informatics Office

AYTO

NationalStatisticsInstitute

TVIAN

National Cadaster Office

SIGLA

Tax Office

SIGLA

Urban Planning Office

AYTO,SIGLA

IDEZar

AYTOElectoralCensus

InhabitantCensus

Addresses

PropertyCensus

Amends(streets, addresses)

Site development updates

Town planning updates

Addresses updates

Street names

Addresses

Maps

Street types

Street names

Addresses

Maps

Addresses rangesStatisticsOffice

TVIAN

TVIAN VARIANT NAME

AUTO AUTO AUTOPISTAAVDA AV AVINGUDAAVDA AVDA AVENIDAAVDA AVGDA AVINGUDAAVDA ETDEA ETORBIDEAAVDA HIRIB HIRIBIDEAVIA AUTOV AUTOVIAAVIA AVIA AUTOVIABARDA AUZOT AUZOTEGIBARDA BARDA BARRIADA

SIGLA NAME

AG AGREGADOAL ALDEA, ALMAEDA, ALMAIAR AREA, ARRABAL, ARCOAU AUTOPISTAAV AVENIDAAY ARROYOBJ BAJADABO BARRIO

AYTO NAMEAN ANDADORESAV AVENIDASCL CALLESCLP CL/PEATONALCLTP TRAMOPEATONALCR CARRERASCT CARRETERAS

Page 11: Advanced Information Systems Laboratory  Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21:

11

Address related repositories Informatics Office repository

Central repository used for assignation of new street names AYTO: Network feature types of the council

Zaragoza City Council

Informatics Office

AYTO

NationalStatisticsInstitute

TVIAN

National Cadaster Office

SIGLA

Tax Office

SIGLA

Urban Planning Office

AYTO,SIGLA

IDEZar

AYTOElectoralCensus

InhabitantCensus

Addresses

PropertyCensus

Amends(streets, addresses)

Site development updates

Town planning updates

Addresses updates

Street names

Addresses

Maps

Street types

Street names

Addresses

Maps

Addresses rangesStatisticsOffice

TVIAN

TVIAN VARIANT NAME

AUTO AUTO AUTOPISTAAVDA AV AVINGUDAAVDA AVDA AVENIDAAVDA AVGDA AVINGUDAAVDA ETDEA ETORBIDEAAVDA HIRIB HIRIBIDEAVIA AUTOV AUTOVIAAVIA AVIA AUTOVIABARDA AUZOT AUZOTEGIBARDA BARDA BARRIADA

SIGLA NAME

AG AGREGADOAL ALDEA, ALMAEDA, ALMAIAR AREA, ARRABAL, ARCOAU AUTOPISTAAV AVENIDAAY ARROYOBJ BAJADABO BARRIO

AYTO NAMEAN ANDADORESAV AVENIDASCL CALLESCLP CL/PEATONALCLTP TRAMOPEATONALCR CARRERASCT CARRETERAS

Page 12: Advanced Information Systems Laboratory  Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21:

12

Gazetteer content creation

Why do we need to combine both 3 repositories? Not all features are

in the 3 repositories Attribute

information is distributed in the different repositoriesIDEZAR

Cadaster OfficeInformatics OfficeStatistics Office

+TVIAN_CODE+NAME+RANGE

STATISTICS_FEATURE

+SIGLA_CODE+NAME+LOCATION

CADASTER_FEATURE

+AYTO_CODE+NAME

COUNCIL_FEATURE

+STREET_CODE+NAME+LOCATION+RANGE

GAZETTEER_FEATURE

Addresses not covered by

statistics office

Addresses not covered by

informatics office

Page 13: Advanced Information Systems Laboratory  Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21:

13

Gazetteer content creation II

Problems found while combining Matching can not be based uniquely on feature names

2 features may differ in typology but not in name (Spain square vs Spain avenue)

Which is the most appropriate feature type taxonomy for the gazetteer contents?

Solution proposed: define a urban network ontology An ontology defines explicitly the concepts and relations

between these concepts in a domain This ontology will provide a unified model of the feature types

that can be found in this domain Making the necessary mappings to the particular

taxonomies use in the different council offices or external organizations

Page 14: Advanced Information Systems Laboratory  Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21:

14

How to build up the ontology

The construction of ontologies upon existing vocabularies is a classical and widely used approach

The underlying problem (ontology alignment) How to find the relationships that hold between the

entities represented in different taxonomies

Two approaches for the ontology construction Manual mapping approach Automated approach

TVIAN AYTO SIGLA

Page 15: Advanced Information Systems Laboratory  Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21:

15

“PZ”

“PL”

SQUARESQUARE

“CN”

RESIDENTIAL DEVELOPMENT

“CM”

MINORROAD

COUNTRY HOUSE(SOUTH OF SPAIN)

MINORROAD

“CN”

“CL”

STREET

“CLP”

“CL”

“CLTP”

“AN”

STREET

PEDESTRIAN STREET

PEDESTRIAN STREET

SEGMENT

SIGLA (Cadaster)

AYTO(City Council)

Concepts

Acronyms

3. Manual Mapping approach Matching of terms (names + acronyms) between the

different taxonomies Difficulties: lack of semantic descriptions

Categories of matches Exact match Partial match: one concept is broader or narrower No match Provisional match: taxonomy errors (homonyms) imply

erroneous matches

TVIAN

AYTO

SIGLA

Page 16: Advanced Information Systems Laboratory  Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21:

16

A more flexible approach

Previous approach Too time expensive and with little scalability

Improvement Use of well-established shared common core ontology

and make mappings between the distinct sources and this common core

New experiment: Use of URBISOC thesaurus a thesaurus focused on Spanish terminology for Town

Planning developed by the CINDOC/CSIC institute (Centre for

Scientific Information and Documentation / Spanish National Research Council)

TVIAN AYTO SIGLA

URBISOC

Page 17: Advanced Information Systems Laboratory  Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21:

17

A more flexible approach II Use of Towntology ontology editor

Focused on ontology construction Storage of concepts with several definitions that are in a

process of selection and characterization Although improving scalability, still time expensive and

error prone

Page 18: Advanced Information Systems Laboratory  Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21:

18

4. Ontology building using an automated approach

Why? Manual mappings are time expensive Some mappings may not be successful because

content creators have not assigned the correct feature type

Technique proposed Formal Concept Analysis (1980, Wille &Ganter …) It enables the extraction of a hierarchy of concepts

from the feature instances contained in the source repositories

TVIAN AYTO SIGLA

generated

Page 19: Advanced Information Systems Laboratory  Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21:

19

Basics of FCA

Definition of formal contexts, triple (G,M,I) G: objects M: attributes I: binary relation between G and M, incidence matrix

It is possible to extract formal concepts Given AG and BM, a pair (A,B) is a formal concept if and

only if the set of all attributes shared by the objects in A is

identical with B A is also the set of all the objects which have in common

with each other the attributes in B Additionally it is possible to establish a subconcept-

superconcept relation (A1,B1)(A2,B2) A1A2 ( B2B1)

Page 20: Advanced Information Systems Laboratory  Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21:

20

Applying FCA

How to obtain a unique repository of instances, i.e. the formal context required by FCA? Traditional datalinking has been applied to the feature

instances contained in the different databases based on the analysis of the lexical and spatial similarities

of feature attributes Transform the datalinking matrix into the incidence matrix

Each checked cell (match of source features) generates an object/instance in the incidence matrix

The columns correspond with the transformation of urban network feature type codes (e.g., AYTO CODE, SIGLA CODE) into proper attributes with boolean values

Page 21: Advanced Information Systems Laboratory  Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21:

21

Incidence matrix

Datalinking matrix

Replace by code

2718 features18 AYTO codes

4318 features35 SIGLA codes

Page 22: Advanced Information Systems Laboratory  Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21:

22

Applying FCA

Obtain the concept latticeNEXT CLOSED SET

algorithm (Ganter 87)

ConceptLattice

Incidence matrix

FCA

Only attributes

supremum(least common superconcept)

AYTO_PLSIGLA_PZ(square)

SIGLA_AV(avenue)

SIGLA_CLAYTO_CL

(traffic allowedstreet)

infimum (greatest common subconcept)

SIGLA_CLAYTO_AN

(carfree designedstreet)

AYTO_AVSIGLA_AV

(traffic allowedavenue)

SIGLA_AVAYTO_AVP(pedestrian

avenue)

SIGLA_CL(street)

SIGLA_CL AYTO_CLP

(pedestrianizedstreet)

Page 23: Advanced Information Systems Laboratory  Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21:

23

Results

Experiment: combining COUNCIL_FEATURE and CADASTER_FEATURE databases A concept lattice of 36 concepts from the original 53

concepts Identification of equivalent concepts in in both

taxonomies, e.g., square (PL in AYTO and PZ in SIGLA)

And also subconcept-superconcept relations. E.g., identification of street as a broader concept in

SIGLA (CL), which has narrower concepts in the AYTOtraffic-allowed streets (CL)pedestrianized streets (CLP)Or carfree-designed streets (AN).

Page 24: Advanced Information Systems Laboratory  Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21:

24

5. Conclusions

FCA approach seems to be more flexible Dynamic building of the ontology (at least, a draft) We don’t need to define the concepts, we just need to

observe the data that exists We have created a domain specific ontology that facilitate

the interoperability (synchronization, update and merge) of the separate repositories

Future lines Improve the efficiency of the method Enrich the generated concepts with commonalities found in

other feature attributes of the instances (e.g., geometry, perimeter, area)

Apply to other domains Hydrology: NMA vs Water Agency repositories

Page 25: Advanced Information Systems Laboratory  Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21:

25

Advanced Information Systems Laboratory

http://iaaa.cps.unizar.es