Upload
collin-morris
View
212
Download
0
Tags:
Embed Size (px)
Citation preview
Advanced Information Systems Laboratory
http://iaaa.cps.unizar.esDepartment of Computer Science and Systems Engineering
1st Workshop of COST Action C21:
"Ontologies for Urban Development: Interfacing Urban Information Systems"
Building an Address Gazetteer on top of anUrban Network Ontology
J.Nogueras-Iso, F.J.López, J.Lacasta, F.J.Zarazaga-Soria, P.R.Muro-Medrano
Geneva, 6-7 November 2006
2
Outline
1. Introduction 2. A typical use-case: IDEZar 3. Ontology building using a manual mapping 4. Ontology building using an automated
approach 5. Conclusions
3
1. Introduction
The increasing relevance of geographic information for decision-making and resource management in diverse areas promoted the creation of Spatial Data Infrastructures (SDI)
SDI: a coordinated approach to technology, policies, standards, and human resources necessary for the effective acquisition, management, distribution and utilization of GI at different organization levels and involving both public and private institutions
Gazetteer Service A typical component of an SDI Directory of instances of a class or classes of features
containing some information regarding position Looks up geographic feature locations based on geographic
identifiers
4
Address Gazetteer Service
In SDIs for local administrations such as a city council, address gazetteer services represent one of the most
important services that the councils must offer to their citizens
An Address Gazetteer Service Specialized on Urban Network Features (addresses) The councils are responsible for the management of
urban networks, and these networks are used as reference information for other services at national level such as cadaster or census services
5
Creation of the contents of a gazetteer
It usually requires combining multiple repositories The same feature (concept) is stored in different
repositories, each of them contributing with a different piece of attribute information
Typical problems of heterogeneityDifferent data models (roles, granularity), encoding
Our proposal to deal with heterogeneity in this context: Build an urban network ontology upon existing feature
types taxonomies
6
2. A typical use-case: IDEZar
The IDEZar Project is the result of a collaboration agreement signed in March 2004 between the City Council and the University of Zaragoza Zaragoza is a medium-sized city (some 650000 inhabitants),
in the northeast of Spain (capital of Aragón), growing fast in extension and population. The municipality is about 1000 km2 and includes several towns
Objective: development of a local SDI for Zaragoza To facilitate, increase and coordinate the use of spatial data
by the Council To develop applications for the citizens and to provide them
with access to public sector information
7
IDEZar Service Architecture http://www.zaragoza.es/idezar/
<<WCAS>>•Catalog
<<WMS>>Urban-Thematic
•Public services (libraries, police stations...)•Private services (pharmacies, parkings...)
<<WMS>>Base
•Street maps
<<WMS>>Environment-Thematic
•Agenda 21, protected areas...
<<Gazetteer>>•Street names
<<Route planner>>•Arriving at Zaragoza
IDEZar (Local SDI)
<<Gazetteer>>IDEE-Nomenclátor
• Toponyms
<<WMS>>IDEE-Base
• Base map up to 1:25000 of Spain
IDEE(National SDI)
<<WMS>>Base
•Orthoimages
IDEAr (Aragón – Regional SDI)
GeoPortalStreet Map and Gazetteer
8
Address related repositories Multiple repositories
Not very different models Feature = name + type + additional info (location, range, …)
But different taxonomies for urban network feature types Not specially synchronized Zaragoza City Council
Informatics Office
AYTO
NationalStatisticsInstitute
TVIAN
National Cadaster Office
SIGLA
Tax Office
SIGLA
Urban Planning Office
AYTO,SIGLA
IDEZar
AYTOElectoralCensus
InhabitantCensus
Addresses
PropertyCensus
Amends(streets, addresses)
Site development updates
Town planning updates
Addresses updates
Street names
Addresses
Maps
Street types
Street names
Addresses
Maps
Addresses rangesStatisticsOffice
TVIAN
TVIAN VARIANT NAME
AUTO AUTO AUTOPISTAAVDA AV AVINGUDAAVDA AVDA AVENIDAAVDA AVGDA AVINGUDAAVDA ETDEA ETORBIDEAAVDA HIRIB HIRIBIDEAVIA AUTOV AUTOVIAAVIA AVIA AUTOVIABARDA AUZOT AUZOTEGIBARDA BARDA BARRIADA
SIGLA NAME
AG AGREGADOAL ALDEA, ALMAEDA, ALMAIAR AREA, ARRABAL, ARCOAU AUTOPISTAAV AVENIDAAY ARROYOBJ BAJADABO BARRIO
AYTO NAMEAN ANDADORESAV AVENIDASCL CALLESCLP CL/PEATONALCLTP TRAMOPEATONALCR CARRERASCT CARRETERAS
9
Address related repositories Statistics Office repository
Inhabitant/poll census, exchanges from/to National Statistics Institute TVIAN (Tipo de Vía Normalizada): standardized network feature types
of the National Statistics Institute
Zaragoza City Council
Informatics Office
AYTO
NationalStatisticsInstitute
TVIAN
National Cadaster Office
SIGLA
Tax Office
SIGLA
Urban Planning Office
AYTO,SIGLA
IDEZar
AYTOElectoralCensus
InhabitantCensus
Addresses
PropertyCensus
Amends(streets, addresses)
Site development updates
Town planning updates
Addresses updates
Street names
Addresses
Maps
Street types
Street names
Addresses
Maps
Addresses rangesStatisticsOffice
TVIAN
TVIAN VARIANT NAME
AUTO AUTO AUTOPISTAAVDA AV AVINGUDAAVDA AVDA AVENIDAAVDA AVGDA AVINGUDAAVDA ETDEA ETORBIDEAAVDA HIRIB HIRIBIDEAVIA AUTOV AUTOVIAAVIA AVIA AUTOVIABARDA AUZOT AUZOTEGIBARDA BARDA BARRIADA
SIGLA NAME
AG AGREGADOAL ALDEA, ALMAEDA, ALMAIAR AREA, ARRABAL, ARCOAU AUTOPISTAAV AVENIDAAY ARROYOBJ BAJADABO BARRIO
AYTO NAMEAN ANDADORESAV AVENIDASCL CALLESCLP CL/PEATONALCLTP TRAMOPEATONALCR CARRERASCT CARRETERAS
10
Address related repositories Cadaster Office repository
Land/Tax management, exchanges from/to National Cadaster Office
SIGLA: network feature types of the Cadaster office
Zaragoza City Council
Informatics Office
AYTO
NationalStatisticsInstitute
TVIAN
National Cadaster Office
SIGLA
Tax Office
SIGLA
Urban Planning Office
AYTO,SIGLA
IDEZar
AYTOElectoralCensus
InhabitantCensus
Addresses
PropertyCensus
Amends(streets, addresses)
Site development updates
Town planning updates
Addresses updates
Street names
Addresses
Maps
Street types
Street names
Addresses
Maps
Addresses rangesStatisticsOffice
TVIAN
TVIAN VARIANT NAME
AUTO AUTO AUTOPISTAAVDA AV AVINGUDAAVDA AVDA AVENIDAAVDA AVGDA AVINGUDAAVDA ETDEA ETORBIDEAAVDA HIRIB HIRIBIDEAVIA AUTOV AUTOVIAAVIA AVIA AUTOVIABARDA AUZOT AUZOTEGIBARDA BARDA BARRIADA
SIGLA NAME
AG AGREGADOAL ALDEA, ALMAEDA, ALMAIAR AREA, ARRABAL, ARCOAU AUTOPISTAAV AVENIDAAY ARROYOBJ BAJADABO BARRIO
AYTO NAMEAN ANDADORESAV AVENIDASCL CALLESCLP CL/PEATONALCLTP TRAMOPEATONALCR CARRERASCT CARRETERAS
11
Address related repositories Informatics Office repository
Central repository used for assignation of new street names AYTO: Network feature types of the council
Zaragoza City Council
Informatics Office
AYTO
NationalStatisticsInstitute
TVIAN
National Cadaster Office
SIGLA
Tax Office
SIGLA
Urban Planning Office
AYTO,SIGLA
IDEZar
AYTOElectoralCensus
InhabitantCensus
Addresses
PropertyCensus
Amends(streets, addresses)
Site development updates
Town planning updates
Addresses updates
Street names
Addresses
Maps
Street types
Street names
Addresses
Maps
Addresses rangesStatisticsOffice
TVIAN
TVIAN VARIANT NAME
AUTO AUTO AUTOPISTAAVDA AV AVINGUDAAVDA AVDA AVENIDAAVDA AVGDA AVINGUDAAVDA ETDEA ETORBIDEAAVDA HIRIB HIRIBIDEAVIA AUTOV AUTOVIAAVIA AVIA AUTOVIABARDA AUZOT AUZOTEGIBARDA BARDA BARRIADA
SIGLA NAME
AG AGREGADOAL ALDEA, ALMAEDA, ALMAIAR AREA, ARRABAL, ARCOAU AUTOPISTAAV AVENIDAAY ARROYOBJ BAJADABO BARRIO
AYTO NAMEAN ANDADORESAV AVENIDASCL CALLESCLP CL/PEATONALCLTP TRAMOPEATONALCR CARRERASCT CARRETERAS
12
Gazetteer content creation
Why do we need to combine both 3 repositories? Not all features are
in the 3 repositories Attribute
information is distributed in the different repositoriesIDEZAR
Cadaster OfficeInformatics OfficeStatistics Office
+TVIAN_CODE+NAME+RANGE
STATISTICS_FEATURE
+SIGLA_CODE+NAME+LOCATION
CADASTER_FEATURE
+AYTO_CODE+NAME
COUNCIL_FEATURE
+STREET_CODE+NAME+LOCATION+RANGE
GAZETTEER_FEATURE
Addresses not covered by
statistics office
Addresses not covered by
informatics office
13
Gazetteer content creation II
Problems found while combining Matching can not be based uniquely on feature names
2 features may differ in typology but not in name (Spain square vs Spain avenue)
Which is the most appropriate feature type taxonomy for the gazetteer contents?
Solution proposed: define a urban network ontology An ontology defines explicitly the concepts and relations
between these concepts in a domain This ontology will provide a unified model of the feature types
that can be found in this domain Making the necessary mappings to the particular
taxonomies use in the different council offices or external organizations
14
How to build up the ontology
The construction of ontologies upon existing vocabularies is a classical and widely used approach
The underlying problem (ontology alignment) How to find the relationships that hold between the
entities represented in different taxonomies
Two approaches for the ontology construction Manual mapping approach Automated approach
TVIAN AYTO SIGLA
15
“PZ”
“PL”
SQUARESQUARE
“CN”
RESIDENTIAL DEVELOPMENT
“CM”
MINORROAD
COUNTRY HOUSE(SOUTH OF SPAIN)
MINORROAD
“CN”
“CL”
STREET
“CLP”
“CL”
“CLTP”
“AN”
STREET
PEDESTRIAN STREET
PEDESTRIAN STREET
SEGMENT
SIGLA (Cadaster)
AYTO(City Council)
Concepts
Acronyms
3. Manual Mapping approach Matching of terms (names + acronyms) between the
different taxonomies Difficulties: lack of semantic descriptions
Categories of matches Exact match Partial match: one concept is broader or narrower No match Provisional match: taxonomy errors (homonyms) imply
erroneous matches
TVIAN
AYTO
SIGLA
16
A more flexible approach
Previous approach Too time expensive and with little scalability
Improvement Use of well-established shared common core ontology
and make mappings between the distinct sources and this common core
New experiment: Use of URBISOC thesaurus a thesaurus focused on Spanish terminology for Town
Planning developed by the CINDOC/CSIC institute (Centre for
Scientific Information and Documentation / Spanish National Research Council)
TVIAN AYTO SIGLA
URBISOC
17
A more flexible approach II Use of Towntology ontology editor
Focused on ontology construction Storage of concepts with several definitions that are in a
process of selection and characterization Although improving scalability, still time expensive and
error prone
18
4. Ontology building using an automated approach
Why? Manual mappings are time expensive Some mappings may not be successful because
content creators have not assigned the correct feature type
Technique proposed Formal Concept Analysis (1980, Wille &Ganter …) It enables the extraction of a hierarchy of concepts
from the feature instances contained in the source repositories
TVIAN AYTO SIGLA
generated
19
Basics of FCA
Definition of formal contexts, triple (G,M,I) G: objects M: attributes I: binary relation between G and M, incidence matrix
It is possible to extract formal concepts Given AG and BM, a pair (A,B) is a formal concept if and
only if the set of all attributes shared by the objects in A is
identical with B A is also the set of all the objects which have in common
with each other the attributes in B Additionally it is possible to establish a subconcept-
superconcept relation (A1,B1)(A2,B2) A1A2 ( B2B1)
20
Applying FCA
How to obtain a unique repository of instances, i.e. the formal context required by FCA? Traditional datalinking has been applied to the feature
instances contained in the different databases based on the analysis of the lexical and spatial similarities
of feature attributes Transform the datalinking matrix into the incidence matrix
Each checked cell (match of source features) generates an object/instance in the incidence matrix
The columns correspond with the transformation of urban network feature type codes (e.g., AYTO CODE, SIGLA CODE) into proper attributes with boolean values
21
Incidence matrix
Datalinking matrix
Replace by code
2718 features18 AYTO codes
4318 features35 SIGLA codes
22
Applying FCA
Obtain the concept latticeNEXT CLOSED SET
algorithm (Ganter 87)
ConceptLattice
Incidence matrix
FCA
Only attributes
supremum(least common superconcept)
AYTO_PLSIGLA_PZ(square)
SIGLA_AV(avenue)
SIGLA_CLAYTO_CL
(traffic allowedstreet)
infimum (greatest common subconcept)
…
SIGLA_CLAYTO_AN
(carfree designedstreet)
AYTO_AVSIGLA_AV
(traffic allowedavenue)
SIGLA_AVAYTO_AVP(pedestrian
avenue)
SIGLA_CL(street)
SIGLA_CL AYTO_CLP
(pedestrianizedstreet)
23
Results
Experiment: combining COUNCIL_FEATURE and CADASTER_FEATURE databases A concept lattice of 36 concepts from the original 53
concepts Identification of equivalent concepts in in both
taxonomies, e.g., square (PL in AYTO and PZ in SIGLA)
And also subconcept-superconcept relations. E.g., identification of street as a broader concept in
SIGLA (CL), which has narrower concepts in the AYTOtraffic-allowed streets (CL)pedestrianized streets (CLP)Or carfree-designed streets (AN).
24
5. Conclusions
FCA approach seems to be more flexible Dynamic building of the ontology (at least, a draft) We don’t need to define the concepts, we just need to
observe the data that exists We have created a domain specific ontology that facilitate
the interoperability (synchronization, update and merge) of the separate repositories
Future lines Improve the efficiency of the method Enrich the generated concepts with commonalities found in
other feature attributes of the instances (e.g., geometry, perimeter, area)
Apply to other domains Hydrology: NMA vs Water Agency repositories
25
Advanced Information Systems Laboratory
http://iaaa.cps.unizar.es