View
219
Download
0
Tags:
Embed Size (px)
Citation preview
•
21.09.06 Krzysztof Janowicz
Towards a Similarity-Based Identity Assumption Service for Historical Places Establishing Meaningful Links
Krzysztof Janowicz; Muenster Semantic Interoperability Lab (MUSIL)
Krzysztof Janowicz Similarity-Based Identity Assumption Service for Historical Places 2
Outline
• Motivation
• Scenario
• Annotation
• Theory
• Further WorkImage from: http://de.wikipedia.org/wiki/HMS_Victory(Bleiglass, 1998)
Krzysztof Janowicz Similarity-Based Identity Assumption Service for Historical Places 3
Motivation
• For the cultural heritage community• Incomplete and vague knowledge
• Interchange between external sources is necessary to answer complex scientific questions & to clean up local knowledge
• Local versus global identifiers Accessible service-based infrastructure!
Krzysztof Janowicz Similarity-Based Identity Assumption Service for Historical Places 4
Motivation
•For semantic similarity research• Application of similarity in a real world domain
• Similarity as part of the identity assumption puzzle
• Combination of similarity and classical reasoning
• Using a stable upper-level ontology (CIDOC CRM)Theory of similarity assumptions for historical places
Krzysztof Janowicz Similarity-Based Identity Assumption Service for Historical Places 5
Motivation• For an identity assumption service
• To run queries against multiple sources it has to be made sure that they refer to the same real-world phenomena; just a common language is not enough!
• Non unique place names (even within the same area)
• Place names refer to cities, rivers, valleys, mountains,…
• Misinterpreted place names (e.g. 'Al Wahat‘ Oasis)
• Names also refer to varying geopolitical units (e.g. nomads) or prominent (artificial) landmarks (e.g. telegraph stations)
• Out-dated place or even country names (e.g. UDSSR)
Gazetteers can only partially solve these problems
(From discussions with Dr. Karl-Heinz Lampe; ZFMK)
Krzysztof Janowicz Similarity-Based Identity Assumption Service for Historical Places 6
Battle of Trafalgar - Scenario• Took place at Cape Trafalgar (Province Cadiz) in 1805
• British victory under the command of Horatio Nelson
• HMS Victory was Nelsons flagship
• Nelson was shot during the battle and died afterwards
Should be easy to annotate!?
Spatial relation between naval battleground and terrestrial cape, Province Cadiz,..?
Place names:Cabo Trafalgar,Taraf al-Gharb,الطرف رأس األغر
Also in a historical sourcefrom French perspective?
Image from: http://en.wikipedia.org/wiki/Horatio_Nelson (painted by Nicholas Pocock)
Vice-Admiral Horatio Nelson, 1st Viscount Nelson?
HMS Victory:Which one?!
Temporal relations?
Krzysztof Janowicz Similarity-Based Identity Assumption Service for Historical Places 7From: http://en.wikipedia.org/wiki/Image:Trafalgar_aufstellung.jpg
Krzysztof Janowicz Similarity-Based Identity Assumption Service for Historical Places 8
Annotation of Historical Knowledge
• CIDOC conceptual reference model (CRM) as upper-level ontology for the cultural heritage domain
• specifies abstract and interrelated vocabulary instead of concrete definitions such as for kinds of exhibits heterogeneous domain!
• describes historical knowledge by relations between places, events, actor and objects
• RDF(S) based representation
• ISO Standard (ISO/PRF 21127)
Krzysztof Janowicz Similarity-Based Identity Assumption Service for Historical Places 9
Annotation Examples (RDF-Triples)• P89F.falls_within(E53.Place(Cape Trafalgar),
E53.Place(Province Cádiz))
Subject-Predicate-Object:
The place Cape Trafalgar falls within a place called Province Cádiz
• P8F.took_place_at(E7.Activity(Battle of Trafalgar), E53.Place(Cape
Trafalgar))
• P117F.occurs_during (E7.Activity(Battle of Trafalgar), E5.Event(Trafalgar
Campaign))
• P14F.carried_out_by (E7.Activity(Battle of Trafalgar), E21.Person(Nelson))
• P2F.has_type (E53.Place(Andalusia), E55.Type(regions))
Krzysztof Janowicz Similarity-Based Identity Assumption Service for Historical Places 10
Theory
• In practice semi-automatic disambiguation via gazetteers and other global authorities (such as for historical figures) is often difficult, expensive and error-prone
(especially for subordinate geopolitical units, events, actors,…)
Use the links established via the CIDOC CRM annotation between places, actors, objects and events as additional reference points!
Krzysztof Janowicz Similarity-Based Identity Assumption Service for Historical Places 11
Theory
Geoinformation = < x, z >
Semantic Reference Systems
interpretatio
n interpretation
Spatiotemporal Reference Systems
Use thematic information as support for spatiotemporal reference
Mike Goodchild: Geographic Rreality
CIDOC CRM+ Reasoning
+ Similarity
Krzysztof Janowicz Similarity-Based Identity Assumption Service for Historical Places 12
Theory: Framework
Comparing Place Descriptions
1. Extract new triples out of existing ones Spatiotemporal & Subsumption Reasoning
2. Compute overlap between source and target triples Semantic Similarity Measurement
3. Compare remaining labels & identifiers Syntactic Identifier Matching
4. How probably compared places correspond Identity Assumption
Krzysztof Janowicz Similarity-Based Identity Assumption Service for Historical Places 13
Theory: Reasoning • Entities are described by sets of RDF triples
• Inference rules to generate new triplesMake local knowledge explicit!More comparable information about entities
• Example: Spatial & temporal Inference rules
• Be careful - names are ambiguous!
HMS XYZ (1804)
HMS XYZ (1805)
?
Krzysztof Janowicz Similarity-Based Identity Assumption Service for Historical Places 14
Theory: Similarity
NelsonsDeath
ProvinceCádiz
BattleOf
Trafalgar
CapeTrafalgar
NapoleonicWars
Nelson
Nelson
per
form
ed
falls within
died in
Cape Trafalgar
ProvinceCádiz
falls within
Source:
Cape Trafalgar
ProvinceCádiz
overlaps with
Target:
simp *
sims
sims
=
ProvinceCádiz
Krzysztof Janowicz Similarity-Based Identity Assumption Service for Historical Places 15
Theory: Network Approach to Similarity
1. For all tuples from the source entity: find equal or similar tuples within the target entity description
2. Define meaningful notions of similarity for given predicates (relations)
• Spatial
• Temporal
• Thematic
3. Define meaningful notion of similarity for all objects that are not subjects of other triples themselves (e.g. ADL Feature Types)
Krzysztof Janowicz Similarity-Based Identity Assumption Service for Historical Places 16
Theory: Neighborhoods & Hierarchies
Egenhofer & Al-Taha 1992
Different similarity measures for neighborhoods & hierarchies
temporal spatial
thematic
Krzysztof Janowicz Similarity-Based Identity Assumption Service for Historical Places 17
Theory: Syntactic Matching
• After recursively applying (semantic) similarity measurements, only labels, vague appellations and identifier are left
Requires syntactic matching / measuring
(Getty Thesaurus)
ID: 7008751 ID: 7008750
Cape TrafalgarWrexham
(found at: www.gwjokes.com )
Krzysztof Janowicz Similarity-Based Identity Assumption Service for Historical Places 18
• Two place descriptions probably refer to the same (real world) place if they are linked via equal or similar relations to equal or similar events, actors, objects, …
• Similar position within a network of historical facts
• Stepwise applying new restrictions to the set of compared historical places
Number of compared tuples is a critical issue!
Theory: Identity Assumptions
Krzysztof Janowicz Similarity-Based Identity Assumption Service for Historical Places 19
Further Work & Evidence• Similarity is only one part of the puzzle!
• Other parts: trust, contradictions & consistence,...
• Which inference rules may lead to difficulties?
• How to handle complementary knowledge?
• Connections to Time Map and ECAI
• Evidence! Battle of Trafalgar Scenario?Develop a identity assumption pilotCombination of similarity measurement with itinerariesBased on real world data from ZFMK, Bonn (biodiversity museum)
Krzysztof Janowicz Similarity-Based Identity Assumption Service for Historical Places 20
Questions
•Thank You!•Special thanks to
• Martin Doerr Foundation for Research and Technology - Hellas (FORTH)
Institute of Computer Science. Heraklion, Crete, Greece
• Karl-Heinz Lampe Zoologisches Forschungsmuseum Alexander Koenig (ZFMK).
Bonn, Germany
•Any Questions?
Krzysztof Janowicz Similarity-Based Identity Assumption Service for Historical Places 21
‘Real World’-Place?
From: http://de.wikipedia.org/wiki/Bild:Atlantis_map_kircher.gif
Krzysztof Janowicz Similarity-Based Identity Assumption Service for Historical Places 22
Gazetteer Feature Types
• Gazetteer Feature Types
Andalucía
ADLG Getty Thesaurus