View
214
Download
1
Tags:
Embed Size (px)
Citation preview
©2004, Philippe Cudré-Mauroux
Exploiting Localized Metadata in Decentralized Settings
Microsoft Research Asia 09.29.04
Philippe Cudré-Mauroux
Distributed Information Systems Laboratory (LSIR)Swiss Federal Institute of Technology, Lausanne (EPFL)
©2004, Philippe Cudré-Mauroux
Outline
I. Problem definition– Goal– Hurdles– Proposed Solution
II. Sharing annotated pictures– Structured metadata standards– Architecture– Demo
III. System dynamicsIV. Conclusions
©2004, Philippe Cudré-Mauroux
I. Problem Definition
• Goal: exploiting local (structured) metadata to organize information (e.g., pictures) globally
• Challenges: existing systems do not directly aggregate localized metadata because of three major hurdles:– Local ontologies – Dangling links – Metadata scarceness
©2004, Philippe Cudré-Mauroux
Hurdles
<rdf:RDF xmlns="http://web.resource.org/cc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><Work rdf:about="http://example.org/gnomophone.mp3"> <MonTitre>Compilers in the Key of C</MonTitre> <dc:description>A lovely classical work </dc:description> <dc:creator> <Agent> #yoyoAgent </Agent> </dc:creator> <dc:date> </dc:date> …
LocalOntology
MetadataScarceness
DanglingLink
©2004, Philippe Cudré-Mauroux
Local Ontologies
• State-of-the-art annotation software / standards provide metadata w.r.t. ontologies (-- schemas, -- taxonomies…)
• Profusion of distinct ontologies– even for specific pieces of information (e.g., images)
• Ontologies are almost always extendable
=> Semantic heterogeneity – A single ontology cannot be used to retrieve all relevant
individuals– Cf. Peer Data Management
©2004, Philippe Cudré-Mauroux
Dangling Links
• Local metadata often refer to local individuals• Such references are irrelevant globally
Party 04
10.05.04
date DJ
My Cousin
John Joe
name
©2004, Philippe Cudré-Mauroux
Metadata Scarceness
• Today, most software include some semi-automatic annotation facilities
• However, most metadata still require human attention– Scarcest resource
=> Metadata Scarceness
©2004, Philippe Cudré-Mauroux
Proposed Solution (high-level view)
• Local Ontologies– Alignment of ontologies
• (in a scalable manner)
– Semantic Gossiping (query expansion)
• Dangling Links– Metadata contextualization (scoping)– Alignment of individuals
• Metadata scarceness– Clustering of individuals (similarity measure)– Propagation of metadata
©2004, Philippe Cudré-Mauroux
II. Sharing Annotated Pictures
• Problem: – Wide adoption of digital cameras => profusion of digital
pictures• Several GBs of personal pictures is nowadays the norm
– How can we share these pictures in a meaningful way, i.e., such that one can find the pictures he/she is looking for?
• One possible avenue:– Leveraging on the new structured metadata tools /
standards• Cf. aforementioned hurdles• Pioneer work
©2004, Philippe Cudré-Mauroux
Providing Structured Annotations to Pictures
• Several emergent tools / standards providing– Structured metadata (XML, Photoshop Album,)– Ontological metadata (RDF, Adobe XMP)– Type-based metadata (Microsoft WinFS)
• The bottom-line: model theory, description logic– A terminology– Some assertions
©2004, Philippe Cudré-Mauroux
Structured Metadata
• Ex.: Photoshop Album• Hierarchy of tags• Stored in a relational,
proprietary, local database
• Non-exportable
©2004, Philippe Cudré-Mauroux
Ontological Metadata (1)
• Ex.: Extensible Metadata Platform (XMP)• Subset of RDF/S• Metadata might be embedded into the file• Supported by a wide range of Adobe applications
– Adobe® Acrobat®– Adobe FrameMaker® – Adobe GoLive®– Adobe Illustrator® – Adobe InCopy®– Adobe InDesign®– Adobe LiveMotion™– Adobe Photoshop®– Adobe Document Server – Adobe Graphics Server – Version Cue™
©2004, Philippe Cudré-Mauroux
Type-Based Metadata (1)
• New file-system for Longhorn (NTFS+++)• No more hierarchies (i.e., folders) but metadata• Items – Attributes – Relationships – Schemas –
Sub-Schemas (extensions)– Déjà vu?
©2004, Philippe Cudré-Mauroux
Comparison of three standards
Local Ontologies
Extensibility Dangling Links
Local Database
Metadata Embedding
WinFS XMP PSA
©2004, Philippe Cudré-Mauroux
PicShare: A Middleware for sharing pictures
PicSharePSP
XMP
WinFS
Metadata Extractor
(Distributed) Hashtable
Insert
Retrieve
Features Handler
60 moments
Information Tracker
10000 Feet component view:
©2004, Philippe Cudré-Mauroux
Extracting Metadata
• Local process• One extractor per format• Extraction of individuals (scoping)
– First step to handle dangling links
• Aggregation of different metadata types– One TBOX per schema / peer
• Terminological axioms
– One ABOX per image• Assertions about individuals
©2004, Philippe Cudré-Mauroux
Finding Correspondences
• Finding Mapping Candidates• 3 levels:
• Many different heuristics• Cf. Previous presentation
• Large-scale application• Local methods only!
Feature Space
Extensional Space
Ontological Space
A
B
©2004, Philippe Cudré-Mauroux
Local Ontologies
• Classical ontology alignment problem– For now:
• Edit distance on property names (T)• Comparison of extensions (E)• User Feedback (U)
• Semantic gossiping (query expansion)– When searching for a property value, propagate the search to
similar predicates– Also, propagate to subproperties
• Keeping track of ontology mappings (Kullback-Leibler distance)
otherwise 1
mapped is attribute if (a))( whereand
)attributes(# 1
)( where
))(
)(log()()||(
paq
nap
aq
apapQPD
Aa
©2004, Philippe Cudré-Mauroux
Dangling Links
• Extract individuals along with metadata– Metadata scoping
• Individual alignment– Based on their structural correspondences (T,S)– Semantic Gossiping (?)
• When searching for an individual (value), propagate the search to similar individuals
©2004, Philippe Cudré-Mauroux
Metadata Scarceness
• Keep track of metadata scarceness– Entropy of an image:
• Propagate metadata– For similar images
• Low-level similarity (S)• Structural similarity (S,T)• User-based similarity (U)
)attributes# :(n otherwise n
1 present is attribute if 0
)(
))(log()()(
i
ii
Apwhere
ApApIH
©2004, Philippe Cudré-Mauroux
Some Shortcomings
• Need for another encoding of the feature vectors– Now: matching is O(n)– Need for a low dimensionality FV encoding– Prefix-routing compliant
• Size of hashtable increases linearly with the number of pictures– Tradeoff: metadata propagation VS clustering / matching
• Some images / schemas stay isolated– Different w.r.t. feature vectors– No (few) metadata
©2004, Philippe Cudré-Mauroux
III. System Dynamics (ongoing)
• Self-organizing system modelization:– States: all ontology / individual alignments– Attractors: similarities (T,S,E,U)– Noise-driven variations: user interactions
• Analysis of overall entropy– Entropy diminishes with user interactions– Fosters semantic interoperability
• Choose images with high entropy first!
• Query propagation (Kullback-Leibler)– Semantic networks traversal– Worst-case scenario (entropic gain)– High-level view: Graph-theoretic problem (cf. cudre04)
©2004, Philippe Cudré-Mauroux
IV. Conclusions
• Contribution:– Problem definition– Initial heuristics– Prototype– A possible analytical model
• What’s next– Evaluation
• Any Input ???• Comparison with other approaches?
– Paper writing
• Future works– Improved heuristics– Full analytical study
• Incl. Objectivity VS Subjectivity of the peers