27
©2004, Philippe Cudré-Mauroux Exploiting Localized Metadata in Decentralized Settings Microsoft Research Asia 09.29.04 Philippe Cudré-Mauroux Distributed Information Systems Laboratory (LSIR) Swiss Federal Institute of Technology, Lausanne (EPFL)

©2004, Philippe Cudré-Mauroux Exploiting Localized Metadata in Decentralized Settings Microsoft Research Asia 09.29.04 Philippe Cudré-Mauroux Distributed

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

©2004, Philippe Cudré-Mauroux

Exploiting Localized Metadata in Decentralized Settings

Microsoft Research Asia 09.29.04

Philippe Cudré-Mauroux

Distributed Information Systems Laboratory (LSIR)Swiss Federal Institute of Technology, Lausanne (EPFL)

©2004, Philippe Cudré-Mauroux

Outline

I. Problem definition– Goal– Hurdles– Proposed Solution

II. Sharing annotated pictures– Structured metadata standards– Architecture– Demo

III. System dynamicsIV. Conclusions

©2004, Philippe Cudré-Mauroux

I. Problem Definition

• Goal: exploiting local (structured) metadata to organize information (e.g., pictures) globally

• Challenges: existing systems do not directly aggregate localized metadata because of three major hurdles:– Local ontologies – Dangling links – Metadata scarceness

©2004, Philippe Cudré-Mauroux

Hurdles

<rdf:RDF xmlns="http://web.resource.org/cc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><Work rdf:about="http://example.org/gnomophone.mp3"> <MonTitre>Compilers in the Key of C</MonTitre> <dc:description>A lovely classical work </dc:description> <dc:creator> <Agent> #yoyoAgent </Agent> </dc:creator> <dc:date> </dc:date> …

LocalOntology

MetadataScarceness

DanglingLink

©2004, Philippe Cudré-Mauroux

Local Ontologies

• State-of-the-art annotation software / standards provide metadata w.r.t. ontologies (-- schemas, -- taxonomies…)

• Profusion of distinct ontologies– even for specific pieces of information (e.g., images)

• Ontologies are almost always extendable

=> Semantic heterogeneity – A single ontology cannot be used to retrieve all relevant

individuals– Cf. Peer Data Management

©2004, Philippe Cudré-Mauroux

Dangling Links

• Local metadata often refer to local individuals• Such references are irrelevant globally

Party 04

10.05.04

date DJ

My Cousin

John Joe

name

©2004, Philippe Cudré-Mauroux

Metadata Scarceness

• Today, most software include some semi-automatic annotation facilities

• However, most metadata still require human attention– Scarcest resource

=> Metadata Scarceness

©2004, Philippe Cudré-Mauroux

Proposed Solution (high-level view)

• Local Ontologies– Alignment of ontologies

• (in a scalable manner)

– Semantic Gossiping (query expansion)

• Dangling Links– Metadata contextualization (scoping)– Alignment of individuals

• Metadata scarceness– Clustering of individuals (similarity measure)– Propagation of metadata

©2004, Philippe Cudré-Mauroux

II. Sharing Annotated Pictures

• Problem: – Wide adoption of digital cameras => profusion of digital

pictures• Several GBs of personal pictures is nowadays the norm

– How can we share these pictures in a meaningful way, i.e., such that one can find the pictures he/she is looking for?

• One possible avenue:– Leveraging on the new structured metadata tools /

standards• Cf. aforementioned hurdles• Pioneer work

©2004, Philippe Cudré-Mauroux

Providing Structured Annotations to Pictures

• Several emergent tools / standards providing– Structured metadata (XML, Photoshop Album,)– Ontological metadata (RDF, Adobe XMP)– Type-based metadata (Microsoft WinFS)

• The bottom-line: model theory, description logic– A terminology– Some assertions

©2004, Philippe Cudré-Mauroux

Structured Metadata

• Ex.: Photoshop Album• Hierarchy of tags• Stored in a relational,

proprietary, local database

• Non-exportable

©2004, Philippe Cudré-Mauroux

Ontological Metadata (1)

• Ex.: Extensible Metadata Platform (XMP)• Subset of RDF/S• Metadata might be embedded into the file• Supported by a wide range of Adobe applications

– Adobe® Acrobat®– Adobe FrameMaker® – Adobe GoLive®– Adobe Illustrator® – Adobe InCopy®– Adobe InDesign®– Adobe LiveMotion™– Adobe Photoshop®– Adobe Document Server – Adobe Graphics Server – Version Cue™

©2004, Philippe Cudré-Mauroux

Ontological Metadata (2)

• Ex.: Photoshop XMP schema

©2004, Philippe Cudré-Mauroux

Type-Based Metadata (1)

• New file-system for Longhorn (NTFS+++)• No more hierarchies (i.e., folders) but metadata• Items – Attributes – Relationships – Schemas –

Sub-Schemas (extensions)– Déjà vu?

©2004, Philippe Cudré-Mauroux

Type-Based Metadata (2)

• Ex.: image schema in WinFS

©2004, Philippe Cudré-Mauroux

Comparison of three standards

Local Ontologies

Extensibility Dangling Links

Local Database

Metadata Embedding

WinFS XMP PSA

©2004, Philippe Cudré-Mauroux

PicShare: A Middleware for sharing pictures

PicSharePSP

XMP

WinFS

Metadata Extractor

(Distributed) Hashtable

Insert

Retrieve

Features Handler

60 moments

Information Tracker

10000 Feet component view:

©2004, Philippe Cudré-Mauroux

Extracting Metadata

• Local process• One extractor per format• Extraction of individuals (scoping)

– First step to handle dangling links

• Aggregation of different metadata types– One TBOX per schema / peer

• Terminological axioms

– One ABOX per image• Assertions about individuals

©2004, Philippe Cudré-Mauroux

Finding Correspondences

• Finding Mapping Candidates• 3 levels:

• Many different heuristics• Cf. Previous presentation

• Large-scale application• Local methods only!

Feature Space

Extensional Space

Ontological Space

A

B

©2004, Philippe Cudré-Mauroux

Local Ontologies

• Classical ontology alignment problem– For now:

• Edit distance on property names (T)• Comparison of extensions (E)• User Feedback (U)

• Semantic gossiping (query expansion)– When searching for a property value, propagate the search to

similar predicates– Also, propagate to subproperties

• Keeping track of ontology mappings (Kullback-Leibler distance)

otherwise 1

mapped is attribute if (a))( whereand

)attributes(# 1

)( where

))(

)(log()()||(

paq

nap

aq

apapQPD

Aa

©2004, Philippe Cudré-Mauroux

Dangling Links

• Extract individuals along with metadata– Metadata scoping

• Individual alignment– Based on their structural correspondences (T,S)– Semantic Gossiping (?)

• When searching for an individual (value), propagate the search to similar individuals

©2004, Philippe Cudré-Mauroux

Metadata Scarceness

• Keep track of metadata scarceness– Entropy of an image:

• Propagate metadata– For similar images

• Low-level similarity (S)• Structural similarity (S,T)• User-based similarity (U)

)attributes# :(n otherwise n

1 present is attribute if 0

)(

))(log()()(

i

ii

Apwhere

ApApIH

©2004, Philippe Cudré-Mauroux

Demo

©2004, Philippe Cudré-Mauroux

Some Shortcomings

• Need for another encoding of the feature vectors– Now: matching is O(n)– Need for a low dimensionality FV encoding– Prefix-routing compliant

• Size of hashtable increases linearly with the number of pictures– Tradeoff: metadata propagation VS clustering / matching

• Some images / schemas stay isolated– Different w.r.t. feature vectors– No (few) metadata

©2004, Philippe Cudré-Mauroux

III. System Dynamics (ongoing)

• Self-organizing system modelization:– States: all ontology / individual alignments– Attractors: similarities (T,S,E,U)– Noise-driven variations: user interactions

• Analysis of overall entropy– Entropy diminishes with user interactions– Fosters semantic interoperability

• Choose images with high entropy first!

• Query propagation (Kullback-Leibler)– Semantic networks traversal– Worst-case scenario (entropic gain)– High-level view: Graph-theoretic problem (cf. cudre04)

©2004, Philippe Cudré-Mauroux

IV. Conclusions

• Contribution:– Problem definition– Initial heuristics– Prototype– A possible analytical model

• What’s next– Evaluation

• Any Input ???• Comparison with other approaches?

– Paper writing

• Future works– Improved heuristics– Full analytical study

• Incl. Objectivity VS Subjectivity of the peers

©2004, Philippe Cudré-Mauroux

Exploiting Localized Metadata in Decentralized Settings

Microsoft Research Asia 09.29.04

Philippe Cudré-Mauroux

Distributed Information Systems Laboratory (LSIR)Swiss Federal Institute of Technology, Lausanne (EPFL)