Ewa Deelman, www.isi.edu/~deelman
Virtual Metadata Catalogs:Augmenting Existing Metadata
Catalogs with Semantic Representations
Yolanda Gil, Varun Ratnakar, and Ewa Deelman
USC Information Sciences Institute
Ewa Deelman, www.isi.edu/~deelman
Data
and
/or
Anal
ysi s
D
iscov
ery
Data, Metadata and
Provenance MgntExecu
tion
Analysis
Definition and M
appingAnalysis Description
Scientific Analysis
Derived DataMetadata
Provenance
Raw DataMetadata
Ewa Deelman, www.isi.edu/~deelman
Data
and
/or
Anal
ysi s
D
iscov
ery
Data, Metadata and
Provenance MgntExecu
tion
Analysis
Definition and M
appingAnalysis Description
Scientific Analysis
Derived DataMetadata
Provenance
Raw DataMetadata
Ewa Deelman, www.isi.edu/~deelman
Data
and
/or
Anal
ysi s
D
iscov
ery
Data, Metadata and
Provenance MgntExecu
tion
Analysis
Definition and M
appingAnalysis Description
Scientific Analysis
Derived DataMetadata
Provenance
Raw DataMetadata
Ewa Deelman, www.isi.edu/~deelman
Data
and
/or
Anal
ysi s
D
iscov
ery
Data, Metadata and
Provenance MgntExecu
tion
Analysis
Definition and M
appingAnalysis Description
Scientific Analysis
Derived DataMetadata
Provenance
Raw DataMetadata
Ewa Deelman, www.isi.edu/~deelman
Raw DataMetadata
Derived DataMetadata
Provenance
Analysis Description
Raw DataMetadata
Derived DataMetadata
Provenance
Analysis Description
Raw DataMetadata
Derived DataMetadata
Provenance
Analysis Description
Raw DataMetadata
Derived DataMetadata
Provenance
Analysis Description
Ewa Deelman, www.isi.edu/~deelman
Raw DataMetadata
Derived DataMetadata
Provenance
Analysis Description
Raw DataMetadata
Derived DataMetadata
Provenance
Analysis Description
Raw DataMetadata
Derived DataMetadata
Provenance
Analysis Description
Raw DataMetadata
Derived DataMetadata
Provenance
Analysis Description
Problem: How to find the data in this environment?
How to specify the characteristics of the data (the metadata attributes)?
How to manage distributed provenance records?
Today clients must figure out manually the meaning of the attributes, identify what are the relevant ones to query, and formulate queries
Ewa Deelman, www.isi.edu/~deelman
Approach
Expose the information in the catalogs at a semantic level Expose the attributes
Provide access to the content based on a user-selected ontology
Support semantic queries to the catalog contents
Ewa Deelman, www.isi.edu/~deelman
The Virtual Metadata Catalog Augments the existing metadata catalogs with
semantic representations Do not modify the original sources
Maps virtual metadata attributes to the original attributes
Represents virtual attributes declaratively Represents constraints as relations between attributes Expands and translates a query to the Virtual Catalog
into the original attributes Explored approach with temporal attributes
Ewa Deelman, www.isi.edu/~deelman
Queries to other metadata catalogs
Mappings
Virtual Metadata
Shared ontologies& vocabularies
Metadata Catalog
Metadata Attributes
Mappings
Virtual Metadata
Virtual Metadata Catalog Service
Metadata Catalog Service
Queries with virtual metadata attributes
Queries with metadata attributes
Queries to multiple catalogsQuery Mediator
Queries to a specific metadata catalog
Ewa Deelman, www.isi.edu/~deelman
Metadata Catalog Service (MCS) Models logical files, logical collections and views Provides a standard set of attributes for each
object type Supports the dynamic definition of attributes of
type String, integer, float, date
Provides a command-line interface and a Java API Used in the Pegasus portal, SCEC, myLEAD
Ewa Deelman, www.isi.edu/~deelman
Execution time Wall clock time or CPU time? Could be specified as:
begin-execution-time, end-execution-time begin-execution-time, duration
Diversity of these attributes can be represented at a semantic level Define the virtual attributes in OWL
Ewa Deelman, www.isi.edu/~deelman
Gratuitous OWL Slide<owl:Class rdf:ID="IntervalThing">
<rdfs:subClassOf rdf:resource= "#TemporalThing"/> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource="#from" /> <owl:maxCardinality rdf:datatype="&xsd;nonNegativeInteger">1</owl:maxCardinality> </owl:Restriction> </rdfs:subClassOf> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource="#to" /> <owl:maxCardinality rdf:datatype="&xsd;nonNegativeInteger">1</owl:maxCardinality> </owl:Restriction> </rdfs:subClassOf></owl:Class>
<owl:ObjectProperty rdf:ID="from"> <rdfs:domain rdf:resource= "#TemporalThing"/> <rdfs:range rdf:resource="#InstantThing"/> <rdf:type rdf:resource= "&owl;FunctionalProperty"/></owl:ObjectProperty><owl:DatatypeProperty rdf:ID="duration"> <rdfs:domain rdf:resource= "#TemporalThing"/> <rdfs:range rdf:resource="&xsd;duration"/></owl:DatatypeProperty>
Ewa Deelman, www.isi.edu/~deelman
Add rules to represent more expressive relations and constraints
[r1: (?x rdf:type tme:IntervalThing), (?x tme:from ?a),(?x tme:duration ?t2), (?a tme:at ?t1), sum(?t1, ?t2, ?t3)makeTemp(?v)->
(?v rdf:type tme:InstantThing) (?v tme:at ?t3) (?x tme:to ?v)]
Ewa Deelman, www.isi.edu/~deelman
MCS Query
Generic Catalog Ontology (file, view, collection)
Query
Virtual Metadata Attributes and Mappings
Reasoner
Distributed domain ontologies
Virtual Metadata Catalog
Metadata Catalog
Metadata Attributes
Metadata Catalog
Metadata Catalog Service (MCS)
Query Mapping
Answer
Ewa Deelman, www.isi.edu/~deelman
Query Mapping
MCS Query
1. Generate query constituents
2. Convert to MCS Attribute names
3. Construct query formula
OWL Query
Virtual metadata attribute value pairs
MCS Attribute value pairs
Distributed domain ontologies
Virtual Metadata Attributes and Mappings
Metadata Catalog Service (MCS)
Generic OWL + Rules Reasoner
answer
“from 2004-01-01T10:00:00” and “duration PT30S”
“startDate” and “endDate”
Load OWL ontologies and rules referenced in the query
generate new attributes “from 2004-01-01T10:00:00” “to 2004-01-01T10:00:30Z”t
Convert “from” to “StartDate”Convert “to” to “Enddate”
Convert from XML schema to simple strings expected by MCS
Ewa Deelman, www.isi.edu/~deelman
Evaluation Developed a prototype system Performed queries across data sets from
different domains Climate modeling, earthquake science,
workflow execution Supported temporal queries using the OWL
time ontology as the target ontology
Ewa Deelman, www.isi.edu/~deelman
Discussion Support only for the query functionality Need to expand to
data publication structuring data into collections Supporting multiple catalogs
Need to support richer catalog structures
Ewa Deelman, www.isi.edu/~deelman
Conclusions Example of building semantic services on top of
legacy catalogs Provided customized views at the semantic level
Views are customized to a particular user-selected ontology Need to expand the functionality to publication Need to address more the handling of alternative data
formats Syntactic versus semantic transformations Transformations between date/time formats, coordinate
systems, etc. Are these transformations workflows?