View
215
Download
2
Embed Size (px)
Citation preview
When the Keywords fall short
Consider the following query: Give me pictures of Niagara Falls taken from the
Falls Avenue in Niagara Falls, Canada with a tele-lens.
the subject that is photographed the vantage point of the photographer the “tool” which is used
When the Keywords fall short
This is what we are after:
When the Keywords fall short This is what we
get:
When the Keywords fall short Or this:
So, what do we need to improve on that? A tool for specifying (rich) metadata
(not just keywords). the metadata should include
different taxonomies/ontologies of classes and resources that belong to those classes, e.g. a geographical ontology, ontology of lenses etc.
properties within the classes of a certain ontology but also among different “neighboring” ontologies
To insure scalability the rule “anything can say anything about anything” applies
Resource Description Framework (RDF)
A foundation of processing metadata Issued by the W3C consortium www.w3.org While relying on XML for serialization, RDF focuses on
semantics The formal semantics of RDF is defined see
RDF Model Theory The aim is to make the Web “machine-understandable” not
only machine readable an open-world framework that allows anyone to make
simple assertions about anything it does not guarantee nor assumes consistency of these
assertions distributed by its nature as is the WWW fully extensible
RDF in XML serialization
“Ora Lassila” is the creator of the resource http://www.w3.org/Home/Lassila.
<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:s="http://description.org/schema/"> <rdf:Description about="http://www.w3.org/Home/Lassila">
<s:Creator>Ora Lassila</s:Creator> </rdf:Description> </rdf:RDF>
RDF Data Model (A kind of) Directed Labeled
Graph edges represent properties nodes represent
resources (ellipses) literals (rectangles) are shared
there may be multiple edges between two nodes
there may be multiple edges with the same label pointing to different nodes
the graph can contain cycles the graph is not necessarily
connected
28
Pic1 Person1
filmFrame age
createdBy
depicting
Person2depicting
subject objectpredicate
RDF Data Model as Triplets
Definition: An RDF model M is a finite set of triplets (also called
statements) of the form:M R x U x (R L)
Where U is the set of (references to)resource identifiers or URIsL is the set of literals (string like elements) which denote
the actual content (data)B is the set of blank nodes (nodes that don’t have a
label)R= U B is the set of all resources (both blank and
labeled)
subject objectpredicate
RDF Data Model as Triplets
Example: [Pic1, filmFrame, “28”] [Pic1, depicting, Person1] [Pic1, createdBy, Person1] [Pic1, depicting, Person2] [Person1, age, “28”]
28
Pic1 Person1
filmFrame age
createdBy
depicting
Person2depicting
RDF Data Model
Implications: properties have always labels (URIs),
i.e., there are no “blank” properties literal nodes cannot have properties
(they can stand only as property values) it is possible to make statements about
properties, but beware a statement about one property propagates to all properties with the same label in the model. In fact all properties with the same label are treated as one resource!
RDF Data Model as Triplets
Example: [Pic1, filmFrame, “28”] [Pic1, depicting, Person1] [Pic1, createdBy, Person1] [Pic1, depicting, Person2] [Person1, age, “28”]
28
Pic1 Person1
filmFrame age
createdBy
depicting
Person2depicting
[depicting, how, “nicely”] depicting nicelyhow nicely
howhow
Any ideas?
RDF Data Model as Triplets
Solution: [Pic1, depicting, b1] [b1, how, “nicely”] [b1, whom, Person1] [Pic1, createdBy, Person1] [Pic1, depicting, b2] [b2, whom, Person2]
Pic1 Person1createdBy
depicting
Person2depicting
b1
b2
whom
whom
nicelyhow
RDF built-in resource and properties rdf:Property
a resource that represents (a type of) all properties rdf:type
a property which says that a resources is an “instance” of another resource, e.g. [john, rdf:type, Person]
DefinitionA model M is RDF-closed iff [xxx,yyy,zzz] M, [yyy, rdf:type, rdf:Property]
Adding Gadgets: Containers
rdf:Bag - a set with duplicates or an unordered list rdf:Seq - a sequence or an ordered list rdf:Alt -a list of “equivalent” alternatives rdf:li - a container membership property [container,
lists, X]
Committee
rdf:Bag
rdf:type John
Jessierdf:li
rdf:li
Document
approved
Joseph
rdf:li
John
Jessieapproved
approved
Document
Joseph
approved
The same?
Adding Gadgets Reification or statements about statements
rdf:Statement rdf:subject rdf:object rdf:predicate
Beware you can make assertions about the statement resource (i.e. not really about the property)
Pic1 Person1depicting
rdf:Statement
rdf:subject rdf:object
rdf:predicate
rdf:typeS
nicelyhow
RDFSchema: adding more structure A modeling language on top of RDF
(expressed in RDF) Introduces the following modeling primitives:
rdf:Propetyrdfs:Class
rdf:type rdf:subClassOf rdf:subPropertyOfrdfs:Literal
rdf:range
rdf:domain
rdfs:Resourcerdfs:subClassOf
rdf:type
RDFSchema: Classes resources may be divided into groups called classes
Definition The extent of a class is the set of all resources that are linked to that class by the rdf:type property.
two classes with the same extent are not necessarily the same, i.e. they might have different properties classes can be subsumed by other classes with the transitive rdfs:subClassOf property a class can be an instance of itself (!)
this blurs the clean distinction of the schema and their instances known from databases in some cases it actually reflects the real-world, e.g. “Tulip” can be both a class (subclass of say “Flower”) and also its own instance when we are not
interested in particularities of the instance, e.g. the picture depicts a tulip.
RDFSchema: Properties properties are linked to classes by specifying their rdfs:domain and rdfs:range,. i.e., they are defined externally wrt classes (they are
first-class citizens)Definition The set D(R) of a property p is defined as the intersection of extents of all classes c indicated by [p,rdfs:domain (rdfs:range),c]Definition The extent of a property p is the set E of all triplets [x, p, y], where xD and y R
If a property has more than one rdfs:domain (rdfs:range), objects(subjects) using this property are instances of all classes stated in the rdfs:domain (rdfs:range) .
properties can be (also) subsumed with the transitive rdfs:subPropertyOf property
RDFSchema:RDFS-ClosureDefinition
A model M is RDFS-closed if the following holds: 1. [x,y,z] M, [y, rdf:type, rdf:Property]2. [x,y,z], [y, rdfs:domain,u] M, [x, rdf:type, u]3. [x,y,z], [y, rdfs:range,u] M, [z, rdf:type, u]4. [x,y,z] M, [x, rdf:type, rdfs:Resource]5. [x,y,z] M where z U, [z, rdf:type, rdfs:Resource]6. [x,rdfs:subPropertyOf,y] and [y,rdfs:subPropertyOf,z] M
[x,rdfs:subPropertyOf,z]7. [x,y,z] and [y,rdfs:subPropertyOf,u] M [x,y z]8. [x, xrdf:type, rdfs:Class] M [x,rdfs:subClassOf, rdfs:Resource]9. [x,rdfs:subClassOf,y] and [y,rdfs:subClassOf,z] M
[x,rdfs:subClassOf,z]
10. [x,rdfs:subClassOf,y] and [a,rdf:type,x] M [a,rdf:type,y]
These can be also considered as inference rules, i.e. how to derive an RDFS-closed graph
RDFSchema: Example
RDF(S) Applications (Web) metadata
CC/PP Composite Capabilities/Preferences Profiles
P3P Platform for Privacy Preferences Data integration (programmable mediators)
foundation for higher ontology languages OIL, OWL
RDF(S) query languages Web needs not only the metadata but also a
means to reason about them. RQL, RDQL, etc.
RDF(S) Pitfalls Unrefined Datatypes
XML data type system can be adopted Reification needs an external interpretation
it is application dependent, i.e. there might be differences in interpretations
More expressive power? price to pay in terms of performance, scalability sometimes not needed
There are not so many RDF metadata available on the Web
this is not a shortcoming of RDF(S) triplets can be harvested from X-link but mainly generated from the DB backends
RDF(S) Pitfalls 2, The Designation Problem
Correspondence between URIs and real-world objects
how do we assure that two (distributed) sites when referring to the same real-world object are using the same URI?
if they don’t how do we reconcile (not express) two different URIs pointing to one object?
Given reconciliation? By whom? a “Web URI
Institution”? Value based decision?
two URIs having the same value do not have to be the same real-life objects...
Source 1 Source 2
John 1234
John
The End of Part 1
Let’s have a break!