1
SomeWhere in the Semantic Web
Marie-Christine Rousset
Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué, Laurent Simon
2
The Semantic Web today
Ontology centered– Methodologies, formal languages, platforms and
standards for building (domain) ontologies– Very few query languages for searching data on
the Web
Expressivity is favoured against efficiency or even feasibility of machine processing
OWL Full is undecidable OWL Lite is ExpTime-complete
Cannot not scale up to the Web« semantic » Google does not exist
3
A more data centric vision of the SW
Semantic Web viewed as a huge semantic and distributed data management system
SomeWhere – a peer to peer infrastructure– based on simple personalized ontologies and
mappings distributed at Web scale
Focus of this talk
4
P2P Data Management Systems
Logical network of peers (≠ physical network)– each peer is characterized by
its physical address (IP) a description of the stored resources its neighbors in the network
– the peers to which it can transmit messages (queries,...)
Some structures of logical networks– fixed (Chord, Hypercube)– guided by the semantics
SON, Edutella, Piazza, DRAGO, Somewhere
5
SomeWhere logical networks The topology is not fixed Guided by mappings
– A peer joins by declaring mappings
between its ontology and the ontologies of some peers that it knows
leaves by removing the mappings with its acquaintances in the network
6
SomeWhere in a nutshell
Simple data model based on a propositional language of classes – for defining ontologies, mappings, and queries– a sublanguage of OWL DL (W3C)
Scales up to one thousand peers – logical network : « small world »
7
SomeWhere Data Model
A0
A1 A2
A3 A4 A5
St_A3St_A5
Data Data
Ontology: hierarchy of intentional classes
Schema+Data
Storage description: extensional classes
More complex inclusion statement: St_A1 A1 ¬A2
8
SomeWhere Data Model
Mappings:
Q1 Q2
Q
A1 (A2 A3) B1 ¬B3
A1 A2
A3
A
B
B1 B2
B3
Queries:
Logical combination of class literals: A1 ¬A3
A1 B3 B3 A1
9
Semantics
Standard logical semantics– one single domain of interpretation– a distributed set of formulas interpreted in the same way as if they
were not distributed Distributed semantics of DDL or DFOL
– more loose– based on distributed interpretations defined w.r.t a collection of
domains of interpretations Our assumption:
– the objects have a unique URI– objects stored at different peers and having the same URI are
interpreted as being the same
10
Data model: example
Musique
Rock Pop Classique
Français US St_pop Tchaikovsky
St_Français St_US St_Tchaikovsky
Radio
Mouv Nostalgie
St_Mouv St_Nostalgie
Mouv Rock
Music
Pop_Rock Classical
St_Pop_Rock
St_Ru Tchai
St_Tchai
Ru It
Rock Pop Pop_Rock
Tchai Tchaikovsky
Tchaikovsky Tchai
P1P2
11
Query answering : illustration
Musique
Rock Pop Classique
Français US St_pop Tchaikovsky
St_Français St_US St_Tchaikovsky
Music
Pop_Rock Classical
St_Pop_Rock
St_Ru Tchai
St_Tchai
Radio
Mouv Nostalgie
St_Mouv St_Nostalgie
Rock Pop Pop_Rock
Tchai Tchaikovsky
Mouv Rock
Pop_Rock ?
Ru It
Pop? Rock ?
Mouv ?
St_Pop_Rock
St_Francais St_Pop
Rewritings
St_USSt_Français
St_Mouv
St_US St_Pop
St_Mouv St_Pop
St_Pop
Tchaikovsky Tchai
St_Pop
St_Français
St_Mouv
St_Pop St_USSt_Pop
P1P2
12
Query answering in SomeWhere Decomposition of queries/recombination of answers
– only atomic queries are transmitted to peers a complex query is splitted into atomic queries
– each solicited peer processes a given atomic query q and incrementally sends back intentional answers for it
(conjunction of) extensional classes that are rewritings of q– intentional answers of different atomic queries resulting from the split
of a complex query must be recombined intentional answers can combine extensional classes of different peers
Computation of proper prime implicates in distributed clausal propositional theories
– Ontologies and mappings are encoded as clauses– Proper prime implicates of the negation of a conjunctive query Q are the
negation of the conjunctive rewritings of Q
13
Query answering algorithm
Message based algorithm– same algorithm on each peer
query, answer, and termination messages
Properties – soundness– completeness– termination (even for cyclic networks)
Comedy Humor
Humor Comedy
Action Suspense Thriller
Animation Cartoons
MyBookmarks
Movies
Comedy Thriller Cartoons Adult
P3
DramaComedy
Drama
Friends Humor
MyBookmarks
Genre
Drama Fantastic Gore
P5
MyBookmarks
IMDB
Drama Comedy
TVShows
Friends Alias
P6
MyBookmarks
DVD
Humor Romance
P4
BenStiller Comedy
MyBookmarks
DVD
Animation Action Suspense
P1
MyBookmarks
Actors
BruceWillis BenStiller JuliaRoberts
P2
BruceWillis Action
Drama Thriller
BruceWillis Action
BenStiller Comedy Drama Thriller
DramaComedy
Drama
Friends Humor
Comedy Humor
Humor Comedy
Action Suspense Thriller
Animation Cartoons
P3
P1
P2 P5
P6
P4
Classes extensions
MyBookmarks
Movies
Comedy Thriller Cartoons Adult
MyBookmarks
Genre
Drama Fantastic Gore
MyBookmarks
DVD
Humor Romance
MyBookmarks
IMDB
Drama Comedy
TVShows
Friends Alias
Drama Thriller
DramaComedy
Drama
Friends Humor
Comedy Humor
Humor Comedy
Action Suspense Thriller
Animation Cartoons
P3
P5
P6
P4
MyBookmarks
DVD
Animation Action Suspense
MyBookmarks
Actors
BruceWillis BenStiller JuliaRoberts
P2
BruceWillis Action
BenStiller Comedy
P1
Q1: Thriller ?
Q2: NOT Adult ?
Q3: Thriller AND Comedy ?
MyBookmarks
Movies
Comedy Thriller Cartoons Adult
MyBookmarks
DVD
Animation Action Suspense
MyBookmarks
Actors
BruceWillis BenStiller JuliaRoberts
MyBookmarks
Genre
Drama Fantastic Gore
MyBookmarks
DVD
Humor Romance
MyBookmarks
IMDB
Drama Comedy
TVShows
Friends Alias
BruceWillis Action
BenStiller Comedy Drama Thriller
DramaComedy
Drama
Friends Humor
Comedy Humor
Humor Comedy
Action Suspense Thriller
Animation Cartoons
P3
P1
P2 P5
P6
P4
Thriller ?
P3:ThrillerP1:Action P1:SuspenseP5:DramaP6:DramaComedyP2:BruceWillis P1:Suspense
Rewritings:
P3:Thriller
P1:Action P1:Suspence
P5:Drama
P6:DramaComedy
P2:BruceWillis P1:Suspense
Rewritings of Thriller: evaluation
P3
P1
P5
P6
P5
Local
Navigational
Navigational
Navigational
Integration
20
SomeWhere infrastructure
1 machine
N peers
N machines
K peers per machine
N machines
1 peer per machine
22
Scalability experiments
on randomly generated networks– 1000 peers– small world topology
Close to the topology of the web
peers– ontologies
random clauses of length 2
– mappings random clauses of length 2 or 3
23
P = 0.01
Varying topologies
1000 peers Ring, 10 neighbours/peer
P = 0.1Small world
P = 1
Random graph
P: probability of redirecting an edge
Model of Watts and Strogatz
24
Scalability results
Varying parameters Number of mappings between peers complexity of mappings
– ratio of clauses of length 3 (0%, 20%, 100%)
timeout : 30 s/query
Depth of query processing Small depth (less than 7) even on the hard cases
Time to produce a number of answers In 90% cases, the first answer is produced within 2 seconds Easy cases (simple mappings):
– few answers per query (5 on average)– very fast (less than 0.1s) to compute all the answers without timeouts
Hard cases (complex and more mappings per edge)– around 1000 answers per query (but > 30% queries not complete : timeouts)– quite fast to obtain them (less than 20s)
25
Ongoing work (1)
Extending the data model to RDF(S)– W3C recommendation for describing web resources– Classes and (binary) relations between objects– each object is identified by a URI
Propositional encoding – of the schema – of the (atomic) queries
Query answering:– Propositional rewriting of each atom based on the
encoding (variables are removed)– Composition of the rewritings by adding corresponding
variables to each rewritten atom
26
Triple : <resource, property, value> Relational : property(resource, value)
Graphical :
RDF data model
resource valueproperty
http://www.louvre.fr "Le Louvre" MuseumName
http://www.paris.fr
Located
" Paris" CityName
27
ArtistName
RDFS
Located
MadeBy
Contains
City CityName
Literal
Museum Work ArtistMuseumNameLiteral
Is-a
ArcheologyMuseum
Is-a
ModernMuseum
Literal
CulturalPlace
Is-a
WorkName
Literal
29
Query rewriting
Propositionalisation of RDFS statements
Query rewriting using SomeWhere
C1dom C2
dom C1range C2
range
P1rel P2
rel
Prel Cdom Prel Crange
30
illustration
Q(X,Y): P2.Work(X)P2.refersTo(X,Y)
P2.Workdom P2.Workrange
P2.Paintingdom …
SomeWhere rewriting
SomeWhere rewriting
P1.Paintsrel …
SomeWhere rewriting
P1.belongsTorel …
P2.Painting(X) P1.Paints(Z,X) P1.belongsTo(X,Y)
R1(X,Y): P2.Painting(X)P1.belongsTo(X,Y)
P2.refersTorel
31
illustration
Q(X,Y): P2.Work(X)P2.refersTo(X,Y)
P2.Workdom P2.Workrange
P2.Paintingdom …
SomeWhere rewriting
SomeWhere rewriting
P1.Paintsrel …
SomeWhere rewriting
P1.belongsTorel …
P2.Painting(X) P1.Paints(Z,X) P1.belongsTo(X,Y)
P2.refersTorel
R2(X,Y): P1.Paints(Z,X)P1.belongsTo(X,Y)
32
Ongoing work (2)
Handling inconsistencies– how to define them ?
insatisfiability (no model) => inconsistency not a necessary condition
– how to check consistency? at each join of a new peer
– how to deal with inconsistency? correct it or reason with it ?
A’
A
B’
B
B’ A’
A B
for each A, there exists a model in which A is non empty: S | A
33
illustration
path m1:
AIPubli is a subclass of Conf.
inconsistencies are caused by mappings.
Article
Theory Expe
P2
path m0 -> m2: AIPublic is a subclass of Journal.
Conf and Journal are disjoint, therefore AIPUbli is necessarily empty
Publi
Conf Journal
P3
>--<
2005
AIPubli BDPubli
P1
AIPubli Theory
m0
Theory Journal
m2
2005 Conf
m1
34
P2P detecting of inconsistencies
Propagation of m1: { ¬AIPubli v Conf;¬AIPubli v Publi;¬AIPubli v ¬Journal; ¬BDPubli v Conf; ¬BDPubli v Publi; ¬BDPubli v ¬Journal }.
No production of unit clause No inconsistency
Propagation of m2: { ¬Theory v Journal; ¬AIPubli v Journal;…..; ¬AIPubli ;…; ¬AIPubli v ¬Conf}. Production of a unit clause Inconsistency{m1,m2} is a NoGood stored at P3
¬Conf v Publi¬Journal v Publi¬Journal v ¬Conf
¬AIPubli v 2005¬BDPubli v 2005
¬Theory v Article¬Expe v Article
¬AIPubli v Theory
¬2005 v Conf
m1
¬Theory v Journal
m2
36
Principle:– avoid the inconsistencies when constructing answers
Semantics of « well-founded » answer:– obtained from a consistent subset of formulas
Algorithm:– for each answer,
build its set of mapping supports and return the set of NoGoods encountered during the reasoning,
discard the mapping supports including a NoGood
– return the answers having a not empty set of mapping supports
P2P well-founded reasoning