4
Interpretations between RDF and the Logical Data Model Sergio Mu˜ noz Facultad de Ingenier´ ıa Universidad Cat´ olica de la SSMA Concepci ´ on [email protected] Claudio Gutierrez Department of Computer Science Universidad de Chile [email protected] Abstract In 1984 Kuper and Vardi presented a logical data model (LDM) which generalizes the relational, hierarchical and networks database models. In this model, a database is a schema represented by a directed multigraph whose leaves represent data. In 1998, the W3C proposed the Resource Description Framework (RDF) as a data model for repre- senting the network of metadata on the Web. In this short paper we show that LDM and RDF are mu- tually interpretable in each other, presenting the result for RDF without vocabulary from RDFS. 1. Introduction In 1998 the W3C issued a recommendation of a meta- data model and language to serve as the basis for an in- frastructure of machine-readable semantics for the data on the Web, the Resource Description Framework (RDF) [6]. The activity around RDF increased over time, being today a recommendation (standard) of the W3C [7, 8] and a widely accepted format for metadata on the Web. In the RDF model the universe to be modeled is a set of resources (universal resource identifiers, URI). The lan- guage to describe them is a set of properties, technically bi- nary predicates. Descriptions are statements over resources in the subject-predicate-object structure, where the object can be a string, and both subject and object can also be anonymous objects, known as blank nodes. In addition, the RDF specification includes a built-in vocabulary with a normative semantics (RDFS). This vocabulary deals with inheritance of classes and properties, as well as typing, among other features [9]. Good references are [10] and [11]. From a database point of view, the RDF data model re- sembles heavily graph data models which flourished in par- allel with object oriented languages two decades ago [1]. One of the first and most influential of these graph data models was the Logical Data Model (LDM) [3, 4], a pro- posal of Kuper and Vardi that generalizes the relational, hi- erarchical and networks database models. The LDM data model consists of a schema and instances. A schema is essentially a directed multigraph, whose nodes point to the corresponding type of data in dependence of their own type: union, products, powerset or simple data, which are the leafs of the schema graph. An instance is a map (valua- tion) which assigns names and actual data to each of the el- ements of the schema. The model turns out to be very flex- ible and extensible. Among the features of the LDM model are its solid logical and database foundation, and the for- mal relationship to the relational model. In fact, the LDM model includes a logic and an algebra, over which Kuper and Vardi build a language for specifying constraints and a query language. In this short paper we present interpretations of RDF in LDM and viceversa. This basic result is relevant in two directions: (1) Allows to translate the logical machin- ery of the well studied database model LDM to RDF; and (2) Gives a new interpretation of RDF in terms of classi- cal databases. For the interpretation of RDF in LDM, we present here the result for the fragment of RDF without RDFS vocabulary. 2. Preliminaries The RDF model We assume the reader is familiar with the RDF model, and will present here only some basic no- tations. For details about the RDF model see [6, 8, 9, 5]. Assume there is an infinite set U (RDF URI references); an infinite set B = {N j : j N} (Blank nodes); and an in- finite set L (RDF literals). A triple (v 1 , v 2 , v 3 ) (U B) × U × (U B L) is called an RDF triple. In such a triple, v 1 is called the subject, v 2 the predicate and v 3 the object. We denote by UBL the union of the sets U, B and L. An RDF graph is a set of RDF triples. A subgraph is a subset of an RDF graph. The universe of an RDF graph G, universe(G), is the set of elements of UBL that occur in the Proceedings of the Third Latin American Web Congress (LA-WEB’05) 0-7695-2471-0/05 $20.00 © 2005 IEEE

[IEEE Third Latin American Web Congress (LA-WEB'2005) - Buenos Aires, Argentina (31-02 Oct. 2005)] Third Latin American Web Congress (LA-WEB'2005) - Interpretations between RDF and

  • Upload
    c

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: [IEEE Third Latin American Web Congress (LA-WEB'2005) - Buenos Aires, Argentina (31-02 Oct. 2005)] Third Latin American Web Congress (LA-WEB'2005) - Interpretations between RDF and

Interpretations between RDF and the Logical Data Model

Sergio MunozFacultad de Ingenierıa

Universidad Catolica de la SSMA [email protected]

Claudio GutierrezDepartment of Computer Science

Universidad de [email protected]

Abstract

In 1984 Kuper and Vardi presented a logical data model(LDM) which generalizes the relational, hierarchical andnetworks database models. In this model, a database is aschema represented by a directed multigraph whose leavesrepresent data. In 1998, the W3C proposed the ResourceDescription Framework (RDF) as a data model for repre-senting the network of metadata on the Web.

In this short paper we show that LDM and RDF are mu-tually interpretable in each other, presenting the result forRDF without vocabulary from RDFS.

1. Introduction

In 1998 the W3C issued a recommendation of a meta-data model and language to serve as the basis for an in-frastructure of machine-readable semantics for the data onthe Web, the Resource Description Framework (RDF) [6].The activity around RDF increased over time, being today arecommendation (standard) of the W3C [7, 8] and a widelyaccepted format for metadata on the Web.

In the RDF model the universe to be modeled is a setof resources (universal resource identifiers, URI). The lan-guage to describe them is a set of properties, technically bi-nary predicates. Descriptions are statements over resourcesin the subject-predicate-object structure, where the objectcan be a string, and both subject and object can also beanonymous objects, known as blank nodes. In addition,the RDF specification includes a built-in vocabulary with anormative semantics (RDFS). This vocabulary deals withinheritance of classes and properties, as well as typing,among other features [9]. Good references are [10] and[11].

From a database point of view, the RDF data model re-sembles heavily graph data models which flourished in par-allel with object oriented languages two decades ago [1].One of the first and most influential of these graph data

models was the Logical Data Model (LDM) [3, 4], a pro-posal of Kuper and Vardi that generalizes the relational, hi-erarchical and networks database models. The LDM datamodel consists of a schema and instances. A schema isessentially a directed multigraph, whose nodes point tothe corresponding type of data in dependence of their owntype: union, products, powerset or simple data, which arethe leafs of the schema graph. An instance is a map (valua-tion) which assigns names and actual data to each of the el-ements of the schema. The model turns out to be very flex-ible and extensible. Among the features of the LDM modelare its solid logical and database foundation, and the for-mal relationship to the relational model. In fact, the LDMmodel includes a logic and an algebra, over which Kuperand Vardi build a language for specifying constraints and aquery language.

In this short paper we present interpretations of RDFin LDM and viceversa. This basic result is relevant intwo directions: (1) Allows to translate the logical machin-ery of the well studied database model LDM to RDF; and(2) Gives a new interpretation of RDF in terms of classi-cal databases. For the interpretation of RDF in LDM, wepresent here the result for the fragment of RDF withoutRDFS vocabulary.

2. Preliminaries

The RDF model We assume the reader is familiar withthe RDF model, and will present here only some basic no-tations. For details about the RDF model see [6, 8, 9, 5].

Assume there is an infinite set U (RDF URI references);an infinite set B = {Nj : j ∈ N} (Blank nodes); and an in-finite set L (RDF literals). A triple (v1,v2,v3) ∈ (U∪B)×U× (U∪B∪L) is called an RDF triple. In such a triple, v1

is called the subject, v2 the predicate and v3 the object. Wedenote by UBL the union of the sets U, B and L.

An RDF graph is a set of RDF triples. A subgraph is asubset of an RDF graph. The universe of an RDF graph G,universe(G), is the set of elements of UBL that occur in the

Proceedings of the Third Latin American Web Congress (LA-WEB’05) 0-7695-2471-0/05 $20.00 © 2005 IEEE

Page 2: [IEEE Third Latin American Web Congress (LA-WEB'2005) - Buenos Aires, Argentina (31-02 Oct. 2005)] Third Latin American Web Congress (LA-WEB'2005) - Interpretations between RDF and

triples of G, and UB theTwo RDF graphs are isomorphic if one can be obtained

from the other by a consistent renaming of blank nodes.

The Logical Data Model The Logical Data Model(LDM) is described in detail in [4]. In what follows wegive the necessary formalization for stating our results.

An LDM schema is a tuple S = 〈V,E,<,τ〉, where(V,E) is a directed multigraph, < is a total order on edgesof E, and τ is a typing function from V to the set of types{�,�,�,�}, that satisfies the following conditions:

1. If τ(v) = � (data node), then there are no outgoingedges from v.

2. If τ(v) = � (powerset node), then v has exactly oneoutgoing edge; we use τ(v) = (�,u) as an abbrevia-tion for “τ(v) = � and its child is u”.

3. If τ(v) = � (union node), then the outgoing edgesfrom v point of different nodes; we use τ(v) =(�,v1, . . . ,vn) as an abbreviation of “τ(v) = � andthere are exactly n outgoing edges from v and theypoint to the distinct nodes v1, . . . ,vn”.

4. If τ(v) = � (product node), then v may have severaloutgoing edges pointing to not necessarily differentnodes; we use τ(v)= (�,v1, . . . ,vn) as an abbreviationof “τ(v) = � and there are exactly n outgoing edgese1, . . . ,en from v pointing to the nodes v1, . . . ,vn, withorder on the edges given by <.

The order < on E makes sense only for nodes of type �.We will use πi(val(l)) to denote the i-th coordinate of thevalue of l, if l is a name in a product node. We assume afixed infinite set N of names and a fixed infinite set D ofdata values. An instance I of a schema S with data in Dand names in N is a pair 〈I,val〉 that satisfies the following:

1. I is a map that associates subsets of N to the nodes ofV (the “instances” of the node). It is required that I(u)and I(w) be disjoint if u �= w in V .

2. val is a mapping with domain⋃

v∈V I(v) satisfying:

(a) If τ(v) = � and l ∈ I(v), then val(l) is a datumin D.

(b) If τ(v) = (�,v1, . . .vn) and l ∈ I(v), thenval(l) = (l1, . . . , ln), where li ∈ I(vi) for each1 ≤ i ≤ n.

(c) If τ(v) = (�,w) and l ∈ I(v), then val(l) is afinite subset of I(w).

(d) If τ(v) = (�,v1, . . .vn) and l ∈ I(v), then val(l)∈⋃{I(vi) : 1 ≤ i ≤ n}.

Two instances I1 and I2 are isomorphic if one is ob-tained from the other by a renaming of names, provided bya map between names that is consistent with the schema.Data do not change through isomorphisms.

An extension of a schema is another schema obtainedfrom the former by adding extra nodes and edges, wherenew edges connect two new nodes, or go from new to oldnodes. An extension I′ of an instance I is an instance overan extension of the schema of I, such that names and valuesof I do not change. Note that an extension does not alter thedata.

3. Interpreting LDM in RDF

In this section we define the mapping from LDM toRDF. The idea is to define a vocabulary corresponding tothe types of nodes of LDM and the relationships expressedby the edges of the schema graph. The aim of this mappingis to translate schemas and instances together as a RDFGraph, using at most the reserved vocabulary of classes,subclasses and subproperties. We have to simulate struc-tures that appear in LDM such as tuples and sets, allowingsubsequent translations to an expanded vocabulary.

Definition 1 (Mapping of a schema). Let S = 〈V,E,<,τ〉be a schema of a LDM. Then the translation of S into theRDF Graph R(S) is defined by:

• The classes data, prod, union, power, and edge,plus one class R(u) for each u ∈V .

• A predicate less, the predicate val, and one predicateR(e) for each edge e ∈ E.

• For every u ∈ V the RDF Graph R(S) contains(R(u),sc,x), where x is the class data, prod, union,or power, if τ(u) = �, �, � or �, respectively.

• For every edge e of the form (u,v), R(S) con-tains: (R(u),R(e),R(v)), (R(e),type,edge),(less,dom,edge), (less,range,edge), and(R(e),less,R(e′)) for every e,e′ ∈ E with e < e′. 1

Definition 2 (Mapping of an instance). Let S = 〈V,E,<,τ〉 be an LDM schema and let I = (I,val) be an instanceof S with N as the fixed set of object names and D as thefixed set of data values. Then the translation of I to theRDF Graph R(I) is defined as follows:

• R(I) contains all triples of R(S).

• R(I) contains (R(l),type,R(u)) for every u ∈ V andl ∈ I(u).

1Note that this implies that less is closed in R(S) under transitivity

Proceedings of the Third Latin American Web Congress (LA-WEB’05) 0-7695-2471-0/05 $20.00 © 2005 IEEE

Page 3: [IEEE Third Latin American Web Congress (LA-WEB'2005) - Buenos Aires, Argentina (31-02 Oct. 2005)] Third Latin American Web Congress (LA-WEB'2005) - Interpretations between RDF and

• For each name l ∈ N there is a different Blank nodeR(l), and for each d ∈ D there is a literal R(d).

• For each edge e ∈ E of the form (u,v), each l ∈ I(u),and each l′ ∈ I(v), the triple (R(l),R(e),R(l′)) is inR(I) if one of the following cases holds:

– τ(u) = � and l′ ∈ val(l).

– τ(u) = � and l′ = val(l)

– τ(u) = � and e is the i-th edge in the set of out-going edges of u, and l′ is the i-th component ofthe tuple val(l).

• For each u ∈V and l ∈ I(u) and d ∈ D with d = val(l),the triple (R(l),val,R(d)) is in R(I).

Theorem 1. If I1 and I2 are two isomorphic instances, thentheir translations R(I1) and R(I2) are isomorphic RDFGraphs.

4. Interpreting RDF in LDM

Recall that we will interpret RDF graphs without vocab-ulary with built-in semantics (i.e. in RDFS).

We remark two main features os this translation: first,that blank nodes of RDF are interpreted as names (ad-dresses) in LDM, preserving the existential character ofblank nodes and the independence of their names (denota-tion) in RDF; second, that the translation of a simple RDFGraph is made by a specific kind of instance over a specifickind of schema, allowing the re-creation of RDF Graphsfrom such instances.

Let U, B and L as described for RDF in the Prelimi-naries. The key for this translation is a special sub-schemaof nodes used as the base for constructing the schema ofLDM over which to define the instances that will be thetranslation of the RDF Graph. Such sub-schema, which wewill call predicate sub-schema, is defined as follows (seeFigure 1):

1. τ(v1) = (�,v2,v3,v4).

2. τ(v2) = (�,v3,v6).

3. τ(v4) = (�,v3,v5,v6).

4. τ(v3) = τ(v5) = �.

5. τ(v6) = (�,w1, . . . ,wn), for some positive integer n.

6. For every i ∈ {1, . . . ,n}, τ(wi) = �.

A p-schema for LDM will be any extension of an arbi-trary schema which is made using only predicate schemas.In order to define the translation of a simple RDF Graphas an instance, we fix the domain as the set D = U∪L∪

U U

Xv_1

v_2

U

v_3 v_4

v_6

w_1

w_n

w_2

v_5

Figure 1. Sub-schema v1, . . . ,v6,w1, . . . ,wn . Thenodes with label U denote union nodes, and withlabel x product nodes.

{null}, where null does not belong to UBL. We also fix theset of names N = B∪N0, where N0 is an infinite set that isdisjoint of D and of B. We use v1, . . . ,v6,w1, . . . ,wn todenote p-schemas.

A p-instance I is an instance over a p-schema S wherefor each predicate sub-schema v1, . . . ,v6,w1, . . . ,wn itholds that all the data in v3 belong to U, and all the datain v5 belong to L. The unique datum in wi is null, for everyi ∈ {1, . . . ,n}. The only nodes which may have elementsof B as names are the wi’s. There is a unique p ∈ U re-lated with the second coordinate of any value in v1, that is,{p} = {val(π2(val(l)))}. Such p is called the head predi-cate of the predicate sub-schema.

For every RDF Graph G and every p ∈ U denote byπp(G) the set of triples of G which have predicate p.

Definition 3 (Instance of an RDF Graph). Let G be asimple non empty RDF Graph. The translation of G intoLDM is a p-instance G that includes:

• Two data nodes gU ,gL, where gU has as data all URIoccurring in G, and gL any Literal appearing in G.

• One data node w j for each Blank node Nj of B thatappears in G, if there is one; otherwise, assume thatw has some arbitrary name with value null. A unionnode gB which is the union of all w j nodes. 2

• A vp,sub j,gU ,ob j,gL,gB,w1, . . . ,wk schema foreach p ∈ U with πp(G) �= /0, where p is its head predi-cate, and where the wi are nodes described in the pre-vious item.

Lemma 1. The instance of a simple RDF Graph is welldefined up to isomorphisms that fix names in B.

Theorem 2. If G1 and G2 are two isomorphic simple RDFGraphs, then G1 and G2 are two isomorphic instances.

Theorem 3. For any P-instance I there exist a RDF simpleGraph G such that I and G are isomorphic.

2This means that values in G(GB) are the Blank nodes in w.

Proceedings of the Third Latin American Web Congress (LA-WEB’05) 0-7695-2471-0/05 $20.00 © 2005 IEEE

Page 4: [IEEE Third Latin American Web Congress (LA-WEB'2005) - Buenos Aires, Argentina (31-02 Oct. 2005)] Third Latin American Web Congress (LA-WEB'2005) - Interpretations between RDF and

5. Conclusions

Establishing relations between the RDF model andgraph database models is a topic that, to the best of ourknowledge, has not been addressed formally. We can men-tion the work of Decker [2] comparing RDF with the rela-tional and semistructured database models, and the recentwork of Angles et al. [1] which concentrates on query lan-guages.

In this paper we build interpretations between RDF andLDM, and future work includes translating the features ofLDM to RDF. Some comments on this preliminary workare in order. Interpreting RDF into LDM has subtleties.RDF has a logic intrinsic to the data model, whereas thelogic of LDM is given by an external language. One wouldlike to embed RDF directly into LDM without using thelogic language for constraints in LDM, as we did in thispaper for simple RDF. This approach is not possible in thegeneral case due to the semantics of several predicates inRDF Schema, like subclass and subproperty. Nevertheless,using the external language of LDM we claim that this ex-tension can be done without problems.

Acknowledgments This work was supported by the Cen-ter for Web Research (CIW), Millenium Nucleus, MideplanP04-067-F. Sergio Munoz also acknowledges Proyecto In-terno DIN 2005-5, UCSC.

References

[1] R. Angles, C. Gutierrez, Querying RDF Data from a GraphDatabase Perspective, 2nd. European Semantic Web Con-ference (ESWC2005), May 2005, Heraklion, Greece. Lec-ture Notes in Computer Science, Volume 3532 / 2005, pp.346-360.

[2] S. Decker, Semantic Web and Databases: Relationships andsome Open Problems, International Semantic Web WorkingSymposium (SWWS). 2001.

[3] G. M. Kuper and M. Y. Vardi, A New Approach to DatabaseLogic. Proceedings of the 3th ACM SIGACT-SIGMODSymposium on Principles of Database Systems. 1984.

[4] G. M. Kuper, M. Y. Vardi, The Logical Data Model, ACMTransactions on Database Systems, Volume 18, No. 3,September 1993, pp. 379 - 413.

[5] C. Gutierrez, C. Hurtado, A. O. Mendelzon, Foundationsof Semantic Web Databases, Proceedings ACM Symposiumon Principles of Database Systems (PODS), Paris, France,June 2004, pp. 95 - 106.

[6] Resource description framework (RDF) model and syntaxspecification, Edit. O. Lassila, R. Swick, Working draft,W3C, 1998.

[7] RDF/XML Syntax Specification (Revised) W3C Recommen-dation 10 February 2004, Edit. D. Beckett

[8] RDF Semantics, W3C Recommendation 10 February 2004Edit. P. Hayes

[9] RDF Vocabulary Description Language 1.0: RDF Schema,W3C Recommendation 10 February 2004, Edit. D. Brickley,R.V. Guha.

[10] RDF Concepts and Abstract Syntax, W3C Recommendation10 February 2004, Edit. G. Klyne, J. J. Carroll.

[11] RDF Primer, W3C Recommendation 10 February 2004,Edit. F. Manola, E. Miller,

Proceedings of the Third Latin American Web Congress (LA-WEB’05) 0-7695-2471-0/05 $20.00 © 2005 IEEE