33
RDF Databases By: Chris Halaschek

RDF Databases

Embed Size (px)

DESCRIPTION

RDF Databases. By: Chris Halaschek. Outline. Motivation / Requirements Storage Issues Sesame General Introduction Architecture Scalability RQL Introduction Demo Future Directions. Motivation. Having metadata available is not enough - PowerPoint PPT Presentation

Citation preview

Page 1: RDF Databases

RDF Databases

By:

Chris Halaschek

Page 2: RDF Databases

Outline

Motivation / Requirements Storage Issues Sesame

General Introduction Architecture Scalability RQL Introduction Demo

Future Directions

Page 3: RDF Databases

Motivation

Having metadata available is not enough Need tools to process, transform, and reason with

the information Need a way to store the metadata and

interact with it

Page 4: RDF Databases

Requirements

Scalable Good performance Useful query language

Page 5: RDF Databases

Storage Issues

How to store the data? In relational database as tables

Querying requires many joins…costly Triples Native graph structure

Querying requires graph traversals…need efficient algorithms

Page 6: RDF Databases

Sesame - Introduction

Open source RDF Schema-based repository and querying facility

Developed as a research prototype by Aidministrator Nederland bv

NLnet Foundation sponsors its further development as open source software

Page 7: RDF Databases

Sesame - Introduction

Can handle RDF data in XML-serialized RDF and N-Triples format

Can extract the contents of a Sesame repository in XML-serialized RDF, N-Triples, and N3 format

Page 8: RDF Databases

Sesame – Architecture

Page 9: RDF Databases

Repository

Many options due to Repository Abstraction Layer (RAL) DBMS – relational, object-relational, etc Existing RDF stores RDF files RDF network services

Page 10: RDF Databases

Repository Abstraction Layer (RAL)

Interface that translates RDF-specific methods to a specific DBMS

Defined by an RDF API Created their own set of interfaces rather than

adopt or extent the existing RDF API proposal Existing API targeted main memory model Theirs offers specific operations that support RDF

Schema semantics (i.e. subsumption reasoning)

Page 11: RDF Databases

RAL Continued

Several of Sesame’s functional modules are clients of the RAL

Problems: Must read from repository – performance

decrease Solution – selectively caching data in memory For small repositories, all data can be cached

Page 12: RDF Databases

Functional Modules

Interact with RAL RQL query module

Evaluates RQL queries RDF administration module

Allows uploading RDF data and schema information, as well as deleting information

RDF export module Allows extraction of schema and/or data from

repository

Page 13: RDF Databases

RQL Query Module

Proposed RQL: Developed within the European IST project C-Web Follow-up project by ICS at FORTH, in Greece Adopts the syntax of OQL

Sesame’s implementation of RQL is slightly different from the proposed RQL

Better compliance to W3C specificaitons Support for optional domain and range restrictions

Queries are translated into sets of call to the RAL Note: Also supports RDQL – based on SquishQL

Page 14: RDF Databases

RQL Query Module

Page 15: RDF Databases

Admin Module

Main functions: Add RDF data/schema information Clear repository

Retrieves information from an RDF(s) source and parses it using SiRPAC RDF parser

Parser delivers information to admin module in statement form – (S,P,O)

Module check statements for consistency and then inserts data

Page 16: RDF Databases

RDF Export Module

Exports the contents of a repository formatted in XML-serialized RDF

Supplies a basis for using Sesame in combination with other RDF tools

Page 17: RDF Databases

Communication with Sesame

Multiple options for various contexts HTTP RMI SOAP

Intermediaries between the functional modules and their clients

Page 18: RDF Databases

Sesame – Architecture

Page 19: RDF Databases

Sesame - Scalability

Performance Tests Uploaded and queried collection of nouns from

Wordnet – 400,000 RDF statements Performed on Sun UltraSPARC 5, 256 MB RAM Used Java Servlets running on web server to

communicate of HTTP PostgreSQL version 7.1.2 repository

Page 20: RDF Databases

Scalability Continued

Uploading nouns 94 minutes 71 statements per second

Querying was much slower than expected Due to distributed storage over multiple tables

Retrieving data required doing many joins

Page 21: RDF Databases

Sesame’s Future

Migration of Sesame to alternate repositories to boost performance

DAML + OIL support

Page 22: RDF Databases

RQL Introduction

Museum schema example

Page 23: RDF Databases

RQL - Syntax

Query typically built upon three clauses Select

Projection over query results From

Bind variables to specific locations in graph model Where

Optional – constraint on values of variables in the from clause

Page 24: RDF Databases

RQL - Example

select X, @P

from {X} @P {Y}

where Y like "Pablo"

x and y are bound to nodes @P bound to a connecting edge - @ prefix signifies the

variable is bound to properties $ prefix signifies classes http://sesame.aidministrator.nl/sesame/

actionFrameset.jsp?repository=museum

Page 25: RDF Databases

RQL - Namespaces

In RDF, nodes and edges are identified by URIs

Can be very long Namespace abbreviation mechanism

Extra clause using namespace

cult = http://www.icom.com/schema.rdf# Simply type: cult:paints

Page 26: RDF Databases

RQL – Path Expressions

Specify a linear path through the graph

select PAINTER, PAINTING, TECH

from {PAINTER} cult:paints {PAINTING}. cult:technique {TECH}

using namespace cult = http://www.icom.com/schema.rdf#

http://sesame.aidministrator.nl/sesame/actionFrameset.jsp?repository=museum

Page 27: RDF Databases

RQL – Querying Schema

Retrieving the class of a resourceselect X, $X, Y

from {X : $X} cult:paints {Y}

using namespace cult = http://www.icom.com/schema.rdf#

Variable $X is matched to the class of the resource value of X

http://sesame.aidministrator.nl/sesame/actionFrameset.jsp?repository=museum

Page 28: RDF Databases

RQL – Querying Schema

Constraining resources to a schemaselect X, Y

from {X : cult:Cubist } cult:paints {Y}

using namespace cult = http://www.icom.com/schema.rdf#

Page 29: RDF Databases

RQL – Standard Functions

Class (also Property) subClassOf (also subProperyOf) typeOf In all above use ^ for only direct descendents

(i.e. subClassOf^( cult:Painter ) )

Page 30: RDF Databases

RQL – subClassOf

Example:

select X, @P, Y

from {X} @P {Y}

where X in subClassOf^( cult:Painter )

using namespace cult = http://www.icom.com/schema.rdf#

Page 31: RDF Databases

RQL – Advanced Queries

Set Operators Union, Intersection, Difference

Logical Operators Domain and Range Constraints Comprehensive List:

http://sesame.aidministrator.nl/publications/rql-tutorial.html

Page 32: RDF Databases

Future of RDF Databases

Standard query language Improved storage structures

Native graph model

Page 33: RDF Databases

References / Links

Sesame: http://sesame.aidministrator.nl/

NLnet Foundation: http://www.nlnet.nl/

Original Specifications of RQL: http://139.91.183.30:9090/RDF/RQL