RDF Databases

Preview:

DESCRIPTION

RDF Databases. By: Chris Halaschek. Outline. Motivation / Requirements Storage Issues Sesame General Introduction Architecture Scalability RQL Introduction Demo Future Directions. Motivation. Having metadata available is not enough - PowerPoint PPT Presentation

Citation preview

RDF Databases

By:

Chris Halaschek

Outline

Motivation / Requirements Storage Issues Sesame

General Introduction Architecture Scalability RQL Introduction Demo

Future Directions

Motivation

Having metadata available is not enough Need tools to process, transform, and reason with

the information Need a way to store the metadata and

interact with it

Requirements

Scalable Good performance Useful query language

Storage Issues

How to store the data? In relational database as tables

Querying requires many joins…costly Triples Native graph structure

Querying requires graph traversals…need efficient algorithms

Sesame - Introduction

Open source RDF Schema-based repository and querying facility

Developed as a research prototype by Aidministrator Nederland bv

NLnet Foundation sponsors its further development as open source software

Sesame - Introduction

Can handle RDF data in XML-serialized RDF and N-Triples format

Can extract the contents of a Sesame repository in XML-serialized RDF, N-Triples, and N3 format

Sesame – Architecture

Repository

Many options due to Repository Abstraction Layer (RAL) DBMS – relational, object-relational, etc Existing RDF stores RDF files RDF network services

Repository Abstraction Layer (RAL)

Interface that translates RDF-specific methods to a specific DBMS

Defined by an RDF API Created their own set of interfaces rather than

adopt or extent the existing RDF API proposal Existing API targeted main memory model Theirs offers specific operations that support RDF

Schema semantics (i.e. subsumption reasoning)

RAL Continued

Several of Sesame’s functional modules are clients of the RAL

Problems: Must read from repository – performance

decrease Solution – selectively caching data in memory For small repositories, all data can be cached

Functional Modules

Interact with RAL RQL query module

Evaluates RQL queries RDF administration module

Allows uploading RDF data and schema information, as well as deleting information

RDF export module Allows extraction of schema and/or data from

repository

RQL Query Module

Proposed RQL: Developed within the European IST project C-Web Follow-up project by ICS at FORTH, in Greece Adopts the syntax of OQL

Sesame’s implementation of RQL is slightly different from the proposed RQL

Better compliance to W3C specificaitons Support for optional domain and range restrictions

Queries are translated into sets of call to the RAL Note: Also supports RDQL – based on SquishQL

RQL Query Module

Admin Module

Main functions: Add RDF data/schema information Clear repository

Retrieves information from an RDF(s) source and parses it using SiRPAC RDF parser

Parser delivers information to admin module in statement form – (S,P,O)

Module check statements for consistency and then inserts data

RDF Export Module

Exports the contents of a repository formatted in XML-serialized RDF

Supplies a basis for using Sesame in combination with other RDF tools

Communication with Sesame

Multiple options for various contexts HTTP RMI SOAP

Intermediaries between the functional modules and their clients

Sesame – Architecture

Sesame - Scalability

Performance Tests Uploaded and queried collection of nouns from

Wordnet – 400,000 RDF statements Performed on Sun UltraSPARC 5, 256 MB RAM Used Java Servlets running on web server to

communicate of HTTP PostgreSQL version 7.1.2 repository

Scalability Continued

Uploading nouns 94 minutes 71 statements per second

Querying was much slower than expected Due to distributed storage over multiple tables

Retrieving data required doing many joins

Sesame’s Future

Migration of Sesame to alternate repositories to boost performance

DAML + OIL support

RQL Introduction

Museum schema example

RQL - Syntax

Query typically built upon three clauses Select

Projection over query results From

Bind variables to specific locations in graph model Where

Optional – constraint on values of variables in the from clause

RQL - Example

select X, @P

from {X} @P {Y}

where Y like "Pablo"

x and y are bound to nodes @P bound to a connecting edge - @ prefix signifies the

variable is bound to properties $ prefix signifies classes http://sesame.aidministrator.nl/sesame/

actionFrameset.jsp?repository=museum

RQL - Namespaces

In RDF, nodes and edges are identified by URIs

Can be very long Namespace abbreviation mechanism

Extra clause using namespace

cult = http://www.icom.com/schema.rdf# Simply type: cult:paints

RQL – Path Expressions

Specify a linear path through the graph

select PAINTER, PAINTING, TECH

from {PAINTER} cult:paints {PAINTING}. cult:technique {TECH}

using namespace cult = http://www.icom.com/schema.rdf#

http://sesame.aidministrator.nl/sesame/actionFrameset.jsp?repository=museum

RQL – Querying Schema

Retrieving the class of a resourceselect X, $X, Y

from {X : $X} cult:paints {Y}

using namespace cult = http://www.icom.com/schema.rdf#

Variable $X is matched to the class of the resource value of X

http://sesame.aidministrator.nl/sesame/actionFrameset.jsp?repository=museum

RQL – Querying Schema

Constraining resources to a schemaselect X, Y

from {X : cult:Cubist } cult:paints {Y}

using namespace cult = http://www.icom.com/schema.rdf#

RQL – Standard Functions

Class (also Property) subClassOf (also subProperyOf) typeOf In all above use ^ for only direct descendents

(i.e. subClassOf^( cult:Painter ) )

RQL – subClassOf

Example:

select X, @P, Y

from {X} @P {Y}

where X in subClassOf^( cult:Painter )

using namespace cult = http://www.icom.com/schema.rdf#

RQL – Advanced Queries

Set Operators Union, Intersection, Difference

Logical Operators Domain and Range Constraints Comprehensive List:

http://sesame.aidministrator.nl/publications/rql-tutorial.html

Future of RDF Databases

Standard query language Improved storage structures

Native graph model

References / Links

Sesame: http://sesame.aidministrator.nl/

NLnet Foundation: http://www.nlnet.nl/

Original Specifications of RQL: http://139.91.183.30:9090/RDF/RQL

Recommended