Accessing Existing Distributed Science Archives as RDF Models

Alasdair J G Gray1

Norman Gray2

Iadh Ounis1

1Computing Science, University of Glasgow2Physics and Astronomy, University of Leicester

Outline

• Motivating science problems• A data integration approach• RDF and SPARQL• Extracting scientific data• Performance results• Conclusions

Searching for Brown Dwarfs

• Data sets:– Near Infrared, 2MASS/UK Infrared Deep Sky

Survey– Optical, APMCAT/Sloan Digital Sky Survey

• Complex colour/motion selection criteria• Similar problems– Halo White Dwarfs

Deep Field Surveys

• Observations in multiple wavelengths– Radio to X-Ray

• Searching for new objects– Galaxies, stars, etc

• Requires correlations across many catalogues– ISO– Hubble– SCUBA– etc

The Problem

Locate and combine relevant data

• Heterogeneous publishers– Archive centres– Research labs

• Heterogeneous data– Relational– XML– Files

Virtual Observatory

Generic Data Integration Approach

• Heterogeneous sources– Autonomous – Local schemas

• Homogeneous view– Mediated global schema

• Mapping– LAV: local-as-view– GAV: global-as-view

Global Schema

Query1 Queryn

Wrapper1

Wrapperk

Wrapperi

Mappings

P2P Data Integration Approach

• Heterogeneous sources– Autonomous – Local schemas

• Heterogeneous views– Multiple schemas

• Mapping– Between pairs of schema– Network of links

• Require common integration data model

Schema1

Wrapper1

Wrapperk

Wrapperi

Schemaj

Mappings

Query1 Queryn

Resource Description Framework (RDF)

• W3C standard• Designed as a metadata

data model• Make statements about

resources• Contains semantic

details• Ideal for linking

distributed data

#foundIn

The Sun#name

#MilkyWay

Milky Way#name

The Galaxy#name

IAU:Starrdf:type

IAU:BarredSpiral

rdf:type

Reasoning

• Infers knowledge from RDF(S) statements– Sub classing

• OWL: extends what can be expressed– Inverse predicates– Equivalence– etc

IAU:Starrdf:type

The Sun#name

#MilkyWay

Milky Way#name

#foundInThe Galaxy#name

IAU:BarredSpiral

rdf:type

IAU:Galaxyrdfs:subClassOf

rdf:type

#contains

SPARQL

• Declarative query language– Select returned data• Graph or tuples• Attributes to return

– Describe structure of desired results– Filter data

• W3C standard• Syntactically similar to SQL

Find the name of the galaxy which contains a star with the name The Sun

SELECT ?galNameWHERE {?gal a IAU:Galaxy ; #name ?galName .?star a IAU:Star ; #name ?starName ; #foundIn ?gal .FILTER REGEX(?starName, “The Sun”)

Querying RDF with SPARQL

IAU:Starrdf:type

The Sun#name

#MilkyWay

Milky Way#name

IAU:BarredSpiral

rdf:type

Find the name of the galaxy which contains a star with the name The Sun

Querying RDF with SPARQL

IAU:Starrdf:type

The Sun#name

#MilkyWay

Milky Way#name

IAU:BarredSpiral

rdf:type

?galName

The Galaxy

Milky Way

Integrating Using RDF

• Data resources– Expose schema and data

as RDF– Need a SPARQL endpoint

• Allows multiple – Access models– Storage models

• Easy to relate data from multiple sources

Relational DB

RDF / Relational

Conversion

XML DB

RDF / XML Conversion

Common Model (RDF)

Mappings

SPARQLquery

Accessing Relational Sources as RDF

Data Dump• Data stored as RDF

– Original relational source is replicated

– Data can become stale• Native SPARQL query

support• Existing RDF stores

– Jena– Seasame

On-the-fly Translation• Data stored as relations• Native SQL support

– Highly optimised access methods

• SPARQL queries must be translated

• Existing translation systems– D2RQ/D2R Server– SquirrelRDF

System Hypothesis

It is viable to perform on-the-fly conversions from existing science archives to RDF to facilitate data access from a data model that a scientist is familiar with

Relational DB

RDF / Relational

Conversion

XML DB

RDF / XML Conversion

Common Model (RDF)

Mappings

SPARQLquery

Test Data

• SuperCOSMOS Science Archive (SSA)– Data extracted from scans of Schmidt plates– Stored in a relational database– About 4TB of data, detailing 6.4 billion objects– Fairly typical of astronomical data archives

• Schema designed using 20 real queries• Personal version contains – Data for a specific region of the sky– About 0.1% of the data, ~500MB

Analysis of Test Data

• About 500MB in size• Organised in 14 Relations– Number of attributes: 2 – 152• 4 relations with more than 20 attributes

– Number of rows: 3 – 585,560– Two views• Complex selection criteria in view

Real Science Queries

Query 5Find the positions and (B,R,I) magnitudes of all star-like objects within delta mag of 0.2 of the colours of a quasar of redshift 2.5 < z < 3.5

SELECT TOP 30 ra, dec, sCorMagB, sCorMagR2, sCorMagI

FROM ReliableStarsWHERE (sCorMagB-sCorMagR2 BETWEEN 0.05 AND 0.80) AND (sCorMagR2-sCorMagI BETWEEN -0.17 AND 0.64)

Analysis of Test QueriesQuery Feature Query Numbers

Arithmetic in body 1-5, 7, 9, 12, 13, 15-20

Arithmetic in head 7-9, 12, 13

Ordering 1-8, 10-17, 19, 20

Joins (including self-joins) 12-17, 19

Range functions (e.g. Between, ABS) 2, 3, 5, 8, 12, 13, 15, 17-20

Aggregate functions (including Group By) 7-9, 18

Math functions (e.g. power, log, root) 4, 9, 16

Trigonometry functions 8, 12

Negated sub-query 18, 20

Type casting (e.g. Radians to degrees) 7, 8, 12

Server functions 10, 11

Expressivity of SPARQL

Features• Select-project-join• Arithmetic in body• Conjunction and disjunction• Ordering• String matching• External function calls

(extension mechanism)

Limitations• Range shorthands• Arithmetic in head• Math functions• Trigonometry functions• Sub queries• Aggregate functions• Casting

Analysis of Test QueriesQuery Feature Query Numbers

Arithmetic in body 1-5, 7, 9, 12, 13, 15-20

Arithmetic in head 7-9, 12, 13

Ordering 1-8, 10-17, 19, 20

Joins (including self-joins) 12-17, 19

Range functions (e.g. Between, ABS) 2, 3, 5, 8, 12, 13, 15, 17-20

Aggregate functions (including Group By) 7-9, 18

Math functions (e.g. power, log, root) 4, 9, 16

Trigonometry functions 8, 12

Negated sub-query 18, 20

Type casting (e.g. radians to degrees) 7, 8, 12

Server functions 10, 11

Expressible queries: 1, 2, 3, 5, 6, 14, 15, 17, 19

Experimental Setup

• Machine– Intel Core 2 Duo 2.4GHz– 2GB RAM– Windows XP– Java 1.5

• Software– MySQL 5.0.51a– D2RQ 0.5.1– SquirrelRDF 0.1

• Only 4 queries completed within 2 hours

Performance Results

A New Approach

• Exploit query engines and data structure of underlying data sources

• Aid user query generation by explaining source data model in terms of known data model

• Data extracted in native model Relational

DB XML DB

SQL XQuery

ExplicatorMappingsModels

Explain

Conclusions

• SPARQL: Not expressive enough for science• Query Converters: Poor performance• Proposed new approach– RDF to understand data models– Native query engines for data extraction

RDF Relational

Ragged data Structured data

Small to medium data volumes

Large data volumes

Reasoning over the data Extracting specific data

Accessing Existing Distributed Science Archives as RDF Models

Documents

RDF Resource Description Framework - Informatica - EN...RDF Resource Description Framework Manuel Fiorelli fiorelli@info.uniroma2.it Important dates for RDF 1999 –RDF is adopted

Practical RDF Chapter 10. Querying RDF: RDF as Data

Assessing the performance of RDF Engines: Discussing RDF Benchmarks

Accessing Exisng Distributed Science Archives as RDF Modelsexplicator.dcs.gla.ac.uk/Publications/ahm2008Presentation.pdf · delta mag of 0.2 of the colours of a quasar of redshi 2.5

RDF: Alternative Fuel tailor-made from Waste · RDF: Alternative Fuel tailor-made from Waste •What is RDF? •Why to develop RDF-routes? •What is the potential of RDF for the

Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –

The Semantic Web — RDF, RDF Schema, and OWL (Part 1) · Mitchell W. Smith — The Semantic Web – RDF, RDF Schema and OWL (Part 1 of 2) Page 3 RDF Introduction "The Semantic Web

RDF - Resource Description Framework and RDF Schema

RDF VISUALIZER - Home | CIDOC CRMcidoc-crm.org/sites/default/files/RDF visualiser.pdf · RDF VISUALIZER 3/4/2017 – 38th CRM-SIG meeting ... Start from an RDF resource ... Provides

Metadata for RDF Statements: The RDF-star Approach

Rdf And Rdf Schema For Ontology Specification

ACCESSING THE HOLDINGS OF THE WORLD BANK GROUP ARCHIVESpubdocs.worldbank.org/en/971781439909784232/... · The World Bank Group Archives The World Bank Group (WBG) Archives stores

WS-DAI RDF(S) Specification Discussion · matchmaker requester SPARQL Access RDF(S) DataAccess Service SPARQL Access RDF(S) DataAccess Service SPARQL Access RDF(S) DataAccess Service

RDF-3X: a RISC-style Engine for RDF,

Accessing RDF(S) data resources in service-based Grid ...The DAIS RDF(S) activity has its origins in several presentations made at the 3rd GGF Semantic Grid Workshop and the DAIS for

Chapter 3 RDF Syntax. RDF Overview RDF Syntax -- the XML encoding RDF Syntax – variations including N3 RDF Schema (RDFS) Semantics of RDF and RDFS – Axiomatic

RDF-Anwendungen: Photo RDF

Why was Asiatic Society of Calcutta stopped from accessing the Russian archives?

RDF as a Universal Healthcare Exchange Languagedbooth.org/2014/rdf-as-univ/rdf-as-univ-slides.pdf · 10 Yosemite Manifesto on RDF as a Universal Healthcare Exchange Language 1. RDF

Web Semântica e Processamento de Linguagem … · node.js RDF-Ext is available on npm, to install it run: npm install rdf-ext In the code, import RDF-Ext: var rdf = require( 'rdf-ext');