33
RDF: 1999 RDFS: 2004 SKOS first version: 2008 SPARQL 1.0: 2008 SPARQL 1.1: 2013 N3 first version: 2008 DAML started: 2000 OWL: 2004 2009 OWL 2 2012 GeoSPARQL JSON-LD: 2014 A Little Bit History Data Engineerin Programming

RDF: 1999 RDFS: 2004 SKOS first version: 2008 SPARQL 1.0: 2008 SPARQL 1.1: 2013 N3 first version: 2008 DAML started: 2000 OWL: 2004 2009 OWL 2 2012 GeoSPARQL

Embed Size (px)

Citation preview

RDF: 1999RDFS: 2004

SKOS first version: 2008

SPARQL 1.0: 2008

SPARQL 1.1: 2013

N3 first version: 2008

DAML started: 2000

OWL: 20042009 OWL 2

2012 GeoSPARQL

JSON-LD: 2014

A Little Bit History

Data Engineering

Programming

Programming the Semantic Web

Linyun FuJune 3, 2014

Adapted from Steffen Staab’s keynote at ESWC 2014

This presentation contains program code, engineering, speculation and worse.It may be viewed as offensive by some viewers.

Semantic Web ProgrammingLinked Data

SPARQL endpoint

RDF file Hypothesis:Semantic Web data is wonderful, but programming with Semantic Web has changed too little since 2000 and still is a mess.

We promise flexibility, but code still hard to

maintain.

Ontology

Linked Data

SPARQL endpoint

RDF file

„Inside“Data Mgmt

„Outside“

Data Mgmt

Eclipse

Visual Studio

....

Ontology

These is a mismatch between data engineering and programming approaches

„Inside“Data Mgmt

„Outside“

Data Mgmt

Eclipse

Visual Studio

....

„Outside“ Understanding

by the developer „Search + Code“ „Browse + Code“

Triple-object mapping Code generation Process of federation

LD vs endpoint Models of federation

Abstraction layers that facilitate a developer‘s

life

searching, finding, reusing all the complex data strcutres that we have

Programming with Data: What does it cost?

Ctotal = t*Ctool + d*t*Clearn + s*Cdeu + s*Cmap + n*Ccode

Ctool : Costs for building t many tools, shared; almost free

Clearn: Costs for learning how to use technology per developer dCdeu: Costs for data engineering/understanding s sources

Cmap : Costs for mapping data structure for s sources to objects

Ccode : Actual costs for accessing/manipulating data n times

„Inside“Both „Outside“

SWOT of Semantic Web Programming

Ctotal = t*Ctool + d*t*Clearn + s*Cdeu + s*Cmap + n*Ccode

threat

opportunity

opportunity

weakness

Not a strength, yet!

strength/weakness

Strong in flexibilitySomewhat weak in

performance

„Outside“

as good as RelDB is not good enough!

Intermediate conclusionMinimize costs for setup:Clearn: Costs for learning how to use technology per developer dCdeu: Costs for data engineering/understanding s sources

Cmap : Costs for mapping data structure for s sources to objects

Minimize costs for core programming:Ccode : Actual costs for accessing/manipulating data n times

Costs for learning and understanding constitute a threat!

Need to be overcome!

We need flexible code to match flexible data structures

• i.e., Domain-specific languages for Semantic Web Programming

• XML programming example• Why Jena is not good enough

XML programming example: LINQ to XML

XElement contacts = new XElement("Contacts", new XElement("Contact", new XElement("Name", "Patrick Hines"), new XElement("Phone", "206-555-0144"), new XElement("Address", new XElement("Street1", "123 Main St"), new XElement("City", "Mercer Island"), new XElement("State", "WA"), new XElement("Postal", "68042") ) ) );

The Jena Approach

Task: List all records for each music artist

The Jamendo ontology

Accessing Artists Using Apache Jena

From artists to songs

Observations• SPARQL queries are strings• Results are strings• Requires good understanding of the data source

RDF Typing is lost

Related Work on RDF AccessStatic Typing Errors detected before

execution Misspelling discovered by

compiler! Anectode: 2nd place

because of misspelt var. Static types are form of

documentation Less knowledge about data

source required

• Better IDE integration / autocompletion

Code generation• Sommer• Winter• OntoMDEDynamic Typing E.g. ActiveRDF

(Oren et al 2007)) “convention over

configuration” dynamic

metaprogramming allows for slick code

Criticism

SEMANTIC WEB PROGRAMMING:

LITEQ – OUR OUTSIDE APPROACH

Programming against Jamendo

c1

Node Path Query Language Using Autocompletion

Exploration of classes

Exploration of relations

Querying for instances

Node Path Query Language Using Autocompletion

Exploration of classesExploration of relations

Node Path Query Language: Query Formulation

Exploration of classesExploration of relationsQuerying for instances

Type set of mo:MusicArtist

No definition or declaration needed

Node Path Query Language for Code Development

Exploration of classesExploration of relationsQuerying for instancesDeveloping code with queries

All translated into SPARQL queries at• Development time• Type inference at compile time

(but also as part of IDE)• Querying again at run time

One language to bind them all

Node Path Query Language for Code Development

Exploration of classesExploration of relationsQuerying for instancesDeveloping code with queriesDeveloping code with new classes

All translated into SPARQL queries at• Development time• Run time update• Persistence!

NPQL

NPQL (Node Path Query Language)• Intensional Queries

Describing RDF classes and properties for reuse in IDE and in host language metaprogramming

• Extensional Queries Class instances and property instances

• Compilation to SPARQL for reuse of existing endpoints

Ongoing discussion about details of NPQL

LITEQ

NPQL (Node Path Query Language)• Intensional Queries• Extensional Queries• Compilation to SPARQL

LITEQ (Language Integrated Types, Extensions and Queries) • Implementation of NPQL as F# Type Provider in Visual Studio• Autocompletion using NPQL queries• Automatic typing

of extensional query resultsby intensional queries

Cost savings

Ctotal = t*Ctool + d*t*Clearn + s*Cdeu + s*Cmap + n*Ccode

Ctool : open source

Clearn: not free – though autocompletion reduces cognitive load

Cdeu: not free – understanding the RDF schema from your IDE

Cmap : 0

Ccode : a lot less than for dotNet RDF (Apache Jena?!!)little bit more than for a fictitious perfect object model

Halstead metrics for different tasks:

Conventional Semantic Web programming approaches waste up to 50% of your efforts!

Speculation 1

Ctotal

n (as in n*Ccode )

SemWeb coding efforts

RelDB/XML coding efforts

Applies to small n

Diff

. cos

ts fo

r le

arni

ng

Speculation 1: Using ontologies and RDF schemata, we can develop more efficiently using

the right tools!

Halstead metrics for different tasks:

If someone gives me a perfect RDF-to-OO mapping for freethen I will not care about whether it is RDF or RelDB

underneath!

Speculation 2

Ctotal

n (as in n*Ccode )

SemWeb coding efforts

RelDB/XML coding efforts

„Perfect“ object model shields

developer from database

ideosyncracies

Diff

. cos

ts fo

r se

tup

Speculation 2: For large programmes, our tools need to offer better support to reduce setup

costs!

CONCLUSION

Semantic Web: Make Developers More Productive

Linked Data

SPARQL endpoint

RDF file

„Inside“Data Mgmt

„Outside“

Data Mgmt

Eclipse

Visual Studio

....

Ontology

What I Think

• New programming languages vs. code generation + existing PLs– Mismatches between RDF and OO: Oren et al 2007,

Saathoff et al 2009– Only some ontologies can be perfectly mapped to OO

classes– A killer PL to come

• “Data programmability”– Model– Format– …?

References• C. Saathoff, S. Scheglmann, S. Schenk. Winter: Mapping

RDF to POJOs revisited.• E. Oren, R. Delbru, S. Gerke, A. Haller, S. Decker: ActiveRDF:

object-oriented semantic web programming. WWW 2007: 817-824

• S. Scheglmann, A. Scherp, S. Staab. Declarative Representation of Programming Access to Ontologies. In: 9th Extended Semantic Web Conference (ESWC2012), Heraklion, Greece, May 27-31, 2012.

• W. Cook, A. Ibrahim. Integrating Programming Languages & Databases: What’s the Problem?? http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.66.7169&rep=rep1&type=pdf