Upload
philomena-jenkins
View
229
Download
0
Tags:
Embed Size (px)
Citation preview
RDF: 1999RDFS: 2004
SKOS first version: 2008
SPARQL 1.0: 2008
SPARQL 1.1: 2013
N3 first version: 2008
DAML started: 2000
OWL: 20042009 OWL 2
2012 GeoSPARQL
JSON-LD: 2014
A Little Bit History
Data Engineering
Programming
Programming the Semantic Web
Linyun FuJune 3, 2014
Adapted from Steffen Staab’s keynote at ESWC 2014
This presentation contains program code, engineering, speculation and worse.It may be viewed as offensive by some viewers.
Semantic Web ProgrammingLinked Data
SPARQL endpoint
RDF file Hypothesis:Semantic Web data is wonderful, but programming with Semantic Web has changed too little since 2000 and still is a mess.
We promise flexibility, but code still hard to
maintain.
Ontology
Linked Data
SPARQL endpoint
RDF file
„Inside“Data Mgmt
„Outside“
Data Mgmt
Eclipse
Visual Studio
....
Ontology
These is a mismatch between data engineering and programming approaches
„Inside“Data Mgmt
„Outside“
Data Mgmt
Eclipse
Visual Studio
....
„Outside“ Understanding
by the developer „Search + Code“ „Browse + Code“
Triple-object mapping Code generation Process of federation
LD vs endpoint Models of federation
Abstraction layers that facilitate a developer‘s
life
searching, finding, reusing all the complex data strcutres that we have
Programming with Data: What does it cost?
Ctotal = t*Ctool + d*t*Clearn + s*Cdeu + s*Cmap + n*Ccode
Ctool : Costs for building t many tools, shared; almost free
Clearn: Costs for learning how to use technology per developer dCdeu: Costs for data engineering/understanding s sources
Cmap : Costs for mapping data structure for s sources to objects
Ccode : Actual costs for accessing/manipulating data n times
„Inside“Both „Outside“
SWOT of Semantic Web Programming
Ctotal = t*Ctool + d*t*Clearn + s*Cdeu + s*Cmap + n*Ccode
threat
opportunity
opportunity
weakness
Not a strength, yet!
strength/weakness
Strong in flexibilitySomewhat weak in
performance
„Outside“
as good as RelDB is not good enough!
Intermediate conclusionMinimize costs for setup:Clearn: Costs for learning how to use technology per developer dCdeu: Costs for data engineering/understanding s sources
Cmap : Costs for mapping data structure for s sources to objects
Minimize costs for core programming:Ccode : Actual costs for accessing/manipulating data n times
Costs for learning and understanding constitute a threat!
Need to be overcome!
We need flexible code to match flexible data structures
• i.e., Domain-specific languages for Semantic Web Programming
• XML programming example• Why Jena is not good enough
XML programming example: LINQ to XML
XElement contacts = new XElement("Contacts", new XElement("Contact", new XElement("Name", "Patrick Hines"), new XElement("Phone", "206-555-0144"), new XElement("Address", new XElement("Street1", "123 Main St"), new XElement("City", "Mercer Island"), new XElement("State", "WA"), new XElement("Postal", "68042") ) ) );
From artists to songs
Observations• SPARQL queries are strings• Results are strings• Requires good understanding of the data source
RDF Typing is lost
Related Work on RDF AccessStatic Typing Errors detected before
execution Misspelling discovered by
compiler! Anectode: 2nd place
because of misspelt var. Static types are form of
documentation Less knowledge about data
source required
• Better IDE integration / autocompletion
Code generation• Sommer• Winter• OntoMDEDynamic Typing E.g. ActiveRDF
(Oren et al 2007)) “convention over
configuration” dynamic
metaprogramming allows for slick code
Criticism
Node Path Query Language Using Autocompletion
Exploration of classes
Exploration of relations
Querying for instances
Node Path Query Language: Query Formulation
Exploration of classesExploration of relationsQuerying for instances
Type set of mo:MusicArtist
No definition or declaration needed
Node Path Query Language for Code Development
Exploration of classesExploration of relationsQuerying for instancesDeveloping code with queries
All translated into SPARQL queries at• Development time• Type inference at compile time
(but also as part of IDE)• Querying again at run time
One language to bind them all
Node Path Query Language for Code Development
Exploration of classesExploration of relationsQuerying for instancesDeveloping code with queriesDeveloping code with new classes
All translated into SPARQL queries at• Development time• Run time update• Persistence!
NPQL
NPQL (Node Path Query Language)• Intensional Queries
Describing RDF classes and properties for reuse in IDE and in host language metaprogramming
• Extensional Queries Class instances and property instances
• Compilation to SPARQL for reuse of existing endpoints
Ongoing discussion about details of NPQL
LITEQ
NPQL (Node Path Query Language)• Intensional Queries• Extensional Queries• Compilation to SPARQL
LITEQ (Language Integrated Types, Extensions and Queries) • Implementation of NPQL as F# Type Provider in Visual Studio• Autocompletion using NPQL queries• Automatic typing
of extensional query resultsby intensional queries
Cost savings
Ctotal = t*Ctool + d*t*Clearn + s*Cdeu + s*Cmap + n*Ccode
Ctool : open source
Clearn: not free – though autocompletion reduces cognitive load
Cdeu: not free – understanding the RDF schema from your IDE
Cmap : 0
Ccode : a lot less than for dotNet RDF (Apache Jena?!!)little bit more than for a fictitious perfect object model
Halstead metrics for different tasks:
Conventional Semantic Web programming approaches waste up to 50% of your efforts!
Speculation 1
Ctotal
n (as in n*Ccode )
SemWeb coding efforts
RelDB/XML coding efforts
Applies to small n
Diff
. cos
ts fo
r le
arni
ng
Speculation 1: Using ontologies and RDF schemata, we can develop more efficiently using
the right tools!
Halstead metrics for different tasks:
If someone gives me a perfect RDF-to-OO mapping for freethen I will not care about whether it is RDF or RelDB
underneath!
Speculation 2
Ctotal
n (as in n*Ccode )
SemWeb coding efforts
RelDB/XML coding efforts
„Perfect“ object model shields
developer from database
ideosyncracies
Diff
. cos
ts fo
r se
tup
Speculation 2: For large programmes, our tools need to offer better support to reduce setup
costs!
Semantic Web: Make Developers More Productive
Linked Data
SPARQL endpoint
RDF file
„Inside“Data Mgmt
„Outside“
Data Mgmt
Eclipse
Visual Studio
....
Ontology
What I Think
• New programming languages vs. code generation + existing PLs– Mismatches between RDF and OO: Oren et al 2007,
Saathoff et al 2009– Only some ontologies can be perfectly mapped to OO
classes– A killer PL to come
• “Data programmability”– Model– Format– …?
References• C. Saathoff, S. Scheglmann, S. Schenk. Winter: Mapping
RDF to POJOs revisited.• E. Oren, R. Delbru, S. Gerke, A. Haller, S. Decker: ActiveRDF:
object-oriented semantic web programming. WWW 2007: 817-824
• S. Scheglmann, A. Scherp, S. Staab. Declarative Representation of Programming Access to Ontologies. In: 9th Extended Semantic Web Conference (ESWC2012), Heraklion, Greece, May 27-31, 2012.
• W. Cook, A. Ibrahim. Integrating Programming Languages & Databases: What’s the Problem?? http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.66.7169&rep=rep1&type=pdf