Upload
phungdiep
View
218
Download
0
Embed Size (px)
Citation preview
Back to the Future –Should SQL Surrender to SPARQL?
Rainer Manthey
© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 1
How to Communicate with Databases?
© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 2
From: http://www.intsolgrp.com/
?
Communicating with Google: Our Everyday Experience
© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 3
Our request: A line of symbols
Google‘sanswer:139 Mill. Links
( … If Google is/has/uses a database?! )
Asking a Relational Database: More Complex, More Goal-Directed
© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 4
From: technet.microsoft.com
Our request:An SQL Query
The DB‘s answer:A table with data rows
Reminder of Basic Terminology: DBS = DBMS + n*DB
© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 5
Database Management SystemDBMS
DB Database
Database System
DB
DBS
Basics (2): Query Language and Query Manager
© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 6
DBMS
DBDB
DBS
Query (= declarative program)
Query LanguageInterpreter
Relational Databases and SQL Systems: A Multi-Billion Dollar Market
© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 7
• 1970: Proposal of the Relational Model of Data (RM) by Edward Codd• 1974: Design of SQL by Chamberlin/Boyce started• 1979: First commercial SQL DBMS (Oracle 2)• 1986: First SQL standard
RM/SQL: A more than 30 years success story . . . Up till now?
SQL: End of an Era?
© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 8
NoSQL takes the database market by storm NoSQL takes the database market by storm
Is it the end of the line for SQL?Is it the end of the line for SQL?
Are SQL Databases Dead?Are SQL Databases Dead?
The relational model is dead, SQL is dead,and I don’t feel so good myselfThe relational model is dead, SQL is dead,and I don’t feel so good myself
History Repeats Itself: Sensible and NonsenSQL Aspects of the NoSQL Hoopla
History Repeats Itself: Sensible and NonsenSQL Aspects of the NoSQL Hoopla
From: http://crossfitlittleton.net
The Semantic Web Dream: SPARQL‘s Vision and Goal
© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 10
“ I have a dreamfor the Web [in which computers] become capableof analyzing all the data on the Web – the content, links, andtransactions between people and computers.
A "Semantic Web",which makes this possible, has yet to emerge,but when it does, the day-to-day mechanisms of trade, bureaucracyand our daily lives will be handled by machines talking to machines.The "intelligent agents" people have touted for ages will finallymaterialize.”
© 2014 by LyonLabs, LLC and Barrett Lyon
From: „Weaving the Web“ (1999)Sir Timothy Berners-Lee
Restriction in this Talk: No Distributed Data Management!
© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 13
SPARQL:
• Designed for managingdata over „the semanticweb“
• Navigation in distributeddata (re)sources is big issue
• IRIs as identifiers for such(re)sources used intensively
• At the same time able tomanage data without a web.
In this presentation:All web-related aspects in SPARQL ignored, as SQL has not been made for this context.
On Syntax: Triples vs. Tuples
© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 14
SQL SPARQL
RM RDF
Query Language
based on
Data Model
• Goal of this contribution:
Compare SQL and SPARQL wrt to their data management capacities only !
• Therefore: First look at theunderlying data modelsof the two languages!• RM: Tables of rows and columns (or: relations as sets of tuples)• RDF: Datasets consisting of triples
RDF: The (Only?) Data Model for the Semantic Web
© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 15
“RDF (Resource Description Framework) is one of the three foundational Semantic Web technologies, the other two being SPARQL and OWL.
In particular, RDF is the data model of the Semantic Web. That means that all data in Semantic Web technologies is represented as RDF.
If you store Semantic Web data, it's in RDF. If you query Semantic Web data (typically using SPARQL), it's RDF data. If you send Semantic Web data to your friend, it's RDF.”
http://www.cambridgesemantics.com/semantic-university/rdf-101
RDF Data: Graphs or Triples?
© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 16
From: http://www.openarchives.org/ore/1.0/primer
Resource
Literal
Graph Representation
TripleRepresentation
RDF Datasets Are Relations
© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 17
Some quite obvious observations:
• Every RDF triple can be perceived as a relational tuple.• Every RDF datasetcan be perceived as a relational table.• Every RDF dataset has the same attributes: S, P, O
⇒We could accomodate every RDF database in a RM database!
(If we wanted to do so!)
Relational Tables Represented in RDF
© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 18
A B C D E. . .1 a 23 Jim 4.5 . . .
T
T as an RDF dataset
S P O
Primary key attribute
T as an RM table
T
. . .1 B a1 C 231 D Jim1 E 4.5. . .
• N-ary tuple into n-1 triples• Attributes into predicate values,
i.e., meta-data into data
Tuples as Graphs, RM DBs as Graph Databases
© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 19
A B C D E. . .1 a 23 Jim 4.5 . . .
T
1
a 23
Jim
4.5 BC D
ETuple in serializednotation
Tuple in graphicalnotation
• Tables in RM can represent graphs as easily as RDF datasets.• No need to introduce a new data model for „graph-structured data“.• RM databases are graph databases.
RM vs. RDF: Brief Summary
© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 20
• Triplesare specialtuples.• Uniform length 3, representing SPO statements• SO: „Things“, P: „Relationships“
• Tuplescan be turned into sets oftriples(systematically):• Provided they have a unary primary key!• Attributes are turned into data: become queryable!
• Datasetsare specialtables.• Tablescan be turned intodatasets.
RM and RDF are (in principle) equally expressive.
SQL Basics (1)
© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 21
A B C D E
x
SELECT B, EFROM TWHERE A = 7
7 a 123 eg 2.1
. . .
. . .
SELECT x.B, x.EFROM T AS xWHERE x.A = 7
T
in full syntax:
• x: tuple variable• Attributes: Functions
• Written in postfix notation, e.g., x.A• Applied to each tuple in turn
Table:
SQL query
SQL Basics (2)
© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 22
A B C D E
x . . a 123 . . . .
. . .
. . .
SELECT x.B, y.EFROM T AS x, T AS yWHERE x.C = y.A
T
. . . . 123 . . 3.4. . .
y
In SQL: Tuples from different tables (or copies of a table)are linked byexplicit comparisonsof attributefrom both tuples,
SPARQL Basics
© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 23
SELECT ?x ?yFROM TWHERE {?x 2 ?z.
?z ?y a. }
SPARQL QueryS P O
?x
. . 2 123
. . .
. . .
T
123 . . a. . .
?y?z
• ?x, ?y, ?z: triplecomponentvariables• Each triple represented by a single
(triple) pattern in the WHERE part• Positional syntax, not attributes as selectors
Common Query Processing Paradigm
© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 24
Query
FROMWHERE
SELECT
• Common principle: Sets of data elements (triple, tuple) as both, input and output
• Difference:• In SQL: Both input and output are tables, output to be always
used as further input –algebraic composition possible• In SPARQL: Output is not necessarily consisting of triples,
thusno composition possible
Datalog: SQL‘s (Relatively) Unknown Brother
© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 25
• SQL is (was) not the only relational query language, e.g.:• Theoretical languages: calculus-based (TRC, DRC), relational algebra (RA)• Early languages: QUEL, Query-by-Example
• Nearly as old as SQL (developed in the 1970s/80s):
Datalog (Database + PROLOG)
• Syntactically: Like pure PROLOG (facts and rules, goals as queries)• Semantically: Like SQL (set-oriented evaluation, no backtracking)
• In Style:• Datalog : Minimalistic, purely symbolic (mathematical)• SQL: Verbose, rich of variants, English keywords (user-friendly?)
• In science: Quite successful for understanding complex problems (e.g., recursion)• Commercially: Completely „irrelevant“, no Datalog DBMS product ever• Datalog was never standardized: Free for scientific experiments
• Datalog is (at least) as expressive as SQL, if equipped with the same built-ins.
SQL and Datalog: Two Different Lingustic Approaches to Querying
© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 26
p(X,Y) ← t(X,2,Z), t(Z, Y, a).
CREATE VIEW p ASSELECT x.A, y.BFROM t AS x, t AS yWHERE x.B=2 AND y.C=a AND x.C=y.A
Datalog rule
SQLview
• Based on DRC: Domain Relational Calculus• Variables represent individual tuplecomponents.• No attributes necessary! • Strictly symbolic style
• Based on TRC: Tuple Relational Calculus• Variables represent entiretuples.• Tuple components accessed via attributes! • Keyword-based style („verbose“)
SPARQL and Datalog: Two Real „Brother“ Languages
© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 27
SELECT ?x ?yFROM tWHERE {?x 2 ?z. ?z ?y a. FILTER ?y > ?z}
{ (X,Y) : t(X, 2, Z) , t(Z, Y, a) , Y > Z } ?
• Obviously verysimilarbasicprinciple!• In both languages: Variablesrepresentcomponentsof tuples/triples• Literals in Datalog = Triple patterns in SPARQL• More than one literal/triple pattern connectedconjunctively(AND)• Identity conditions expressed indirectly in literals/triple patterns
• Constant values appearing on suitable position• Identity of values in different position: same variable
SPARQL query
Datalog query
SPARQL and SQL: Quite Unrelated (Except on the Surface)
© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 28
SELECT ?x ?yFROM tWHERE {?x 2 ?z. ?z ?y a. FILTER ?y > ?z}
{ (X,Y) : t(X, 2, Z) , t(Z, Y, a) , Y > Z } ?
SPARQL query
Datalog query
In comparison: SQL is very different in „philosophy“ and style from both of these!
SELECT t1.A, t2.BFROM T AS t1, T AS t2WHERE t1.B = 2 AND t1.C = t2.A AND
t2.C = a AND t2.B > t2.A
SQL query
SQL and SPARQL: A Brief Summary of Additional Complex Features
© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 29
• SQL: More complex queriesconstructed by . . .• Combining SELECT-FROM-WHERE blocks
usingset operatorsUNION, INTERSECT, MINUS• NestingSFW-blocks (using EXISTS quantifier in WHERE conditions)• Explicit propositional operators AND, OR, NOT• Aggregate functions (e.g., COUNT, AVG) and GROUP BY• Ordering of query results: ORDER BY
• SPARQL: • UNION operator available for merging patterns in WHERE parts• No other set operators, no combination of several queries• EXISTSoperator since SPARQL 1.1 for nested patterns in FILTER• Boolean operators only in special situations• Aggregation as in SQL since SPARQL 1.1• ORDER BY as in SQL
SPARQL stepwise enhanced with other SQL keywords
© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 30
Back to the Future?
• Successful Science Fiction movie from 1985• Crazy inventor tries to do time travels using
a futuristic high-tech car• Reaches the past (1955), aiming at the future• 30 years back (like SPARQL to SQL)
As far as data management is concerned, SPARQL seems to be a step back in time.
SQL and (even more) Datalog aretoo close,but hiddenby idiosyncratic new syntax detailsand by IRIs around everywhere.
Conclusion: A Tale of Two Languages
© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 31
Should SQL Surrender to SPARQL?
• As far as database management is concerned: Certainly not!• Both, RM and RDF, are very close in style and equally expressive.• SPARQL cannot really claim any advantage wrt graph databases.• The two languages have more commonalities than differences.• Superiority on the SPARQL side is not really visible.• Surprising: SPARQL is much closer to Datalog than to SQL!
• As far as „serving the web“ is concerned: No competition by SQL (yet)!
Some (more) personal opinions:
• SPARQL‘s style („look and feel“) is consequent in some aspects,but appears to me quite ugly and overblown otherwise.
• The documentation of SPARQL & Co by W3C is hard to „digest“.• The „propaganda“ for SPARQL by the „Semantic Web Movement“
is making fair comparisons hard.• Whether the SQL vendors will again be able to „swallow“ a competitor
this time remains to be seen . . . I have my doubts.