Web Science & Technologies
University of Koblenz ▪ Landau, Germany
Information-Rich Programming in F#
with Semantic Data
Steffen [email protected] 2
WeST
Linked Open Data Cloud
… the Web of Linked Data consisting of more than 30 Billion RDF triples from
hundreds of data sources …
Gerhard WeikumSIGMOD Blog, 6.3.2013http://wp.sigmod.org/
Where’s the Data in the Big Data Wave?
Steffen [email protected] 5
WeST
Example RDF Graph
Native GraphOR
R2RML: RDB to RDF Mapping Language(W3C rec)
Steffen [email protected] 6
WeST
Agenda
SchemEX
Construction of schema-based index
Schema induction
LiteQ – Language integrated types, extensions and queries for RDF graphs
Exploring Programming, Typing
Evaluation of LITEQ (NPQL) against SPARQL
Understandability
Ease of use
Steffen [email protected] 7
WeST
Exploring a data source
Using a data source
Programming against unknown data source
Steffen [email protected] 8
WeST
Example application
• Goal: Application that helps to collect dog license fee• Send Email reminders to dog owners
• Data is given as RDF graph
Steffen [email protected] 9
WeST
Programmer‘s Task 1: Schema Exploration
Schema exploration & Identification of important RDF types• Find RDF types representing dogs and persons
Steffen [email protected] 10
WeST
Naive Approach Task 1: Schema Exploration
Schema exploration & Identification of important RDF types• Find RDF types representing dogs and persons
Tooling for Naïve Approach: SPARQL Query Formulation
Steffen [email protected] 11
WeST
Programmer‘s Task 2: Code Type Creation
Code type creation in host language• Convert the identified dog and person RDF types to
code types in the host language
type exDog(uri) = classinherit exCreature(uri)member this.hasOwner : exPerson
= …member this.TaxNo : Integer = …
endtype exPerson(uri) = class
inherit exCreature(uri)end
type exCreature(uri) = classmember this.hasName : String = …Member this.hasAge : int = …
end
Steffen [email protected] 12
WeST
Programmer‘s Task 3: Data querying
Data querying• Write a query that returns all dog owners
Steffen [email protected] 13
WeST
Naive Approach Task 3: Data querying
Data querying• Write a query that returns all dog owners
Tooling for Naive Approach: SPARQL Query formulation
Steffen [email protected] 14
WeST
Naive Approach Task 4: Object manipulation
Create the objects, manipulate them & make them persistent• Develop functionality around query to send reminder
let queryString = “SELECT ?owner WHERE {?dog rdf:type exDog.?dog ex:hasOwner ?owner
}“
dbConnection.evaluate(queryString) |> Seq.iter ( fun uri ->
let p = new Person(uri)sendReminderEmail(p)
)
Steffen [email protected] 17
WeST
Graph Traversal with NPQL: Subtype Navigation >
rdf:Resource > ex:Creature
NPQL
Steffen [email protected] 18
WeST
ex:Dog
Graph Traversal with NPQL: Property Navigation .
ex:Dog . ex:hasOwner
NPQL
Steffen [email protected] 19
WeST
• Select ex:Dog• Walk through
ex:hasOwner toex:Person
• Use extension toretrieve all personswho own dogs:
ex:Bob
Extensional Semantics: Task 3 – Querying for Owners
rdf:Resource > ex:Dogrdf:Resource > ex:Creature > ex:Dogrdf:Resource > ex:Creature > ex:Dog . ex:hasOwnerrdf:Resource > ex:Creature > ex:Dog . ex:hasOwner -> Extension
NPQL
Steffen [email protected] 20
WeST
rdf:Resource > ex:Creature > ex:Dog.hasOwner
Intensional Semantics: Task 2 - Creating Person Code Type
• Select ex:Person node• “Intension”
to get code type based on rdf type
rdf:Resource > ex:Creature > ex:Dog.hasOwner -> Intension
NPQL
type exPerson(uri) = classinherit exCreature(uri)
end
type exCreature(uri) = classmember this.hasName : String = …Member this.hasAge : int = …
end
Steffen [email protected] 21
WeST
Suggestions during query writing • Instances based on
extensional semantics• Types & Props
based on intensional semantics
Autocompletion Semantics: Task 1 - Exploration
rdf:Resource > ex:Creature
ex:Person, ex:Dog
NPQL
rdf:Resource > ex:Creature >
Steffen [email protected] 22
WeST
Extensional Semantics: LA Conjunctive Queries
Left associative
conjunctive query
with projection
ex:Dog <- ex:hasOwner
NPQL
Steffen [email protected] 23
WeST
Host Language Extension: Task 4 – Create Objects
Create the objects, manipulation & persistence• Develop the functionality around the query
that will send the reminder using LITEQ in F#
Preliminary Implementation in F#http://west.uni-koblenz.de/Research/systems/liteq
Web Science & Technologies
University of Koblenz ▪ Landau, Germany
Live demo of LITEQ in Visual Studio/F#
Steffen [email protected] 26
WeST
Task LINQ XML Type
Provider
Freebase Type
Provider
LITEQ current version
LITEQ Concept
1 Schema exploration
- (✔) per doc
(✔) only trees
✔ ✔
2 Code type creation
- (✔) erased types?
(✔) erased types
(✔) erased types
✔full
hierarchy
3 Data querying
✔ - ((✔)) very limited expressiv.
(✔)limited
expressiv.
✔ no full
SPARQL
4 Object manipulation & persistence
(✔) ✔ - ✔ no new object
creation
✔
Related Work
Steffen [email protected] 27
WeST
Future work wrt LITEQ
• Current implementation is a prototype• Current implementation uses erased types
At runtime, no type hierarchy is present
• Switch to generated types in the future Higher expressiveness in the host language
exploiting type hierarchy
• Optimizations of LITEQ implementation necessary• Lazy evaluation
• Distinguish between design time and runtime• Not all types created at design time are needed at
runtime
• Formalize query language and investigate expressiveness
Steffen [email protected] 28
WeST
Data modeling world
Description Logics
RDF
UML class
diagrams
Program modeling world
ML type inference
Challenge: Joint Type Inference
Steffen [email protected] 29
WeST
Agenda
SchemEXWhere do I find relevant data?
Efficient construction of a schema-level index
LiteQ – Language integrated types, extensions and queries for RDF graphs
Exploring Programming, Typing
Evaluation of LITEQ (NPQL) vs. SPARQL
Understandability
Ease of use
Steffen [email protected] 30
WeST
Preliminary Evaluation of LITEQ/NPQL
Focused on NPQL • Reason:
Test subjects lacked knowledge of F# and functional programming for evaluating LITEQ in full
• Comparing NPQL against SPARQL
Main Hypothesis of Evaluation • NPQL with autocompletion allows for effective query
writing in more efficient manner than SPARQL
Thus: some of the advantages of LITEQ cannot show up in the evaluation!
Steffen [email protected] 31
WeST
Evaluation Subjects
Evaluation with 11 participants• 1 subject a posteriori eliminated from analysis of evaluation,
because he could not deal with SPARQL at all!
• 10 subjects remaining for analysis
Participants• Undergraduate students
• PhD students
• PostDocs
Steffen [email protected] 32
WeST
Evaluation - Setup
1. Pre-questionaire
2. Training in RDF, SPARQL & NPQL
3. Experimental tasks to be solved by subjects
4. Post-questionaire
Steffen [email protected] 33
WeST
Phase 1: Pre-Questionnaire – Knowledge & skills
• Programming: All “Intermediate” or above• Object-orientation: 8 “Intermediate” or above• Functional programming:
4 “Intermediate” or aboveLisp, Haskell, F# (once)
4 “none”
• .NET1 “Expert”
2 “Beginner”
7 “none”
• SPARQL: 3 “Intermediate” or above [Sparql Experts]7 below “intermediate” [Sparql Novices]
Steffen [email protected] 34
WeST
Phase 2: Training in RDF, SPARQL, NPQL
Training in RDF & SPARQL• Presentation of RDF & SPARQL (20 minutes)• Practical excercise writing SPARQL queries
in the Web interface (5 minutes)
Training in NPQL• Practical excercise writing NPQL queries in Visual Studio
(5 minutes)
Steffen [email protected] 35
WeST
Phase 3: Solving experimental tasks by subjects
9 different experimental tasks to solve• Half of tasks in NPQL using Visual Studio• Other half using SPARQL and a web interface
Task types• Navigation and exploration of a data source (Task 1)• Retrieving and answering questions about the data (Task 3)• 2 tasks were not solvable in NPQL
• Investigating how users deal with limits of the language
Evaluation measure: • Duration to complete each task
Steffen [email protected] 38
WeST
Phase 4: Post-Questionnaire
“Do you want to explore a data source in your IDE?”
4 “yes”
3 “no, prefer separation of steps”
3 “no preference”
“NPQL is easier to use than SPARQL”
7 “agree” or above
Other
• Better support when writing queries in SPARQL
• Better response times for interactive working with NPQL
My conclusion Though LITEQ is still in a pre-alpha status,
advantages became visible in preliminary user evaluation
Steffen [email protected] 39
WeST
Agenda
SchemEXConstruction of schema-based index
Schema induction
LiteQ – Language integrated types, extensions and queries for RDF graphs
Exploring Programming, Typing
Evaluation of LITEQ (NPQL) against SPARQL
Understandability
Ease of use
Steffen [email protected] 40
WeST
Searching the LOD cloud
?
foaf:Document
fb:Computer_Scientist
dc:creator
x
swrc:InProceedingsSELECT ?xWHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings . ?x dc:creator ?y . ?y rdf:type fb:Computer_Scientist}
Steffen [email protected] 41
WeST
Searching the LOD cloud
SELECT ?xWHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings . ?x dc:creator ?y . ?y rdf:type fb:Computer_Scientist}
Index
Where? • ACM• DBLP
Steffen [email protected] 42
WeST
Schema-level index
Schema information on LOD
Explicit
Assigning class types
Implicit
Modelling attributes
Class
Entity
rdf:type EntityProperty
Entity 2
Steffen [email protected] 44
WeST
Typecluster
Entities with the same Set of types
C1 C2
DS1 DS2 DSm
Cn...
...
TCj
Steffen [email protected] 45
WeST
Typecluster: Example
foaf:Document swrc:InProceedings
DBLP ACM
tc2309
Steffen [email protected] 46
WeST
Bi-Simulation
Entities are equivalent, if they refer with the same attributes to equivalent entities
Restriction: 1-Bi-Simulation
P1 P2
DS1 DS2 DSm
Pn...
...
BSi
Steffen [email protected] 48
WeST
SchemEX: Combination TC and Bi-Simulation
Partition of TC based on 1-Bi-Simulation with restrictions on the destination TC
C1 C2 Cn...
DS1 DS2 DSm...
C45 C2 Cn‘...
P1 Pn...
EQC EQC
DS
TCj TCk
EQCj
BSi
Sch
ema
Pay
load
P2
Steffen [email protected] 49
WeST
SchemEX: Example
DBLP
...
tc2309 tc2101
eqc707
bs2608
foaf:Document swrc:InProceedings fb:Computer_Scientist
dc:creator
SELECT ?xWHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings . ?x dc:creator ?y . ?y rdf:type fb:Computer_Scientist}
Steffen [email protected] 50
WeST
SchemEX: Computation
Precise computation: Brute-Force
C1 C2 Cn...
DS1 DS2 DSm...
C12 C2 Cn‘...
P1 Pn...
EQC EQC
DS
TCj TCk
EQCj
BSi
Sch
ema
Pay
load
P2
Smarter Approach?
Steffen [email protected] 51
WeST
Stream-based Computation of SchemEX
LOD Crawler: Stream of n-Quads (triple + data source)
… Q16, Q15, Q14, Q13, Q12, Q11, Q10, Q9, Q8, Q7, Q6, Q5, Q4, Q3, Q2, Q1
FiFo
4
3
2
1
1
6
23
4
5
C3
C2
C2
C1
Steffen [email protected] 52
WeST
Quality of Approximated Index
Stream-based computation vs. brute force
Data set of 11 Mio. tripel
Steffen [email protected] 53
WeST
SchemEX @ BTC 2011
SchemEXAllows complex queries (Star, Chain)
Scalable computation
High quality
Index over BTC 2011 data2.17 billion tripel
Index: 55 million tripel
Commodity hardware VM: 1 Core, 4 GB RAM
Throughput: 39.500 tripel / second
Computation of full index: 15h
1. Place BTC 2011
Steffen [email protected] 54
WeST
Future work wrt SchemEX
Further exploration of
• schema induction
• query federation
Federation vs Link Traversal based query execution
• Granularity of query execution
• Too fine grained: URI dereferencing
• Too expressive: SPARQL
• Sweet spot -> NPQL??
Steffen [email protected] 55
WeST
Agenda
SchemEX
Construction of schema-based index
Schema induction
LiteQ – Language integrated types, extensions and queries for RDF graphs
Exploring Programming, Typing
Evaluation of LITEQ (NPQL) against SPARQL
Understandability
Ease of use
Steffen [email protected] 56
WeST
Future
1. Searching for distributed data
2. Understanding distributed data
3. Intelligent queries on distributed data
4. Programming with distributed data• Type reuse• Type induction
Web Science & Technologies
University of Koblenz ▪ Landau, Germany
Thank you for your attention!