Upload
mgrani
View
2.215
Download
0
Embed Size (px)
Citation preview
Wissenstechnologie WS 08/09
Michael Granitzer
IWM TU Graz & Know-CenterIWM TU Graz & Know Center
Lect e 6 T iple Sto es Spa ql
http://kmi tugraz at http://www know center at
Lecture 6: Triple Stores, Sparql, Semantic Retrieval
http://kmi.tugraz.at http://www.know-center.atThis work is licensed under the Creative Commons Attribution 2.0 Austria License. To view a copy of this license, visit http://creativecommons.org/licenses/by/2.0/at/.
Today
Ontology Modelling & SW Frameworks
Triple Stores•Basic RDBMS scheme•Property tables & vertical Partitioning•Performance Comparisons
SPARQL•Definition•Definition•Simplex & Complex Queries•Some examples on Endpoints
Information Retrieval vs Semantic Retrieval
2
Information Retrieval vs. Semantic Retrieval•Basics of IR•„Semantic“ Retrieval•Practical Examples (Freebase, Cugil etc.)
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Ontology Modellingvs OOP Semantic Retrievalvs. OOP
Similar to design in Object Oriented ProgrammingSimilar to design in Object Oriented Programming
Classes, objects and members
C t th ti l tiCapture the operational properties
public interface Course {bli id ll()public void enroll()
}
Ontology Modelling: Capture the structural properties
owl:Courseowl:participates
3owl:Student
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Ontology Modellingvs RDBMS Semantic Retrievalvs. RDBMS
Similar in designing a database systemSimilar in designing a database system
Higher expressiveness in OWL Aggrement on the domain not only referential integrity
Not focused on special indexing structures or on querying only
Ontologies should be application independent
Consistency checksConsistency checks
Semantic Integration via Ontologies
4
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Product Database
File System EmployeeDatabase
Text Database
...
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Ontology ModellingGoals Semantic RetrievalGoals
GoalsGoals
Share common understanding among people or software
Enable reuse of knowledgeEnable reuse of knowledge
Make domain assumptions explicit
Separate domain knowledge from operational knowledgeSeparate domain knowledge from operational knowledge
Analyze domain knowledge
Main Application AreasMain Application Areas
Semantic harmonization of heterogeneous data sources
Structuring the content of a portal
5
Structuring the content of a portal
Enhance search and retrieval
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
…
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Ontology ModellingAspects to model Semantic RetrievalAspects to model
Defining classes in the ontologyDefining classes in the ontology
Arranging classes in a taxonomyg g y
Defining slots/properties for classes and their values
Define logical constraints on classes/properties
Assign instances
6
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Ontology ModellingThree simple rules Semantic RetrievalThree simple rules
1 “There is no one correct way to model a domain— there 1. There is no one correct way to model a domain there are always viable alternatives. The best solution almost always depends on the application that you have in mind and the extensions that you anticipate ”mind and the extensions that you anticipate.
2. “Concepts in the ontology should be close to objects (physical or logical) and relationships in your domain of (physical or logical) and relationships in your domain of interest. These are most likely to be nouns (objects) or verbs (relationships) in sentences that describe your domain ”domain.
3. “Ontology development is necessarily an iterative process”
7
process
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Ontology ModellingNoy‘s and McGunnise 7 Steps Semantic RetrievalNoy s and McGunnise 7 Steps
1 Determine the domain and scope of the ontology1. Determine the domain and scope of the ontology
2. Consider reusing existing ontologies
3 E t i t t t i th t l3. Enumerate important terms in the ontology
4. Define the classes and the class hierarchy
5. Define the properties (slots) of classes
6. Define the facets of the slots
7. Create/Import instances
8
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Semantic Web FrameworksMotivation Semantic RetrievalMotivation
Protege as modelling GUIProtege as modelling GUI
For „Semantic Web Applications“ we want also to
A t ti ll i t/ i tAutomatically import/map instances
Manage large number of triples
Combine different schemas
Query for specific triples
Harmonize different metadata schemas
Database requirements for graphs
9
Database requirements for graphs
Reasoning
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Semantic Web FrameworksOverview Semantic RetrievalOverview
Three „major“ Java based Open Source frameworks Three „major Java based Open Source frameworks
Jena
Sesame
Protege Java API
Functionality
Java API for managing OWL, RDF and RDFS (optional DAML+OIL)
Import/Export of different formats
Persistence via own data store, different database and file system backend
Querying, Graph manipulation and restricted reasoning capabilities
10Web API
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Semantic Web FrameworksJena Architecture Semantic RetrievalJena Architecture
SPARQL
RDF/XML
11Jena: Implementing theSemantic Web Recommendations – 2003http://www.hpl.hp.com/techreports/2003/H
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
PL-2003-146.html
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Semantic Web FrameworksMain Differences Semantic RetrievalMain Differences
JenaJena
Reference implementation
Not directly focused towards web access and scalabilityNot directly focused towards web access and scalability
Protege
Modelling GUIModelling GUI
Sesame
Focused towards remote access and scaleabilityFocused towards remote access and scaleability
Flexible Layer architecture for different storage backends
Others: Virtuoso 3Store Kowari OpenAnzo
12
Others: Virtuoso, 3Store, Kowari, OpenAnzo
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Triple StoresOverview Semantic RetrievalOverview
Basic data model is RDF (i e OWL RDFS)Basic data model is RDF (i.e. OWL, RDFS)
RDF forms an directed graph
How do we manage large graphs
In Memory Adjacency MatrixIn Memory Adjacency Matrix
On secondary storage
– Special Indices Use relational database management systems
13
– Use relational database management systems
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Triple Stores Normalized“ Table Model of RDF Semantic Retrieval„Normalized Table Model of RDF
Subject Predicate Object
http://book.at/isbn123 author http://fussball.de/G. Müller
http://book.at/isbn123 price €15
http://book.at/isbn123 Title Ein Leben für die Tore
http://fussball.de/G. Müller Name Gerd Müller
http://book.at/isbn123 http://fussball.de/G. Müllerauthor
name
14price title
name
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
€15 Ein Leben für die Tore Gerd Müller
Triple Stores Query in an unoptimized RDBMS
Query: Titles of books from the personwith name Gerd Müller?
Select r3.o as Title from rdfr1, rdf r2, rdf r3 where
r1.s = r2.o AND R2.s = r3.s AND
Subject (s) Predicate(p)
Object (o)
http://book.at/isbn123 author http://fussball.de/G. Müller
r1.o = ‘Gerd Müller’ AND r1.p = ‘Name’ AND r2.p = ‘author’ AND R3.p = ‘Title’
http://book.at/isbn123 price €15
p
http://book.at/isbn123 Title Ein Leben für die Tore
http://fussball de/G Name Gerd Müller
15
http://fussball.de/G. Müller
Name Gerd Müller
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Triple StoresThe Sesame Mapping as example Semantic RetrievalThe Sesame Mapping as example
16See Hak Soo Kim, Hyun Seok Cha, Jungsun Kim, Jin Hyun Son,, Development of the Efficient OWL Document Management
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
y g y p gSystem for the Embedded Applications, Springer 2005, http://www.springerlink.com/content/8mfxeh0glq5xj00m/
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Triple StoresIndexing Techniques Semantic RetrievalIndexing Techniques
Use specialised indices for graphsUse specialised indices for graphs
Bitmap indices in Virtuosohttp://virtuoso.openlinksw.com/wiki/main/Main/VOSBitmapIndexing
Index different combinations of the S,P,O Table
P,S,O
O,P,S
O,S,P
S,O,P
17
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Triple StoresA first Analysis Semantic RetrievalA first Analysis
Normalised view on a graph: one large tableNormalised view on a graph: one large table
Generic and flexible, but
L lf j i t f th i l i RDBMS Large self joints for rather simple queries. RDBMS areusually not optimized for this
Large memory overhead in query processing due toLarge memory overhead in query processing due toself joints
Requires lot of index lookups and/or full table scansRequires lot of index lookups and/or full table scans
Large storage overhead
I l fl ibilit f
18
In general: flexibility vs. performance
How to improve?
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Triple StoresFurther improvements Semantic RetrievalFurther improvements
Property tables: flattened representation by finding p y p y gsets of properties which are used together
Subject-Property Matrix Materialized Join Views (SPMJVs) from OracleCh E I D S E d G d S i i J 2005 A ffi i t SQL b d Chong, E. I., Das, S., Eadon, G., and Srinivasan, J. 2005. An efficient SQL-based RDF querying scheme. In Proceedings of the 31st international Conference on Very Large Data Bases, ACM
19
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Abadi, D. J., Marcus, A., Madden, S. R., and Hollenbach, K. 2007. Scalable semantic web data management using vertical partitioning. In Proceedings of the 33rd international Conference on Very Large Data Bases (Vienna, Austria, September 23 - 27, 2007). Very Large Data Bases. VLDB Endowment, 411-422.
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Triple StoresProperty Tables Semantic RetrievalProperty Tables
++: Faster querying within a property tables due to reducing++: Faster querying within a property tables due to reducingsubject-subject self joins
--: Requires intelligent selection of the properties in the table
More property colums lead to more null values in the tableand therefore to larger space overhead
Lesser property colums lead to more property tablesmore joins over lesser property tables
--: Multi valued properties are hard to manage (e g a book has: Multi valued properties are hard to manage (e.g. a book hasseveral authors)
Subject Title Author Year
20ID1 “Intro to RDF” Granitzer 2006ID1 “Intro to RDF” Tochtermann 2006
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Triple StoresVertical Partitioning Semantic RetrievalVertical Partitioning
Partition database according to properties – one table per Partition database according to properties one table per propertyAbadi, D. J., Marcus, A., Madden, S. R., and Hollenbach, K. 2007. Scalable semantic web data management using vertical partitioning. In Proceedings of the33rd international Conference on Very Large Data Bases (Vienna, Austria, September 23 - 27, 2007). Very Large Data Bases. VLDB Endowment, 411-422.
Tables are sorted by subject allows fast merge sort joinsTables are sorted by subject allows fast merge sort joins
21
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Triple StoresVertical Partitioning Semantic RetrievalVertical Partitioning
++: Use of simple fast merge joints++: Use of simple, fast merge joints
++: Multi valued attributes are supported
++: No a-priori clustering decision is necessary++: No a priori clustering decision is necessary
++: Smaller tables. Only those properties accessed have to beread from disk
--: Insert may be slower due to access to multiple tables
--: Queries over multiple properties span over multiple tables
22
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Triple StorePerformance of Open Source Solutions Semantic RetrievalPerformance of Open Source Solutions
Portwin & Parvatikar (2006) Scaling Jena in a Commercial Environment: Portwin & Parvatikar (2006) Scaling Jena in a Commercial Environment: The Ingenta MetaStore Project
LEGHIGH Dataset with domain universities
~200 million triples, 11 Millionen OWL Statements, 4.3 millionen documents
Kowari: 1 billion triple, load 20k Triple/s for Wikipedia data set
Unoptimized
Simple query take milliseconds
With inference queries take several seconds to minutesdepending on the complexity
23Optimization for Inference: for RDFS entailment is toexpand the graph by making implicit edges explicit
more storage but faster access
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Triple StorePerformance of Oracle Semantic RetrievalPerformance of Oracle
BioMed literature database (UniProt data set)BioMed literature database (UniProt data set)
80 million triples
5 GB RDF/XML d t ( 2 5 GB T i l 1 7 GB M i ~5 GB RDF/XML data (~2,5 GB Triple; 1,7 GB Mapping; 4,8 GB Indices)
Queries take milliseconds to secondesQueries take milliseconds to secondes
Subject-property matrix materialized views provide optimization potential of roughly ~30%optimization potential of roughly 30%
Chong, E. I., Das, S., Eadon, G., and Srinivasan, J. 2005. An efficient SQL-based RDF querying scheme. In Proceedings of the 31st international Conference on Very Large Data Bases, ACM
24
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Triple StorePerformance Summary Semantic RetrievalPerformance Summary
http://esw.w3.org/topic/LargeTripleStoreshttp://esw.w3.org/topic/LargeTripleStores
Problem: Comparison among performance numbers available
Trade-off Generic vs Performance Trade off Generic vs. Performance
Optimization potential is available
Currently not as fast as specialised RDBMS but more flexibleCurrently not as fast as specialised RDBMS, but more flexible
25
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
SPARQLSPARQL Protocol and RDF Query Language Semantic RetrievalSPARQL Protocol and RDF Query Language
Different languages similar to SQL in RDBMSDifferent languages similar to SQL in RDBMS
SerQL, RDF, SPARQL
SPARQL currently proposed recommendation of the W3C
But what does querying a graph mean?
Basically
Specify a sub-graph with variable nodes
Find all patterns in the graph matching the sub-graph
? author Gerd Müller
title
Select ?x, ?y where ?x <author> “Gerd Müller”.
?x <title> ?y
26?
title ?x <title> ?y.
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
SPARQLExample Semantic RetrievalExample
Daten:
http://example.org/book/book1http://purl.org/dc/elements/1.1/title"SPARQL Tutorial" .
Abfrage:
SELECT ?title WHERE { <http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title> ?title . }
Ergebnis:
title "SPARQL Tutorial"
27
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
SPARQLExample Semantic RetrievalExample
Data:@prefix foaf: <http://xmlns.com/foaf/0.1/> .
_:a foaf:name "Johnny Lee Outlaw" ._:a foaf:mbox <mailto:[email protected]> ._:b foaf:name "Peter Goodguy" .:b foaf:mbox <mailto:peter@example org>_:b foaf:mbox <mailto:[email protected]> .
Query:
PREFIX foaf: http://xmlns.com/foaf/0.1/SELECT ?name ?mboxWHERE { ?x foaf:name ?name .?x foaf:mbox ?mbox}
Res ltResult:
name mbox"Johnny Lee Outlaw" mailto:[email protected]"Peter Goodguy" <mailto:[email protected]>
28
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
SPARQLSimple Query Elements Semantic RetrievalSimple Query Elements
Determine the Namespace: PREFIXDetermine the Namespace: PREFIX
Determine the return format
SELECT: Table output format similar to SQL Results
CONSTRUCT: Allows to construct a graph as return value
ASK: Returns only true/false depending of the result exists or not
DESCRIBE: return possible properties/ressources for a particularquery. Used for browsing.
Specify the selection criteria with the WHERE ClauseSpecify the selection criteria with the WHERE Clause
Specify a non-recursive sub-pattern with triples and placeholders (? Or $)
29Perform Grouping and Filter Operations
Modifiers: ORDER BY, LIMIT, OFFSET, DISTINCT
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
SPARQLBlank Nodes Semantic RetrievalBlank Nodes
ID of Blank Nodes is unique within one query and indicate only theID of Blank Nodes is unique within one query and indicate only theexistence of a blank node not it‘s absolute value
Blank nodes are identified by an automatically generated URI
Consider the results of a query
Subject Value
a “ m”
Subject Value
“ ”Subject Value≡ ≠
Blank nodes may be renamed and are structural elements only
_:a “zum”
_:b “Beispiel”
_:x “zum”
_:y “Beispiel”_:z “zum”
_:z “Beispiel”≡ ≠Blank nodes may be renamed and are structural elements only
30
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
SPARQLComplex Queries Semantic RetrievalComplex Queries
Combination of groups of simple graph expressions in the WHERE Combination of groups of simple graph expressions in the WHERE clause
OPTIONAL clause: Subgraph pattern may not exist
Example for querying book titles from Springer
if an author exists, it will be listed if not the title is returnedwithout a author
SELECT ?title ?authorWHERE{ ?buch ex:pulishedFrom http://springer.com/Verlag .{ p p // p g / g
? Buch ex:Title ?title .OPTIONAL {?buch ex:Autor ?author }.
}
31
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
SPARQLComplex Queries Semantic RetrievalComplex Queries
Specifying alternative sub graph patterns: UNIONSpecifying alternative sub graph patterns: UNION
Logical OR or union of two separat queries
SELECT ?title ?authorWHERE{ ?buch ex:pulishedFrom http://springer.com/Verlag .
? Buch ex:Title ?title .{?b h A t ? th } UNION{?buch ex:Autor ?author .} UNION{?buch ex:Creator ?author .}
}
„Select all books with a title published by Springer which have an author or an creator assigned“
Note: ?author in the different groups are independent of each other
32
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
SPARQLComplex Queries Semantic RetrievalComplex Queries
Considering special datatypes: FILTER and XML DatatypesConsidering special datatypes: FILTER and XML Datatypes
Specify the data type of a literal
SELECT ?title ?authorWHEREWHERE{ ?buch ex:pulishedFrom http://springer.com/Verlag .
? Buch ex:Title ?title .?buch ex:publishedIn „1998“^^xsd:integer
FILTER specifies boolean expressions for filtering results
?buch ex:publishedIn „1998 xsd:integer}
E.g. Specify the data type range using FILTER (see Chapter 7 in Semantic Web Grundlagen)
SELECT ?title ?authorWHERE
33
WHERE{ ?buch ex:pulishedFrom http://springer.com/Verlag .
?buch ex:publishedIn ?year .FILTER(?year >2000)
}
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
}
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
SPARQLReal World Examples in DBPedia Semantic RetrievalReal World Examples in DBPedia
PREFIX p: <http://dbpedia.org/property/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX rdf: <http://www.w3.org/1999/02/22 rdf syntax ns#>SELECT * WHERE {
?album p:artist ?band. ?album rdf:type <http://dbpedia.org/class/yago/Album106591815>.OPTIONAL {?album p:cover ?cover}.OPTIONAL {?album p:name ?name}. PREFIX p: <http://dbpedia.org/property/> { p }OPTIONAL {?album p:released ?dateofrelease}.
} ORDER BY DESC(?name) LIMIT 20 OFFSET 19
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>CONSTRUCT {?album p:itIsDone ?dateofrelease .?band p:isBand "true" .
} WHERE { ?album p:artist ?band?album p:artist ?band. ?album rdf:type <http://dbpedia.org/class/yago/Album106591815>.OPTIONAL {?album p:cover ?cover}.OPTIONAL {?album p:name ?name}.OPTIONAL {?album p:released ?dateofrelease}.
} ORDER BY ?name LIMIT 20 OFFSET 19}
PREFIX p: <http://dbpedia.org/property/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>DESCRIBE ?album
34
DESCRIBE ?album WHERE {
?album p:artist <http://dbpedia.org/resource/The_Allman_Brothers_Band>.
?album rdf:type <http://dbpedia.org/class/yago/Album106591815>.}
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
}
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
SPARQLSummary Semantic RetrievalSummary
Similar to SQL Similar to SQL
Allows easier expression of joins without knowing the underlying database schemay g
Allows to return not only tables, but also more complexeoutput formats like graphs etc.
Datatypes of a variable not always clear
http://www.w3.org/TR/rdf-sparql-query/
htt //th fi t t/l / / l f
35
http://thefigtrees.net/lee/sw/sparql-faq
Hitzler, Krötsch, Rudolph, Sure, Semantic Web –Grundlagen Chapter 7
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Grundlagen, Chapter 7
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Semantic vs. Information RetrievalOverview Semantic RetrievalOverview
Central Question: What is semantic retrieval?Central Question: What is semantic retrieval?
Define information retrieval
Wh i ti i i ?Where is semantic missing?
How can we use Semantic Web technologie to increasesemantic?semantic?
36
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Semantic vs. Information RetrievalDefinition of IR Semantic RetrievalDefinition of IR
Salton (1968): „Information retrieval is a field concerned with the ( ) „structure, analysis, organization, storage, searching, and retrieval of information.“
“ f i i l ( ) i fi di i l ( ll d ) f“Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers). “ Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval,
Cambridge University Press 2008Cambridge University Press. 2008
Main focus of IR is how to deal with uncertainty and incomplete information
Representation of documents is ambiguous
Query formulation is ambiguous and usually incomplete
37“Unstructured” information
Usually the perfect answer, so far a perfect answer exists, can not be delivered
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Semantic vs. Information RetrievalIR vs Data Retrieval from Rijsbergen 1979 Semantic RetrievalIR vs. Data Retrieval from Rijsbergen 1979
Data retrieval Information retrieval
Matching Exact match Partial (best) match
Inference Ded ction Ind ctionInference Deduction Induction
Model Deterministic Probabilistic
Classification Monothetic Polythetic
Query language Artificial Natural
Query specification Complete Incomplete
Items wanted Matching Relevant
38
g
Error response Sensitive Insensitive
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Semantic vs. Information RetrievalIR vs Data Retrieval from Rijsbergen 1979 Semantic RetrievalIR vs. Data Retrieval from Rijsbergen 1979
“What is the Gross domestic product of Austria?”What is the Gross domestic product of Austria?
39
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Semantic vs. Information RetrievalIR vs Data Retrieval from Rijsbergen 1979 Semantic RetrievalIR vs. Data Retrieval from Rijsbergen 1979
“What is the Gross domestic product of Austria?”What is the Gross domestic product of Austria?
Select GDP from GDP_table where country_name=“Austria”
€ 270.8 bn
However,
not all information is available in databases
Queries are hard to formulate for the average users as well as for non domain experts
Th l th d i d th i f ti d The more complex the domain and the information need, the harder to formulate a correct query
40
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Semantic vs. Information RetrievalBasic Retrieval Workflow Semantic RetrievalBasic Retrieval Workflow
Retrieval Documents
DDocument Representation
DrModel M
Ranking Function R
Query Q
Information NeedIN
41
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
See also Baeza Yates & Ribeiro Neto, (1999),“Modern Information Retrieval”
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Semantic vs. Information RetrievalThe Vector Space Model Semantic RetrievalThe Vector Space Model
Document Representation Dr: Documents are represented asDocument Representation Dr: Documents are represented asbag-of-words (i.e. a set of words)
Query Q: Query is a set of keywords
Retrieval Model M:
Set of words are converted to vectors d and q
Use different heurisitc to calculate the importance of a word
Ranking Function R:
C i Si il it C l l t th l b t d dCosine Similarity: Calculate the angle between d and q
d1:= “Boy plays chess”
42
y p y
d2:= “Boy plays bridge”
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Semantic vs. Information RetrievalAn analysis of the vector space model Semantic RetrievalAn analysis of the vector space model
Query and documents are represented in terms of their wordsQuery and documents are represented in terms of their words
Importance of words depend on their occurrence
Syntactic matching between documents and queriesy g q
No synonyms are considered (e.g. Money == Cash)
No homonyms are considered (e.g. Apache Web Server)
No mereonyms are considered (e.g. tire is part of a car)
No relationships between terms are considered
43
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Semantic vs. Information RetrievalSo where can we include semantic? Semantic RetrievalSo where can we include semantic?
Increase the semantic of the document representation Dr and the queryp q yQ
Add metadata (e.g. tags, dublin core etc.)
Use more sophisticated preprocessing (e g language models wordUse more sophisticated preprocessing (e.g. language models, wordsense disambiguation)
Allow users to express information needs in more detail or estimatethe context of a user (e.g. specify metadata, profiling)the context of a user (e.g. specify metadata, profiling)
Formal representation of DR and Q using semantic web languages like OWL see Tran, Bloehdorn, Cimiano, Haase (2007), „Expressive Ressource Description for Ontology-Based Information Retrieval“Information Retrieval
However, if we have a perfect formal representation we still need totransform natural language queries to this model for the average user
Requires a special user interface – not possible for the generic case
44
Requires a special user interface – not possible for the generic case
Natural Language Understanding – currently unsolved
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Semantic vs. Information RetrievalSo where can we include semantic? Semantic RetrievalSo where can we include semantic?
Document representationp
C:\myDocument.doc
http://en wikipedia org/wiki/ApacheWebServer
Ex:ConceptEx:containsConcept
http://en.wikipedia.org/wiki/ApacheWebServer
ex:term
rdf:type
„ Apache“ „ Apache Server“ „ Apache Web Server“
45QuerySelect * WHERE {?x ex:containshttp://en.wikipedia.org/wiki/ApacheWebServer}
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Semantic vs. Information RetrievalSo where can we include semantic? Semantic RetrievalSo where can we include semantic?
Iterative refinement of the information need:
Keyword Query: „Apache“
http://en.wikipedia.org/wiki/ApacheWebServerhttp://en.wikipedia.org/wiki/ApacheHelicopterttp //e ped a o g/ / pac e e coptehttp://en.wikipedia.org/wiki/AmericanNatives
Select * WHERE {?x ex:containsConcept http://en.wikipedia.org/wiki/ApacheWebServer}
C:\myDocument.doc C:\myDocument2.doc
http://en.wikipedia.org/wiki/ApacheWebServer
Ex:containsConcept
http://en.wikipedia.org/wiki/ApacheTribes
Ex:containsConcept
46
p p g p
ex:term
http://en.wikipedia.org/wiki/ApacheTribes
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
„ Apache“ „ Apache Server“ „ Apache Web Server“
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Semantic vs. Information RetrievalSo where can we include semantic? Semantic RetrievalSo where can we include semantic?
Increase the cabability of the retrieval model M and rankingIncrease the cabability of the retrieval model M and rankingfunction R
Latent Semantic Indexing/Concept IndexingAutomatically determine the concepts contain in a Automatically determine the concepts contain in a document set
Include a-priori knowledge (e.g. Thesaurus, Word Net)
Learn ranking functions based on a users feedback (e.g. via machine learning)
Use formal knowledge in form of ontologies and reasoningUse formal knowledge in form of ontologies and reasoningcapabilities
47
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Semantic vs. Information RetrievalSo where can we include semantic? Semantic RetrievalSo where can we include semantic?
ExampleExample
Vector Space Model
– D={Apache=0.8, http=0.5, server=0.3}D {Apache 0.8, http 0.5, server 0.3}– Q={Jetty=0.8, java=0.7, web=0.4}– Ranking Value=0.0
Introduce a „better“ retrieval model by using a domainontology:
Ex:WebServer
ex:isA
„Apache“ and „Jetty“ can be related to eachother using the domain
48Ex:ApacheServer
ex:term
Ex:JettyServer
ex:term
ontology
Ranking Value > 0
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
„ Apache“ „ Jetty Server“„ Apache Web Server“ „ Jetty“
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Semantic vs. Information RetrievalSo where can we include semantic? Semantic RetrievalSo where can we include semantic?
Improve the presentation of resultsImprove the presentation of results
Clustering of search results
Display different facets of the result setp y
Different representation of results
Display facts instead of documents
Supports refining the user to define their information need
Search as iterative approach
49
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Semantic vs. Information RetrievalSo where can we include semantic? Semantic RetrievalSo where can we include semantic?
50
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Semantic vs. Information RetrievalDimensions of Semantic Semantic RetrievalDimensions of Semantic
Semantically structured vs. unstructured document and querySemantically structured vs. unstructured document and queryrepresentation
Semantic expressiveness increases with increased structure
Queries are hard to formalize. Support for the average useris required
Labour intensive creation of the document representationLabour intensive creation of the document representation
Extension of the retrieval model
Runtime complexity of reasoning in case of semanticllyp y g ystructured Dr and Q
Scaleability (also an issue for more complex statisticalmethods)
51
methods)
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Semantic vs. Information RetrievalDimensions of Semantic Semantic RetrievalDimensions of Semantic
Semantic enhancement of result presentationp
„Low hanging fruit“
Does not require a formalized knowledge base
Formalized knowledge vs. statistical approaches
Bottom Up vs. Top Down
Statistical approaches can provide sophisticated retrieval model Statistical approaches can provide sophisticated retrieval model, which do not require formalized, modelled knowledge
Statistical approaches depend on the fact that all required information is within the data set and can be extracted information is within the data set and can be extracted
Statistical models are usually not shareable between systems
Knowledge base may not model all facts contained in the data set ( W dN t d k thi b t th A h W b S )
52
(e.g. WordNet does know nothing about the Apache Web Server)
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Semantic vs. Information RetrievalExample Freebase www freebase com Semantic RetrievalExample Freebase, www.freebase.com
Open database of the worlds informationOpen database of the worlds information
Contribution by the community
Linked with other free resources like Wikipediap
Web API
Own Query Language: Metaweb Query Language
Regarding semantic search
Structural document representation
Keyword queries in combination with intelligent interfacesfor infromation need refinement
Fact based representation
53
Fact based representation
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Semantic vs. Information RetrievalExample Freebase Semantic RetrievalExample Freebase
54
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Semantic vs. Information RetrievalExample Cuil www cuil com Semantic RetrievalExample Cuil, www.cuil.com
Internet search engine with 120 billion pagesInternet search engine with 120 billion pages
Not based on popularity of sites, just on content and topics ofp p y , j pcontent
Document representation and query is unstructured
E t d d R t i l d l i th b k d b d Extended Retrievalmodel in the background based on categorical knowledge and statistical methods (Clustering)http://www.news.com.au/technology/story/0,25642,24089734-5014239,00.html
55Enhanced interface
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Ontology Modelling & SW Frameworks
Triple Stores
SPARQL
Information Retrieval vs. Semantic Retrieval
Semantic vs. Information RetrievalExample Yahoo! Search Monkey Semantic RetrievalExample Yahoo! Search Monkey
Use structured data to improve presentation of search resultsUse structured data to improve presentation of search results
http://developer.yahoo.com/searchmonkey/#
56
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
Summary
Triple StoreTriple Store
Generic /Flexibility vs. Performance/Space
Usually RDBMS with a large triple table
Alternative: special graph indexing structures
SPARQL
Simple query for RDF by providing a graph pattern
Recommended by W3c
Semantic Search
Document & query representation
Retrieval model & Interfaces
57
Retrieval model & Interfaces
Semantic vs. Statistic
It is the next step, but not such a big one
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
It is the next step, but not such a big one
Next Week
Guest Lectures (Anwesenheitspflicht) from experts Guest Lectures (Anwesenheitspflicht) from experts
Werner Klieber: Semantic Web Services (30‘)
A tif L tif O Li k d D t (30‘)Aatif Latif: Open Linked Data (30‘)
Fleur Jeanquartier: Bringing the Semantic Web closer to the User (30‘)to the User (30 )
58
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
That‘s it for today…
Thanks for your attentionThanks for your attention
Questions/comments?
59
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at
License
This work is licensed under the Creative Commons This work is licensed under the Creative Commons Attribution 2.0 Austria License. To view a copy of this license, visit http://creativecommons org/licenses/by/2 0/at/http://creativecommons.org/licenses/by/2.0/at/.
Contributors:
Mathias Lux
Peter Scheir
Klaus Tochtermann
60Michael Granitzer
WS 08/09
http://kmi.tugraz.at
Wissenstechnologie @ kmi.tugraz.at