60
Wissenstechnologie WS 08/09 Michael Granitzer IWM TU Graz & Know-Center IWM TU Graz & Know Center Lect e 6 T iple Sto es Spa ql http://kmi tugraz at http://www know center at Lecture 6: Triple Stores, Sparql, Semantic Retrieval http://kmi.tugraz.at http://www .know-center.at This work is licensed under the Creative Commons Attribution 2.0 Austria License. To view a copy of this license, visit http://creativecommons.org/licenses/by/2.0/at/ .

Wissenstechnologie Vi 08 09

  • Upload
    mgrani

  • View
    2.215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Wissenstechnologie Vi 08 09

Wissenstechnologie WS 08/09

Michael Granitzer

IWM TU Graz & Know-CenterIWM TU Graz & Know Center

Lect e 6 T iple Sto es Spa ql

http://kmi tugraz at http://www know center at

Lecture 6: Triple Stores, Sparql, Semantic Retrieval

http://kmi.tugraz.at http://www.know-center.atThis work is licensed under the Creative Commons Attribution 2.0 Austria License. To view a copy of this license, visit http://creativecommons.org/licenses/by/2.0/at/.

Page 2: Wissenstechnologie Vi 08 09

Today

Ontology Modelling & SW Frameworks

Triple Stores•Basic RDBMS scheme•Property tables & vertical Partitioning•Performance Comparisons

SPARQL•Definition•Definition•Simplex & Complex Queries•Some examples on Endpoints

Information Retrieval vs Semantic Retrieval

2

Information Retrieval vs. Semantic Retrieval•Basics of IR•„Semantic“ Retrieval•Practical Examples (Freebase, Cugil etc.)

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 3: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Ontology Modellingvs OOP Semantic Retrievalvs. OOP

Similar to design in Object Oriented ProgrammingSimilar to design in Object Oriented Programming

Classes, objects and members

C t th ti l tiCapture the operational properties

public interface Course {bli id ll()public void enroll()

}

Ontology Modelling: Capture the structural properties

owl:Courseowl:participates

3owl:Student

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 4: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Ontology Modellingvs RDBMS Semantic Retrievalvs. RDBMS

Similar in designing a database systemSimilar in designing a database system

Higher expressiveness in OWL Aggrement on the domain not only referential integrity

Not focused on special indexing structures or on querying only

Ontologies should be application independent

Consistency checksConsistency checks

Semantic Integration via Ontologies

4

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Product Database

File System EmployeeDatabase

Text Database

...

Page 5: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Ontology ModellingGoals Semantic RetrievalGoals

GoalsGoals

Share common understanding among people or software

Enable reuse of knowledgeEnable reuse of knowledge

Make domain assumptions explicit

Separate domain knowledge from operational knowledgeSeparate domain knowledge from operational knowledge

Analyze domain knowledge

Main Application AreasMain Application Areas

Semantic harmonization of heterogeneous data sources

Structuring the content of a portal

5

Structuring the content of a portal

Enhance search and retrieval

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 6: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Ontology ModellingAspects to model Semantic RetrievalAspects to model

Defining classes in the ontologyDefining classes in the ontology

Arranging classes in a taxonomyg g y

Defining slots/properties for classes and their values

Define logical constraints on classes/properties

Assign instances

6

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 7: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Ontology ModellingThree simple rules Semantic RetrievalThree simple rules

1 “There is no one correct way to model a domain— there 1. There is no one correct way to model a domain there are always viable alternatives. The best solution almost always depends on the application that you have in mind and the extensions that you anticipate ”mind and the extensions that you anticipate.

2. “Concepts in the ontology should be close to objects (physical or logical) and relationships in your domain of (physical or logical) and relationships in your domain of interest. These are most likely to be nouns (objects) or verbs (relationships) in sentences that describe your domain ”domain.

3. “Ontology development is necessarily an iterative process”

7

process

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 8: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Ontology ModellingNoy‘s and McGunnise 7 Steps Semantic RetrievalNoy s and McGunnise 7 Steps

1 Determine the domain and scope of the ontology1. Determine the domain and scope of the ontology

2. Consider reusing existing ontologies

3 E t i t t t i th t l3. Enumerate important terms in the ontology

4. Define the classes and the class hierarchy

5. Define the properties (slots) of classes

6. Define the facets of the slots

7. Create/Import instances

8

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 9: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Semantic Web FrameworksMotivation Semantic RetrievalMotivation

Protege as modelling GUIProtege as modelling GUI

For „Semantic Web Applications“ we want also to

A t ti ll i t/ i tAutomatically import/map instances

Manage large number of triples

Combine different schemas

Query for specific triples

Harmonize different metadata schemas

Database requirements for graphs

9

Database requirements for graphs

Reasoning

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 10: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Semantic Web FrameworksOverview Semantic RetrievalOverview

Three „major“ Java based Open Source frameworks Three „major Java based Open Source frameworks

Jena

Sesame

Protege Java API

Functionality

Java API for managing OWL, RDF and RDFS (optional DAML+OIL)

Import/Export of different formats

Persistence via own data store, different database and file system backend

Querying, Graph manipulation and restricted reasoning capabilities

10Web API

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 11: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Semantic Web FrameworksJena Architecture Semantic RetrievalJena Architecture

SPARQL

RDF/XML

11Jena: Implementing theSemantic Web Recommendations – 2003http://www.hpl.hp.com/techreports/2003/H

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

PL-2003-146.html

Page 12: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Semantic Web FrameworksMain Differences Semantic RetrievalMain Differences

JenaJena

Reference implementation

Not directly focused towards web access and scalabilityNot directly focused towards web access and scalability

Protege

Modelling GUIModelling GUI

Sesame

Focused towards remote access and scaleabilityFocused towards remote access and scaleability

Flexible Layer architecture for different storage backends

Others: Virtuoso 3Store Kowari OpenAnzo

12

Others: Virtuoso, 3Store, Kowari, OpenAnzo

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 13: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Triple StoresOverview Semantic RetrievalOverview

Basic data model is RDF (i e OWL RDFS)Basic data model is RDF (i.e. OWL, RDFS)

RDF forms an directed graph

How do we manage large graphs

In Memory Adjacency MatrixIn Memory Adjacency Matrix

On secondary storage

– Special Indices Use relational database management systems

13

– Use relational database management systems

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 14: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Triple Stores Normalized“ Table Model of RDF Semantic Retrieval„Normalized Table Model of RDF

Subject Predicate Object

http://book.at/isbn123 author http://fussball.de/G. Müller

http://book.at/isbn123 price €15

http://book.at/isbn123 Title Ein Leben für die Tore

http://fussball.de/G. Müller Name Gerd Müller

http://book.at/isbn123 http://fussball.de/G. Müllerauthor

name

14price title

name

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

€15 Ein Leben für die Tore Gerd Müller

Page 15: Wissenstechnologie Vi 08 09

Triple Stores Query in an unoptimized RDBMS

Query: Titles of books from the personwith name Gerd Müller?

Select r3.o as Title from rdfr1, rdf r2, rdf r3 where

r1.s = r2.o AND R2.s = r3.s AND

Subject (s) Predicate(p)

Object (o)

http://book.at/isbn123 author http://fussball.de/G. Müller

r1.o = ‘Gerd Müller’ AND r1.p = ‘Name’ AND r2.p = ‘author’ AND R3.p = ‘Title’

http://book.at/isbn123 price €15

p

http://book.at/isbn123 Title Ein Leben für die Tore

http://fussball de/G Name Gerd Müller

15

http://fussball.de/G. Müller

Name Gerd Müller

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 16: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Triple StoresThe Sesame Mapping as example Semantic RetrievalThe Sesame Mapping as example

16See Hak Soo Kim, Hyun Seok Cha, Jungsun Kim, Jin Hyun Son,, Development of the Efficient OWL Document Management

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

y g y p gSystem for the Embedded Applications, Springer 2005, http://www.springerlink.com/content/8mfxeh0glq5xj00m/

Page 17: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Triple StoresIndexing Techniques Semantic RetrievalIndexing Techniques

Use specialised indices for graphsUse specialised indices for graphs

Bitmap indices in Virtuosohttp://virtuoso.openlinksw.com/wiki/main/Main/VOSBitmapIndexing

Index different combinations of the S,P,O Table

P,S,O

O,P,S

O,S,P

S,O,P

17

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 18: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Triple StoresA first Analysis Semantic RetrievalA first Analysis

Normalised view on a graph: one large tableNormalised view on a graph: one large table

Generic and flexible, but

L lf j i t f th i l i RDBMS Large self joints for rather simple queries. RDBMS areusually not optimized for this

Large memory overhead in query processing due toLarge memory overhead in query processing due toself joints

Requires lot of index lookups and/or full table scansRequires lot of index lookups and/or full table scans

Large storage overhead

I l fl ibilit f

18

In general: flexibility vs. performance

How to improve?

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 19: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Triple StoresFurther improvements Semantic RetrievalFurther improvements

Property tables: flattened representation by finding p y p y gsets of properties which are used together

Subject-Property Matrix Materialized Join Views (SPMJVs) from OracleCh E I D S E d G d S i i J 2005 A ffi i t SQL b d Chong, E. I., Das, S., Eadon, G., and Srinivasan, J. 2005. An efficient SQL-based RDF querying scheme. In Proceedings of the 31st international Conference on Very Large Data Bases, ACM

19

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Abadi, D. J., Marcus, A., Madden, S. R., and Hollenbach, K. 2007. Scalable semantic web data management using vertical partitioning. In Proceedings of the 33rd international Conference on Very Large Data Bases (Vienna, Austria, September 23 - 27, 2007). Very Large Data Bases. VLDB Endowment, 411-422.

Page 20: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Triple StoresProperty Tables Semantic RetrievalProperty Tables

++: Faster querying within a property tables due to reducing++: Faster querying within a property tables due to reducingsubject-subject self joins

--: Requires intelligent selection of the properties in the table

More property colums lead to more null values in the tableand therefore to larger space overhead

Lesser property colums lead to more property tablesmore joins over lesser property tables

--: Multi valued properties are hard to manage (e g a book has: Multi valued properties are hard to manage (e.g. a book hasseveral authors)

Subject Title Author Year

20ID1 “Intro to RDF” Granitzer 2006ID1 “Intro to RDF” Tochtermann 2006

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 21: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Triple StoresVertical Partitioning Semantic RetrievalVertical Partitioning

Partition database according to properties – one table per Partition database according to properties one table per propertyAbadi, D. J., Marcus, A., Madden, S. R., and Hollenbach, K. 2007. Scalable semantic web data management using vertical partitioning. In Proceedings of the33rd international Conference on Very Large Data Bases (Vienna, Austria, September 23 - 27, 2007). Very Large Data Bases. VLDB Endowment, 411-422.

Tables are sorted by subject allows fast merge sort joinsTables are sorted by subject allows fast merge sort joins

21

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 22: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Triple StoresVertical Partitioning Semantic RetrievalVertical Partitioning

++: Use of simple fast merge joints++: Use of simple, fast merge joints

++: Multi valued attributes are supported

++: No a-priori clustering decision is necessary++: No a priori clustering decision is necessary

++: Smaller tables. Only those properties accessed have to beread from disk

--: Insert may be slower due to access to multiple tables

--: Queries over multiple properties span over multiple tables

22

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 23: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Triple StorePerformance of Open Source Solutions Semantic RetrievalPerformance of Open Source Solutions

Portwin & Parvatikar (2006) Scaling Jena in a Commercial Environment: Portwin & Parvatikar (2006) Scaling Jena in a Commercial Environment: The Ingenta MetaStore Project

LEGHIGH Dataset with domain universities

~200 million triples, 11 Millionen OWL Statements, 4.3 millionen documents

Kowari: 1 billion triple, load 20k Triple/s for Wikipedia data set

Unoptimized

Simple query take milliseconds

With inference queries take several seconds to minutesdepending on the complexity

23Optimization for Inference: for RDFS entailment is toexpand the graph by making implicit edges explicit

more storage but faster access

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 24: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Triple StorePerformance of Oracle Semantic RetrievalPerformance of Oracle

BioMed literature database (UniProt data set)BioMed literature database (UniProt data set)

80 million triples

5 GB RDF/XML d t ( 2 5 GB T i l 1 7 GB M i ~5 GB RDF/XML data (~2,5 GB Triple; 1,7 GB Mapping; 4,8 GB Indices)

Queries take milliseconds to secondesQueries take milliseconds to secondes

Subject-property matrix materialized views provide optimization potential of roughly ~30%optimization potential of roughly 30%

Chong, E. I., Das, S., Eadon, G., and Srinivasan, J. 2005. An efficient SQL-based RDF querying scheme. In Proceedings of the 31st international Conference on Very Large Data Bases, ACM

24

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 25: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Triple StorePerformance Summary Semantic RetrievalPerformance Summary

http://esw.w3.org/topic/LargeTripleStoreshttp://esw.w3.org/topic/LargeTripleStores

Problem: Comparison among performance numbers available

Trade-off Generic vs Performance Trade off Generic vs. Performance

Optimization potential is available

Currently not as fast as specialised RDBMS but more flexibleCurrently not as fast as specialised RDBMS, but more flexible

25

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 26: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

SPARQLSPARQL Protocol and RDF Query Language Semantic RetrievalSPARQL Protocol and RDF Query Language

Different languages similar to SQL in RDBMSDifferent languages similar to SQL in RDBMS

SerQL, RDF, SPARQL

SPARQL currently proposed recommendation of the W3C

But what does querying a graph mean?

Basically

Specify a sub-graph with variable nodes

Find all patterns in the graph matching the sub-graph

? author Gerd Müller

title

Select ?x, ?y where ?x <author> “Gerd Müller”.

?x <title> ?y

26?

title ?x <title> ?y.

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 27: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

SPARQLExample Semantic RetrievalExample

Daten:

http://example.org/book/book1http://purl.org/dc/elements/1.1/title"SPARQL Tutorial" .

Abfrage:

SELECT ?title WHERE { <http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title> ?title . }

Ergebnis:

title "SPARQL Tutorial"

27

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 28: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

SPARQLExample Semantic RetrievalExample

Data:@prefix foaf: <http://xmlns.com/foaf/0.1/> .

_:a foaf:name "Johnny Lee Outlaw" ._:a foaf:mbox <mailto:[email protected]> ._:b foaf:name "Peter Goodguy" .:b foaf:mbox <mailto:peter@example org>_:b foaf:mbox <mailto:[email protected]> .

Query:

PREFIX foaf: http://xmlns.com/foaf/0.1/SELECT ?name ?mboxWHERE { ?x foaf:name ?name .?x foaf:mbox ?mbox}

Res ltResult:

name mbox"Johnny Lee Outlaw" mailto:[email protected]"Peter Goodguy" <mailto:[email protected]>

28

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 29: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

SPARQLSimple Query Elements Semantic RetrievalSimple Query Elements

Determine the Namespace: PREFIXDetermine the Namespace: PREFIX

Determine the return format

SELECT: Table output format similar to SQL Results

CONSTRUCT: Allows to construct a graph as return value

ASK: Returns only true/false depending of the result exists or not

DESCRIBE: return possible properties/ressources for a particularquery. Used for browsing.

Specify the selection criteria with the WHERE ClauseSpecify the selection criteria with the WHERE Clause

Specify a non-recursive sub-pattern with triples and placeholders (? Or $)

29Perform Grouping and Filter Operations

Modifiers: ORDER BY, LIMIT, OFFSET, DISTINCT

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 30: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

SPARQLBlank Nodes Semantic RetrievalBlank Nodes

ID of Blank Nodes is unique within one query and indicate only theID of Blank Nodes is unique within one query and indicate only theexistence of a blank node not it‘s absolute value

Blank nodes are identified by an automatically generated URI

Consider the results of a query

Subject Value

a “ m”

Subject Value

“ ”Subject Value≡ ≠

Blank nodes may be renamed and are structural elements only

_:a “zum”

_:b “Beispiel”

_:x “zum”

_:y “Beispiel”_:z “zum”

_:z “Beispiel”≡ ≠Blank nodes may be renamed and are structural elements only

30

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 31: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

SPARQLComplex Queries Semantic RetrievalComplex Queries

Combination of groups of simple graph expressions in the WHERE Combination of groups of simple graph expressions in the WHERE clause

OPTIONAL clause: Subgraph pattern may not exist

Example for querying book titles from Springer

if an author exists, it will be listed if not the title is returnedwithout a author

SELECT ?title ?authorWHERE{ ?buch ex:pulishedFrom http://springer.com/Verlag .{ p p // p g / g

? Buch ex:Title ?title .OPTIONAL {?buch ex:Autor ?author }.

}

31

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 32: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

SPARQLComplex Queries Semantic RetrievalComplex Queries

Specifying alternative sub graph patterns: UNIONSpecifying alternative sub graph patterns: UNION

Logical OR or union of two separat queries

SELECT ?title ?authorWHERE{ ?buch ex:pulishedFrom http://springer.com/Verlag .

? Buch ex:Title ?title .{?b h A t ? th } UNION{?buch ex:Autor ?author .} UNION{?buch ex:Creator ?author .}

}

„Select all books with a title published by Springer which have an author or an creator assigned“

Note: ?author in the different groups are independent of each other

32

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 33: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

SPARQLComplex Queries Semantic RetrievalComplex Queries

Considering special datatypes: FILTER and XML DatatypesConsidering special datatypes: FILTER and XML Datatypes

Specify the data type of a literal

SELECT ?title ?authorWHEREWHERE{ ?buch ex:pulishedFrom http://springer.com/Verlag .

? Buch ex:Title ?title .?buch ex:publishedIn „1998“^^xsd:integer

FILTER specifies boolean expressions for filtering results

?buch ex:publishedIn „1998 xsd:integer}

E.g. Specify the data type range using FILTER (see Chapter 7 in Semantic Web Grundlagen)

SELECT ?title ?authorWHERE

33

WHERE{ ?buch ex:pulishedFrom http://springer.com/Verlag .

?buch ex:publishedIn ?year .FILTER(?year >2000)

}

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

}

Page 34: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

SPARQLReal World Examples in DBPedia Semantic RetrievalReal World Examples in DBPedia

PREFIX p: <http://dbpedia.org/property/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX rdf: <http://www.w3.org/1999/02/22 rdf syntax ns#>SELECT * WHERE {

?album p:artist ?band. ?album rdf:type <http://dbpedia.org/class/yago/Album106591815>.OPTIONAL {?album p:cover ?cover}.OPTIONAL {?album p:name ?name}. PREFIX p: <http://dbpedia.org/property/> { p }OPTIONAL {?album p:released ?dateofrelease}.

} ORDER BY DESC(?name) LIMIT 20 OFFSET 19

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>CONSTRUCT {?album p:itIsDone ?dateofrelease .?band p:isBand "true" .

} WHERE { ?album p:artist ?band?album p:artist ?band. ?album rdf:type <http://dbpedia.org/class/yago/Album106591815>.OPTIONAL {?album p:cover ?cover}.OPTIONAL {?album p:name ?name}.OPTIONAL {?album p:released ?dateofrelease}.

} ORDER BY ?name LIMIT 20 OFFSET 19}

PREFIX p: <http://dbpedia.org/property/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>DESCRIBE ?album

34

DESCRIBE ?album WHERE {

?album p:artist <http://dbpedia.org/resource/The_Allman_Brothers_Band>.

?album rdf:type <http://dbpedia.org/class/yago/Album106591815>.}

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

}

Page 35: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

SPARQLSummary Semantic RetrievalSummary

Similar to SQL Similar to SQL

Allows easier expression of joins without knowing the underlying database schemay g

Allows to return not only tables, but also more complexeoutput formats like graphs etc.

Datatypes of a variable not always clear

http://www.w3.org/TR/rdf-sparql-query/

htt //th fi t t/l / / l f

35

http://thefigtrees.net/lee/sw/sparql-faq

Hitzler, Krötsch, Rudolph, Sure, Semantic Web –Grundlagen Chapter 7

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Grundlagen, Chapter 7

Page 36: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Semantic vs. Information RetrievalOverview Semantic RetrievalOverview

Central Question: What is semantic retrieval?Central Question: What is semantic retrieval?

Define information retrieval

Wh i ti i i ?Where is semantic missing?

How can we use Semantic Web technologie to increasesemantic?semantic?

36

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 37: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Semantic vs. Information RetrievalDefinition of IR Semantic RetrievalDefinition of IR

Salton (1968): „Information retrieval is a field concerned with the ( ) „structure, analysis, organization, storage, searching, and retrieval of information.“

“ f i i l ( ) i fi di i l ( ll d ) f“Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers). “ Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval,

Cambridge University Press 2008Cambridge University Press. 2008

Main focus of IR is how to deal with uncertainty and incomplete information

Representation of documents is ambiguous

Query formulation is ambiguous and usually incomplete

37“Unstructured” information

Usually the perfect answer, so far a perfect answer exists, can not be delivered

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 38: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Semantic vs. Information RetrievalIR vs Data Retrieval from Rijsbergen 1979 Semantic RetrievalIR vs. Data Retrieval from Rijsbergen 1979

Data retrieval Information retrieval

Matching Exact match Partial (best) match

Inference Ded ction Ind ctionInference Deduction Induction

Model Deterministic Probabilistic

Classification Monothetic Polythetic

Query language Artificial Natural

Query specification Complete Incomplete

Items wanted Matching Relevant

38

g

Error response Sensitive Insensitive

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 39: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Semantic vs. Information RetrievalIR vs Data Retrieval from Rijsbergen 1979 Semantic RetrievalIR vs. Data Retrieval from Rijsbergen 1979

“What is the Gross domestic product of Austria?”What is the Gross domestic product of Austria?

39

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 40: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Semantic vs. Information RetrievalIR vs Data Retrieval from Rijsbergen 1979 Semantic RetrievalIR vs. Data Retrieval from Rijsbergen 1979

“What is the Gross domestic product of Austria?”What is the Gross domestic product of Austria?

Select GDP from GDP_table where country_name=“Austria”

€ 270.8 bn

However,

not all information is available in databases

Queries are hard to formulate for the average users as well as for non domain experts

Th l th d i d th i f ti d The more complex the domain and the information need, the harder to formulate a correct query

40

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 41: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Semantic vs. Information RetrievalBasic Retrieval Workflow Semantic RetrievalBasic Retrieval Workflow

Retrieval Documents

DDocument Representation

DrModel M

Ranking Function R

Query Q

Information NeedIN

41

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

See also Baeza Yates & Ribeiro Neto, (1999),“Modern Information Retrieval”

Page 42: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Semantic vs. Information RetrievalThe Vector Space Model Semantic RetrievalThe Vector Space Model

Document Representation Dr: Documents are represented asDocument Representation Dr: Documents are represented asbag-of-words (i.e. a set of words)

Query Q: Query is a set of keywords

Retrieval Model M:

Set of words are converted to vectors d and q

Use different heurisitc to calculate the importance of a word

Ranking Function R:

C i Si il it C l l t th l b t d dCosine Similarity: Calculate the angle between d and q

d1:= “Boy plays chess”

42

y p y

d2:= “Boy plays bridge”

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 43: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Semantic vs. Information RetrievalAn analysis of the vector space model Semantic RetrievalAn analysis of the vector space model

Query and documents are represented in terms of their wordsQuery and documents are represented in terms of their words

Importance of words depend on their occurrence

Syntactic matching between documents and queriesy g q

No synonyms are considered (e.g. Money == Cash)

No homonyms are considered (e.g. Apache Web Server)

No mereonyms are considered (e.g. tire is part of a car)

No relationships between terms are considered

43

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 44: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Semantic vs. Information RetrievalSo where can we include semantic? Semantic RetrievalSo where can we include semantic?

Increase the semantic of the document representation Dr and the queryp q yQ

Add metadata (e.g. tags, dublin core etc.)

Use more sophisticated preprocessing (e g language models wordUse more sophisticated preprocessing (e.g. language models, wordsense disambiguation)

Allow users to express information needs in more detail or estimatethe context of a user (e.g. specify metadata, profiling)the context of a user (e.g. specify metadata, profiling)

Formal representation of DR and Q using semantic web languages like OWL see Tran, Bloehdorn, Cimiano, Haase (2007), „Expressive Ressource Description for Ontology-Based Information Retrieval“Information Retrieval

However, if we have a perfect formal representation we still need totransform natural language queries to this model for the average user

Requires a special user interface – not possible for the generic case

44

Requires a special user interface – not possible for the generic case

Natural Language Understanding – currently unsolved

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 45: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Semantic vs. Information RetrievalSo where can we include semantic? Semantic RetrievalSo where can we include semantic?

Document representationp

C:\myDocument.doc

http://en wikipedia org/wiki/ApacheWebServer

Ex:ConceptEx:containsConcept

http://en.wikipedia.org/wiki/ApacheWebServer

ex:term

rdf:type

„ Apache“ „ Apache Server“ „ Apache Web Server“

45QuerySelect * WHERE {?x ex:containshttp://en.wikipedia.org/wiki/ApacheWebServer}

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 46: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Semantic vs. Information RetrievalSo where can we include semantic? Semantic RetrievalSo where can we include semantic?

Iterative refinement of the information need:

Keyword Query: „Apache“

http://en.wikipedia.org/wiki/ApacheWebServerhttp://en.wikipedia.org/wiki/ApacheHelicopterttp //e ped a o g/ / pac e e coptehttp://en.wikipedia.org/wiki/AmericanNatives

Select * WHERE {?x ex:containsConcept http://en.wikipedia.org/wiki/ApacheWebServer}

C:\myDocument.doc C:\myDocument2.doc

http://en.wikipedia.org/wiki/ApacheWebServer

Ex:containsConcept

http://en.wikipedia.org/wiki/ApacheTribes

Ex:containsConcept

46

p p g p

ex:term

http://en.wikipedia.org/wiki/ApacheTribes

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

„ Apache“ „ Apache Server“ „ Apache Web Server“

Page 47: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Semantic vs. Information RetrievalSo where can we include semantic? Semantic RetrievalSo where can we include semantic?

Increase the cabability of the retrieval model M and rankingIncrease the cabability of the retrieval model M and rankingfunction R

Latent Semantic Indexing/Concept IndexingAutomatically determine the concepts contain in a Automatically determine the concepts contain in a document set

Include a-priori knowledge (e.g. Thesaurus, Word Net)

Learn ranking functions based on a users feedback (e.g. via machine learning)

Use formal knowledge in form of ontologies and reasoningUse formal knowledge in form of ontologies and reasoningcapabilities

47

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 48: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Semantic vs. Information RetrievalSo where can we include semantic? Semantic RetrievalSo where can we include semantic?

ExampleExample

Vector Space Model

– D={Apache=0.8, http=0.5, server=0.3}D {Apache 0.8, http 0.5, server 0.3}– Q={Jetty=0.8, java=0.7, web=0.4}– Ranking Value=0.0

Introduce a „better“ retrieval model by using a domainontology:

Ex:WebServer

ex:isA

„Apache“ and „Jetty“ can be related to eachother using the domain

48Ex:ApacheServer

ex:term

Ex:JettyServer

ex:term

ontology

Ranking Value > 0

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

„ Apache“ „ Jetty Server“„ Apache Web Server“ „ Jetty“

Page 49: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Semantic vs. Information RetrievalSo where can we include semantic? Semantic RetrievalSo where can we include semantic?

Improve the presentation of resultsImprove the presentation of results

Clustering of search results

Display different facets of the result setp y

Different representation of results

Display facts instead of documents

Supports refining the user to define their information need

Search as iterative approach

49

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 50: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Semantic vs. Information RetrievalSo where can we include semantic? Semantic RetrievalSo where can we include semantic?

50

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 51: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Semantic vs. Information RetrievalDimensions of Semantic Semantic RetrievalDimensions of Semantic

Semantically structured vs. unstructured document and querySemantically structured vs. unstructured document and queryrepresentation

Semantic expressiveness increases with increased structure

Queries are hard to formalize. Support for the average useris required

Labour intensive creation of the document representationLabour intensive creation of the document representation

Extension of the retrieval model

Runtime complexity of reasoning in case of semanticllyp y g ystructured Dr and Q

Scaleability (also an issue for more complex statisticalmethods)

51

methods)

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 52: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Semantic vs. Information RetrievalDimensions of Semantic Semantic RetrievalDimensions of Semantic

Semantic enhancement of result presentationp

„Low hanging fruit“

Does not require a formalized knowledge base

Formalized knowledge vs. statistical approaches

Bottom Up vs. Top Down

Statistical approaches can provide sophisticated retrieval model Statistical approaches can provide sophisticated retrieval model, which do not require formalized, modelled knowledge

Statistical approaches depend on the fact that all required information is within the data set and can be extracted information is within the data set and can be extracted

Statistical models are usually not shareable between systems

Knowledge base may not model all facts contained in the data set ( W dN t d k thi b t th A h W b S )

52

(e.g. WordNet does know nothing about the Apache Web Server)

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 53: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Semantic vs. Information RetrievalExample Freebase www freebase com Semantic RetrievalExample Freebase, www.freebase.com

Open database of the worlds informationOpen database of the worlds information

Contribution by the community

Linked with other free resources like Wikipediap

Web API

Own Query Language: Metaweb Query Language

Regarding semantic search

Structural document representation

Keyword queries in combination with intelligent interfacesfor infromation need refinement

Fact based representation

53

Fact based representation

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 54: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Semantic vs. Information RetrievalExample Freebase Semantic RetrievalExample Freebase

54

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 55: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Semantic vs. Information RetrievalExample Cuil www cuil com Semantic RetrievalExample Cuil, www.cuil.com

Internet search engine with 120 billion pagesInternet search engine with 120 billion pages

Not based on popularity of sites, just on content and topics ofp p y , j pcontent

Document representation and query is unstructured

E t d d R t i l d l i th b k d b d Extended Retrievalmodel in the background based on categorical knowledge and statistical methods (Clustering)http://www.news.com.au/technology/story/0,25642,24089734-5014239,00.html

55Enhanced interface

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 56: Wissenstechnologie Vi 08 09

Ontology Modelling & SW Frameworks

Triple Stores

SPARQL

Information Retrieval vs. Semantic Retrieval

Semantic vs. Information RetrievalExample Yahoo! Search Monkey Semantic RetrievalExample Yahoo! Search Monkey

Use structured data to improve presentation of search resultsUse structured data to improve presentation of search results

http://developer.yahoo.com/searchmonkey/#

56

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 57: Wissenstechnologie Vi 08 09

Summary

Triple StoreTriple Store

Generic /Flexibility vs. Performance/Space

Usually RDBMS with a large triple table

Alternative: special graph indexing structures

SPARQL

Simple query for RDF by providing a graph pattern

Recommended by W3c

Semantic Search

Document & query representation

Retrieval model & Interfaces

57

Retrieval model & Interfaces

Semantic vs. Statistic

It is the next step, but not such a big one

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

It is the next step, but not such a big one

Page 58: Wissenstechnologie Vi 08 09

Next Week

Guest Lectures (Anwesenheitspflicht) from experts Guest Lectures (Anwesenheitspflicht) from experts

Werner Klieber: Semantic Web Services (30‘)

A tif L tif O Li k d D t (30‘)Aatif Latif: Open Linked Data (30‘)

Fleur Jeanquartier: Bringing the Semantic Web closer to the User (30‘)to the User (30 )

58

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 59: Wissenstechnologie Vi 08 09

That‘s it for today…

Thanks for your attentionThanks for your attention

Questions/comments?

[email protected]

59

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at

Page 60: Wissenstechnologie Vi 08 09

License

This work is licensed under the Creative Commons This work is licensed under the Creative Commons Attribution 2.0 Austria License. To view a copy of this license, visit http://creativecommons org/licenses/by/2 0/at/http://creativecommons.org/licenses/by/2.0/at/.

Contributors:

Mathias Lux

Peter Scheir

Klaus Tochtermann

60Michael Granitzer

WS 08/09

http://kmi.tugraz.at

Wissenstechnologie @ kmi.tugraz.at