59
Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

  • View
    219

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Towards SemanticWeb engineeringMultichannel publishing 3/12/2009

Olli Alm

Page 2: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Outline

Part 1:Semantic WebOntologyRDF languagesQuerying and reasoning SW data

Part 2:Modelling SW dataSW data processingCase examplesSummary

Page 3: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Outline

Page 4: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Part 1

Page 5: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Outline: part 1

Page 6: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Semantic Web

• The vision: WWW with intelligent machines (Tim Berners-Lee)

• In practice: a set of languages and techniques for knowledge processing, modelling and representation

• W3C activity group: standards, specifications, recommendations, tools (www.w3.org)

Page 7: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Semantic Web

”The Semantic Web is about two things. It is about common formats for integration and combination of data drawn from diverse sources, where on the original Web mainly concentrated on the interchange of documents. It is also about language for recording how the data relates to real world objects. That allows a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing.”(from W3C SW activity statement)

1) common formats for integration of data

2) for recording how the data relates to real world object

Page 8: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Semantic Web

The layer cake of the Semantic Web technologies

Page 9: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Semantic Web

• MVC & XML-movement in the web: separate the data model from it’s representation

• The Semantic Web:“unified” data model for representing (real world) data to be utilizedon any representation

what if we could…1. …represent any kind of (real world) data?2. …represent data in a unified way?3. …just take and reuse open data in our application?4. …integrate data easily from diverse sources?

Page 10: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Semantic Web

The Semantic Web:

• A branch of Artificial Intelligence?

• Symbolic AI: old ideas in a new form?

• Machine intelligence: symbolic representation of the facts

”Symbolic AI (or Classical AI) is the branch of artificial intelligence research that concerns itself with attempting to explicitly represent human knowledge in a declarative form (i.e. facts and rules).”

Page 11: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Semantic Web

The Semantic Web:

• Explicit representation: an ontology

Page 12: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Semantic Web

The Semantic Web:

• Explicit representation: an ontology

• Not just explicit representation, in addition: shared

Page 13: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Semantic Web

The Semantic Web: shared conceptualization?

Page 14: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Semantic Web

The Semantic Web: shared conceptualization? (the linked data project)

Page 15: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Semantic Web

The Semantic Web: shared conceptualization

• everything is connected

• everything is referable (URIs)

• distributed set of statements (ontologies) as a basis of our world model

• ontology language(s): 1. tool for identifying resources2. tool for stating facts about resources (=statements)3. tool for sharing and integrating statements4. tool for reasoning the data

-e.g. acquiring new statements with deductive reasoning

• in SW world, term “ontology languages” refer to RDF-based languages such as RDFS, OWL (and OWL2).

Page 16: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Ontology

”OWL Full can be viewed as an extension of RDF, while OWL Lite and OWL DL can be viewed as extensions of a restricted view of RDF Every OWL (Lite, DL, Full) document is an RDF document, and every RDF document is an OWL Full document, but only some RDF documents will be a legal OWL Lite or OWL DL document”

Page 17: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

RDF

An example of RDF-data (in XML serialization) -person info

<foaf:Person rdf:about="#me" xmlns:foaf="http://xmlns.com/foaf/0.1/">

<foaf:name>Dan Brickley</foaf:name> <foaf:homepage rdf:resource="http://danbri.org/" /> <foaf:img rdf:resource="/images/me.jpg" /> </foaf:Person>

Page 18: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

RDF

An example of RDF-data (in TURTLE / TTL serialization) -person info

<http://mynamespace.fi#me> rdf:type foaf:Person ; foaf:name ”Dan Brickley” ; foaf:homepage http://danbri.org ; foaf:img http//danbri.org/images/me.jpg .

Page 19: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

RDF

An example of RDF-data (in TURTLE / TTL serialization) -web page info

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>. @prefix dc: <http://purl.org/dc/elements/1.1/#>. @prefix exterms: <hhttp://www.example.org/terms/>.

<http://www.example.org/index.html> exterms:creation-date "August 16, 1999"; dc:language "en"; dc:creator

<http://www.example.org/staffid/85740>.

Page 20: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

RDF

An example of RDF-data (graph representation) -web page info

Page 21: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

RDF

An example of RDF-data (graph representation) -web page info

The graph-like nature of the RDF-resources / objects are nodes-properties / attributes are edges**properties are also resources (in the metalevel) and can be represented as a nodes in the graph (why is that?)

Page 22: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

RDF

RDF (Resource Description Framework) is…-a statement language (logics)

-a statement = triple

A triple has three parts: 1) subject, 2) predicate and 3) object

Example from Friend-Of-A-Friend schema (FOAF)

<http://mynamespace.fi#me> rdf:type foaf:Person ; foaf:name ”Dan Brickley” ; foaf:homepage http://danbri.org ; foaf:img http//danbri.org/images/me.jpg .

subject

predicate

object

Triple says: ”me is a (type of) person”Triple says: ”me is called ”Dan Brickley”

Triple says: ”me has a homepage danbri.org”

The sets of triples forms a graph that interlinks resources with each other! (here: 4 triples, with subject #me)

Page 23: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

The sets of triples forms a graph that interlinks resources with each other!

Page 24: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

RDF

URI• in RDF, everything has a unique identifier, URI• Uniform Resource Identifier• URI is an URL without link: not always clickable • in SW, URLs can be and are utilized as a URIs

(don’t mix with URNs, IRIs or PURLs)

<http://mynamespace.fi#me> rdf:type foaf:Person ; foaf:name ”Dan Brickley” ; foaf:homepage http://danbri.org ; foaf:img http//danbri.org/images/me.jpg .

Dan Brickley is identified by http://mynamespace.fi#me

foaf:name is an abbreviation for URI http://xmlns.com/foaf/0.1/(a property defined in foaf-namespace)

Page 25: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

RDF

URI• For consistency, URIs should not change often (or at all)• (should the URI change if the “identity” or “essence” of the resource

changes?)• URI identifies an object, but that doesn’t mean that different URIs refer to different resources:

in Web Ontology Language (OWL), we can state that two different URIs refer to the same object:

<rdf:Description rdf:about="#William_Jefferson_Clinton"><owl:sameAs rdf:resource="#BillClinton"/>

</rdf:Description>

(also the opposite is possible: we can state that two resources are distinct from each other)

Page 26: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

RDFS

RDFS (Resource Description Framework Schema)• Divides the world into universals (classes) and particulars

(individuals / instances) TYPING

E.g. “Lassie is a dog” = @prefix sws: <http://www.metropolia.fi/~ollial/2009/11#>. <sws:lassie> rdf:type sws:dog ; foaf:name ”Lassie” .

Classes have subclasses: <sws:dog> rdf:type rdfs:Class ; rdfs:subClassOf sws:animal;

(Transitive) reasoning in RDFS:1) Lassie is a dog2) Dog is a kind of animal _ Lassie is a kind of animal

Page 27: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

OWL

OWL (Ontology Web Language)

Extends RDFS to express• relations between classes, between instances

• property types: literal vs. objects literal property: foaf:name = “Olli” object property: foaf:knows http://someone/somewhere

• Subtyping of properties reasoning (e.g. functional, transitive)• Computability / complexity levels for the model• Three sublanguages OWL-FULL, OWL-DL, OWL-LITE

Page 28: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

OWL

OWL (Ontology Web Language)

Extends RDFS to express• relations between classes, between instances

• property types: literal vs. objects literal property: foaf:name = “Olli” object property: foaf:knows http://someone/somewhere

• Subtyping of properties reasoning (e.g. functional, transitive)• Computability / complexity levels for the model• Three sublanguages OWL-FULL, OWL-DL, OWL-LITE

Page 29: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Reasoning in Symbolic AI

(Theory behind) ontology languages are (more or less) based on the assumptions that:

1) Logic is expressive (as a natural language): We can model our domain / world by defining a set of statements that holds (in our world). (state of affairs is the main concern, objects are secondary)

2) Language corresponds the world: If we are using strong and expressive language, we can model in a deep way real world phenomena in a consistent way and assume that our model corresponds the world.

3) Reason out the information: We can now deduce new (world) information (in the form of statements) by inferencing the set of statements.

Page 30: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Reasoning in Ontologies / open world

In addition to logic-as-a-language-correspondence-theories, the logicbehind ontologies follows the open-world semantics:• Our model may not contain all the relevant information• If something is stated, it is true, BUT• If something is not described, the machine don’t know the answer!

An example:

The statement in ontology: “Lassie is a dog”

A) The question: “Is Lassie a dog?”Closed world semantics: TRUEOpen world semantics: TRUE

B) The question “Is Lassie a cat?”Closed world semantics: FALSE Open world semantics: Don’t know

Page 31: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Practical reasoning in Ontologies

1) We load our data (e.g. the XML file) to the reasoning machine (e.g. Jena).

2) We set the inference engine on, and also define it’s level (e.g. reason out the transitive closures).

3) Now, we can ask statements from the model and get also the statements generated by the reasoner.

The data (1): “Lassie is a dog”, “Dog is a mammal”, “Mammal is an animal”Transitive closure inference (2):-reason out the is-a –relations, if there are related instances, add the new

facts for those instances.The deduced data (3):“Lassie is a dog”, “Dog is a mammal”, Mammal is an animal”, “Lassie is a

mammal”, “Lassie is an animal”

Page 32: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Practical reasoning in Ontologies

OWL: reasoning with properties

Transitive properties: P(x,y) AND P(y,z) P(x,z)An example:locatedIn(Punavuori,Helsinki) AND locatedIn(Helsinki, Uusimaa) locatedIn(Punavuori, Uusimaa)

Symmetric properties:P(x,y) P(y,x)An example:isFriendOf(Olli, Matti) isFriendOf(Matti, Olli)

Functional properties:P(x,y) AND P(x,z) y = z (~every object has it’s own unique value for P)An example:hasFather(Olli, Frank) AND hasFather(Olli, Paul) Frank = Paul

Page 33: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Practical reasoning in Ontologies

OWL: reasoning with properties

Transitive properties: P(x,y) AND P(y,z) P(x,z)An example:locatedIn(Punavuori,Helsinki) AND locatedIn(Helsinki, Uusimaa) locatedIn(Punavuori, Uusimaa)

Symmetric properties:P(x,y) P(y,x)An example:isFriendOf(Olli, Matti) isFriendOf(Matti, Olli)

Functional properties:P(x,y) AND P(x,z) y = z (~every object has it’s own unique value for P)An example:hasFather(Olli, Frank) AND hasFather(Olli, Paul) Frank = Paul

This means:We can define certain ”implication patterns” in ourmodel and utilize them for processing data.Instead of having only the ”static” data, new datais generated based on the ”implications”.

Page 34: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Reasoning and processing data

In addition to the inferencing in the model, we can processthe data in more traditional ways:

• Build a procedural program for processing data

• Use specific rule-language for processing

• Query the data by using specific RDF query language, e.g. SPARQL(RQL, RUL, RDQL, …)

• The best solution depends on the nature of the problem:e.g. the inference engine reasoning is usually expensive / costly

solution (=takes lot of time)

Page 35: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

SparQL query language

SparQL: W3C recommendation

• Current de facto query language for RDF• Quite same as SQL to relational databases:

SELECT, WHERE, ORDER BY (why the FROM is missing?)

PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?mbox WHERE { ?x foaf:name ?name . ?x foaf:mbox ?mbox }

Page 36: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

SparQL query language

SparQL: W3C recommendation

• Current de facto query language for RDF• Quite same as SQL to relational databases:

SELECT, WHERE, ORDER BY (why the FROM is missing?)

PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX ns: <http://example.org/ns#> SELECT ?title ?price WHERE { ?x ns:price ?price . FILTER (?price < 30.5) ?x dc:title ?title . }

Page 37: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

SparQL query language

SparQL: why?

• Clear representation for data queries (instead of coding by hand)• Good query engine implementation fast data retrieval?• Implemented in many development libraries

What you cannot do with SparQL?

• Update data? (extension: SparQL Update)• Do recursive queries:

“get all the superclasses of the dog”

(procedural example)x = dogWhile(x has superclasses) {

add superclass to resultsetx = superclass

}

Page 38: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Part 2

Page 39: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Outline: part 2

Page 40: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Data modeling for the Semantic Web

When modelling things in ontologies, we can use “object-oriented” approach:

• Try to define the domain

• Model objects that exist in the domain and the relations between the objects

In the modelling task, we are defining

• The metadata schema as usual (~database schema / objects of the domain)

• In addition, we should also define the ‘domain ontologies’ or ‘domain vocabularies’ we are using

Page 41: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Data modeling for the Semantic Web

metadata schema

• Defines the primary objects (classes) to model: books, cars, persons, …

• Defines the properties for objects: title, author, edition, no of pages, ISBN, genre, …

• Properties either have literal values or object values• Literal / DatatypeProperty:

name, title, street address, isbn, hasGenre(?)

• Object property: hasFriend, isLocated, hasAuthor, hasGenre(?)

• For “similar” objects, you can use the inheritance (subclassing!)• woman is a person, person is an agent, agent is an entity…

Page 42: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Data modeling for the Semantic Web

metadata schema: defining properties for a class (in RDFS / OWL)

<myNS:book>rdf:type owl:Class .

<myNS:title>rdf:type owl:DatatypeProperty;rdfs:domain myNS:book;rdfs:range xsd:string.

<myNS:isbn> rdf:type owl:DatatypeProperty; rdfs:domain myNS:book; rdfs:range xsd:string .

<myNS:author>rdf:type owl:ObjectProperty;rdfs:domain myNS:book;rdfs:range myNS:author .

class definition

property definitions

Page 43: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Data modeling for the Semantic Web

metadata schema: defining properties for a class (in RDFS / OWL)

rdfs:domain the objects that have this property

rdfs:range the suitable values for the property

• Ontology languages are “schemaless” in the sense that youcan assign any properties for any objects. (open world assumption)

• Reasoning on the rdfs:domain:

<myNS:hasTail>rdf:type owl:ObjectProperty ;domain: myNS:donkey .

<myNS:matti>rdf:type myNS:person ;myNS:hasTail myNS:tail001 .

<myNS:matti> rdf:type myNS:person ; rdf:type myNS:donkey ; myNS:hasTail myNS:tail001 .

Page 44: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Data modeling for the Semantic Web

metadata schema: defining properties for a class (in RDFS / OWL)

rdfs:domain the objects that have this property

rdfs:range the suitable values for the property

• Ontology languages are “schemaless” in the sense that youcan assign any properties for any objects. (open world assumption)

• Reasoning on the rdfs:domain:

<myNS:hasTail>rdf:type owl:ObjectProperty ;domain: myNS:donkey .

<myNS:matti>rdf:type myNS:person ;myNS:hasTail myNS:tail001 .

<myNS:matti> rdf:type myNS:person ; rdf:type myNS:donkey ; myNS:hasTail myNS:tail001 .

”if it has the tail, it is a donkey!”

Page 45: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Data modeling for the Semantic Web

Domain vocabularies: reusing domain knowledge

• In our schema, we can refer to “external” ontologies thatdefine some domain of discourse.

• The idea: • you don’t have to reinvent the wheel• saves time and money• easy data integration (connected data)• (and you can always extend the domain vocabulary)

• In practice: 1) refer / fetch / download the ontology2) assign your schema properties (property range) to the values3) use the domain vocabulary to describe your resorces

Page 46: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Data modeling for the Semantic Web

Domain vocabularies: reusing domain knowledge

• Case study: ONKI ontology service: www.yso.fi

• User interface, web services for utilizing domain vocabularies

Page 47: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Data modeling for the Semantic Web

Page 48: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Data modeling for the Semantic Web

Domain vocabularies: reusing domain knowledge

Example domains:• Classification schemes• Geographical information (place+coordinate+relations)• YSO (General Finnish Upper Ontology – Yleinen Suomalainen

Ontologia)• DB-pedia (information extracted from the Wikipedia)• Author databases (Getty ULAN)

Page 49: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Data modeling for the Semantic Web

Domain vocabularies: reusing domain knowledge

In addition to domain vocabularies, the reusageof schema definitions is also encouraged!

Why? allow data integration based on the properties existing metadata schemas may provide, well-thinked, mature solutions for modelling

Example schemas:• Dublin Core, simple DC• SKOS (for thesauri and concept scheme modelling)• FOAF (Friend-of-a-Friend: social connections)

Page 50: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Processing the Ontology data

• Although the RDF data may be initially distributed, (usually) it has to be stored in one place for reasoning / processing. ontology repositories, usually build on the RDMS.

(triple-stores, few big tables, attributes for subject, predicate and object) repositories are usually quite slow when compared to RDMS (WHY?)

• The RDF data (graph data) is strongly interconnected, the whole model has to be in memory or in DB for processing. e.g. usually streaming / SAX-like processing is not possible

• Many Semantic Web applications are concerned on processing or analyzing 1) subsumption hierarchies OR 2) connections between the resources

Page 51: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

SW application domains

• Data integration based on the ontologies (e.g. Linked Data)

• Multifaceted, hierarchical search (e.g. Museum Finland / Museosuomi)

• Modelling and analyzing networked data (e.g. FOAF, Linked Data)

• Trust issues: who stated what? Who agreed? In which namespace? Based on what?

• Anything starting with ‘Semantic’: Semantic search, Semantic wiki, Semantic annotation, Semantic desktop, Semantic portal, Semantic repository, …

Page 52: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Ontology building

Protégé ontology editor

Page 53: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Linked data project

Page 54: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Museosuomi / Museum Finland

Page 55: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Kulttuurisampo / FoaF

http://www.kulttuurisampo.fi/ff.shtml

Page 56: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Semantic Web engineering

• Tools, models and languages for managing and processing distributed data

• Not just data: emphasis for modelling “real world knowledge”

• Reuse schemas, content and domain vocabularies

• Identify everything (URIs), make resources referable

• Networked, hierarchical, interlinked data

• Data processing with inference, rules, query language or procedural programming

• Open data?

• RDF: good for modelling / designing complex domains

Page 57: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Semantic Web engineering (problems)

Ontologies may complicate things• Versioning, modification of domain ontology, how the data utilizing the

ontology should react? • Who is responsible for maintaining the ontology (=expensive)? • Complex data model scrappy, low quality data• You can model the same things in simpler models, e.g. in SQL• Who needs URIs anyway?• Triplestores are usually slow, the level of abstraction in data index is

low (=triple)

-----------------------------------------------------------------------------There isn’t (really) such thing as

Semantic Web application development framework------------------------------------------------------------------------------

Page 58: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Further material

W3C Semantic Activity: http://www.w3.org/2001/sw/RDFS spec: http://www.w3.org/TR/rdf-schema/OWL spec: http://www.w3.org/TR/owl-ref/OWL2: http://www.w3.org/TR/2009/REC-owl2-overview-20091027/Wikipedia: semantic webLinked data: http://linkeddata.org/SKOS: http://www.w3.org/2004/02/skos/

Jena http://jena.sourceforge.net/ (for Java)RDFLib http://rdflib.net/ (for Python)

Reasoning:http://owl.man.ac.uk/2003/why/latest/#2http://www.w3.org/TR/2009/REC-owl2-primer-20091027/#Modeling_Knowledge:_Basic_Notions (in OWL2)

Page 59: Towards Semantic Web engineering Multichannel publishing 3/12/2009 Olli Alm

Thank you.Questions?