58
Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Exploiting Large-Scale Semantics on the Web

Prof. Enrico MottaDirector, Knowledge Media

InstituteThe Open UniversityMilton Keynes, UK

Page 2: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Structure of the Talk

• Quick Recap: What is the Semantic Web?

• State of the art: 1st Generation SW Applications– Emphasis on ontology-driven data aggregation

– Limited with respect to their ability to exploit large scale, heterogeneous semantic markup

• Exploiting large-scale semantics– A blueprint for the next generation of SW applications

– Key research issues to tackle• Need for new methods suitable for the new scenarios defined by NG-SW applications

– NG-SW approach can also be used 'self-reflectively' to tackle key SW tasks

Page 3: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Quick Recap: What is the Semantic Web?

Page 4: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

The Semantic Web

A large scale, heterogenous collection of formal, machine processable, ontology-based statements (semantic metadata) about web resources and other entities in the world, expressed in a XML-based syntax

Page 5: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Ontology

Metadata

UoD

<RDF triple><RDF triple><RDF triple><RDF triple><RDF triple><RDF triple>

<RDF triple><RDF triple><RDF triple><RDF triple><RDF triple><RDF triple>

<RDF triple><RDF triple><RDF triple><RDF triple><RDF triple><RDF triple>

<RDF triple><RDF triple><RDF triple><RDF triple><RDF triple><RDF triple>

Page 6: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Person Organization

String Organization-Unit

partOf

hasAffiliation

worksInOrgUnithasJobTitle

<akt:Person rdf:about="akt:EnricoMotta"> <rdfs:label>Enrico Motta</rdfs:label> <akt:hasAffiliation rdf:resource="akt:TheOpenUniversity"/> <akt:hasJobTitle>kmi director</akt:hasJobTitle> <akt:worksInOrgUnit rdf:resource="akt:KnowledgeMediaInstitute"/> <akt:hasGivenName>enrico</akt:hasGivenName> <akt:hasFamilyName>motta</akt:hasFamilyName> <akt:worksInProject rdf:resource="akt:Neon"/> <akt:worksInProject rdf:resource="akt:X-Media"/> <akt:hasPrettyName>Enrico Motta</akt:hasPrettyName> <akt:hasPostalAddress rdf:resource="akt:KmiPostalAddress"/> <akt:hasEmailAddress>[email protected]</akt:hasEmailAddress> <akt:hasHomePage rdf:resource="http://kmi.open.ac.uk/people/motta/"/></akt:Person>

Page 7: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

SW = A Conceptual Layer over the web

Page 8: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

SW is Heterogeneous!

Page 9: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Generating semantic markup

<RDF triple><RDF triple><RDF triple><RDF triple><RDF triple><RDF triple>

<RDF triple><RDF triple><RDF triple><RDF triple><RDF triple><RDF triple>

<RDF triple><RDF triple><RDF triple><RDF triple><RDF triple><RDF triple>

<RDF triple><RDF triple><RDF triple><RDF triple><RDF triple><RDF triple>

<RDF triple><RDF triple><RDF triple><RDF triple><RDF triple><RDF triple>

Page 10: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Key aspects of the SW

• Size (= Huge)– Sem. markup (eventually to reach) the same order of

magnitude as the web

• Conceptual Heterogeneity (= Big)– Sem. markup based on many different ontologies

• Rate of change (= Very High)– Data generated all the time from human and artificial

agents…

• Provenance (= Very Heterogeneous)– ….Hence provenance itself is extremely heterogeneous

• Trust (= very variable and subjective)– A side-effect of heterogeneous provenance

• Data Quality (= very variable)– No guarantee of correctness

• Intelligence (= by-product of size and heterogeneity)– Rather than a by-product of sophisticated problem solving

Page 11: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Compare with traditional KBS

• Size (= Small or Medium)– KBS normally small to medium size

• Conceptual Heterogeneity (= Not an issue)– KBS normally based on a single conceptual model

• Rate of change (= Very Low)– Change rate under developers' control (hence, low)

• Provenance (= Not an issue)– KBS are normally created ad hoc for an application by a

centralised team of developers

• Trust (= not a major issue)– Centralisation of devpt. process implies no significant trust

issues

• Data Quality (= not a major issue)– Again, centralisation guarantees data quality across the board

• Intelligence (= by-product of complex, task-centric reasoning)– E.g., sophisticated diagnostic, planning systems…

Page 12: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

The Semantic Web today

1st Generation SW Applications

Page 13: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK
Page 14: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK
Page 15: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

<rdf:Description rdf:about="http:/ /ww w.ecs.soton.ac.uk/info/#person-01269"> <ns0:family-name>Gibbins</ns0:family-name> <ns0:full-name>Nicholas Gibbins</ns0:full-name> <ns0:given-name>Nicholas</ns0:given-name> <ns0:has-email-address>[email protected]</ns0:has-email-address> <ns0:has-affiliation-to-unit rdf:resource="http:// 194.66.183.26/ WEBSITE/GOW/Vie wDepartment.aspx?Department=750"/> </ rdf:Description> </ rdf:RDF>

CS Dept Data

AKT Reference Ontology

RDF Data

Bibliographic Data

Page 16: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK
Page 17: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

• Typically use a single ontology – Usually providing a homogeneous view over heterogeneous data sources.

– Limited use of existing SW data

• Closed to semantic resources• Limited interactivity

– In contrast with typical web 2.0 applications

Features of 1st generation SW Applications

Hence: current SW applications are far more similar to traditional KBS (closed semantic systems) than to 'real' SW applications (open semantic systems)

Page 18: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

1895 2006

It is still early days..

Page 19: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Next Generation SW Applications: Exploiting Large-Scale Semantics

Page 20: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Next generation SW applications

NG SW Application

• Able to exploit the SW at large – Hence: Multi-Ontology– Hence: Open to Semantic Resources– Hence: Open to User Interaction

• Ideally also able to exploit non-SW data– E.g., folksonomies– Hence: embedding powerful information

extraction engines

Page 21: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Two systems we have built

Magpie AquaLog

Page 22: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK
Page 23: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Magpie Components

Enriched Web Page

Semantic Log

(found-item 3275578832 localhost #u"http://localhost/people/motta/" john-domingue john-domingue)(found-item 3275578832 localhost

Jabber Server

Magpie

Hub

Ontology cache (Lexicon)

Problem Domain & Resources

Ontology based Proxy Server

Web Page

Page 24: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK
Page 25: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK
Page 26: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

AquaLog: Ontology-Driven Question Answering

Which is the capital of Spain?

NL SENTENCEINPUT

QUERY

TRIPLES

ANSWER (?, capital, Spain)

Linguistic AnalysisMapping Engine

RESULT

TRIPLES

NL Generation

Madrid

<Spain, has-capital-city, Madrid>

Page 27: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Need for mechanisms for automatically identifying semantic markup relevant to the current page, user, browsing session, etc..

PowerMagpie: Semantic browsing on the 'open' SW

Page 28: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Need for mechanisms for automatically locating ontologies relevant to the current query, map user terminology to ontologies,integrate info from different ontologies, etc..

PowerAqua: QA on the 'open' semantic web

Page 29: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Key Research Tasks for Enabling Next Generation SW Applications

Page 30: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Dynamic Ontology Selection

• Both PowerAqua and PowerMagpie heavily rely on ontology selection to locate possibly relevant knowledge in response to– User queries (PowerAqua)– Accessing web pages (PowerMagpie)

• Hence, ontology selection is a crucial task for both systems

Page 31: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Current support for ontology selection

Page 32: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Current support for ontology selection

However Swoogle onlyprovides limited supportfor NG-SW Applications

Page 33: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Ontology Structuring Relations

extends

inconsistent-with

Page 34: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Ontology Structuring Relations

extends

Inconsistent-with

inconsistent-with

Page 35: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Additional Limitations of Swoogle

• Limited Query/Search mechanisms– Only keyword search, we need more powerful query methods (e.g., ability to pose formal queries)

• Limited range of ontology ranking mechanisms– Swoogle only uses a 'popularity-based' one, we need other methods as well

• No support for fast extraction of ontology modules– Typically during ontology selection we are only interested in the part of the ontology relevant to our current need

Page 36: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Key Tasks

• Ontology Selection– In the context of identifying the right knowledge

• Ontology Mapping– In the context of integrating information coming from different ontologies

– In the context of mapping query/specs to ontologies

• Ontology Modularization– Key for effective use of ontological information in the given scenarios

Page 37: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

New task context

• Key point is that NG-SW applications require solutions in a new dynamic context (run-time rather than design-time)– Example: Ontology Mapping

• Current work focuses on design-time mapping of complete ontologies

– Example: Ontology Selection• Current work focuses on user-mediated ontology selection

– Example: Ontology Modularization• Current work by and large assumes that the user is in the loop

Page 38: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

More info to be found here:

• Ontology Mapping– Lopez, V., Sabou, M., Motta, E. (2006). "Mapping

the real semantic web on the fly". International Semantic Web Conference, Georgia, Atlanta.

• Ontology Selection– Sabou, M., Lopez, V., Motta, E. (2006). "Ontology

Selection for the Real Semantic Web: How to Cover the Queen’s Birthday Dinner?". Proceedings of EKAW 2006, Podebrady, Czech Republic.

• Ontology Modularization– D'Aquin, M., Sabou, M., Motta, E. (2006).

"Modularization: A key for the dynamic selection of relevant knowledge components". ISWC 2006 Workshop on Ontology Modularization

Page 39: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Exploiting the SW itself to tackle its heterogeneity

• Interestingly, a NG-SW-based approach can also be used also to tackle key SW tasks, such as Ontology Mapping– Based on the use of the SW itself as background knowledge

Page 40: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Exploiting Large-Scale Semantics to Tackle Key SW Tasks

Case Study: Ontology Mapping

Page 41: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Ontology Mapping: State of the Art

• State-of-the-art methods rely on a combination of:– Label similarity methods

• e.g., Full_Professor = FullProfessor

– Structure similarity methods• Using taxonomic information or information about domain and range of associated properties

• However, as pointed out by Aleksovski et al (EKAW, 2006):– In many cases there is no sufficient lexical overlap

– In many cases source and target ontology have not sufficient structure to allow effective structure-based mapping

Page 42: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Use of bkg. knowledge for ontology mapping

A B?

Background Knowledge

Page 43: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

External Source = a Reference Ontology

Alekszovski et al. EKAW’06• Map candidate terms into concepts from a richly axiomatized domain ontology (anchors)• Derive a mapping based on the relation of the anchor terms

A B

B’A’

= =

rel

rel

Advantages: • Handles dissimilar ontologies• Returns semantic mappings

Disadvantages: • Assumes that a suitable domain

ontology is available. • Approach only suitable for closed

domains

Page 44: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

External Source = Web

van Hage et al. ISWC’05• rely on Google and an online dictionary in the food domain to extract semantic relations between candidate mappings using IR techniques

A Brel

+ OnlineDictionary

IR Methods

Advantages: • General purpose

Disadvantages: • IR Methods introduce noise

Page 45: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

External Source = WordNet

Lopez et al. ESWC ’05• use wordnet to map queries expressed in the user's

terminology to a domain ontology to support question answering

A Brel

WordNet Advantages: • General purpose

Disadvantages: • Knowledge sparseness• Works best with concepts, not

so useful with relations• WordNet is not an ontology!!!

Page 46: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Knowledge-poor ontology mapping

• Actually isn’t a bit strange that such complex and knowledge-poor methods are devised, when the SW already provides so much background knowledge?….

Page 47: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Proposal: • rely on online ontologies (Semantic Web) to derive mappings• ontologies are dynamically discovered and combined

A Brel

Advantages: • General purpose• Does not introduce noise• Works with any kind of domain

entities (concepts, relations, instances)

Semantic Web

External Source = SW

Page 48: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Strategy 1 - Definition

Find ontologies that contain equivalent classes to A and B and use their relationship in the ontologies to derive the mapping.

A Brel

Sem

anti

c W

eb

A1’B1’

A2’B2’

An’Bn’

O1

O2 On

BABA

BABA

BABA

BABA

⊥⇒⊥⊇=>⊇⊆=>⊆≡⇒≡

''

''

''

''For each ontology use these rules:

These rules can be extended to take into account indirect relations between A’ and B’, e.g., between parents of A’ and B’:

'''' BABCCA ⊥⇒⊥∧⊆

Page 49: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Strategy 1- Variants

A B

Quick variant: Stop as soon as a relation is found

Sem

anti

c W

ebA1’

B1’

O1

Page 50: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Strategy 1- Variants

Precise variant: Derive all possible mappings from all ontologies and combine them into a final mapping.

A B

Sem

anti

c W

eb

A1’B1’

O1

A2’B2’

O2

Dealing with Contradictions:•Return all mappings even if contradictory•Return a mapping only when there is no contradiction •Return the most frequent mapping (i.e., the mapping derived from most ontologies)•Return the mappings with 'higher authority' (based on metrics of ontology evaluation or trust)•Try to combine mappings

BAABBA ≡⇒⊆∧⊆

Page 51: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Strategy 1- Examples

Beef Food

Sem

anti

c W

eb

Beef

RedMeat

Tap

Food

MeatOrPoultry

SR-16 FAO_Agrovoc

ka2.rdf

Researcher AcademicStaff

Sem

anti

c W

eb

Researcher

AcademicStaff

ISWC SWRC

Page 52: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Strategy 2 - Definition

BABCCAr

BABCCAr

BABCCAr

BABCCAr

BABCCAr

⊇⇒≡∧⊇⊇⇒⊇∧⊇⊥⇒⊥∧⊆≡⇒≡∧⊆⊆⇒⊆∧⊆

')5(')4(')3(')2(')1(

Principle: If no ontologies are found that contain the two terms then combine information from multiple ontologies to find a mapping.

A Brel

Sem

anti

c W

eb

A’BC

C’B’rel

rel

Details: (1) Select all ontologies containing A’ equiv. with A (2) For each ontology containing A’:

(a) if find relation between C and B.(b) if find relation between C and B.

CA ⊆'CA ⊇'

Page 53: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Strategy 2 - Examples

PoultryChicken⊆FoodPoultry ⊆

Chicken Vs. Food(midlevel-onto)

(Tap)

Ex1:

FoodChicken⊆

Ham Vs. FoodEx2:

(r1)

MeatHam⊆FoodMeat ⊆

(pizza-to-go)

(SUMO) FoodHam⊆

(Same results for Duck, Goose, Turkey)

(r1)

Ham Vs. SeafoodEx3:

MeatHam⊆SeafoodMeat ⊥

(pizza-to-go)

(wine.owl) SeafoodHam ⊥(r3)

Page 54: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Conclusions

• Using the SW as background knowledge for ontology mapping has several benefits– Suitable for our NG-SW scenario as there is no need for design-time selection of a background ontology

– Even when design-time selection is feasible, it is suitable for those cases where a suitable domain ontology cannot be found

– Reduces noise by exploiting only ontologies – Can be tailored to handle multiple solutions– Can be integrated with other approaches, based on lexical and structural analysis• Indeed it is not designed to be used as standalone, but to enhance existing methods

Page 55: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Reference

• Sabou, M., D'Aquin, M., Motta, E. (2006). "Using the semantic web as background knowledge for ontology mapping". ISWC 2006 Workshop on Ontology Mapping.

Page 56: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

So What?

• Time to go beyond 1st generation applications

• 2nd generation SW applications will exploit much more fully the large scale semantic markup provided by the SW

• Many issues to be addressed:– Better ontology crawling, indexing, retrieving and ranking support

– Mapping, selection, and modularization methods appropriate for NG-SW applications

– Further acceleration needed in the generation of semantic markup

Page 57: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK

Vision Papers

• Motta, E., Sabou, M. (2006). "Next Generation Semantic Web Applications". 1st Asian Semantic Web Conference, Beijing.

• Motta, E., Sabou, M. (2006). "Language Technologies and the Evolution of the Semantic Web". LREC 2006, Genoa, Italy.

• Motta, E. (2006). "Knowledge Publishing and Access on the Semantic Web: A Socio-Technological Analysis". IEEE Intelligent Systems, Vol.21, 3, (88-90).

Page 58: Exploiting Large-Scale Semantics on the Web Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK