33
Semantic Web research anno 2006: main streams, popular falacies, current status, future challenges Frank van Harmelen Vrije Universiteit Amsterdam

Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

Embed Size (px)

DESCRIPTION

This keynote at the Cooperative Intelligent Agents Workshop was a good opportunity to give my view on the current state of Semantic Web research: what is it about, what is it not about, what has been achieved, what remains to be done. (Includes the now infamous slide "What's it like to be a machine")

Citation preview

Page 1: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

Semantic Web research anno 2006:

main streams, popular falacies, current status, future challenges

Frank van HarmelenVrije Universiteit Amsterdam

Page 2: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

2

This is NOT a Semantic Webevangelization talk

(I assume you are already converted)

Page 3: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

This is a “topical” talk:

Webster: “referring to the topics of the day, of temporary interest”

Page 4: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

Which Semantic Web are we talking about?

Semantic Web research anno 2006:

main streams, popular falacies, current status, future challenges

main streams

Page 5: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

5

General idea of Semantic WebMake current web more machine accessible

(currently all the intelligence is in the user)

Motivating use-cases

Search engines• concepts, not keywords• semantic narrowing/widening of queries

Shopbots• semantic interchange, not screenscraping

E-commerce Negotiation, catalogue mapping, data-integration

Web Services Need semantic characterisations to find them

Navigation• by semantic proximity, not hardwired links

.....

Page 6: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

6

General idea of Semantic Web(2)

Do this by:

1. Making data and meta-dataavailable on the Webin machine-understandable form (formalised)

2. Structure the data and meta-data in ontologies These are non-trivial

design decisions.Alternative would be:

Page 7: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

7

“machine-understandable form” (What it’s like to be a machine)

<name>

<symptoms>

<drug>

<drugadministration>

<disease>

<treatment>

IS-A

alleviatesMETA-DATA

Page 8: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

8

Expressed using the W3C stack

Page 9: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

9

Which Semantic Web?Version 1:

"Semantic Web as Web of Data" (TBL)

recipe:expose databases on the web, use RDF, integrate

meta-data from: expressing DB schema semantics

in machine interpretable waysenable integration and unexpected re-

use

Page 10: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

10

Which Semantic Web?Version 2:

“Enrichment of the current Web”

recipe:Annotate, classify, index

meta-data from: automatically producing markup:

named-entity recognition, concept extraction, tagging, etc.

enable personalisation, search, browse,..

Page 11: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

11

Which Semantic Web?Version 1:

“Semantic Web as Web of Data”

Version 2:“Enrichment of the current Web”

Different use-cases Different techniques Different users

Page 12: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

Four popular falacies about the Semantic Web

Semantic Web research anno 2006:

main streams, popular falacies, current status, future challenges

popular falacies

Page 13: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

13

First: clear up some popular misunderstandingsFalse statement No :

“Semantic Web people try to enforce meaning from the top”

They only “enforce” a language.They don’t enforce what is said in that language

Compare: HTML “enforced” from the top,But content is entirely free.

Page 14: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

14

First: clear up some popular misunderstandingsFalse statement No :

“The Semantic Web people will require everybody to subscribe to a single predefined "meaning" for the terms we use.”

Of course, meaning is fluid, contextual, etc.

Lot’s of work on (semi)-automatically bridging between different vocabularies.

Page 15: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

15

First: clear up some popular misunderstandingsFalse statement No :

“The Semantic Web will require users to understand the complicated details of formalised knowledge representation.”

All of this is “under the hood”.

Page 16: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

16

First: clear up some popular misunderstandingsFalse statement No :

“The Semantic Web people will require us to manually markup all the existing web-pages.”

Lots of work on automatically producing semantic markup:

named-entity recognition, concept extraction, etc.

Page 17: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

The current state of Semantic Web

Semantic Web research anno 2006:

main streams, popular falacies, current status, future challengescurrent status

Page 18: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

18

4 hard questions on the Semantic Web:

Q1: "where does the meta-data come from?” NL technology is delivering on concept-extraction Socially emerging (learning from tagging).Q2: “where do the meta-data-schema

come from?” many handcrafted schema hierarchy learning remains hard relation extraction remains hard.Q3: “what to do with many meta-data schema?” ontology mapping/aligning remains VERY hard.Q4: “where’s the ‘Web’ in the Semantic Web?” more attention to social aspects (P2P, FOAF) non-textual media remains hard deal with typical Web requirements.

Page 19: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

19

Q1: Where do the ontologies come from?Professional bodies, scientific communities,

companies, publishers, ….

Good old fashioned Knowledge Engineering

Convert from DB-schema, UML, etc.

Learning remains very hard…

Page 20: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

20

Q1: Where do the ontologies come from? handcrafted

music: CDnow (2410/5), MusicMoz (1073/7) community efforts

biomedical: SNOMED (200k), GO (15k), commercial: Emtree(45k+190k) ranging from lightweight (Yahoo)

to heavyweight (Cyc) ranging from small (METAR)

to large (UNSPC)

Page 21: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

21

Q2: Where do the annotations come from?

- Automated learning- shallow natural language analysis- Concept extraction

amsterdam

trade

antwerp europe

amsterdam

merchant

city town

center

netherlandsmerchant

city town

Example: Encyclopedia Britannica on “Amsterdam”

Page 22: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

22

lightweight NLP Dutch language semantic search engine

exploit existing legacy-data Amazon Lab equipment

side-effect from user interaction MIT Lab photo-annotator

NOT from manual effort

Q2: Where do the annotations come from?

Page 23: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

23

Q3: What to do with many ontologies? Mesh

Medical Subject Headings, National Library of Medicine 22.000 descriptions

EMTREE Commercial Elsevier, Drugs and diseases 45.000 terms, 190.000 synonyms

UMLS Integrates 100 different vocabularies

SNOMED 200.000 concepts, College of American Pathologists

Gene Ontology 15.000 terms in molecular biology

NCI Cancer Ontology: 17,000 classes (about 1M definitions),

Page 24: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

24

Q3: What to do with many ontologies?Stitching all this together by hand?

Page 25: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

25

Q3: What to do with many ontologies?

Linguistics & structure

Shared vocabulary

Instance-based matching

Shared background knowledge

Page 27: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

27

Where are we now: applications

healthy uptake in some areas: knowledge management / intranets data-integration life-sciences convergence with Semantic Grid cultural heritage

still very few applications in personalisation mobility/context awareness

Most applications for companies, few applications for the public

Page 28: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

Future directions/challenges

Semantic Web research anno 2006:

main streams, popular falacies, current status, future challengesfuture challenges

Page 29: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

29

Semantic Web as an integrator of many different subfields

DatabasesNatural Language ProcessingKnowledge RepresentationMachine LearningInformation RetrievalAgentsHCI….

Page 30: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

30

Provocation…Ontology research is done……

We know how to make, maintain & deploy them

We have tools & methods forediting, storing, inferencing, visualising, etc

… except for two problems: Learning Mapping

Natural lang. technology is also done… at least it’s good enough

Page 31: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

31

Large open questions Ontology learning & mapping emerging semantics (social &

statistical) Semantic Web services

discovery, composition: realistic? non-textual media

the semantic gap: text or social? Deployment:

1. data-integration2. search3. personalisation

Page 32: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

32

Changing focus

centralised,

formalised, complete,

precisedistributed,

heterogeneous,

open, P2P, approximate,

lightweight

Web 3.0 = Web 2.0 + Semantic Web

Page 33: Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges

33Web

Not much

Lots

Semantics

Lots

Not much

Artificial Intelligence

Collective Intelligence

RDFFlexible & extensible Metadataschemas

Semantic Web

Services

Ontology Building

OWL

KnowledgeDiscovery

SWRL

Decision making

FOAF

RSSSocial bookmarking

NLP

Information linking

Slide by Carol Goble

Predicting the future…