40
Creating and Creating and Exploiting a Exploiting a Web of Web of Semantic Data Semantic Data Tim Finin, UMBC Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009 05 August 2009 http://ebiquity.umbc.edu/resource/html/ http://ebiquity.umbc.edu/resource/html/ id/272/ id/272/

Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

Embed Size (px)

Citation preview

Page 1: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

Creating and Creating and Exploiting a Web Exploiting a Web of Semantic Dataof Semantic Data

Tim Finin, UMBCTim Finin, UMBCEarth and Space Science

Informatics Workshop05 August 200905 August 2009

http://ebiquity.umbc.edu/resource/html/id/272/http://ebiquity.umbc.edu/resource/html/id/272/

Page 2: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

Overview

•Introduction•Semantic Web 101•Recent Semantic Web trends•Examples: DBpedia, Wikitology•Conclusion

Page 3: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

The Age of Big Data

•Massive amounts of data is available today•Advances in many fields driven by availability of unstructured data, e.g., text, audio, images

• Increasingly, large amounts of structured and semi-structured data is also online

•Much of this available in the Semantic Web language RDF, fostering integration and interoperability

•Such structured data is especially important for the sciences

Page 4: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

Twenty years ago…Tim Berners-Lee’s 1989 WWW proposal described a web of rela- tionships among named objects unifying many information management tasksCapsule history• Guha’s MCF (~94) • XML+MCF=>RDF (~96)• RDF+OO=>RDFS (~99)• RDFS+KR=>DAML+OIL (00)• W3C’s SW activity (01)• W3C’s OWL (03)• SPARQL, RDFa (08)• Rules (09)

http://www.w3.org/History/1989/proposal.html

Page 5: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

Ten years ago ….

•The W3C started developing standards for the Semantic Web

•The vision, technology and use cases are still evolving

•Moving from a web of documents to a web of data

Page 6: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

Today

4.5 billion integrated facts 4.5 billion integrated facts published on the Web as published on the Web as RDF Linked Open DataRDF Linked Open Data

Page 7: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

Tomorrow

Large collections of Large collections of integrated facts published integrated facts published

on the Web for many on the Web for many disciplines and domainsdisciplines and domains

Page 8: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

W3C’s Semantic Web Goal

“The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.”-- Berners-Lee, Hendler and Lassila, The Semantic Web, Scientific American, 2001

Page 9: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

Contrast with a non-Web approach

The W3C Semantic Web approach is•Distributed•Open•Non-proprietary•Standards based

Page 10: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

How can we share data on the Web?

•POX, Plain Old XML, is one approach, but it has deficiencies

•The Semantic Web languages RDF and OWL offer a simpler and more abstract data model (a graph) that is better for integration

• Its well defined semantics supports knowledge modeling and inference

•Supported by a stable, funded standards organization, the World Wide Web Consortium

Page 11: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

Simple RDF Example

http://umbc.edu/~finin/talks/idm02/

“Intelligent Information Systemson the Web and in the Aether”

http://umbc.edu/

dc:Title

dc:Creator

bib:Aff

“Tim Finin” “[email protected]

bib:namebib:email

Note: “blank node”

Page 12: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

The RDF Data Model•An RDF document is an unordered collection of statements, each with a subject, predicate and object

•Such triples can be thought of as a labelled arc in a graph

•Statements describe properties of resources•A resource is any object that can be referenced or denoted by a URI

•Properties themselves are also resources (URIs)•Dereferencing a URI produces useful additional information, e.g., a definition or additional facts

Page 13: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

RDF is the first SW language

<rdf:RDF ……..> <….> <….></rdf:RDF>

XML EncodingGraph

stmt(docInst, rdf_type, Document)stmt(personInst, rdf_type, Person)stmt(inroomInst, rdf_type, InRoom)stmt(personInst, holding, docInst)stmt(inroomInst, person, personInst)

Triples

RDFData Model

Good for Machineprocessin

g

Good for human viewing

Good for storage and reasoning

RDF is a simple language for graph based representations

Page 14: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

XML encoding for RDF

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:bib="http://daml.umbc.edu/ontologies/bib/"><description about="http://umbc.edu/~finin/talks/idm02/"> <dc:title>Intelligent Information … and in the Aether</dc:Title> <dc:creator> <description> <bib:Name>Tim Finin</bib:Name> <bib:Email>[email protected]</bib:Email> <bib:Aff resource="http://umbc.edu/" /> </description> </dc:Creator></description></rdf:RDF>

http://umbc.edu/~finin/talks/idm02/

“Intelligent Information Systemson the Web and in the Aether”

http://umbc.edu/

dc:Title

dc:Creator

bib:Aff

“Tim Finin” “[email protected]

bib:namebib:email

Page 15: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

N3 is a friendlier encoding

@prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# . @prefix dc: http://purl.org/dc/elements/1.1/ .@prefix bib: http://daml.umbc.edu/ontologies/bib/ .

<http://umbc.edu/~finin/talks/idm02/> dc:title "Intelligent ... and in the Aether" ; dc:creator [ bib:Name "Tim Finin"; bib:Email "[email protected]" bib:Aff: "http://umbc.edu/" ] .

http://umbc.edu/~finin/talks/idm02/

“Intelligent Information Systemson the Web and in the Aether”

http://umbc.edu/

dc:Title

dc:Creator

bib:Aff

“Tim Finin” “[email protected]

bib:namebib:email

Page 16: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

RDFS supports simple inferences• RDF Schema adds vocabulary for classes, properties & constraints• An RDF ontology plus some RDF statements may imply additional

RDF statements (not possible in XML)• Note that this is part of the data model and not of the accessing or

processing code.

@prefix rdfs: <http://www.....>.@prefix : <genesis.n3>.parent a rdf: property; rdfs:domain person;

rdfs:range person.mother rdfs:subProperty parent; rdfs:domain woman; rdfs:range person.eve mother cain.

person a class.woman subClass person.mother a property.eve a person; a woman; parent cain.cain a person.

Page 17: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

OWL adds further richness

OWL adds richer representational vocabulary, e.g.– parentOf is the inverse of childOf– Every person has exactly one mother– Every person is a man or a woman but not both– A man is the equivalent of a person with a sex

property with value “male”OWL is based on ‘description logic’ – a logic subset with efficient reasoners that are complete– Good algorithms for reasoning about descriptions

Page 18: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

That was then, this is now

• 1996-2000: focus on RDF and data• 2000-2007: focus on OWL,

developing ontologies, sophisticated reasoning

• 2008-…: Integrating and exploiting large RDF data collections backed by lightweight ontologies

Page 19: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

A Linked Data story

•Wikipedia as a source of knowledge–Wikis are a great ways to collaborate

on building up knowledge resources

•Wikipedia as an ontology–Every Wikipedia page is a concept or object

•Wikipedia as RDF data–Map this ontology into RDF

•DBpedia as the lynchpin for Linked Data–Exploit its breadth of coverage to integrate things

Page 20: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

Populating Freebase KB

Page 21: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

Underlying Powerset’s KB

Page 22: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

Mined by TrueKnowledge

Page 23: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

Wikipedia as an ontology

• Using Wikipedia as an ontology–each article (~3M) is an ontology concept or instance–terms linked via category system (~200k), infobox template

use, inter-article links, infobox links–Article history contains metadata for trust, provenance, etc.

• It’s a consensus ontology with broad coverage• Created and maintained by a diverse community for

free!• Multilingual• Very current• Overall content quality is high

Page 24: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

Wikipedia as an ontology

•Uncategorized and miscategorized articles•Many ‘administrative’ categories: articles needing revision; useless ones: 1949 births

•Multiple infobox templates for the same class

•Multiple infobox attribute names for same property

•No datatypes or domains for infobox attribute values

• etc.

Page 25: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

Dbpedia : Wikipedia in RDF

•A community effort to extractstructured information fromWikipedia and publish as RDFon the Web

•Effort started in 2006 with EU funding•Data and software open sourced•DBpedia doesn’t extract information from Wikipedia’s text, but from the its structured information, e.g., links, categories, infoboxes

Page 26: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

DBpedia: Linked Data lynchpin

Page 27: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

http://lookup.dbpedia.org/

Page 28: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009
Page 29: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009
Page 30: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009
Page 31: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

Dbpedia uses WP structured data

DBpedia extracts structured data from Wikipedia, especially from Infoboxes

Page 32: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

http://dbpedia.org/sparql/

PREFIX dbp: <http://dbpedia.org/resource/>PREFIX dbpo: <http://dbpedia.org/ontology/>SELECT distinct ?Property ?PlaceWHERE {dbp:Barack_Obama ?Property ?Place . ?Place rdf:type dbpo:Place .}

Page 33: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

DBpedia: Linked Data lynchpin

Page 34: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

Consider Baltimore, MD

Page 35: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

Looking at the RDF description

We find assertions equating DBpedia's object for Baltimore with those in other LOD datasets:dbpedia:Baltimore%2C_Maryland

owl:sameAs census:us/md/counties/baltimore/baltimore;

owl:sameAs cyc:concept/Mx4rvVin-5wpEbGdrcN5Y29ycA;

owl:sameAs freebase:guid.9202a8c04000641f800000000004921a;

owl:sameAs geonames:4347778/ .

Since owl:sameAs is defined as an equivalence relation, the mapping works both ways

Page 36: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

Linked Data Cloud, March 2009

Page 37: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

WikitologyWe’ve been exploring a different approach to derive an ontology from Wikipedia through a series of use cases:– Identifying user context in a collaboration system from

documents viewed (2006)– Improve IR accuracy by adding Wikitology tags to

documents (2007)– ACE: cross document co-reference resolution for named

entities in text (2008)– TAC KBP: Knowledge Base population from text (2009)– Improve Web search engine by tagging documents and

queries (2009)

Page 38: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

Wikitology 2.0 (2008)

WordNetYago

Human input & editingDatabases

Freebase KB

RDF RDF

textgraphs

Page 39: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

Conclusion

•The Semantic Web approach is a powerful approach for data interoperability and integration

•The research focus is shifting to a “Web of Data” perspective

•Many research issue remain: uncertainty, provenance, trust, parallel graph algorithms, reasoning over billions of triples, user-friendly tools, etc.

•Just as the Web enhances human intelligence, the Semantic Web will enhance machine intelligence

•The ideas and technology are still evolving

Page 40: Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

http://ebiquity.umbc.edu/http://ebiquity.umbc.edu/