@
Semantic Web Technologies:A Tutorial
Li Ding
University of Maryland Baltimore County
Joint work with Deborah McGuinness, Tim Finin and Anupam Joshi
Presented at Kodak Research Laboratories, Rochester, New York 18 July 2006
2
@
del.icio.us
The Web has made people smarter craigslist
3
@
But what about machines?
tell
register
Machines still have a very minimal understanding of text and images.
4
@
Motivation: machine-friendly data Natural Language
XML – represent structures
Semantic Web - represent more semantics represent structures enable common vocabulary associate symbols with logic interpretation for inference
Li Ding is a person
<person>Li Ding</person>
<> </>as seen by a person
as seen by a person as seen by a machine
as seen by a machine
6
@
Semantic Web Layers
WebAspect
SemanticAspect
HTTP
"The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.“ – Berners-Lee, Hendler & Lassila, Scientific American, 2001
Image source: http://en.wikipedia.org/wiki/Image:W3c_semantic_web_stack.jpg
7
@
The Semantic Web is simple Each URI denotes a concept
URIs are connected by triples
Machines read data as directed RDF graph
Don't say "colour" say <http://example.com/2002/std6#col>
Source: Tim Berners-Lee, Putting the Web back into Semantic Web, ISWC2005 Keynote
Relational database RDF (Resource Description Framework)
8
@
<?xml version="1.0" encoding="utf-8"?><rdf:RDF xmlns:foaf=http://xmlns.com/foaf/0.1/ xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#“> <foaf:Person> <foaf:name>Li Ding</foaf:name></foaf:Person></rdf:RDF>
Example: RDF graph and syntax
Li Dinghttp://xmlns.com/foaf/0.1/name
http://xmlns.com/foaf/0.1/Person
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
Data encoded in RDF/XML syntax
XMLunicodeNamespace URI as tag
RDF GraphURI, Literal, BNodeTriple
t1
t2
The entire graph means: there exist a person whose name is “Li Ding”.
Alternative RDF syntax languages: N3(notation 3), N-Triples, Turtle
9
@
Example: Surfing RDF graphs
http://cs.umbc.edu/~dingli1/foaf.rdf#dingli
foaf:knows
foaf:mboxmailto:[email protected]
Tim
Fininfoaf:surname
foaf:mboxmailto:[email protected]
foaf:firstName
rdfs:seeAlso
http://cs.umbc.edu/~finin/foaf.rdf
wordNet:Agent
rdf:typerdfs:Class
rdfs:subClassOf
foaf:Person
foaf:mbox
rdfs:domain
rdf:typerdf:Property
foaf:PersonLi Ding
foaf:namerdf:type
G2: http://cs.umbc.edu/~finin/foaf.rdf
G1: http://cs.umbc.edu/~dingli1/foaf.rdf
G3: http://xmlns.com/foaf/1.0/
Surf to another instance
Surf to definition
rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#rdfs: http://www.w3.org/2000/01/rdf-schema#foaf: http://xmlns.com/foaf/1.0/
rdf:type
10
@
Example: Serving human & machine
The Original RDF/XML for machines
The HTML is generated by applying XSLT on RDF/XML
11
@
Ontology Spectrum
Catalog/ID
GeneralLogical
constraints
Terms/glossary
Thesauri“narrower
term”relation
Formalis-a
Frames(properties)
Informalis-a
Formalinstance
Value Restriction
Disjointness, Inverse,part of…
Source: Originally by Deborah L. McGuinness (KSL, Stanford), modified by Tim Finin
SimpleTaxonomies
ExpressiveOntologies
Wordnet
CYCRDF DAMLOO
DB Schema RDFSIEEE SUOOWL
UMLS
12
@
Ontology Languages: RDFS and OWL RDFS
Set theory – rdfs:Class Relation – rdf:Property, rdfs:domain, rdfs:range Hierarchy – rdfs:subClassOf, rdfs:subPropertyOf Built-in Datatype – xsd:string, xsd:dataTime
OWL Description Logic
Class, Thing, Nothing DatatypeProperty, ObjectProperty, AnnotationProperty,…
Class axioms oneOf, disjointWith, unionOf, complementOf, intersectionOf … Restriction, onProperty, cardinality, hasValue…
Property axioms inverseOf , TransitiveProperty , SymmetricProperty FunctionalProperty, InverseFunctionalProperty
Equality– equivalentClass , sameAs , differentFrom… Ontology annotation – Ontology, imports, versionInfo
13
@
Example: Inference using ontologies
Source: Semantic Web tutorial (AAAI 2005) by Deborah L. McGuinness
#Deborah#Louise#Joe hasBrother
hasParenthasSibling
hasChild
hasUncle
hasbrother rdfs:subPropertyOf hasSibling hasChild owl:inverseOf hasParent
SWRL: (x hasParent y) (y hasBrother z) => (x hasUncle z)
Ontology Languages (RDFS, OWL) has formal foundations that allow us to infer additional (implicit) statements RDFS provides basic ones, e.g. sub-class, sub-property, domain OWL adds many more axioms, e.g. inverse-property, equality,
SWRL (Semantic Web Rule Language) enables a general purposed solution Supports rule representation But also requires inference support beyond RDFS and OWL
14
@
More languages and more ontologies Languages (require special inference engine) [Trust/Uncertainty] BayesOWL
[Proof] PML (Proof Markup Language)
[Query/Data Access] SPARQL Query Language for RDF [Rule] SWRL( Semantic Web Rule Language) [Policy] REI: A Policy Specification Language
[Service] OWL-S by DAML (1.2 preview available) [Service] SAWSDL (Semantic Annotations for WSDL) [Thesauri] SKOS (Simple Knowledge Organization System)
Ontologies (only need RDFS and/or OWL inference) Upper ontologies - OpenCyc, WordNet, OntoSem, SUO Specialized common ontologies - FOAF, Dublin Core, RSS Domain ontologies – bibtex, biology, and many…
Li Ding, Pranam Kolari, Zhongli Ding, and Sasikanth Avancha, “Using Ontologies in the Semantic Web: A Survey”, in Ontologies in the Context of Information Systems (book chapter), 2005. http://ebiquity.umbc.edu/paper/html/id/257/
15
@
Semantic Web Tools
create
Managing Ontologies
extend
publish
Reasoner
Online Registry
Mapping Tools
Triple store
Editor
integrate
instance
inference
update
DAML Ontology Library Schema Web
Protégé Swoop Jena (SPARQL)
KAONKowari SeasamOWLIM3storeInstance storeRedlandTapRDF storeYarsIBM IODTRDFLibRDF gatewayallegroOracle 10
Pellet (DL)Racer (DL)FACT++ (DL)JenaJTPF-OWLEulerCWM
Search EngineSwoogleSemantic Web Search
ONION PROMPTOntoMapper GlueOntoMerge Ontomorph
source1: http://ebiquity.umbc.edu/paper/html/id/257/Using-Ontologies-in-the-Semantic-Web-A-Surveysource2: http://www.wiwiss.fu-berlin.de/suhl/bizer/toolkits/
BrowserTabulator IsaVizPiggybankAragoHorusMspaceMagpie
browse
17
@
Semantic Web data sources Text editor: I write RDF/XML manually. Semantic Web Editors: Protégé, Swoop Information Extraction (consumer side)
NLP (hard), e.g. SemNews heuristic scrapping (regular expr.), e.g. Semagix Freedom
Wrapped database content (publisher side) blog, social network websites, e.g. livejournal.com academic interests: http://www.mindswap.org/,
http://ebiquity.umbc.edu Generated by software
creative commons license embedded in HTML embedded metadata JPEG, PDF (XMP) agent communication message
…
18
@
Year Terms(million
)
Documents(million)
Individuals(million)
Triples(million)
Bytes(billion)
2004 0.15 0.33 7.3 48 4.32006 1.9 1.6 16 276 472008 10 100 1000 20,000 3000
The Scale of the Semantic Web Statistics based Semantic Web data indexed by Swoogle
Estimated number of documents based on Google queryDocs Corresponding Google query
Optimistic 109 rdf OR inurl:rss OR inurl:foaf -filetype:html
Conservative 105 rdf filetype:rdf
19
@
Where the data from “com” has contributed the largest portion of websites (71%) and pure
SWDs (39%) because industry has adopted virtual hosting technology as well as ontologies such as RSS and FOAF
most SWOs are from “org” (46%, e.g. www.w3.org) and “edu” (14%, e.g., spire.umbc.edu) because of the deep interests in developing ontologies from academia and non-profit organizations.
note: Statistics of top level domain is also used in characterizing the Web (Henziger and Lawrence 2004)SWDs: Semantic Web documents; SWOs: semantic web ontologies; pure SWD: not embeded
20
@
Source websites of SWD
3, 52002, 17474
1, 125911
100517, 180401, 2
y = 6236.7x-0.6629
R2 = 0.9622
1
10
100
1000
10000
100000
1000000
1 10 100 1000 10000 100000 1000000
m: # of SWDs
y: #
of w
ebsi
tes
host
ing
>= m
SW
Ds
y = 6598.8x-0.7305
R2 = 0.9649
1
10
100
1000
10000
100000
1 10 100 1000 10000 100000 1000000
m: # of SWDs y: #
of w
ebsi
tes
host
ing
>= m
SW
Ds
Invariant found! The number of websites hosting more than m SWDs follows
power law distribution Similar to the Web
Head: virtual hosting Tail: crawling strategy
Jan 2005- Mar 2006 Jan 2005- Aug 2005
21
@
Size of SWD Embedded SWDs are small
69% have 3 triples 96% have <10 triples;
Pure SWDs 60% have 5 to 1000 triples. Special size of RSS 130
17 triples for channel 7 triples for each of the 15
items
SWOs Biased by PML, Small ones from RDF test Largest is 1M
# of triples
Num
ber o
f S
WD
sN
umbe
r of
SW
Os
22
@
Age of SWD Measured by the last-modified time of SWD
PSWD: Exponential distribution SWO: flat tail -- ontology development interests decrease?
y = 2E-48e0.0032x
1
10
100
1000
10000
100000
1000000
7/20/1995 4/15/1998 1/9/2001 10/6/2003 7/2/2006
pswd swo (pml filtered) Expon. (pswd)
23
@
How Semantic Web Terms are used? All usage distributions follow Power distribution Few SWTs been well populated
371 has >100 class-instance 1208 has>100 property-instances
24
@
http://www.w3.org/2000/01/rdf-schema http://www.w3.org/1999/02/22-rdf-syntax-ns
http://xmlns.com/foaf/0.1/index.rdf
http://purl.org/dc/elements/1.1
http://purl.org/rss/1.0 http://www.w3.org/2002/07/owl
http://purl.org/dc/terms
http://web.resource.org/cc
http://www.w3.org/2001/vcard-rdf/3.0
http://www.hackcraft.net/bookrdf/vocab/0_1/
1
2
6
4
710
9
8
5
3
0.51
0.11
0.25 0.35
0.29
0.20
0.12
0.08
0.12 0.11
0.10
0.100.30
0.11
0.43
0.07
0.03
0.160.18
0.16
0.070.06
0.17
0.20
0.17
0.12
0.10
0.270.27
0.21
0.230.25
0.10
0.05
0.03
indegree=512,790,mean(inflow)=0.217
indegree=1,077,768,mean(inflow)=0.100
indegree=432,984,mean(inflow)=0.039
indegree=861,416,mean(inflow)=0.096
indegree=270,178,mean(inflow)=0.168 indegree=86,959,mean(inflow)=0.069
indegree=54,909,mean(inflow)=0.042
indegree=57,066,mean(inflow)=0.195
indegree=155,949,mean(inflow)=0.036
indegree=16,380,mean(inflow)=0.167
Swoogle Rank (citation based)
Computed using Swoogle metadata by May 2006
26
@
TechnologiesFIPA (JADE, April Agent Platform)Semantic Web (RDF, OWL)Web (SOAP,WSDL,DAML-S)Internet (Java Web Start )
FeaturesOpen Market FrameworkAuction ServicesOWL message contentOWL OntologiesGlobal Agent Community
MotivationMarket dynamicsAuction theory (TAC)Semantic webAgent collaboration (FIPA & Agentcities)
Travel Agents
Auction Service Agent
Customer Agent
Bulletin BoardAgent
Market Oversight Agent
Request
Direct Buy
Report Direct Buy Transactions
BidBid
CFP
Report Auction Transactions
Report Travel Package
Report Contract
ProposalWeb Service
Agents
Ontologieshttp://taga.umbc.edu/ontologies/ travel.owl – travel concepts fipaowl.owl – FIPA content lang. auction.owl – auction services tagaql.owl – query language
FIPA platform infrastructure services, including directory facilitators enhanced to use OWL-S for service discovery
Owl for representation and reasoning
Owl for service
descriptions
Owl as a content languag
e
Owl for protocol
description
http://taga.umbc.edu (offline now)
TAGA: Travel Agent Game in Agentcities
27
@
Semantic Content Publishing data stored in database PHP generates both HTML
and OWL HTML pages link to
corresponding OWL no more web scraping
Mysql database
PHP
PHP
FOAF
http://ebiquity.umbc.edu/person/foaf/Li/Ding/foaf.rdf
http://ebiquity.umbc.edu/person/html/Li/Ding/
http://ebiquity.umbc.edu/ -- ebiquity group website
28
@
Rei Policy Language Rei is a declarative policy language for describing
policies over actions Reasons over domain dependent information
Currently represented in OWL + logical variables Based on deontic concepts
Permission, Prohibition, Obligation, Dispensation Models speech acts
Delegation, Revocation, Request, Cancel Meta policies
Priority, modality preference Policy engineering tools
Reasoner, IDE for Rei policies in Eclipsehttp://rei.umbc.edu/
29
@
Example: enforcing privacy policyThe speaker doesn’t want others to know the
specific room that he’s in, but is willing for others to know he’s on campus
He defines the following privacy policy Share my location with a granularity >= “State”
The broker isLocated(US) => Yes! isLocated(Maryland) => Yes! isLocated(UMBC) => Uncertain.. isLocated(ITE-RM210) => Uncertain..
30
@
Cobra: Context Broker Architecture Ontology
Agents
Service
Inference
Policy
http://cobra.umbc.edu/
31
@
Web-scale semantic web data access
agent data access service the Web
ask (“person”)Search vocabulary
ask (“?x rdf:type foaf:Person”)
inform (“foaf:Person”)
Fetch docs
Populate RDF database
Query localRDF database
inform (doc URLs)
Search URIrefs in SW vocabulary
Search URLsin SWD index
Compose query
Index RDF data
32
@
Swoogle Semantic Web Search Engine Harvesting Semantic Web
data from the Web Provide search/navigation
services for machines (via REST+ RDF/XML) Digest doc, term, namespace Links
Also serves human users Status
Running since summer 2004 1.6M RDF documents, 300M
RDF triples, 10K ontologies
http://swoogle.umbc.edu/
33
@
Ontology Dictionary
foaf:name rdfs:domain
Onto 1
owl:Classrdf:type
foaf:Agentrdfs:subClassOf
Onto 2
dc:titlerdf:type
Dr.
SWD3
foaf:Person
foaf:Person
foaf:namerdfs:domain
foaf:Person
owl:Classrdf:type
foaf:Agentrdfs:subClassOf
foaf:Person
wob:hasInstanceDomain
dc:title
wob:hasInstanceDomain
Tim Fininfoaf:name
From web of document to web of data Aggregate from multiple sources Inductively learned definition
http://swoogle.umbc.edu/2005/modules.php?name=Ontology_Dictionary
34
@
Semantic Web Challenges - Winners
CONFOTO is a browsing and annotation service for conference photos.
CS AKTive Space (CAS) is an integrated Semantic Web application which provides a way to explore the UK Computer Science Research domain across multiple dimensions for multiple stakeholders, from funding agencies to individual researchers.
Flink itself is also likely to be unique as a crossover between a social experiment and a semantic application.
2003 2004
2005
http://challenge.semanticweb.org/
35
@
Triple Shop: SPARQL dataset finder
1. Compose a SPARQL query without FROM clause
2. Parse SPARQL query, search Swoogle for related URLs,and compose a dataset
http://sparql.cs.umbc.edu/tripleshop2/
Who knows Anupam Joshi?Show me their names, email address and pictures
3. Run SPARQL query on dataset
36
@
A. Joshi
L. Ding
H. ChenP. Kolari
F. Perich
J. Golbeck
J. Hendler
Kagal
sink
source
island
T. Finin A. Joshi
L. Ding
H. Chen
L. Kagal
F. Perich
Golbeck’s Trust Network
DBLP Coauthor Network
FOAF Network Reputation Systems
A. Sheth
M. P. Singh
Y. Peng
6
15
128
T. Finin
sameName
knows
knows
knows
co-author
hub
Google PageRank
Citeseer Rank
Integrating Social Networksdata FOAF
knows RDF RDF/XML
DBLP Coauthor
Database HTML
Trust Reputation Trust network
Computation Entity mapping Tie strength Trust aggregation
37
@
WWW Toolkit
Proof Markup Language (PML)CWM
(TAMI)
JTP(DAML/NIMD)
SPARK(CALO)
UIMA(NIMD/Exp Agg)
IW Explainer/Abstractor
IWBase
IWBrowser
IWSearch
Trust
Justification
Provenance
N3
KIF
SPARK-L
Text Analytics
IWTrust
provenanceregistration
search enginebased publishing
Expert friendlyVisualization
End-user friendly visualization
Trust computationSDS
(DAML/SNRC)OWL-S/BPEL
[Inference Web] Framework for explaining question answering tasks by abstracting, storing, exchanging, combining, annotating, filtering, segmenting, comparing, and rendering proofs and proof fragments provided by question answerers.
Inference Web Infrastructure
38
@
PML: Proof Markup Langauge
Justification Trace
IWBase
NodeSet foo:ns1(hasConclusion …)
Query foo:query1(type TonysSpecialty ?x)
Question foo:question1 (what is Tony’s Specialty)
Mapping
NodeSet foo:ns2(hasConclusion …)
SourceUsage
hasAnswer
hasAntecendent
fromQuery
fromAnswer
…
isQueryFor
InferenceEngine
InferenceRule
hasVariableMapping
hasInferencEngine
hasRuleInferenceStep
Language hasLanguage
InferenceStep
Source
isConsequentOf
hasSourceUsage hasSource isConsequentOf
usageTime …
39
@
IWBrowser – Justification and Provenance
40
@
t3 t4
t1
t2
t3
Web pages containing one or more molecules discovered by Swoogle
The graph’s RDF molecules
Tracking Provenance via RDF Molecule
http://www.cs.umbc.edu/~dingli1
Li Ding
Tim Finin
foaf:knows foaf:name
foaf:name
foaf:mbox
t1t2
t3t4
mailto:[email protected]
An RDF graph G decompose
Match sub-G
raph
Ding, L.; Finin, T.; Peng, Y.; Pinheiro da Silva, P.; McGuinness, D.L. Tracking RDF Graph Provenance using RDF Molecules. Proceedings of the Fourth International Semantic Web Conference (poster), November 2005. 2005 , http://www-ksl.stanford.edu/KSL_Abstracts/KSL-05-06.html
41
@
Conclusion The Semantic Web
simple but powerful Standardized by W3C: RDF, RDFS, OWL Current focuses
Query -- SPARQL Rules – SWRL, RIF Web services – OWL-S, WSDL-S, SAWSDL Best practice and deployment
but cannot do everything Open questions
Business model, Industry adoption? Privacy?
42
@
Recommended Readings Tutorials
Semantic Web Road map, (since 1998), Tim Berners-Lee The Semantic Web, Scientific American, May 2001, Tim Berners-Lee, James
Hendler and Ora Lassila Ontology Development 101: A Guide to Creating Your First Ontology, 2001,
Natalya F. Noy and Deborah L. McGuinness Semantic Web Tutorials, http://www.w3.org/2001/sw/BestPractices/Tutorials
Starting points W3C Semantic Web activity, http://www.w3.org/2001/sw/ W3C Semantic Web Interest Group, http://www.w3.org/2001/sw/interest/ W3C Semantic Web News, http://www.w3.org/2001/sw/news Planet RDF - aggregated blogs, http://planetrdf.com/ Dave Beckett’s Resource Description Framework (RDF) Resource Guide Swoogle Semantic Web Search Engine, http://swoogle.umbc.edu Semantic Web reference card, http://ebiquity.umbc.edu/resource/html/id/94/
Conferences and Journals International Semantic Web Conference (ISWC) European Semantic Web Conference (ESWC) Semantic Technology Conference (SemTech) Journal of Web Semantics
43
@
Ongoing W3C’s Semantic Web Activity RDF Data Access Working Group
RDQL… => SPARQL Rules Interchange Working Group
RuleML => SWRL=> RIF Best Practices Working Group
Vocabulary management, e.g. WordNet Thesauri– SKOS (Simple Knowledge Organization System) Image Annotation DOAP (Description of a Project) Many tutorials and demos
Semantic Annotations for Web Services Description Language Working Group OWL-S and WSDL-S WSDL 2.0