Upload
bernhard-haslhofer
View
2.931
Download
0
Embed Size (px)
DESCRIPTION
Lecture slides about the basics of Linked Data
Citation preview
Linked (Open) Data
INFO 4302 - April 18, 2011Bernhard Haslhofer - Cornell University
Who am I?
• Postdoc at Cornell Information Science
• Research areas• linked data
• user-contributed data (annotations)
• (meta-)data interoperability
• Contact:• [email protected]
Today we talk about...
http://www.youtube.com/watch?v=5Cb3ik6zP2I
Today we talk about...
• Movies, actors and other real-world entities
• How to make data about these entities available on the Web (Linked Data)
• Enabling technologies, best-practices and useful tools that help us in doing so
• Other Linked Data projects (BBC, LoC)
Web Architecture Recap
The World Wide Web (WWW)
• Internet != WWW != Google != Facebook
• Fundamental technologies• URI - a simple and generic syntax for identifiers
• HTML - a markup language without formal schema binding
• HTTP - a simple protocol to access and manipulate resources and resource representations in a distributed environment
• W3C Consortium (http://www.w3.org)
URIs
• Identification of resources via Uniform Resource Identifiers (URIs)
• Generic Syntax:
© Prof. Dr. Wolfgang Klas und Dr. Bernhard Haslhofer, WS 2009/10 - Multimediale Systeme 27 Semantic Multimedia (I): RDF 7-11
The generic syntax consists of a hierarchical sequence of components, scheme, authority, path, query, and fragment.
URI = scheme “:” hier-path [ “?” query ] [ “#” fragment ]
Scheme and hier-path are required, though the path may be empty.
Example URIs with components:
foo://example.com:8042/over/there?name=ferret#nose \_/ \________________/\_________/ \_________/ \__/ | | | | | scheme authority path query fragment | ______________________|_ / \ / \ urn:example:animal:ferret:nose
The components are defined in more detail, e.g. authority may contain userinfo, host, and port. The path may be empty, absolute, or rootless.
URI
URL URN
URIs / Resources
• Information Resource• web pages, images, product catalogs, etc
• all their essential characteristics can be conveyed in a message
• e.g., http://www.flickr.com/user2/photos/image.jpg
• Non-Information Resource• other things such as dogs, people, this classroom, concepts
• their essence is not information
• e.g., http://www.example.com/ontology/meter
HTTP
• A stateless request-response protocol in the client-server computing model
• HTTP methods: GET, POST, PUT, DELETE, ...
• Agents may use a URI to access the referenced resource = dereferencing the URI
HTTP Content Negotiation
• A URI is not (necessarily) a filename
• Conneg = making available multiple resource representations via the same URI
URI
Resource
Plain Texttext/plain
HTML (en)text/html
HTML (jp)text/htmlhttp://example.com/The_Shining
(X)HTML(5)
• A resource representation data format...
• ... for presentation markup• rendered by user agents (typically browsers)
• focus on readability
• less formal, user-friendly syntax and semantics
Web Services
• Application-to-application communication based on the Web architecture• simple and open standards (HTTP, XML, JSON, ...)
• send data from Application A to Application B through the Web
• usually define some API
Web
Application A Application B
Linked Data
Why Linked Data?
Why Linked Data?
Why Linked Data?
Why Linked Data?
• There is lots of information on the Web
• ...valuable information that can be (re-)used
• Problem• information is usually expressed in the form of
HTML documents
• the underlying raw data are locked in closed data silos (mostly DBMS)
(c) http://www.flickr.com/photos/docsearls/5500714140
Why Linked Data?
• The Web is successful because it provides• Uniform encoding (HTML)
• Uniform addressing (URI)
• Uniform transportation (HTTP)
for the exchange of documents.
• Why not apply the same mechanism to the underlying data?
What is Linked Data?
• A method to build a Web of Data
• Architectural style, set of standards
Web
What is Linked Data?
• A set of four principles• use URIs as names for things
• use HTTP URIs so that people can look up those names
• when someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
• include links to other URIs, so that they can discover more things
Enabling Technologies
Uniform Resource Identifiers (URI)
• Name and identify things (resources)
• Dereferencable HTTP URIs
http://dbpedia.org/resource/The_Shining_(film)
http://rdf.freebase.com/ns/m/04fjzv
http://data.linkedmdb.org/resource/film/2014
Resource Description Framework (RDF)
• A model for representing data on the Web
• Several statements (triples) form a graph
http://dbpedia.org/resource/The_Shining_(film)
The Shining (film)
rdfs:label
!" (#$)
rdfs:label
http://dbpedia.org/ontology/Film
rdf:type
http://dbpedia.org/resource/Jack_Nicholsondbpprop:starring
http://xmlns.com/foaf/0.1/Person
rdf:type
1937-04-22 Jack Nicholson
dbpedia-owl:birthDatefoaf:name
RDF serialization (RDF/XML, N3, Turtle, etc.)
• Data formats for RDF resource representations
• Used to transfer RDF data between apps
© Prof. Dr. Wolfgang Klas und Dr. Bernhard Haslhofer, WS 2010/11 - Multimediale Systeme 27 Linked (Open) Data 7-15
7.2.2.3 RDF Serialization Formats: RDF/XML, N3, Turtle, N-Triple, etc
Data formats for RDF resource representations
Used to transfer RDF data from application-to-application
N3/Turtle example:
@prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix dbpedia-owl:<http://dbpedia.org/ontology/> .
<http://dbpedia.org/resource/The_Shining_%28film%29>rdf:type dbpedia-owl:Work , dbpedia-owl:Film .
@prefix dbpprop:<http://dbpedia.org/property/> .@prefix ns9:<http://dbpedia.org/datatype/> .
<http://dbpedia.org/resource/The_Shining_%28film%29>dbpprop:runtime"146.0"^^ns9:minute ;
RDF Vocabulary Description Language (RDFS)
• A language for describing the syntax and semantics of vocabularies in a machine-understandable way
http://dbpedia.org/ontology/Film
http://dbpedia.org/ontology/Work
rdfs:subClassOf
OWL - Web Ontology Language
• A more expressive (formal) language for defining the syntax and semantics of vocabularies
• Solves RDFS shortcomings but introduces quite some complexity
http://dbpedia.org/ontology/starring
http://www.w3.org/2002/07/owl#ObjectProperty
http://dbpedia.org/ontology/Person
http://dbpedia.org/ontology/Work
starring
rdf:type
rdfs:range
rdfs:domain
rdfs:label
Simple Knowledge Organization System (SKOS)
• A language for describing controlled vocabularies (taxonomies, thesauri, classification schemes)
http://dbpedia.org/resource/The_Shining_(film)
http://dbpedia.org/resource/Category:1980s_horror_films
http://dbpedia.org/resource/Category:1980s_films
http://www.w3.org/2004/02/skos/core#Concept
skos:subject rdf:type
skos:broader
rdf:type
Links between Resources
• OWL defines properties for linking resources
http://dbpedia.org/resource/The_Shining_(film)
http://rdf.freebase.com/ns/m/04fjzv
http://data.linkedmdb.org/resource/film/2014
owl:sameAs
http://dbpedia.org/resource/Jack_Nicholson
owl:sameAs
dbpprop:starring
http://data.nytimes.com/N5761411277431266513
owl:sameAs
SPARQL
• A query language and protocol for accessing RDF data on the Web
© Prof. Dr. Wolfgang Klas und Dr. Bernhard Haslhofer, WS 2010/11 - Multimediale Systeme 27 Linked (Open) Data 7-19
7.2.2.7 SPARQL - RDF Query Language
A query language and protocol for accessing RDF data on the Web
SELECT DISTINCT ?x
WHERE {?x skos:subject <http:dbpedia.org/resource/Cate-gory:1980s_horror_films>}
LIMIT 10
Vocabulary / Data Publishing Best Practices
Publishing Vocabularies
• Hash-based URIs• e.g., http://example.com/example1#ClassA
• Suited to group the description of a moderate number of related terms into one RDF document
• Agent can retrieve terms with a single request
• Slash-based URIs• e.g., http://example.com/example1/ClassB
• Suited to split terms in large vocabularies into one document per term
• No need to download a massive document
Provide either:
human-readable content from vocabulary URI
or:
machine-readable content from vocabulary URI
... depending on what is requested.
Publishing Data
• Distinguish between non-information and information resource
• Sample non-information resource• http://dbpedia.org/resource/The_Shining_(film)
• Sample information resource• http://dbpedia.org/page/The_Shining_(film) - HTML
• http://dbpedia.org/data/The_Shining_(film) - RDF
Publishing Data
GET http://dbpedia.org/resource/The_Shining_(film)Accept: application/rdf+xml
303 See OtherLocation: http://dbpedia.org/data/The_Shining_(film)
GET http://dbpedia.org/data/The_Shining_(film)Accept: application/rdf+xml
200 OK...<?xml version="1.0" encoding="utf-8"?><rdf:RDF ...
The Linking Open Data Community Project
Linking? Open? Data Project?
• Open Data: a philosophy, practice, or policy that data are freely available to everyone without restrictions from copyright, patents, a.s.o.
• Linked Data: method / best practices for exposing, sharing, and connecting data using URIs and RDF
• Linking Open Data: a W3C community project with the goal to extend the Web with a data commons by publishing various open data sets as RDF on the Web and by setting links between data items from different sources
Useful Tools
RDF APIs
• Java
• Jena Semantic Web Framework (http://openjena.org/)
• Sesame RDF API (http://www.openrdf.org/)
• PHP
• ARC (http://arc.semsol.org/)
• Ruby
• RDF.rb: Linked Data for Ruby (http://rdf.rubyforge.org/)
• Python
• RDFLib (http://www.rdflib.net/)
• C
• Redland RDF Libraries (http://librdf.org/)
RDF Stores
• OpenLink Virtuoso (http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/)
• 4Store (http://4store.org/)
• AllegroGraph (http://www.franz.com/agraph/allegrograph/)
• Oracle 11g (http://www.oracle.com/technetwork/database/options/semantic-tech/ index.html)
• ...and many more: http://www.w3.org/2001/sw/wiki/Tools
RDF / Linked Data Wrappers
• D2RQ - SPARQL / Linked Data for relational databases (http://www4.wiwiss.fu-berlin.de/bizer/d2rq/)
• OAI2LOD Server - expose any OAI-PMH source as Linked Data
• TripFS - filesystem as Linked Data
• TripCel - XLS spreadsheets as Linked Dat
• ...
Linked Data debugging
Startup your console / terminal- native on Linux / Mac OS X- Windows: http://www.cygwin.com/
Dereference resources with cURL (http://curl.haxx.se/)
curl -I -H "Accept: application/rdf+xml" http://dbpedia.org/resource/The_Shining_%28film%29
curl -H "Accept: application/rdf+xml" http://dbpedia.org/data/The_Shining_%28film%29
Linked Data debugging
Install the Raptor RDF Syntax Library (http://librdf.org/raptor/)
- Mac: brew install raptor
Use the rapper utility to dereference URIs
rapper http://dbpedia.org/resource/The_Shining_%28film%29
rapper -o rdfxml http://dbpedia.org/resource/The_Shining_%28film%29
Readings
Required Reading
• T. Heath, C. Bizer. Linked Data: Evolving the Web into a Global Data Space, Chapters 1-5
http://linkeddatabook.com/editions/1.0/
Recommended Readings
• Linked Data Web Site: http://linkeddata.org
• Linked Data / Semantic Web Introduction: http://www.linkeddatatools.com/semantic-web-basics
• Tim Berners-Lee. Linked Data Design Issues: http://www.w3.org/DesignIssues/LinkedData.html
• Best Practice Recipes for Publishing RDF Vocabularies: http://www.w3.org/TR/swbp-vocab-pub/
• How to Publish Linked Data on the Web: http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/