90
Open Data - Principles and Techniques - VU Web Engineering / TU Wien May 15 th 2014 - Bernhard Haslhofer -

Open Data - Principles and Techniques

Embed Size (px)

Citation preview

Page 1: Open Data - Principles and Techniques

Open Data- Principles and Techniques -

VU Web Engineering / TU Wien May 15th 2014

!- Bernhard Haslhofer -

Page 2: Open Data - Principles and Techniques

About me

• Data Scientist @ AIT - Austrian Institute of Technology

• Previously – Lecturer & Researcher @ Cornell University, NY,

USA – Univ. Ass @ University of Vienna – …

2

Page 3: Open Data - Principles and Techniques

About me

• Research Interests

–Web-based information systems • Structured Web Data • Knowledge Graphs • Data quality issues • …

– Large-scale data analytics • Machine learning • Network analysis • Information retrieval

3

Page 4: Open Data - Principles and Techniques

My plan for today…

• Open Data – Principles and Examples !

• Technique #1: Linked (Open) Data !

• Technique #2: Microdata !

• Open Data Activities in Austria !

• Questions / Discussion

4

Page 5: Open Data - Principles and Techniques

Open Data – Principles

!

“Open data is data that can be freely used, reused and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike.” !Open Data Handbook, 2012, Open Knowledge Foundation http://opendatahandbook.org/

5

Page 6: Open Data - Principles and Techniques

P#1: Availability and Access

Data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet

!

Data must also be available in a convenient and modifiable form

6http://opendefinition.org/

Page 7: Open Data - Principles and Techniques

P#2: Reuse and Redistribution

Data must be provided under terms that permit reuse and redistribution including the intermixing with other datasets.

7http://opendefinition.org/

Page 8: Open Data - Principles and Techniques

P#3: Universal Participation

Everyone must be able to use, reuse and redistribute (no discrimination) !

No ‘non-commercial’ restrictions

8http://opendefinition.org/

Page 9: Open Data - Principles and Techniques

Questions

!

• Do the open data principles sound familiar (to CS students / software engineers)? !

• Any known “open data” examples?

9

Page 10: Open Data - Principles and Techniques

Open Data Licensing

10

Page 11: Open Data - Principles and Techniques

Public Domain Dedication

11

Page 12: Open Data - Principles and Techniques

Open Data Movement

12

Source: http://www.flickr.com/photos/jamescridland/613445810/sizes/l/in/photostream/

Page 13: Open Data - Principles and Techniques

Open Government Data

13

Page 14: Open Data - Principles and Techniques

14

Page 15: Open Data - Principles and Techniques

15

“Decades ago, the US Government made both whether data and the GPS System freely available. Since that time, American entrepreneurs and innovators have utilised these resources to create navigation systems, location-based applications, …”

Page 16: Open Data - Principles and Techniques

16

Page 17: Open Data - Principles and Techniques

Open Government Data

17

Page 18: Open Data - Principles and Techniques

18

Page 19: Open Data - Principles and Techniques

19

Open Government Data

Developers Entrepreneurs

Startups

Apps / Services

Page 20: Open Data - Principles and Techniques

(Open) Data Journalism

20

Page 21: Open Data - Principles and Techniques

21

(Open) Data Journalism

Page 22: Open Data - Principles and Techniques

(Open) Data Journalism

22http://datajournalismhandbook.org/

Page 23: Open Data - Principles and Techniques

Open Data in Science

23

Page 24: Open Data - Principles and Techniques

Open Data in Science / Open Access

24

Page 25: Open Data - Principles and Techniques

How can publish and access structured data on the Web?

Page 26: Open Data - Principles and Techniques

My plan for today…

• Open Data – Principles and Examples !

• Technique #1: Linked (Open) Data !

• Technique #2: Microdata !

• Open Data Activities in Austria !

• Questions / Discussion

26

Page 27: Open Data - Principles and Techniques

Linked Data!“A method of publishing structured data so that it can be interlinked and become more useful. !It builds upon standard Web technologies such as HTTP, RDF and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a way that can be read automatically by computers. !This enables data from different sources to be connected and queried” ![Bizer, Heath, Berners-Lee 2009]

27

Page 28: Open Data - Principles and Techniques

Linked Open Data

28Open Data + Linked Data = Linked Open Data

Page 29: Open Data - Principles and Techniques

Why Linked Data?

Page 30: Open Data - Principles and Techniques

Why Linked Data?

Page 31: Open Data - Principles and Techniques

Why Linked Data?

Page 32: Open Data - Principles and Techniques

Web Architecture

Page 33: Open Data - Principles and Techniques

Web Architecture

• A set of simple standards – Uniform global addressing (URI) – Uniform document encoding (HTML) – Uniform transportation (HTTP)

• Hyperlinks connecting documents • Works pretty well for accessing and exchanging

documents

Page 34: Open Data - Principles and Techniques

How can publish and access structured data on the Web?

Page 35: Open Data - Principles and Techniques

Web Services and Web APIs

Source: http://www.blogperfume.com/new-27-circular-social-media-icons-in-3-sizes/

Page 36: Open Data - Principles and Techniques

Web Services and Web APIs

• Each Web API has a proprietary interface • Datasources must be known in advance • Information entities (papers, authors,

subjects, etc.) are often not linked

Page 37: Open Data - Principles and Techniques

37

Social Networking Sites as Walled Gardens by David Simonds

Page 38: Open Data - Principles and Techniques

Linked Data Vision

• Publish and link structured data on the Web • Create a single globally connected data

space based on the Web Architecture

Page 39: Open Data - Principles and Techniques

Web of Linked Data

• A set of simple standards – Uniform global addressing (URI) – Uniform data model (RDF) – Uniform transportation (HTTP)

• RDF links connecting entities • Forms a global data space and facilitates

accessing and exchanging data

Page 40: Open Data - Principles and Techniques

What is Linked Data?

• A method to build a Web of Data • Architectural style, set of standards

Page 41: Open Data - Principles and Techniques

Linking Open Data Project

• A W3C community project with the goal to extend the Web with a data commons by publishing various open data sets as RDF on the Web and by setting links between data items from different sources

Page 42: Open Data - Principles and Techniques
Page 43: Open Data - Principles and Techniques
Page 44: Open Data - Principles and Techniques
Page 45: Open Data - Principles and Techniques
Page 46: Open Data - Principles and Techniques
Page 47: Open Data - Principles and Techniques

~$ curl -I -H "Accept: text/turtle" http://dbpedia.org/resource/The_Shining_\(film\) !~$ curl -H "Accept: text/turtle" http://dbpedia.org/data/The_Shining_\(film\).ttl

~$ sudo apt-get install raptor (Linux) ~$ brew install raptor (Mac OSX) ~$ rapper http://dbpedia.org/resource/The_Shining_\(film\)

Page 48: Open Data - Principles and Techniques

LINKED DATA TECHNOLOGIES

48

Page 49: Open Data - Principles and Techniques

RDF

• A data model for representing data on the Web • Several statements (triples) form a graph

Page 50: Open Data - Principles and Techniques

RDF/XML, N3, Turtle, etc.

• Data formats for RDF resource representations

• Used to transfer RDF data between apps

Page 51: Open Data - Principles and Techniques

RDFS

• A language for describing the syntax and semantics of schemas/vocabularies in a machine-understandable way

http://dbpedia.org/ontology/Film

http://dbpedia.org/ontology/Work

rdfs:subClassOf

Page 52: Open Data - Principles and Techniques

OWL• A more expressive (formal) language for defining

the syntax and semantics of schemas/vocabularies • Solves RDFS shortcomings but introduces quite

some complexity

Page 53: Open Data - Principles and Techniques

SKOS• A language for describing controlled vocabularies

(taxonomies, thesauri, classification schemes)

Page 54: Open Data - Principles and Techniques

SPARQL

• A query language and protocol for accessing RDF data on the Web

SELECT DISTINCT ?x WHERE { ! ?x dcterms:subject ! <http://dbpedia.org/resource/Category:1980s_horror_films> . }

Page 55: Open Data - Principles and Techniques

Database Systems Analogy...

Purpose Relational Database Management Systems (RDBMS)

Linked Data Technologies

Query

Schema Definition Language

Data Representation

Identifiers

55

?

Page 56: Open Data - Principles and Techniques

Database Systems Analogy...

Purpose Relational Database Management Systems (RDBMS)

Linked Data Technologies

Query SQL SPARQL

Schema Definition Language

SQL DDL RDFS / OWL

Data Representation

Relational Model / Tables RDF / Graph

Identifiers Primary Keys (numeric sequences)

URI

56

Page 57: Open Data - Principles and Techniques

DBPedia Query Demo

57

SELECT ?person (count(DISTINCT ?spouse) as ?spouses) where { ?person a yago:AmericanFilmActors . ?person dbpprop:spouse ?spouse . !} ORDER BY DESC(?spouses) LIMIT 100

Page 58: Open Data - Principles and Techniques

LINKED DATA EXAMPLES

58

Page 59: Open Data - Principles and Techniques
Page 60: Open Data - Principles and Techniques
Page 61: Open Data - Principles and Techniques
Page 62: Open Data - Principles and Techniques
Page 63: Open Data - Principles and Techniques
Page 64: Open Data - Principles and Techniques
Page 65: Open Data - Principles and Techniques

65

Page 66: Open Data - Principles and Techniques

66

Page 67: Open Data - Principles and Techniques

Google Knowledge Graph

• Enables search for things (people, places) that Google knows about !

• Rooted in public sources such as Freebase, Wikipedia, CIA World Factbook, etc. – augmented to 500M objects, 3.5B facts and

relationship !

• Next generation search (semantic index)

67

Page 68: Open Data - Principles and Techniques

68

Page 69: Open Data - Principles and Techniques

69

Page 70: Open Data - Principles and Techniques

My plan for today…

• Open Data – Principles and Examples !

• Technique #1: Linked (Open) Data !

• Technique #2: Microdata !

• Open Data Activities in Austria !

• Questions / Discussion

70

Page 71: Open Data - Principles and Techniques

Rich Snippets / Microdata

71

Page 72: Open Data - Principles and Techniques

Microdata (HTML5)

• An HTML 5 specification used to nest structured data within existing content on Web pages. !

• Search engines and browsers can extract and process Microdata and provide richer browsing experience for users

Page 73: Open Data - Principles and Techniques

Microdata Example

<div itemscope itemtype="http://schema.org/Person"> !! <span itemprop="name">Bernhard Haslhofer</span>, ! <span itemprop="nickname">behas</span>. ! <div !itemprop="address” ! !itemscope itemtype="http://schema.org/PostalAddress">

! ! <span itemprop="streetAddress">301 College Avenue</span> ! ! <span itemprop=”addressLocality">Ithaca</span> ! ! <span itemprop=”addressCountry">United States</span>

! </div> </div>

Page 74: Open Data - Principles and Techniques

Schema.org

Page 75: Open Data - Principles and Techniques
Page 76: Open Data - Principles and Techniques

schema.org / Microdata example

<h1>Pirates of the Carribean: On Stranger Tides (2011)</h1> Jack Sparrow and Barbossa embark on a quest to find the elusive fountain of youth, only to discover that Blackbeard and his daughter are after it too. !Director: Rob Marshall Writers: Ted Elliott, Terry Rossio, and 7 more credits Stars: Johnny Depp, Penelope Cruz, Ian McShane 8/10 stars from 200 users. Reviews: 50.

Page 77: Open Data - Principles and Techniques

schema.org / Microdata example

Page 78: Open Data - Principles and Techniques

schema.org

• Defines – a number of types (e.g, person), organized in

an inheritance hierarchy – a number of properties (e.g., name)

• Extension mechanisms to extend the schemas

• OWL representation: http://schema.org/docs/schemaorg.owl

• http://schema.rdfs.org/index.html78

Page 79: Open Data - Principles and Techniques

Open Graph Protocol

Page 80: Open Data - Principles and Techniques
Page 81: Open Data - Principles and Techniques

81

Page 82: Open Data - Principles and Techniques
Page 83: Open Data - Principles and Techniques

My plan for today…

• Open Data – Principles and Examples !

• Technique #1: Linked (Open) Data !

• Technique #2: Microdata !

• Open Data Activities in Austria !

• Questions / Discussion

83

Page 84: Open Data - Principles and Techniques

84

Page 85: Open Data - Principles and Techniques

Open Government Data

85

Page 86: Open Data - Principles and Techniques

Open Government Data

86

Page 87: Open Data - Principles and Techniques

87

Page 88: Open Data - Principles and Techniques

Open Government Data Apps

88

Page 89: Open Data - Principles and Techniques

My plan for today…

• Open Data – The idea !

• Implementation #1: Linked Open Data !

• Implementation #2: Machine-readable HTML tags

!

• Open Data Activities in Austria !

• Questions / Discussion

89

Page 90: Open Data - Principles and Techniques

Readings

!

• Tom Heath and Christian Bizer (2011) Linked Data: Evolving the Web into a Global Data Space (1st edition). Synthesis Lectures on the Semantic Web: Theory and Technology, 1:1, 1-136. Morgan & Claypool. !

• Jason Ronallo: HTML5 Microdata and Schema.orghttp://journal.code4lib.org/articles/6400