Upload
victor-de-boer
View
692
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Guest Lecture for VU Social Science students
Citation preview
Linked Data Principles and Examples
Victor de Boer25-11-2014
With slides from Knud Hinnerk Moller, Kasper Brandt, Christophe Gueret
Victor de Boer
Researcher at Netherlands Institute for Sound and Vision
Assistant professor at Web and Media Group VU
Semantic Web, Linked Data
Cultural Heritage
Digital History
Linked Data for Development
http://info.cern.ch/Proposal.html
Tim Berners-Lee (The inventor of the Web)
Web of Documents (WWW)Linked Documents
From text to data > increased semantics
More and more structured data available online
• Governments
• Social web data
• Medical data
• Museums
• Research data
?
Mo
verum
.com
Web of Documents vs Web of Data
• People are often not interested in documents, they are interested in things (information) – Humans are very good at reading (web)
documents and distilling information
• Computers are very good at calculating, combining and filtering information. But they are very bad at reading documents– We need to help machines understand web data
– Write it down in a way that they can understand
LINKED DATA!!
Web of Documents (WWW)Linked Documents
Web of DataLinked Data
without
Slide stolen from Christophe Gueret
with Linked Data
Slide stolen from Christophe Gueret
http://info.cern.ch/Proposal.html
Tim Berners-Lee (The inventor of the Web)And the Semantic Web
What is Linked Open Data?
Intermezzo
Intermezzo
Open Datais about licenses to allow reuse
Linked Datais about technology for interoperability
Intermezzo
Intermezzo
★Available on the web (whatever format), but with an open license
★★
Available as machine-readable structured data (e.g. excel instead of image scan of a table)
★★★as (2) plus non-proprietary format (e.g. CSV instead of excel)
★★★★
All the above plus, Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff
★★★★★All the above, plus: Link your data to other people’s data to provide context
www.w3.org/designissues/linkeddata.html
Linked Data five star system (TBL)
Intermezzo
Intermezzo
http://lod-cloud.net/
Examples of Linked Data
• Academia, Research
• Community
• Libraries, Museums, Cultural Heritage
• Government and public institutions
(Open Data)
• Media
• Business
OpenPhacts explorer
http://www.openphacts.org/
Google knowledge graph
ww
w.h
uffin
gton
po
st.com
How does all this work?
• Data, not documents
• Structured data
• Graph (networked) data!
• W3C Web standards stack
– URIs, HTTP, RDF, RDFa, RDFS, OWL, SPARQL, etc.
Four rules of Linked Data
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information, using the standards (RDF)
4. Include links to other URIs. so that they can discover more things.
http://www.w3.org/DesignIssues/LinkedData.html
Semantic Web standard for writing down data, information
(Subject, Relation, Object)
<Painting001, has_location, Amsterdam>
Resource Description Framework (RDF)
Painting001 Amsterdamhas_location
name
located in
located in
located inpopulation
population
capital
People’s Republic of
China
Beijing
SJTU
23,019,148
20,693,000
Shanghai Jiao Tong University
name
Shanghai
上海
SJTU name "Shanghai Jiao Tong University"
SJTU located in Shanghai
Shanghai name "上海"
Shanghai population "23,019,148"
Shanghai located in People’s Republic of China
People’s Republic of China capital Beijing
Beijing located in People’s Republic of China
Beijing population "20,693,000"
• Graph• Triple
Graph Thinking
Use HTTP URIs for Things
• Uniform Resource Identifier (URI) is a string of characters used to identify a name of a resource
• http://rijksmuseum.nl/data/schilderij1
• I can go there (dereference) and then I get information about it – HTML page for humans– RDF data for machines
Links
• Link your data to other data
– By establishing RDF triples that point to other people’s data
– By reusing other people’s URIs
Example: Link to Geonames
IDS: document 0002 Country:”Gambia”
Geonames:Gambia
Region: Africa
population : 1593256
N 13° 30' 0'' W 15° 30' 0'
Reuse things: Vocabularies
• FOAF (Friend of a Friend): People, Organisations, Social Networks
• Dublin Core (Bibliographic): publications, authors, media, etc.
• schema.org (Google, Yahoo!, Bing, Yandex): cross-domain, what search engines are interested in (people, events, products, locations)
• Good Relations: business, products, etc.
rijks:Painting001 Amsterdam
http://purl.org/dc/terms/spatial
Reuse things: Datasets
• GeoNames: Geographical data• DBPedia: RDF version of Wikipedia (also in
Dutch)• GTAA: (Gemeenschappelijke Thesaurus
Audiovisuele Archieven): Persons, topics, AV-terms
• VIAF: Persons
rijks:Painting001 http: //sws.geonames.org/2759794/
http://purl.org/dc/terms/spatial
Examples
Dutch Ships and Sailors Linked Data Cloud
Victor de Boer, Matthias van Rossum, Jur Leinenga, Rik Hoekstra
With input from Andrea Bravo Balado and Robin Ponstein
Netherlands Institute for Sound and Vision / VU University Amsterdam [email protected]
ISWC2014
The Problem:((Maritime) historical) data is not integrated
25+ Maritime datasets; Heterogeneous
The solution
Well, Linked Data obviously!
But why Linked Data
• Heterogeneous models, one dataformat– Link what can be linked– Keep specificity of original data – Allow integration at project level (and beyond)
• Links to other sources: re-use knowledge
• Extensible
• Allow multiple levels of semantic enrichment/ normalization – Provenance
KB Delpher
Dutch-Asiatic Shipping (DAS) –Voyages (Huygens ING)
“VOC Opvarenden”Mustering and payroll information (DANS Easy)
Dutch Ships and Sailors
Modeling in collaboration with historians (1)
dss:Recordmdb:Aanmonstering
mdb:aanmonstering-del_gem-1879-101
dss:Recordmdb:PersoonsContractmdb:persoonscontract-
del_gem-1879-101-16858-Pieter_Hoekstra
dss:Schipmdb:Schip
mdb:schip-del_gem-1879-101-Isadora
dss:shipmdb:ship
“1870-1894"
"Isadora"
rdfs:labeldss:shipname
mdb:scheepsnaam
dss:ShipTypemdb:ScheepsTy
pemdb:schoener
dss:shiptypemdb:scheepstype
“32”
dcterms:identifiermdb:inventarisnummer
mdb:has_KB_article
<http://resolver.kb.nl/resolve?urn=ddd:010063756:mpeg21:a0045:ocr>
mdb:schip-del_gem-1879-137-Isadora
owl:sameAs
dss:has_aanmonstering
mdb:has_person
foaf:Persondss:Person
mdb:Personmdb:persoon-del_gem-1879-101-16858
dss:rank
mdb:rank
dss:Rankmdb:Rang
mdb:matroos
mdb:maandgage
“Pieter"foaf:firstnamemdb:voornaa
m“Hoekstra"
foaf:lastnamemdb:achternaam
Jur Leinenga(Huygens ING) Muster-rolls Northern Provinces1803-1937
Modeling in collaboration with historians (2)
dss:Recordgzmvoc:Telling
gzmvoc:telling-1046-De_Berkel __bnode_
1gzmvoc:aziatischeBemanning
dss:Shipgzmvoc:Schip
gzmvoc: schip-1046-De_Berkel
dss:has_shipgzmvoc:schip
"1046"
“Schip”
“De Berkel”rdfs:label
dss:scheepsnaamgzmvoc:scheepsnaam
dss:ShipTypegzmvoc:Scheepst
ypegzmvoc: type-
Ship
dss:has_shiptypegzmvoc:has_shiptype
gzmvoc:scheepstype
“21”
“Moorsemattroosen”
dss:azRegistratieKop
gzmvoc:azAantalMatrozen
gzmvoc:telling
gzmvoc:heeft DAS heenreis
dss:Recorddas:Voyagedas:voyage-
1918_61
Matthias van Rossum (VU-hist) Payroll information for European
vs Asiatic Sailors (17th / 18th C)
Modelling principles
• Model each dataset as directly as possible– Only “syntactical” transformation to RDF– No normalization
• Reusability
• Transparency, trust
• Normalize and link in second stage – store in separate RDF Named Graphs
mdb:Schip1 mdb:Kof
mdb:scheepsType
das:ShipX das:Kofship
das:typeOfShip
dss:has_shipType
rdfs:subPropertyOf
rdfs:subPropertyOf
Link properties and classes to interoperability layer
mdb:Schip1 mdb:Kof
mdb:scheepsType
das:ShipX das:Kofship
das:typeOfShip
Aat:Kof
Aat:Platbodems
skos:exactMatch
skos:exactMatch
skos:exactMatch
Vocabulary Links
Links to DBPedia (Ship types, places, ranks)Links to Getty AAT (Ship types, ranks)Links to GeoNames (Places)
Linking to Historical newspapers
• Automatically detect links between ships and historical newspaper articles (delpher.nl)
– Based on ship name, time intervals, captain’s names, ship type, named entities, keywords, background knowledge
• 179,120 links
- Andrea Bravo Balado
Example
[HARLINGEN, 24 October.] . «et gestrande
Zweedsche schip , waarvan wij ons vorig no.
melding maakten , is door de 'eepboot van
hier afgebragt en hier binnengede u BiJ die
gelegenheid werd ons medegeeeid, dat nog
vier vaartuigen op Terschelling aren
gestrand. Tevens is het berigt ontvan°e > dat
het hier behoorende schoonerschip
Transit, kapitein Schaap, in de Noordzee is
gezonken, nadat het achterschip was
weggeslagen ; een ligtmatroos verloor
daarbij het leven. Mede zijn hier drie
vreemde schepen met meer en minder
zware averij binnengeloopen.Spoiler alert! It sank in the North Sea.
mdb:Aanmonstering_1859-55
mdb:Transit
Provenance
• Sets of triples have provenance information
– Who made it (people/software?)
– Based on what source
– Content confidence
• Matches historical
science requirements
DAS
GZMVOC
MDB
VOCOPVBegunstig
den
VOCOPVSoldijboek
en
PROV
AAT
VOCOPVOpvaren
den
foaf
owl:sameAs
dss:hasKBLink
rdfs:subClassOf,rdfs:subPropertyOf
dss:DAS link
skos :exactMatch
Data analysis and visualisation
Current work: linking original scans
[HARLINGEN, 24 October.] . «et gestrande
Zweedsche schip , waarvan wij ons vorig no.
melding maakten , is door de 'eepboot van
hier afgebragt en hier binnengede u BiJ die
gelegenheid werd ons medegeeeid, dat nog
vier vaartuigen op Terschelling aren
gestrand. Tevens is het berigt ontvan°e > dat
het hier behoorende schoonerschip
Transit, kapitein Schaap, in de Noordzee is
gezonken, nadat het achterschip was
weggeslagen ; een ligtmatroos verloor
daarbij het leven. Mede zijn hier drie
vreemde schepen met meer en minder
zware averij binnengeloopen.Spoiler alert! It sank in the North Sea.
mdb:Aanmonstering_1859-55
mdb:Transit
Networked heritage
Concept: Jan Sluijters (schilder)DBpedia
Related items
Links
• Styles (Expressionism, Cubism, Fauvism)
• Period (contemporaries)
LinkedTV: Example of contextualization
LinkedTV – SmartTV
12 februari 2013
Cultureel erfgoed scenario, Tussen Kunst & Kitsch
Met dank aan overeenkomst met AVRO!
DIVE INTO THE EVENT-BASED
BROWSING OF LINKED HISTORICAL
MEDIAVICTOR DE BOER, JOHAN OOMEN, OANA INEL, LORA
AROYO,
ELCO VAN STAVEREN, WERNER HELMICH AND DENNIS DE
BEURS
DIGITAL HUMANITIES RESEARCHERS Med
ia research
er Lars A
rveR
øsslan
do
f the U
niversity o
f Bergen
. (Ph
oto
: An
dreas R
. Graven
)
EXPLORATIVE SEARCH
Erp, M. van; Oomen, J.; Segers, R.; Akker, C. van de; Aroyo, L.; Jacobs, G.; Legêne, S; Meij, L. van der;O ssenbruggen, J.R. van; Schreiber, G. Automatic Heritage Metadata Enrichment with Historic Events Museums and the Web 2011 http://www.museumsandtheweb.com/mw2011/papers/automatic_heritage_metadata_enrichment_with_hi
http
s://ww
w.flickr.co
m/p
ho
tos/d
rainrat/1
47
79
92
89
98
/
DATA: OPENIMAGES.EU
Open videos Netherlands Institute for Sound and Vision
3000, mostly news broadcasts
DATA: DELPHER.NL
Scans of Radio bulletins (hand annotated)
• 1937 – 1984
• 1.5 Million OCR’ed and NErred
ENTITY EXTRACTION
CROWDTRUTH.ORG
ENTITY EXTRACTION
EVENTS CROWDSOURCING AND LINKING TO CONCEPTS THROUGH CROWDTRUTH.ORG
SEGMENTATION & KEYFRAMES
LINKING EVENTS AND CONCEPTS TO KEYFRAMES
SIMPLE EVENT MODEL (SEM), OPENANNOTATION (OA) AND SKOS
DIVE:MEDIA
OBJECT
SEM:EVEN
T
SEM:PLACE
SEM:TIME
SEM:ACTOR
SKOS:CONCEPT
OA:ANNOTATIO
N
• LINKS TO EUROPEANA (MULTILINGUAL)• LINKS TO DBPEDIA
DIGITAL SUBMARINE UI
http
s://ww
w.flickr.co
m/p
ho
tos/b
enjcarso
n/2
45
17
18
85
INFINITY OF EXPLORATION
http
s://ww
w.flickr.co
m/p
ho
tos/m
ibu
chat/2
77
42
51
41
5
Linked Data 4 Development
Linked Data for International Aid Transparency Initiative
Msc. Thesis by Kasper Brandt Victor de Boer
“IATI is a voluntary, multi-stakeholder initiative that seeks to improve the transparency of aid in order to increase its effectiveness in tackling poverty.”
Linking datasets and Applications User questions
1. In total, how much does a given country receive in aid?
2. A comparative index of aid versus the Human Development Index.
3. What is the geographic location of a project? How much aid went to a given province, constituency or village?
o Is the aid spent in places where the need is highest? Is it well distributed across the country?
o Can we attribute sub-national breakdowns for aid so we can see how much goes to different parts of recipient countries?
4. How does violent conflict in recipient countries affect aid activities?
5. How does aid spending as registered in the IATI standard compare to World Bank indicators?
IATI 2 LOD application
http://iati2lod.appspot.com/applications
Information sharing in rural developing areas
Need for information sharing in rural
developing areas
• Agricultural, Health, Education, Market prices…
Sharing (heterogeneous) knowledge is essential
• LD is well-suited because of:– Language-agnostic– Interface-agnostic– De-centralised authoring
• Slicing
– Re-usability• Local• Global
Based on Sbc4d.com
Local market data
Communiqué
GSM/Voice interface
Web Interface Text-To-Speech
Community radio
RadioMarché
Sahel Eco operativeBuyers
EcoMash
[M.Sc. thesis by Henk Kroon]
Linked Data for Development (LD4D)
Web applications
<VoiceXML> to SPARQL*
Voice browserTel: +31208080855
Skype: +990009369996162208
RadioMarché Linked market data
‘Allo, Linked Data?
DBpediaGeoNames
Agrovoc
Low-powered hardware and Mesh networking
ENTITY REGISTRY SYSTEM (ERS)• Fully decentralised Linked Data publication platform• Works under any kind of connectivity context• Tracks back individual edits back to their authors• Simple and versatile• Open Source https://github.com/ers-devs• Low resource demanding
... and open for contributions so don'thesitate to fork it!
Rapid-prototyping knowledge sharing platform
(aka “The Box”)
With the mainstream
Dev. countries can leapfrog directly into the information age,
jumping many phases of immature technologies
Img: flickr/n3v3rv0id
Linked Data is mainstream computer science research.
Test hypotheses in domains/environments
Take Home• Linked Data is a set of technologies and principles fpr
formalizing data and information to make it usable for computers– Based on triples and URIs– Data takes the form of graphs– We can link data from heterogeneous sources– Reuse
• It mirrors the Web of Documents, Social Web– But behind the scenes
• Networks are very powerful and flexible for representing and sharing information