Upload
michael-hausenblas
View
6.481
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Existing data management approaches assume control over schema, data and data generation, which is not the case in open, de-centralised environments such as the Web. The lack of control means that there are social processes necessary to generate 'ordo ab chao' and hence a new life cycle model is necessary. Based on our experience in Linked Data publishing and consumption over the past years, we have identify involved parties and fundamental phases, which provide for a multitude of so called Linked Data life cycles. If you want to hear me speak to the slides, you might want to check out the following videos on YouTube: Part 1: http://www.youtube.com/watch?v=AFJSMKv5s3s Part 2: http://www.youtube.com/watch?v=G6YJSZdXOsc Part 3: http://www.youtube.com/watch?v=OagzNpDEPJg
Citation preview
Linked Data life cycles
Dr. Michael Hausenblas, Linked Data Research Centre
DERI, NUI Galway
July 2011
What is a dataspace?
• Heterogeneous data sources• Distributed environment - proximity• Find and consume data• Update data
What is a DSSP and why does it matter?
• DSSP == Dataspace Support Platform
• Participants & relationships
• Services– Catalog & Browse– Search & Query– Index– Discovery
• Linked Data ecosystem is an open & standards-based real-world DSSP
Data management solutions
Base
d o
n [
Frankl
in:S
IGM
OD
05
]
Linked Data principles*
1. Use URIs to identify the “things” in your data
2. Use HTTP URIs so people and machines can look them up (on the Web)
3. When a URI is looked up, return a description of the thing
4. Include links to related things
* http://www.w3.org/DesignIssues/LinkedData.html
triples distribution
links distribution
http://lod-cloud.net/state/
Linked Open Data cloud stats
The Challenge
• Classical data management approaches assume complete control over schema, data, and data generation
• The Web: distributed & open lacks control
• Requires a new model of life cycles
Linked Data life cycles
opendata.ie
LOD cloud
Neologism
DataCube
prefix.cc
Google Refine
RDB2RDF
VoID
DCAT
Sindice
CKAN
LATC 24/7
duke
Sig.ma
school explorer
data-gov.ie
Linked Data life cycles: data awareness
opendata.ie
LOD cloud
Neologism
DataCube
prefix.cc
Google Refine
RDB2RDF
VoID
DCAT
Sindice
CKAN
LATC 24/7
duke
Sig.ma
school explorer
data-gov.ie
‘database hugging disorder’
htt
p:/
/th
inkq
uart
erl
y.co
.uk/
01
-data
/a-d
ata
-sta
te-o
f-m
ind
/
Hans Rosling
TimBL’s 5-star plan for open data*
★ Make your data available on
the Web under an open license
★★ Make it available as structured data (Excel sheet instead of image scan of a table)
★★★ Use a non-proprietary format (CSV file instead of an Excel sheet)
★★★★ Use Linked Data format (URIs to identify things, RDF to represent data)
★★★★★ Link your data to other people’s data to provide context* http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/
Linked Data life cycles: modeling
opendata.ie
LOD cloud
Neologism
DataCube
prefix.cc
Google Refine
RDB2RDF
VoID
DCAT
Sindice
CKAN
LATC 24/7
duke
Sig.ma
school explorer
data-gov.ie
http://linked-statistics.org/datacube/
http://schema.rdfs.org
Linked Data life cycles: publishing
opendata.ie
LOD cloud
Neologism
DataCube
prefix.cc
Google Refine
RDB2RDF
VoID
DCAT
Sindice
CKAN
LATC 24/7
duke
Sig.ma
school explorer
data-gov.ie
Publishing
http://lab.linkeddata.deri.ie/2010/grefine-rdf-extension/
Linked Data life cycles: discovery
opendata.ie
LOD cloud
Neologism
DataCube
prefix.cc
Google Refine
RDB2RDF
VoID
DCAT
Sindice
CKAN
LATC 24/7
duke
Sig.ma
school explorer
data-gov.ie
Discovery
• Model for dataset description: VoID vocabulary
• Users in industry and governments
• Published as W3C Notehttp://www.w3.org/TR/void
• Significant uptake in research
Describing Datasets
• General dataset metadata• Access metadata• Structural metadata• Describing linksets• Deployment and discovery of voiD files
Linked Data life cycles: integration
opendata.ie
LOD cloud
Neologism
DataCube
prefix.cc
Google Refine
RDB2RDF
VoID
DCAT
Sindice
CKAN
LATC 24/7
duke
Sig.ma
school explorer
data-gov.ie
Why going for the 5th star?
Central Contractor Registration (CCR)
Geonames
http://webofdata.wordpress.com/2011/05/22/why-we-link/
Pay-as-you-go integration
Fix Overall Data Integration
Effort
http://latc-project.eu/
Linked Data life cycles: use cases
opendata.ie
LOD cloud
Neologism
DataCube
prefix.cc
Google Refine
RDB2RDF
VoID
DCAT
Sindice
CKAN
LATC 24/7
duke
Sig.ma
school explorer
data-gov.ie
• Fingal County Council– Raising awareness re open data and demonstrating its value.– ODC2011 submission http://planning-apps.opendata.ie
• Local Government Management Agency (former LGCSB)– Advancing access to Open Data for Local Authorities – LD pilot for Management Service Indicators across Local Authorities
• Central Statistics Office, dissemination group– Boot-strapping data-gov.ie with statistical data.– school explorer - pilot
• Enterprise Ireland: National Cross Industry Working Group on Open Data
27
Use case: eGov Ireland
School explorer
Linked Data life cycles
opendata.ie
LOD cloud
Neologism
DataCube
prefix.cc
Google Refine
RDB2RDF
VoID
DCAT
Sindice
CKAN
LATC 24/7
duke
Sig.ma
school explorer
data-gov.ie
Challenges• Schema mapping, matching, alignment
[Hausenblas:DBKDA10]
• Write-enable the LD world [Berners-Lee:DERITR09]
• Authentication and authorisation in a distributed setuphttp://www.w3.org/2005/Incubator/webid/
• REST-alignment of Linked Data[Wilde:WEWST09]
• Dataset dynamics[Umbrich:LDOW10]
References[Franklin:SIGMOD05] M. J. Franklin, A. Y. Halevy, and D. Maier, From databases to dataspaces: a new
abstraction for information management. SIGMOD Record, 34(4):27–33, 2005.
[Berners-Lee:DERITR09] T. Berners-Lee, R. Cyganiak, M. Hausenblas, J. Presbrey, O. Seneviratne, and O. Ureche.On Integration Issues of Site-Specific APIs into the Web of Data. DERI Technical Report, 2009.
[Hausenblas:DBKDA10] M.Hausenblas and Marcel Karnstedt. Understanding Linked Open Data as a Web-Scale Database. Second International Conference on Advances in Databases, Knowledge, and Data Applications, 2010.
[Wilde:WEWST09] E. Wilde and M. Hausenblas. RESTful SPARQL? You Name It! Aligning SPARQL with REST and Resource Orientation. Fourth Workshop on Emerging Web Services Technology Workshop at European Conference on Web Services, Eindhoven, The Netherlands, 2009.
[Umbrich:LDOW10] J. Umbrich, M. Hausenblas, A. Hogan, A. Polleres, and S. Decker. Towards Dataset Dynamics: Change Frequency of Linked Open Data Sources. Third International Workshop on Linked Data on the Web at 19th International World Wide Web Conference, Raleigh, North Carolina, USA, 2010.
See also ...
• The Linked Open Data cloudhttp://lod-cloud.net
• Linked Data core specificationshttp://linkeddata-specs.info
• Enabling cross-boundary access to data sourceshttp://enable-cors.org
• Linked Open Data 5-star deployment scheme
http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/