View
790
Download
0
Category
Tags:
Preview:
Citation preview
FOSDEM 5/02/2011 1
With the help of the Datalift teamAnd the support of the French National Research Agency
Datalift: A Catalyser for the Web of Data
François ScharffeLIRMM/CNRS/University of Montpellier
francois.scharffe@lirmm.fr @lechatpito
The data revolution is on its way !
As Open Data meets the Semantic Web
The promises of linked-data
Richer Applications
Linked Data Lite | the Web on Steroids 1.0 (iPhone)
Richer applications
BBC Programmes
More precise search and QA
Making your data 5 stars
http://www.w3.org/DesignIssues/LinkedData.html
So, how to lift data ?
How to publish data on the Web as linked-data ?
● Basic principles Tim Berners Lee [2006] (Design Issues)
– Use URIs to identify things (not only documents)– Use HTTP URIs– When dereferecing URIS, return a description of the
ressource– Include links to other ressources on the Web
Welcome aboard the data lift
Published and interlinked data on the Web
Applications
Interconnexion
Publication infrastructure
Data convertion
Vocabulary selection
Raw data
Datalift
Datasets publication
R&D to automate the publication process
Tool suite to help publish data
Training, tutorials, data publication camps
SemWebPro 18/01/2011 11
1st floor - Selection
Les vocabulaires de mes amis …
Ø What is a (good) vocabulary for linked data ?
§ Usability criterias
Simplicity, visibility, sustainability, integration, coherence …
Ø Differents types of vocabularies
§ metadata, reference, domain, generalist …
§ The pillars of Linked Data : Dublin Core, FOAF, SKOS
Ø Good and less good practices
§ Ex : Programmes BBC vs legislation.gov.uk
§ Vocabulary of a Friend : networked vocabularies
Ø Linguistic problems
§ Existing vocabularies are in English at 99%
§ Terminological approach :which vocabularies for « Event » « Organization »
SemWebPro 18/01/2011 13
Did you say « vocabulary »
… And why not « ontology »?
§ Or « schema » ou « metadata schema »?
§ Ou « model » (data ? World ?)
Ø All these terms are used and justifiable
They are all « vocabularies »
§ The define types of objects (or classes)and the properties (oo attributes) atttached to these objects.
§ Types and attributes are logically definedand named using natural language
§ A (semantic) vocabularyis an explicit formalizationof concepts existing in natural language
Vocabularies for linked data
ØAre meant to describe resources in RDF
ØAre based on one of the standard W3C language§ RDF Schema (RDFS)
• For vocabulaires without too much logical complexity
§ OWL • For more complex ontological constructs
§ These two languages are compatible (almost)
ØThe can be composed « ad libitum »§ One can reuse a few elements of a vocabulary
§ The original semantics have to be followed
What makes a good vocabulary ?
Ø A good vocabulary is a used vocabulary
§ Data published on CKAN give an idea of vocabulary usage
§ Exemple : vlist of datasets using FOAF http://xmlns.com/foaf/0.1/
Ø Other usability criterias
§ Simplicity and readability in natural language
§ Elements documentation (definition in natural language)
§ Visibility and sustainability of the publication
§ Flexibility and extensibility
§ Sémantique integration (with other vocabularies)
§ Social integration (with the user community)
A vocabulary is also a community
ØBad (but common) practice● Build a lonely vocabulary
– For example as a research project– Without basing it on any existing vocabulary
§ To publish it (or not) and then to forget about it
§ Not to care about its users
ØA good vocabulary has an organic life
§ Users and use cases
§ Revisions and extensions
§ Like a « natural » vocabulary
Types of vocabularies
Ø Metadata vocabularies
§ Allowing to annotate other vocabularies
• Dublin Core, Vann, cc REL, Status
Ø Reference vocabularies
§ Provide « common » classes and properties
• FOAF, Event, Time, Org Ontology
Ø Domain vocabularies
§ Specific to a domain of knowledge
• Geonames, Music Ontology, WildLife Ontology
Ø « general » vocabularies
§ Describe « everything » at an arbitrary detail level
• DBpedia Ontology, Cyc Ontology, SUMO
Vocabulary of a Friend
Øhttp://www.mondeca.com/foaf/voaf
ØA simple vocabulary...
ØTo represent interconnexions between vocabularies
ØA unique entry point to vocabularies and Datasets of the linked-data cloud Linked Data Cloud
ØOngoing work in Datalift
SemWebPro 18/01/2011 19
2nd floor - Conversion
URL Design et URL Pattern
ØGood practices for linked-data
§ Ressource: http://dbpedia.org/resource/Paris
§ Document: http://dbpedia.org/page/Paris
§ Data: http://dbpedia.org/data/Paris
Ø… served using content negociation
URI Pattern in REST
ØLes services REST (Representational State Transfer) manipulent des ressources et les URLs sont principalement utilisés pour adresser ces ressources
ØUne URI de base:
§ http://www.example.com/bookstore/
ØUne ressource à un URL unique: (retrieve, update, create, delete)
§ http://www.example.com/bookstore/books/ISBN123
ØNotion de collection: (list, replace, create, delete)
§ http://www.example.com/bookstore/books
Convertion tools to RDF
ØHow is the raw data to be converted ?
§ Relational Database ?
§ (Semi-)structured formats ?
§ Programmatic acces (API) ?
ØThere are solutions for all cases
D2RQ Map
Triplify: Relational data to JSON/RDF
ØExtract a folder in your Webapp: http://sourceforge.net/projects/triplify/
ØModify a config file:
§ SQL query … URI pattern
§ PHP lover!
Working on spreadsheets
Google acquired Freebase
http://code.google.com/p/google-refine/
RDF extension for Google Refine
ØA graphical extension for Google Refine allowing to export the clean data as RDFhttp://lab.linkeddata.deri.ie/2010/grefine-rdf-extension/
Name Job Title Grade Organization
Annual pay rate - including
taxable benefits and allowances
Notes
Stephan Wilcke Chief Executive Officer
Asset Protection Agency
£150,000 - £154,999
Jens Bech Chief Risk Officer Asset Protection Agency
£165,000 - £169,999 No pension
Ion Dagtoglou Chief Invesment Officer
Asset Protection Agency
£165,000 - £169,999 No pension
Brian Scammell Chief Credit Officer
Asset Protection Agency
£130,000 - £134,999 4 days per week
Google Refine et RDF
SemWebPro 18/01/2011 29
3rd floor - Publication
Publication components
SPARQLendpoint
REST
RDFstorage
Alimentation
Alimentation
Alimentation
InferenceEngine
QueryingBrowsing
A few productsVirtuoso, Sesame, Mulgara, 4storeOWLIM, AllegroGraph, Big Data,Jena
Named graphs
1
23
4
5
6
7
8
9
1110
14
12
13
15
16
ØDelete on a graph
ØSPARQL queries define graphs
ØRdf graphs are bags of triples, everything is mixed
Inference
Ø Generating triples from other triples
Ø Deduction mechanism
§ Men are mortals, Socrates is a man, so Socrates is mortal
Ø Allows to avoid exhaustivity, give sense to defining hierarchies
Ø Constraints: cardinality, NFPs, ...
1
23
4
5
6
7
8
9
1110
14
12
13
15
16
Analyse des RDF Store : la méthode QSOS
Ø Qualification and Selection of Open Source Software
§ Projet Open Source sur des solutions open source
§ http://www.qsos.org
Ø Objectifs de QSOS
§ Qualifier des logiciels
§ Comparer des solutions après avoir défini des exigences et en pondérant les critères
§ Sélectionner le produit le plus adapté par rapport à un besoin
Ø QSOS fournit
§ Une méthode objective et formalisée
§ Un référentiel d’études disponibles
§ Des outils facilitant le déroulement de la méthode
SemWebPro 18/01/2011 34
4th floor - Interconnexion
Linked data and interconnexions
ØWithout links there is no Web but data silos
ØLinks can be part of the datasets design (reference datasets)
ØLinks can be found after the publication: equivalence links between resources
Comment interconnecter ses données ?
Tools
Ø RKB-CRS A coreference resolution service for the RKB knowledge base
Ø LD-mapper A linkage tool for datasets described using the Music Ontology
Ø ODD Linker A linkage tool based on SQL
Ø RDF-AI Multi purpose data linkage and fusion
Ø Silk et Silk LSL Linkage tool and linkage specification language
Ø Knofuss architecture Datasets linkage and fusion
Exemple Silk specification
<Silk> <Prefix id="rdfs" namespace= "http://www.w3.org/2000/01/rdf-schema#" /> <Prefix id="dbpedia" namespace= "http://dbpedia.org/ontology/" /> <Prefix id="gn" namespace= "http://www.geonames.org/ontology#" />
<DataSource id="dbpedia"> <EndpointURI>http://demo_sparql_server1/sparql </EndpointURI> <Graph>http://dbpedia.org</Graph> </DataSource>
<DataSource id="geonames"> <EndpointURI>http://demo_sparql_server2/sparql </EndpointURI> <Graph>http://sws.geonames.org/</Graph> </DataSource> <Thresholds accept="0.9" verify="0.7" /> <Output acceptedLinks="accepted_links.n3" verifyLinks="verify_links.n3" mode="truncate" />
<Interlink id="cities"> <LinkType>owl:sameAs</LinkType> <SourceDataset dataSource="dbpedia" var="a"> <RestrictTo> ?a rdf:type dbpedia:City </RestrictTo> </SourceDataset> <TargetDataset dataSource="geonames" var="b"> <RestrictTo> ?b rdf:type gn:P </RestrictTo> </TargetDataset> <LinkCondition> <AVG> <Compare metric="jaroSimilarity"> <Param name="str1" path="?a/rdfs:label" /> <Param name="str2" path="?b/gn:name" /> </Compare> <Compare metric="numSimilarity"> <Param name="num1" path="?a/dbpedia:populationTotal" /> <Param name="num2" path="?b/gn:population" /> </Compare> </AVG> </LinkCondition> </Interlink></Silk>
Where to find links ?
Towards automated interconnexion services
ØThe linkage specification could be simplified
§ Using alignments between vocabularies
§ Detection of discriminating properties
§ Indicating comparison methods by attaching metadata to ontologies
ØWork in progress in Datalift
SemWebPro 18/01/2011 41
5th floor - Applications
VisiNav
Sig.ma
Nos Députés . FR
A few examples from US
http://data-gov.tw.rpi.edu/demo/USForeignAid/demo-1554.html
Mashups … Mashups … Mashups …
That's it !
● Datalift.org● We're looking for a Datageek !
Recommended