Upload
dohuong
View
231
Download
1
Embed Size (px)
Citation preview
Interlinking the web of data: challenges and solutionsWork in collaboration with François Scharffe and others
Jérôme Euzenat
June 8, 2011
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Outline
Linked data
Methods for data interlinking
General framework for data interlinking
Conclusions
Jérôme Euzenat Interlinking the web of data: challenges and solutions 2 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Outline
Linked data
Methods for data interlinking
General framework for data interlinking
Conclusions
Jérôme Euzenat Interlinking the web of data: challenges and solutions 3 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Linked (open) data cloud
Jérôme Euzenat Interlinking the web of data: challenges and solutions 4 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Publishing data on the web
Four publication principles1. Use URIs for identifying resources2. Use dereferencable URIs3. When a URI is dereferenced, a description of the identified resource is
returned4. Published datasets must be interconnected to other datasets
If possible using semantic web technologies: URI, HTTP, RDF, OWL
Jérôme Euzenat Interlinking the web of data: challenges and solutions 5 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
What for?
I If you are an authority publishing data (gov), this allows you to put yourdata in the context other authority data and the others to reuse yourdata;
I If you are a link producer, i.e., someone who knows how to add value bycross-linking data, you have standard references to work with;
I If you are an application developer, e.g., seevl.net, you can takeadvantage of this data always on the web.
and
I the data is constantly up-to-date;I the data can be freely linked;I Data is browsable.
Jérôme Euzenat Interlinking the web of data: challenges and solutions 6 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Example: INSEE dataset
Région table:code nom chef-lieu11 Île-de-France 7505621 Champagne-Ardenne 5110822 Picardie 80021
Sous-région table:
région département11 7511 7711 7811 9111 9211 93
Jérôme Euzenat Interlinking the web of data: challenges and solutions 7 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Example: Administrative ontology
Territoire_FR
Pays
Region
Departement
Arrondissement
Commune
codenom
chef-lieusubdivision
integer
string
Jérôme Euzenat Interlinking the web of data: challenges and solutions 8 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Example: RDF
REG_11
Region
rdf:typ
e
11
i:code
Île-de-Francei:nom
PAYS_FR
Paysrdf:typ
e
FRi:co
de_iso
Francei:nom
i:sub
DEP_75DEP_77DEP_78
Departement Commune
COM_75056
i:chef-lieu
rdf:typerdf:typerdf:type
rdf:type
i:subi:su
bi:sub
Parisi:nom
i:nom
Jérôme Euzenat Interlinking the web of data: challenges and solutions 9 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Example: Linking INSEE and NUTS
NUTS: Nomenclature of territorial units for statistics
#INSEE INSEE name NUTS Level #NUTS1 Pays 0 34
1 14226 Région 2 344
100 Département 3 1488342 Arrondissement4036 Canton 4
52422 Commune 5
Å vs. Saint-Rémy-en-Bouzemont-Saint-Genest-et-Issonor Montbonnot Saint-Martin
Jérôme Euzenat Interlinking the web of data: challenges and solutions 10 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Example: Linking INSEE and NUTS
NUTS: Nomenclature of territorial units for statistics
#INSEE INSEE name NUTS Level #NUTS1 Pays 0 34
1 14226 Région 2 344
100 Département 3 1488342 Arrondissement4036 Canton 4
52422 Commune 5
Å vs. Saint-Rémy-en-Bouzemont-Saint-Genest-et-Issonor Montbonnot Saint-Martin
Jérôme Euzenat Interlinking the web of data: challenges and solutions 10 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Example: Linking INSEE and NUTS
Territoire_FR
Pays
Region
Departement
Commune
PAYS_FR
REG_11
DEP_75
DEP_77
DEP_78
COM_75056
Region
Country
NUTSRegion
LAURegion
FR1
UK1
FR10
FR101
FR102
FR103
owl:sameAs
owl:sameAs
owl:sameAs
owl:sameAs
owl:sameAs
Jérôme Euzenat Interlinking the web of data: challenges and solutions 11 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Outline
Linked data
Methods for data interlinking
General framework for data interlinking
Conclusions
Jérôme Euzenat Interlinking the web of data: challenges and solutions 12 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
The data interlinking problem
URI1 URI2
o o ′
Data interlinking
owl:sameAs
Jérôme Euzenat Interlinking the web of data: challenges and solutions 13 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
The data interlinking problem
URI1 URI2
o o ′
Data interlinking
owl:sameAs
Jérôme Euzenat Interlinking the web of data: challenges and solutions 13 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
The data interlinking problem
URI1 URI2
o o ′
Data interlinking
owl:sameAs
Jérôme Euzenat Interlinking the web of data: challenges and solutions 13 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
The data interlinking problem
URI1 URI2
o o ′
Data interlinking
owl:sameAs
Jérôme Euzenat Interlinking the web of data: challenges and solutions 13 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Manual resource matching
URI1 URI2
Manual observation
owl:sameAs
This does not scale.
Jérôme Euzenat Interlinking the web of data: challenges and solutions 14 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Manual resource matching
URI1 URI2
Manual observation
owl:sameAs
This does not scale.
Jérôme Euzenat Interlinking the web of data: challenges and solutions 14 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
URI matching
URI1 URI2
URI transformation
owl:sameAs
http://dbpedia.org/resource/Johann_Sebastian_Bach owl:sameAshttp://www.lastfm.fr/music/Johann+Sebastian+Bach
http://rdf.insee.fr/geo/regions-2011.rdf#REG_11 ?http://ec.europa.eu/eurostat/ramon/rdfdata/nuts2008/FR10
Jérôme Euzenat Interlinking the web of data: challenges and solutions 15 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
URI matching
URI1 URI2
URI transformation
owl:sameAs
http://dbpedia.org/resource/Johann_Sebastian_Bach owl:sameAshttp://www.lastfm.fr/music/Johann+Sebastian+Bach
http://rdf.insee.fr/geo/regions-2011.rdf#REG_11 ?http://ec.europa.eu/eurostat/ramon/rdfdata/nuts2008/FR10
Jérôme Euzenat Interlinking the web of data: challenges and solutions 15 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
URI matching
URI1 URI2
URI transformation
owl:sameAs
http://dbpedia.org/resource/Johann_Sebastian_Bach owl:sameAshttp://www.lastfm.fr/music/Johann+Sebastian+Bach
http://rdf.insee.fr/geo/regions-2011.rdf#REG_11 ?http://ec.europa.eu/eurostat/ramon/rdfdata/nuts2008/FR10
Jérôme Euzenat Interlinking the web of data: challenges and solutions 15 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Data matching through a commonontology
o
URI1 URI2
Resource matchingof datasets
described by thesame ontology
owl:sameAs
Jérôme Euzenat Interlinking the web of data: challenges and solutions 16 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Data matching through a commonontology
o
URI1 URI2
Resource matchingof datasets
described by thesame ontology
owl:sameAs
Jérôme Euzenat Interlinking the web of data: challenges and solutions 16 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Data matching through a commonontology
o
URI1 URI2
Resource matchingof datasets
described by thesame ontology
owl:sameAs
Jérôme Euzenat Interlinking the web of data: challenges and solutions 16 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Matching with a common ontology
+ Focus the search: only match instances of the same class;– Not sufficient: it remains to identify corresponding entities
+ If keys are defined (OWL 2), this is done;+ At least we know which properties to compare;– Inferring secondary keys may be useful;– Correcting discrepancies: record linkage.
Jérôme Euzenat Interlinking the web of data: challenges and solutions 17 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Record linkage
Name Johann
Date 1665
Place München
NameJohannes
Date1667
PlaceMonaco di Bavaria
Having a common ontology does not solve the problem.
Jérôme Euzenat Interlinking the web of data: challenges and solutions 18 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Data matching with different ontologies(implicit alignment)
o o ′
URI1 URI2
Resource matchingof datasetsdescribed by
different ontologies
owl:sameAs
Jérôme Euzenat Interlinking the web of data: challenges and solutions 19 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Data matching with different ontologies(implicit alignment)
o o ′
URI1 URI2
Resource matchingof datasetsdescribed by
different ontologies
owl:sameAs
Jérôme Euzenat Interlinking the web of data: challenges and solutions 19 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Data matching with different ontologies(implicit alignment)
o o ′
URI1 URI2
Resource matchingof datasetsdescribed by
different ontologies
owl:sameAs
Jérôme Euzenat Interlinking the web of data: challenges and solutions 19 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Independent key computation
I Different span requires different key (France is not a key for INSEE);I Differences in schema and depths makes difference in what is a key
("Paris" is both a department name (DEP_75) and a municipalityname (COM_75056) for INSEE while the region name may be a key forNUTS)
I Keys are often meaningless: they are data-independent anddatabase-dependent, hence they cannot be used for matching entities(REG_11 vs. FR_10).
I rdf:type and insee:nom are keys for INSEE (Region);I nuts:level and nuts:name are keys for NUTS (NUTSRegion);I insee:nom corresponds to nuts:name; there exists a correspondence
between rdf:type in INSEE and nuts:level.
Jérôme Euzenat Interlinking the web of data: challenges and solutions 20 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Independent key computation
I Different span requires different key (France is not a key for INSEE);I Differences in schema and depths makes difference in what is a key
("Paris" is both a department name (DEP_75) and a municipalityname (COM_75056) for INSEE while the region name may be a key forNUTS)
I Keys are often meaningless: they are data-independent anddatabase-dependent, hence they cannot be used for matching entities(REG_11 vs. FR_10).
I rdf:type and insee:nom are keys for INSEE (Region);I nuts:level and nuts:name are keys for NUTS (NUTSRegion);I insee:nom corresponds to nuts:name; there exists a correspondence
between rdf:type in INSEE and nuts:level.
Jérôme Euzenat Interlinking the web of data: challenges and solutions 20 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Data matching with different ontologies(explicit alignment)
o o ′
URI1 URI2
A
Resource matchingof datasetsdescribed by
different ontologies
owl:sameAs
Jérôme Euzenat Interlinking the web of data: challenges and solutions 21 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Data matching with different ontologies(explicit alignment)
o o ′
URI1 URI2
A
Resource matchingof datasetsdescribed by
different ontologies
owl:sameAs
Jérôme Euzenat Interlinking the web of data: challenges and solutions 21 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Data matching with different ontologies(explicit alignment)
o o ′
URI1 URI2
A
Resource matchingof datasetsdescribed by
different ontologies
owl:sameAs
Jérôme Euzenat Interlinking the web of data: challenges and solutions 21 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Data matching with different ontologies(explicit alignment)
o o ′
URI1 URI2
A
Resource matchingof datasetsdescribed by
different ontologies
owl:sameAs
Jérôme Euzenat Interlinking the web of data: challenges and solutions 21 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
INSEE and NUTS: ontology alignment
Territoire_FR
Pays
Region
Departement
Arrondissement
Commune
codenom
chef-lieusubdivision
integer
string
Region
Country
NUTSRegion
LAURegion
name
level
codehasSubRegion
=
≤
≤≤
≤
=
Jérôme Euzenat Interlinking the web of data: challenges and solutions 22 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
INSEE and NUTS: ontology alignment
Territoire_FR
Pays
Region
Departement
Arrondissement
Commune
codenom
chef-lieusubdivision
integer
string
Region
Country
NUTSRegion
LAURegion
name
level
codehasSubRegion
=
≤
≤≤
≤
=
Jérôme Euzenat Interlinking the web of data: challenges and solutions 22 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Ontology alignment
+ Ontology alignment restore the benefits of common ontology: it helpsfocussing the search;
– It is not exact science! (but alignments may be available);+ Ontology alignment and data linking reinforce each others.
Jérôme Euzenat Interlinking the web of data: challenges and solutions 23 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
A simple algorithm
I Find matching concepts [concept matching];I For each of them, determine matching properties based on the similarity
between their values in both datasets [property matching];I From them find property combinations identifying corresponding entities
[key extraction];I Link corresponding entities [link generation].
For instance, nom/RegionINSEE ⊆ name/NUTSRegionNUTS and moreoverthey are unambiguous.
Jérôme Euzenat Interlinking the web of data: challenges and solutions 24 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Simple alignments are not sufficient
Territoire_FR
Region
Departement
Commune
nom
DEP_75
nom
COM_75056
nom
Region
NUTSRegion
name
FR101name
Paris
=
=
=
≤≤
≤
=
=
=
Jérôme Euzenat Interlinking the web of data: challenges and solutions 25 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Simple alignments are not sufficient
Territoire_FR
Region
Departement
Commune
nom
DEP_75
nom
COM_75056
nom
Region
NUTSRegion
name
FR101name
Paris
=
=
=
≤≤
≤
=
=
=
Jérôme Euzenat Interlinking the web of data: challenges and solutions 25 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Simple alignments are not sufficient
Territoire_FR
Region
Departement
Commune
nom
DEP_75
nom
COM_75056
nom
Region
NUTSRegion
name
FR101name
Paris
=
=
=
≤≤
≤
=
=
=
Jérôme Euzenat Interlinking the web of data: challenges and solutions 25 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Expressive alignments are necessary
Region
NUTSRegion
level
hasParentRegion
2 =
FR1=
=
subdivision hasSubRegion=
nom name=
Jérôme Euzenat Interlinking the web of data: challenges and solutions 26 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Expressive alignments are necessary
Region
NUTSRegion
level
hasParentRegion
2 =
FR1=
=
subdivision hasSubRegion=
nom name=
Jérôme Euzenat Interlinking the web of data: challenges and solutions 26 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Expressive alignments are necessary
Region
NUTSRegion
level
hasParentRegion
2 =
FR1=
=
subdivision hasSubRegion=
nom name=
Jérôme Euzenat Interlinking the web of data: challenges and solutions 26 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Query generation
SELECT ?rPREFIX insee: <http://rdf.insee.fr/ontologie-geo-2006.rdf#>PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>FROM <http://rdf.insee.fr/geo/regions-2011.rdf>WHERE {
?r rdf:type insee:Region .}
SELECT ?nPREFIX nuts: <http://ec.europa.eu/eurostat/ramon/ontologies/geographic.rdf#>PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>FROM <http://ec.europa.eu/eurostat/ramon/rdfdata/nuts2008/>WHERE {
?n rdf:type nuts:NUTSRegion .?n nuts:level 2ˆˆxsd:int .?n nuts:hasParentRegion nuts:FR1 .
}
Jérôme Euzenat Interlinking the web of data: challenges and solutions 27 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Query generation
CONSTRUCT { ?r owl:sameAs ?n . }PREFIX insee: <http://rdf.insee.fr/ontologie-geo-2006.rdf#>PREFIX nuts: <http://ec.europa.eu/eurostat/ramon/ontologies/geographic.rdf#>PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>FROM <http://rdf.insee.fr/geo/regions-2011.rdf>FROM <http://ec.europa.eu/eurostat/ramon/rdfdata/nuts2008/>WHERE {
?r rdf:type insee:Region .?r insee:nom ?l .?n rdf:type nuts:NUTSRegion .?n nuts:name ?l .?n nuts:level 2ˆˆxsd:int .?n nuts:hasParentRegion nuts:FR1 .
}
Jérôme Euzenat Interlinking the web of data: challenges and solutions 28 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Outline
Linked data
Methods for data interlinking
General framework for data interlinking
Conclusions
Jérôme Euzenat Interlinking the web of data: challenges and solutions 29 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
What does this mean?
I Ontology alignments are schema-level expression of correspondences;I They are useful for focussing the search;I Expressive alignments are necessary;I They can be turned into SPARQL-based link generators.
but it is also necessary to express instance level constraints:I for converting data (e.g., mph vs. m/s);I for expressing matching constraint on data (e.g., similarity).
Jérôme Euzenat Interlinking the web of data: challenges and solutions 30 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
General framework
o o ′
URI1 URI2
Ontology matching
A
Data interlinking
owl:sameAs
Jérôme Euzenat Interlinking the web of data: challenges and solutions 31 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
General framework
o o ′
URI1 URI2
Ontology matching
A
Data interlinking
owl:sameAs
Jérôme Euzenat Interlinking the web of data: challenges and solutions 31 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
General framework
o o ′
URI1 URI2
Ontology matching
A
Data interlinking
owl:sameAs
Jérôme Euzenat Interlinking the web of data: challenges and solutions 31 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
General framework
o o ′
URI1 URI2
Ontology matching
A
Data interlinking
owl:sameAs
Jérôme Euzenat Interlinking the web of data: challenges and solutions 31 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
General framework
o o ′
URI1 URI2
Ontology matching
A
Data interlinking
owl:sameAs
Jérôme Euzenat Interlinking the web of data: challenges and solutions 31 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Six interlinking tools
RKB-CRS Co-reference resolution for the RKB knowledge base.LD-mapper Linking tool for the Music Ontology.ODD Linker SQL-based linker.
RDF-AI Dataset linker and merger.Silk Linking script engine based on explicit linking description.
KnoFuss Alignment based linker.
and others: ObjectCoRef, LN2R, LIMES. . .
http://melinda.inrialpes.fr
Jérôme Euzenat Interlinking the web of data: challenges and solutions 32 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Six interlinking tools
o o ′
URI1 URI2
owl:sameAs
Data interlinkingODD-Linker, RKB-CRS, LD-Mapper
ASilk, RDF-AI
Ontology matchingKnoFuss
Jérôme Euzenat Interlinking the web of data: challenges and solutions 33 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Six interlinking tools
o o ′
URI1 URI2
owl:sameAs
Data interlinkingODD-Linker, RKB-CRS, LD-Mapper
ASilk, RDF-AI
Ontology matchingKnoFuss
Jérôme Euzenat Interlinking the web of data: challenges and solutions 33 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Six interlinking tools
o o ′
URI1 URI2
owl:sameAs
Data interlinkingODD-Linker, RKB-CRS, LD-Mapper
ASilk, RDF-AI
Ontology matchingKnoFuss
Jérôme Euzenat Interlinking the web of data: challenges and solutions 33 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Six interlinking tools
o o ′
URI1 URI2
owl:sameAs
Data interlinkingODD-Linker, RKB-CRS, LD-Mapper
ASilk, RDF-AI
Ontology matchingKnoFuss
Jérôme Euzenat Interlinking the web of data: challenges and solutions 33 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Directions
I Combination between ontology matching and data interlinking;I Explicit use of alignments or interlinking scripts;I Data-driven algorithms (database key computation).I Domain-specific techniques (geographic data).
Jérôme Euzenat Interlinking the web of data: challenges and solutions 34 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Outline
Linked data
Methods for data interlinking
General framework for data interlinking
Conclusions
Jérôme Euzenat Interlinking the web of data: challenges and solutions 35 / 38
Linked dataMethods for data interlinking
General framework for data interlinkingConclusions
Conclusions
I A large part of linked data added value is in links;I They may not be easy to find;I Many techniques are available for automating interlinking;I Having a general framework may help integrating them.
Jérôme Euzenat Interlinking the web of data: challenges and solutions 36 / 38