Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal...

Preview:

Citation preview

Towards Easy Matching Between Statistical Linked Data:

Dimension Patterns

Hideto Sato and Wen Wen

2013/10/22 1

FirstInterna0onalWorkshoponSeman0cSta0s0cs(SemStats2013)

22October2013,Sydney

Introduction

•  Formatchingsta0s0caldatafromdifferentsources,upperconceptsandschema-levellinksareimportant.

•  ThreeProblems(1)Asmallnumberofupperconceptsareavailable.(2)CertainpaHernsofdimensiondescrip0onprevent

someschema-levellinks.(3)Usageofexternalcodesishardtofindinaschema-

level.•  Thispaperfocuseson(2)and(3),andproposepa9ernsofdimensiondescrip:ontoimprovethem.

2013/10/22 2

Trial Matching

•  ItalianImmigra0onSta0s0cs⇒ thenumbersofimmigrantstoItaly

    bybirthcountrybyyear•  WorldBankSta0s0cs

⇒ thetotalpopula0on    bycountrybyyear

•  IntegratedSta0s0csPercentageofImmigrantstoItalybycountrybyyear

2013/10/22 3

qb:component

qb:dimension

2013/10/22 4

rdf:type

istat:dataset-DCIS_POPSTRCIT

qb:DataSet istat:dsd-DCIS_POPSTRCIT

qb:structure

qb:component

istat:dimension-paesi

istat:code-paesi

qb:codeList

istat:code-paesi-al

skos:hasTopConcept

hHp://sws.geonames.org/783754/

skos:exactMatch

dataset:world-development-indicators

qb:DataSetd-indicators:structure

qb:structure

sdmx-dimension:refAreaor

sdmx-dimension:visArea

classifica0on:country

classifica0on:country/AL

skos:hasTopConcept

owl:sameAs

qb:dimension

rdf:type rdf:type

qb:codeList

rdfs:subPropertyOf

country-dimension

qb:dimension

istat:code-range-paesi

rdf:type

rdfs:range

sdmx-code:Area

rdfs:subClassOf

country-code-class

rdfs:range

rdf:type

rdfs:subClassOf

ItalianImmigra:onSta:s:cs WorldBankSta:s:cs

rdfs:subPropertyOf

DimensionProperty

Code Class

Code

DataSet

DataStructureDefinition

hHp://sws.geonames.org/ontology#Feature/

qb:component

qb:dimension

2013/10/22 5

rdf:type

istat:dataset-DCIS_POPSTRCIT

qb:DataSet istat:dsd-DCIS_POPSTRCIT

qb:structure

qb:component

istat:dimension-paesi

istat:code-paesi

qb:codeList

istat:code-paesi-al

skos:hasTopConcept

hHp://sws.geonames.org/783754/

skos:exactMatch

dataset:world-development-indicators

qb:DataSetd-indicators:structure

qb:structure

sdmx-dimension:refAreaor

sdmx-dimension:visArea

classifica0on:country

classifica0on:country/AL

skos:hasTopConcept

owl:sameAs

qb:dimension

rdf:type rdf:type

qb:codeList

rdfs:subPropertyOf

country-dimension

qb:dimension

istat:code-range-paesi

rdf:type

rdfs:range

sdmx-code:Area

rdfs:subClassOf

country-code-class

rdfs:range

rdf:type

rdfs:subClassOf

ItalianImmigra:onSta:s:cs WorldBankSta:s:cs

rdfs:subPropertyOf

DimensionProperty

Code Class

Code

DataSet

DataStructureDefinition

hHp://sws.geonames.org/ontology#Feature/

(1)  What role does the dimension play?

• place of residence • place of birth

qb:component

qb:dimension

2013/10/22 6

rdf:type

istat:dataset-DCIS_POPSTRCIT

qb:DataSet istat:dsd-DCIS_POPSTRCIT

qb:structure

qb:component

istat:dimension-paesi

istat:code-paesi

qb:codeList

istat:code-paesi-al

skos:hasTopConcept

hHp://sws.geonames.org/783754/

skos:exactMatch

dataset:world-development-indicators

qb:DataSetd-indicators:structure

qb:structure

sdmx-dimension:refAreaor

sdmx-dimension:visArea

classifica0on:country

classifica0on:country/AL

skos:hasTopConcept

owl:sameAs

qb:dimension

rdf:type rdf:type

qb:codeList

rdfs:subPropertyOf

country-dimension

qb:dimension

istat:code-range-paesi

rdf:type

rdfs:range

sdmx-code:Area

rdfs:subClassOf

country-code-class

rdfs:range

rdf:type

rdfs:subClassOf

ItalianImmigra:onSta:s:cs WorldBankSta:s:cs

rdfs:subPropertyOf

DimensionProperty

Code Class

Code

DataSet

DataStructureDefinition

hHp://sws.geonames.org/ontology#Feature/

(2) What type of code does the dimension use ?

•  Countries •  Domestic Administrative Areas •  River Basins, and so on.

qb:component

qb:dimension

2013/10/22 7

rdf:type

istat:dataset-DCIS_POPSTRCIT

qb:DataSet istat:dsd-DCIS_POPSTRCIT

qb:structure

qb:component

istat:dimension-paesi

istat:code-paesi

qb:codeList

istat:code-paesi-al

skos:hasTopConcept

hHp://sws.geonames.org/783754/

skos:exactMatch

dataset:world-development-indicators

qb:DataSetd-indicators:structure

qb:structure

sdmx-dimension:refAreaor

sdmx-dimension:visArea

classifica0on:country

classifica0on:country/AL

skos:hasTopConcept

owl:sameAs

qb:dimension

rdf:type rdf:type

qb:codeList

rdfs:subPropertyOf

country-dimension

qb:dimension

istat:code-range-paesi

rdf:type

rdfs:range

sdmx-code:Area

rdfs:subClassOf

country-code-class

rdfs:range

rdf:type

rdfs:subClassOf

ItalianImmigra:onSta:s:cs WorldBankSta:s:cs

rdfs:subPropertyOf

DimensionProperty

Code Class

Code

DataSet

DataStructureDefinition

hHp://sws.geonames.org/ontology#Feature/

(3) What common codes are available?

• Geonames • DBPedia

preferably in the schema-level

Matching Data from Different Sources

For Dimension Properties What role does the dimension play? • Place of Birth• Place of Residence

For Code Class (Range of Dimension) What type of code does the dimension use? • Countries• Domestic Administrative Areas • River Basins

For Code Values What common codes are available? • Geonames• DBPedia

2013/10/22 8

Thefollowingques0onsareimportantforeachdimension.Asforanareadimension,

Matching Data from Different Sources

For Dimension Properties What role does the dimension play? • Place of Birth• Place of Residence

For Code Class (Range of Dimension) What type of code does the dimension use? • Countries• Domestic Administrative Areas

For Code Values What common codes are available? • Geonames• DBPedia

2013/10/22 9

Thefollowingques0onsareimportantforeachdimension.Asforanareadimension,UpperConcepts

Schema-LevelDescrip:on

QB and Upper Concepts

QB:TheRDFDataCubeVocabularyQBprovidesabridgetoupperconceptsbyreferringtotheSDMX-RDFvocabulary.

2013/10/22 10

Upper Concepts and SDMX-RDFUpperconcept UpperresourceinSDMX-RDF

Dimension Property PlaceofBirth sdmx-dimension:visAreaPlaceofResidence sdmx-dimension:refArea

Code Class (Range of Dimension) Area sdmx-code:AreaCountry (notdefined)Domes0cArea (notdefined)RiverBasin (notdefined)

2013/10/22 11

(sdmx-dimension:visArea has been removed in the current version of SDMX-RDF.)

eg:cardiff_00pt(local:code)

DimensionDescrip:oninQB

Code

Dimension Property

rdfs:range

rdf:type

sdmx-dimension:refArea(upper:abstract

DimensionProperty)

rdfs:subPropertyOf

LocalUpper

eg:refArea(local:

dimensionProperty)

eg:areaCodeList(local:codeList)

skos:hasTopConcept|qb:hierarchyRoot

rdfs:subClassOf

Code Class

eg:UnitaryAuthority(local:CodeClass)

qb:codeList

sdmx-code:Area(upper:

AbstractCodeClass)

Code List

Data Structure Definition

qb:dimension

2013/10/22 12

Anti-Patterns

•  TwoAn:-Pa9ernspreventdescribingschema-levellinksproperly.– Directuseofanabstractupperresource

– Directuseofanexternalcodeclass

2013/10/22 13

eg:cardiff_00pt(local:code)

An:-Pa9ern:DirectUseofanUpperResource

Code

Dimension Property

rdf:type

LocalUpper

eg:areaCodeList(local:codeList)

Code Class

eg:UnitaryAuthority(local:CodeClass)

sdmx-code:Area(upper:

AbstractCodeClass)

Code List

?

skos:hasTopConcept|qb:hierarchyRoot

qb:codeList

rdfs:range

Data Structure Definitionqb:dimension

rdfs:subClassOf

2013/10/22 14

sdmx-dimension:refArea(upper:abstract

DimensionProperty)

eg:cardiff_00pt(local:code)

ThePa9ernforUsingaLocalCodeClass

Code

Dimension Property

rdfs:range

rdf:type

sdmx-dimension:refArea(upper:abstract

DimensionProperty)

rdfs:subPropertyOf

LocalUpper

eg:refArea(local:

dimensionProperty)

eg:areaCodeList(local:codeList)

skos:hasTopConcept|qb:hierarchyRoot

rdfs:subClassOf

Code Class

eg:UnitaryAuthority(local:CodeClass)

qb:codeList

sdmx-code:Area(upper:

AbstractCodeClass)

Code List

Data Structure Definition

qb:dimension

2013/10/22 15

An:-Pa9ern:DirectUseofanExternalCodeClass

Dimension Property

<hHp://sws.geonames.org/

2653822/>(external:code)

rdf:type

sdmx-dimension:refArea(upper:abstract

DimensionProperty)

rdfs:subPropertyOf

LocalUpper External

Code Classrdfs:range

sdmx-code:Area(upper:

AbstractCodeClass)

Code

?eg:areaCodeList(local:codeList)

qb:hierarchyRoot

Code List

qb:codeList

Data Structure Definition

qb:dimension

2013/10/22 16

rdfs:subClassOf

<hHp://www.geonames.org/ontology#Feature>(external:CodeClass)

eg:refArea(local:

dimensionProperty)

ThePa9ernforUsinganExternalCodeClass

Dimension Property

<hHp://www.geonames.org/ontology#Feature>(external:CodeClass)

<hHp://sws.geonames.org/

2653822/>(external:code)

rdf:type

sdmx-dimension:refArea(upper:abstract

DimensionProperty)

rdfs:subPropertyOf

LocalUpper External

Code Classrdfs:range

sdmx-code:Area(upper:

AbstractCodeClass)

Codeeg:areaCodeList(local:codeList)

qb:hierarchyRoot

Code List

qb:codeList

eg:UnitaryAuthority(local:

CodeClassAdapter)

rdfs:subClassOf owl:equivalentClass

Data Structure Definitionqb:dimension

2013/10/22 17

eg:refArea(local:

dimensionProperty)

Alternate Code Class

Whenusingbothlocalandexternalcodeclasses,itisdifficulttofindwhetheranexternalcodeclassisemployedornot.

Weneedaschema-leveldescrip:onforanalternatecodeclass.

2013/10/22 18

eg:cardiff_00pt(local:code)

UsingLocalandExternalCodeClasses

Code

Dimension Property

rdfs:range

rdf:type

Local

eg:refArea(local:

dimensionProperty)

eg:areaCodeList(local:codeList)

skos:hasTopConcept|qb:hierarchyRoot

Code Class

eg:UnitaryAuthority(local:CodeClass)

qb:codeList

Code List

Data Structure Definition

qb:dimension

2013/10/22 19

<hHp://www.geonames.org/ontology#Feature>(external:CodeClass)

<hHp://sws.geonames.org/

2653822/>(external:code)

rdf:type

?

External

skos:exactMatch|owl:sameAs

eg:cardiff_00pt(local:code)

Proposalofanaddi:onallink(ext:altClass)

Code

Dimension Property

rdfs:range

rdf:type

Local

eg:refArea(local:

dimensionProperty)

eg:areaCodeList(local:codeList)

skos:hasTopConcept|qb:hierarchyRoot

Code Class

eg:UnitaryAuthority(local:CodeClass)

qb:codeList

Code List

Data Structure Definition

qb:dimension

2013/10/22 20

<hHp://www.geonames.org/ontology#Feature>(external:CodeClass)

<hHp://sws.geonames.org/

2653822/>(external:code)

rdf:type

External

ext:altClass

skos:exactMatch|owl:sameAs

From Our Survey

2013/10/22 21

AreaDimension

TimeDimension

DirectUseofanUpperResource

3/12 3/12

DirectUseofanExternalCodeClass

2/12 8/12

UseofAlternateCodeClasses

10/12 1/12

ThecountsareDSDs(DataStructureDefini7ons)foundintheendpointslistedathHp://www.w3.org/2011/gld/wiki/Data_Cube_Implementa0ons.

Conclusion•  Weintroduceddimensionpa9ernsfordescribingschema-levellinksincludingreferencestoupperresourcesandalternateclasslinks.

•  ThesewillextracttheQB'spowerofdescrip0ontoitsfullextent.

•  However,onlyafewupperresourcesareavailablenow.Therefore,thepartofthepaHernsconcerningtoupperconceptsarepreparatoryforthefuture.

•  Wethinkthatitisanurgenttasktoenrichupperresourcessuitableforsta0s0caldata.

2013/10/22 22

Recommended