22
Towards Easy Matching Between Statistical Linked Data: Dimension Patterns Hideto Sato and Wen Wen 2013/10/22 1 First Interna0onal Workshop on Seman0c Sta0s0cs (SemStats 2013) 22 October 2013, Sydney

Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important

Towards Easy Matching Between Statistical Linked Data:

Dimension Patterns

Hideto Sato and Wen Wen

2013/10/22 1

FirstInterna0onalWorkshoponSeman0cSta0s0cs(SemStats2013)

22October2013,Sydney

Page 2: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important

Introduction

•  Formatchingsta0s0caldatafromdifferentsources,upperconceptsandschema-levellinksareimportant.

•  ThreeProblems(1)Asmallnumberofupperconceptsareavailable.(2)CertainpaHernsofdimensiondescrip0onprevent

someschema-levellinks.(3)Usageofexternalcodesishardtofindinaschema-

level.•  Thispaperfocuseson(2)and(3),andproposepa9ernsofdimensiondescrip:ontoimprovethem.

2013/10/22 2

Page 3: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important

Trial Matching

•  ItalianImmigra0onSta0s0cs⇒ thenumbersofimmigrantstoItaly

    bybirthcountrybyyear•  WorldBankSta0s0cs

⇒ thetotalpopula0on    bycountrybyyear

•  IntegratedSta0s0csPercentageofImmigrantstoItalybycountrybyyear

2013/10/22 3

Page 4: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important

qb:component

qb:dimension

2013/10/22 4

rdf:type

istat:dataset-DCIS_POPSTRCIT

qb:DataSet istat:dsd-DCIS_POPSTRCIT

qb:structure

qb:component

istat:dimension-paesi

istat:code-paesi

qb:codeList

istat:code-paesi-al

skos:hasTopConcept

hHp://sws.geonames.org/783754/

skos:exactMatch

dataset:world-development-indicators

qb:DataSetd-indicators:structure

qb:structure

sdmx-dimension:refAreaor

sdmx-dimension:visArea

classifica0on:country

classifica0on:country/AL

skos:hasTopConcept

owl:sameAs

qb:dimension

rdf:type rdf:type

qb:codeList

rdfs:subPropertyOf

country-dimension

qb:dimension

istat:code-range-paesi

rdf:type

rdfs:range

sdmx-code:Area

rdfs:subClassOf

country-code-class

rdfs:range

rdf:type

rdfs:subClassOf

ItalianImmigra:onSta:s:cs WorldBankSta:s:cs

rdfs:subPropertyOf

DimensionProperty

Code Class

Code

DataSet

DataStructureDefinition

hHp://sws.geonames.org/ontology#Feature/

Page 5: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important

qb:component

qb:dimension

2013/10/22 5

rdf:type

istat:dataset-DCIS_POPSTRCIT

qb:DataSet istat:dsd-DCIS_POPSTRCIT

qb:structure

qb:component

istat:dimension-paesi

istat:code-paesi

qb:codeList

istat:code-paesi-al

skos:hasTopConcept

hHp://sws.geonames.org/783754/

skos:exactMatch

dataset:world-development-indicators

qb:DataSetd-indicators:structure

qb:structure

sdmx-dimension:refAreaor

sdmx-dimension:visArea

classifica0on:country

classifica0on:country/AL

skos:hasTopConcept

owl:sameAs

qb:dimension

rdf:type rdf:type

qb:codeList

rdfs:subPropertyOf

country-dimension

qb:dimension

istat:code-range-paesi

rdf:type

rdfs:range

sdmx-code:Area

rdfs:subClassOf

country-code-class

rdfs:range

rdf:type

rdfs:subClassOf

ItalianImmigra:onSta:s:cs WorldBankSta:s:cs

rdfs:subPropertyOf

DimensionProperty

Code Class

Code

DataSet

DataStructureDefinition

hHp://sws.geonames.org/ontology#Feature/

(1)  What role does the dimension play?

• place of residence • place of birth

Page 6: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important

qb:component

qb:dimension

2013/10/22 6

rdf:type

istat:dataset-DCIS_POPSTRCIT

qb:DataSet istat:dsd-DCIS_POPSTRCIT

qb:structure

qb:component

istat:dimension-paesi

istat:code-paesi

qb:codeList

istat:code-paesi-al

skos:hasTopConcept

hHp://sws.geonames.org/783754/

skos:exactMatch

dataset:world-development-indicators

qb:DataSetd-indicators:structure

qb:structure

sdmx-dimension:refAreaor

sdmx-dimension:visArea

classifica0on:country

classifica0on:country/AL

skos:hasTopConcept

owl:sameAs

qb:dimension

rdf:type rdf:type

qb:codeList

rdfs:subPropertyOf

country-dimension

qb:dimension

istat:code-range-paesi

rdf:type

rdfs:range

sdmx-code:Area

rdfs:subClassOf

country-code-class

rdfs:range

rdf:type

rdfs:subClassOf

ItalianImmigra:onSta:s:cs WorldBankSta:s:cs

rdfs:subPropertyOf

DimensionProperty

Code Class

Code

DataSet

DataStructureDefinition

hHp://sws.geonames.org/ontology#Feature/

(2) What type of code does the dimension use ?

•  Countries •  Domestic Administrative Areas •  River Basins, and so on.

Page 7: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important

qb:component

qb:dimension

2013/10/22 7

rdf:type

istat:dataset-DCIS_POPSTRCIT

qb:DataSet istat:dsd-DCIS_POPSTRCIT

qb:structure

qb:component

istat:dimension-paesi

istat:code-paesi

qb:codeList

istat:code-paesi-al

skos:hasTopConcept

hHp://sws.geonames.org/783754/

skos:exactMatch

dataset:world-development-indicators

qb:DataSetd-indicators:structure

qb:structure

sdmx-dimension:refAreaor

sdmx-dimension:visArea

classifica0on:country

classifica0on:country/AL

skos:hasTopConcept

owl:sameAs

qb:dimension

rdf:type rdf:type

qb:codeList

rdfs:subPropertyOf

country-dimension

qb:dimension

istat:code-range-paesi

rdf:type

rdfs:range

sdmx-code:Area

rdfs:subClassOf

country-code-class

rdfs:range

rdf:type

rdfs:subClassOf

ItalianImmigra:onSta:s:cs WorldBankSta:s:cs

rdfs:subPropertyOf

DimensionProperty

Code Class

Code

DataSet

DataStructureDefinition

hHp://sws.geonames.org/ontology#Feature/

(3) What common codes are available?

• Geonames • DBPedia

preferably in the schema-level

Page 8: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important

Matching Data from Different Sources

For Dimension Properties What role does the dimension play? • Place of Birth• Place of Residence

For Code Class (Range of Dimension) What type of code does the dimension use? • Countries• Domestic Administrative Areas • River Basins

For Code Values What common codes are available? • Geonames• DBPedia

2013/10/22 8

Thefollowingques0onsareimportantforeachdimension.Asforanareadimension,

Page 9: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important

Matching Data from Different Sources

For Dimension Properties What role does the dimension play? • Place of Birth• Place of Residence

For Code Class (Range of Dimension) What type of code does the dimension use? • Countries• Domestic Administrative Areas

For Code Values What common codes are available? • Geonames• DBPedia

2013/10/22 9

Thefollowingques0onsareimportantforeachdimension.Asforanareadimension,UpperConcepts

Schema-LevelDescrip:on

Page 10: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important

QB and Upper Concepts

QB:TheRDFDataCubeVocabularyQBprovidesabridgetoupperconceptsbyreferringtotheSDMX-RDFvocabulary.

2013/10/22 10

Page 11: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important

Upper Concepts and SDMX-RDFUpperconcept UpperresourceinSDMX-RDF

Dimension Property PlaceofBirth sdmx-dimension:visAreaPlaceofResidence sdmx-dimension:refArea

Code Class (Range of Dimension) Area sdmx-code:AreaCountry (notdefined)Domes0cArea (notdefined)RiverBasin (notdefined)

2013/10/22 11

(sdmx-dimension:visArea has been removed in the current version of SDMX-RDF.)

Page 12: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important

eg:cardiff_00pt(local:code)

DimensionDescrip:oninQB

Code

Dimension Property

rdfs:range

rdf:type

sdmx-dimension:refArea(upper:abstract

DimensionProperty)

rdfs:subPropertyOf

LocalUpper

eg:refArea(local:

dimensionProperty)

eg:areaCodeList(local:codeList)

skos:hasTopConcept|qb:hierarchyRoot

rdfs:subClassOf

Code Class

eg:UnitaryAuthority(local:CodeClass)

qb:codeList

sdmx-code:Area(upper:

AbstractCodeClass)

Code List

Data Structure Definition

qb:dimension

2013/10/22 12

Page 13: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important

Anti-Patterns

•  TwoAn:-Pa9ernspreventdescribingschema-levellinksproperly.– Directuseofanabstractupperresource

– Directuseofanexternalcodeclass

2013/10/22 13

Page 14: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important

eg:cardiff_00pt(local:code)

An:-Pa9ern:DirectUseofanUpperResource

Code

Dimension Property

rdf:type

LocalUpper

eg:areaCodeList(local:codeList)

Code Class

eg:UnitaryAuthority(local:CodeClass)

sdmx-code:Area(upper:

AbstractCodeClass)

Code List

?

skos:hasTopConcept|qb:hierarchyRoot

qb:codeList

rdfs:range

Data Structure Definitionqb:dimension

rdfs:subClassOf

2013/10/22 14

sdmx-dimension:refArea(upper:abstract

DimensionProperty)

Page 15: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important

eg:cardiff_00pt(local:code)

ThePa9ernforUsingaLocalCodeClass

Code

Dimension Property

rdfs:range

rdf:type

sdmx-dimension:refArea(upper:abstract

DimensionProperty)

rdfs:subPropertyOf

LocalUpper

eg:refArea(local:

dimensionProperty)

eg:areaCodeList(local:codeList)

skos:hasTopConcept|qb:hierarchyRoot

rdfs:subClassOf

Code Class

eg:UnitaryAuthority(local:CodeClass)

qb:codeList

sdmx-code:Area(upper:

AbstractCodeClass)

Code List

Data Structure Definition

qb:dimension

2013/10/22 15

Page 16: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important

An:-Pa9ern:DirectUseofanExternalCodeClass

Dimension Property

<hHp://sws.geonames.org/

2653822/>(external:code)

rdf:type

sdmx-dimension:refArea(upper:abstract

DimensionProperty)

rdfs:subPropertyOf

LocalUpper External

Code Classrdfs:range

sdmx-code:Area(upper:

AbstractCodeClass)

Code

?eg:areaCodeList(local:codeList)

qb:hierarchyRoot

Code List

qb:codeList

Data Structure Definition

qb:dimension

2013/10/22 16

rdfs:subClassOf

<hHp://www.geonames.org/ontology#Feature>(external:CodeClass)

eg:refArea(local:

dimensionProperty)

Page 17: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important

ThePa9ernforUsinganExternalCodeClass

Dimension Property

<hHp://www.geonames.org/ontology#Feature>(external:CodeClass)

<hHp://sws.geonames.org/

2653822/>(external:code)

rdf:type

sdmx-dimension:refArea(upper:abstract

DimensionProperty)

rdfs:subPropertyOf

LocalUpper External

Code Classrdfs:range

sdmx-code:Area(upper:

AbstractCodeClass)

Codeeg:areaCodeList(local:codeList)

qb:hierarchyRoot

Code List

qb:codeList

eg:UnitaryAuthority(local:

CodeClassAdapter)

rdfs:subClassOf owl:equivalentClass

Data Structure Definitionqb:dimension

2013/10/22 17

eg:refArea(local:

dimensionProperty)

Page 18: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important

Alternate Code Class

Whenusingbothlocalandexternalcodeclasses,itisdifficulttofindwhetheranexternalcodeclassisemployedornot.

Weneedaschema-leveldescrip:onforanalternatecodeclass.

2013/10/22 18

Page 19: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important

eg:cardiff_00pt(local:code)

UsingLocalandExternalCodeClasses

Code

Dimension Property

rdfs:range

rdf:type

Local

eg:refArea(local:

dimensionProperty)

eg:areaCodeList(local:codeList)

skos:hasTopConcept|qb:hierarchyRoot

Code Class

eg:UnitaryAuthority(local:CodeClass)

qb:codeList

Code List

Data Structure Definition

qb:dimension

2013/10/22 19

<hHp://www.geonames.org/ontology#Feature>(external:CodeClass)

<hHp://sws.geonames.org/

2653822/>(external:code)

rdf:type

?

External

skos:exactMatch|owl:sameAs

Page 20: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important

eg:cardiff_00pt(local:code)

Proposalofanaddi:onallink(ext:altClass)

Code

Dimension Property

rdfs:range

rdf:type

Local

eg:refArea(local:

dimensionProperty)

eg:areaCodeList(local:codeList)

skos:hasTopConcept|qb:hierarchyRoot

Code Class

eg:UnitaryAuthority(local:CodeClass)

qb:codeList

Code List

Data Structure Definition

qb:dimension

2013/10/22 20

<hHp://www.geonames.org/ontology#Feature>(external:CodeClass)

<hHp://sws.geonames.org/

2653822/>(external:code)

rdf:type

External

ext:altClass

skos:exactMatch|owl:sameAs

Page 21: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important

From Our Survey

2013/10/22 21

AreaDimension

TimeDimension

DirectUseofanUpperResource

3/12 3/12

DirectUseofanExternalCodeClass

2/12 8/12

UseofAlternateCodeClasses

10/12 1/12

ThecountsareDSDs(DataStructureDefini7ons)foundintheendpointslistedathHp://www.w3.org/2011/gld/wiki/Data_Cube_Implementa0ons.

Page 22: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important

Conclusion•  Weintroduceddimensionpa9ernsfordescribingschema-levellinksincludingreferencestoupperresourcesandalternateclasslinks.

•  ThesewillextracttheQB'spowerofdescrip0ontoitsfullextent.

•  However,onlyafewupperresourcesareavailablenow.Therefore,thepartofthepaHernsconcerningtoupperconceptsarepreparatoryforthefuture.

•  Wethinkthatitisanurgenttasktoenrichupperresourcessuitableforsta0s0caldata.

2013/10/22 22