Dealing with Data Diversity in a Smart City Data Hub

Preview:

DESCRIPTION

Keynote speech at the Semantics for Smarter Cities workshop SSC 2014 at ISWC 2014

Citation preview

Dealing with Data Diversity in aSmart City Data Hub

Mathieu d'Aquin - @mdaquin

slideshare.net/mdaquin

Knowledge Media Institute, The Open University

where a penguin is a dataset

Diversity

Why should we care aboutdiversity?

Because diversity is good, and whatmakes data diverse is not the same aswhat makes it more or less relevant

Why should we care aboutdiversity?

Because it is hard to manage

How many species of species ofpenguins/animals/things?

How many biologist to classify them?

and that's purely static... unlike species, new dataappear all the time...

Why should we care aboutdiversity?

The Eskimo language

has 255 differentwords for

"visiting linguist"

Because we might have a lot of it, orwhat we need to manage is verygranular

Data diversity in a Smart City

Example of the MK:Smart project inMilton Keynes, UK ( )mksmart.org

Data diversity in a Smart CityPartners in the MK:Smart project

Data diversity in a Smart CityAreas of the MK:Smart project

Data diversity in a Smart CityMK Data Hub - Where diversity is handled

A concrete exampleWifi-based presence sensors

10-12 can covers an reasonably large enclosed area (here, the refectoryof the Open University);

A concrete exampleWifi-based presence sensors

Use trianglation to find the location of wifi-enabled devices.

A concrete exampleWifi-based presence sensors

Basic statistical analysis to extract patterns of usage of the facility

A concrete exampleWifi-based presence sensors

Basic statistical analysis to extract patterns of usage of the facility

A concrete exampleWifi-based presence sensors

A concrete example: Diversity

A concrete example: Diversity

A concrete example: Diversity

A concrete example: Diversity

A concrete example: Diversity

A concrete example: Diversity

for we use alignments, mappings, links, etc.

How do we usually deal withthis

data heterogenity

Example: The LinkedUp Catalogue of datasetsfor education includes mappings betweenthe vocanulaties of different datasetsdata.linkededucation.org/linkedup/catalogue/

What about diversity at thepolicy level?

What about diversity at thepolicy level?

What about diversity at thepolicy level?

What about diversity at thepolicy level?

VoID and DC to represent datasets, PROV-O for basic provenance.

More structured representation

ODRL for the structured representation of policies and rights

More structured representation

With the tools to deal with it

More structured representation

And the processes

More structured representation

Requires an appropriate representation of dataflows

Reasoning on the way policy-information propagates

http://purl.org/datanode/ns/

An ontology of relationships between data artifacts (DataNodes).

DataNode

Captures the essence of dataflows rather than the process, as a basis formeta-information propagation.

DataNode

Propagating meta informationaccross dataflows

Examples of rules:Duties such as attributions propagate over relations of derivation, butnot necessraly othersPermissions such as the right to redistribute however do notpropagate over relations of derivation, except of specific cases (e.g.copies)Prohibitions such as preventing commercial exploitation propage overderivations

A lot of the semantics for Smart Cities work focus on data heterogeneity.

There is a need to look at data diversity at the meta-information level(here we focus on policy related information).

How to manage, catalogue, keep track of and manipulate a largenumber of datasets with diverse rights, access, validity, scope.

How do we help users/developers in exploring and exploiting thisdiversity...

Discussion/future

Discussion/future

Master of Datasets

Need for a clear, semantic (i.e. ontological) foundation for describingand defining data artefacts.

DataNode is a step towards defining their relationships. Vocabulariessuch as ODRL and VOID focus on specific aspects.

More is needed to formally represent the foundamental descriptors ofdata (scope, validity, policy, ...)

Discussion/future

Thanks!

Mathieu d'Aquin Alessandro Adamou Enrico Daga

Shuangyan Liu Keerthi Thomas Enrico Motta

Recommended