18
On Metadata for Open Data Yannis Charalabidis 25.04.2012

On metadata for Open Data

Embed Size (px)

DESCRIPTION

On an enlarged metadata set for open data classification, allowing for automated processing and linking

Citation preview

Page 1: On metadata for Open Data

On Metadata for Open Data

Yannis Charalabidis

25.04.2012

Page 2: On metadata for Open Data

Introduction

We will try in the next slides to show you what is the level of expectation from metadata

handling from a 2nd generation open data system

Page 3: On metadata for Open Data

Imagine you are in front of the ENGAGE system, and you have your URI from a dataset,

somewhere in the cloud,(copied as string in the clipboard)

And begin …

Page 4: On metadata for Open Data

Prescreening: User only gives URI of the dataset

Enter (paste) the URI of your dataset

_

Page 5: On metadata for Open Data

(then for 30 seconds you see this screen, changing)

Progress of ENGAGE Resource Prescreening: ( 45% ) of jobs completed

Managed to : Identify xls file

Autofill, provisionally: TitleAutofill, provisionally: CreatorCreate unique ENGAGE URI

Calculate keywordsAutofill, provisionally: keywords

……

Page 6: On metadata for Open Data

(When finishing import, the report)Report

ENGAGE managed to automatically, provisionally fill in ( 21 ) of 43 metadata attributes for your dataset.

Your current validity is at ( 45% )

For your dataset to be inserted in the database, you need to continue filling

in ( 5 ) mandatory attributes.Your dataset will then be inserted with validity ( 55% )

If all ( 17 ) non-mandatory attributes are filled in, validity will be maximum, at

70% / limit of the insertion phase.

Please select next action: Continue ParkContinue Park CancelCancel

Page 7: On metadata for Open Data

After import …

… and then, we enter the metadata insertion page with pre-filled data, etc.

When we finish, we get a similar final report.

AND NOW THE ENGAGE METADATA set, that makes all that a possibility:

Page 8: On metadata for Open Data

But,before, some semantics:

Attribute characteristics – notation:

(M) : attribute is Mandatory (cannot be empty)(*) : attribute takes values from a controlled list of terms (codelist), or tree (dag of terms), or table (+) : takes values from an extendible list or tree. User may extend the list during insertion(a) : an auto-filling list (as suggestion) or otherwise automatically calculated attribute(m) : attribute accepts multiple values(v) : attribute entry can be verified through a type-checking algorithm

(( x )) : x is possible, but as an optionno tag : attribute is a simple string entry

---------- for the future -------------(c0), (c1), (c2), (c3) : the importance of attribute in completeness calculation (c3 is higher – mostly important)(q0), (q1), (q2), (q3) : the importance of attribute in data quality calculation (q3 is higher – mostly important)

Page 9: On metadata for Open Data

A. The core attributesMetadata Attribute Type of Attribute Type of codelist

Size of codelist (nodes)

Existing codelists

TitleAutomatic: extracted from the dataset headline of the URI/dataset provided

(M) ((a)) String - - -

PublisherPUB admin tree (100 per country, extendible)

(M)(*)(+)Pointer to Tree Tree of Strings 100 X

countryGreece (ENG)

Creator PUB admin tree (100 per country, extendible)Prompt: same as the publisher

(M)(*)(+)Pointer to Tree Tree of PS entities 100 X

countryGreece (ENG)

CodeAutomatic: ENGAGE automatic classification system (date,country,PSector,type,etc) or ENGAGE URI

(M)(*)(a) String - - -

UserThe user who uploads that. Automatic filling from table of users / login

(*)(a)Pointer to Table Table of Users -

-

Page 10: On metadata for Open Data

B. The outer core attributes Metadata Attribute Type of Attribute Type of codelist

Size of codelist (nodes)

Existing codelists

SubjectText describing the resource in one sentenceIt can be stored in a list and reused

(M)(*)(+)Pointer to List List of strings All resource

subjectsNO

Type List of types: dataset, linkable dataset, visualization, textual information, executable binary, unknown

(M)(*)(m)Pointer to list List of strings 10 ENG

Format xls xml odata … jpd pdf … (appr. 50 format types) (M)(*)(+)

Pointer to listList of strings 50 ENG

Language ISO simplified (5 < 20 (EU) < ISO (3000). Automatic: extract from language settings (when XLS / ISO)

(M)(*) ((a)) (m)Pointer to List List of strings 200 ISO List

(ENG)

Country 5 ENGAGE countries < rest of 27 EU < other countries ISO country list

(M)(*)(m)Pointer to List List of strings 200

ISO List (ENG)

Page 11: On metadata for Open Data

C. The Public Sector ContextMetadata Attribute Type of Attribute Type of codelist

Size of codelist (nodes)

Existing codelists

Public Sector DomainTree of sectors (20: finance, health, social security, etc)Automatic : can be calculated from Creator, if all public sector entities have a domain

(*)(m)(+)Pointer to Tree Tree of strings 20 ENG, GR

Relative Public Service List of public services (i2010 20 basic services, plus “other-reward service”, “othr permission service”, “Other registry entry service”, “Other personal documents service”)

(*)(m)(+)Pointer to List List of strings 24 ENG, GR

Relative Information SystemList of EU and national main information systems (50+50*country)

(*)(m)(+) Pointer to List List of strings 200 GR

Legal Framework Main EU directives on open data (10), main national laws and decrees on open data (10 X country)

(*)(m)(+) Table of Legal Elements 100 GR

Page 12: On metadata for Open Data

D. The Scientific ContextMetadata Attribute Type of Attribute Type of codelist

Size of codelist (nodes)

Existing codelists

Scientific Sector ENGAGE Tree of Scientific Domains

(*)(m)Pointer to Tree Tree of strings 100 Science

Scientific Usage of ResourceENGAGE tree of scientific types/usages: events data (nature or man-made), financial data, health data, etc (20)

(*)(m)(+)Pointer to Tree Tree of strings 20 Science

Intended AudienceList of possible audiences: citizens, enterprises, researchers, public sector managers, public sector officers, policy makers, members of National Parliament, MEP’s, NGO’s etc

(*)(m)(+)Pointer to List Tree of strings 20 ENGAGE

Keywords Initial list made / proposed by ENGAGE System with countries, Psector Domain, Science Domain, Usage. Also get from linked areas / domains / types etc

(*)(m)(+)(a)Pointer to List List of strings 200 -

Page 13: On metadata for Open Data

E. URL’s – URI’s - Links Metadata Attribute Type of Attribute Type of codelist

Size of codelist (nodes)

Existing codelists

Type of Source Link URL / URI / DOI / WS / RSS/ ENGAGE / other (*)(+)

Pointer to List List of Strings 10 ENG

Source Link (URL) String or ENGAGE URL (*)(a). Automatic: put the URL of ENGAGE site

(*) (+) ((a))Pointer to List List of Strings

Codelist is the full list of URI’s in ENGAGE

Yes

Type of Resource link URL / URI / DOI / WS / RSS/ ENGAGE other (*)(+)

Pointer to List List of Strings 10 ENG

Resource Link String or ENGAGE (a). Automatic lists the link it already has.

(*) (+) ((a))Pointer to List List of Strings

Codelist is the full list of URI’s in ENGAGE

Yes

Relevant Resources List of existing URI’s in the system . Automatic: calculates from matching domain+type+ (*)(m)(+)(a) List of Strings

Codelist is the full list of URI’s in ENGAGE

Yes

Page 14: On metadata for Open Data

F. Linked DataMetadata Attribute Type of Attribute Type of codelist

Size of codelist (nodes)

Existing codelists

Linking statusLinkable, linked, non-linked, non-linkable, unknown

(*)Pointer to List List of Strings 5 YES

Linked Data SetURI of a linked dataset. Details of link:

(*)(m)(+)(a)(d)Pointer to List List of URI’s No limit -

Linking Type (PK match) Pointer to List List of Strings 1 -

Matching column of this resource String - - -

Matching column of linked resource String - - -

Columns of this resource, to be included (m) String - - -

Columns of linked resource, to be included (m) String - - -

VisualisationsLinks to visualisations of current resource

(*)(m)(+)(a)(d)Pointer to List List of URI’s No limit -

Page 15: On metadata for Open Data

G. Dates and StatusMetadata Attribute Type of Attribute Type of codelist

Size of codelist (nodes)

Existing codelists

Consideration Started on (v)DATE - - -

Initial Approval / Planning Started on (v)DATE - - -

Planned to be valid on (v)DATE - - -

Validity Started on (v)DATE - - -

Validity to finish on (v)DATE - - -

Rejected on (v)DATE - - -

Substituted on (v)DATE - - -

Status Considered, planned, valid, valid and linked, rejected, outdated, substituted. Automatic: calculation through DATES

(*) (a) Pointer to List List of Strings 8 ENG

Page 16: On metadata for Open Data

H. RatingMetadata Attribute Type of Attribute Type of codelist

Size of codelist (nodes)

Existing codelists

Metadata CompletenessAutomatic: calculated by filled / empty non mandatory items

Number (1-100) - - -

Metadata QualityAutomatic: calculated by specific filled / empty non mandatory items Number (1-100) - - -

Citizen RatingAs reported / calculated by relative users Number (1-100) - - -

Researcher RatingAs reported / calculated by relative users Number (1-100) - - -

Business RatingAs reported / calculated by relative users

Number (1-100)

Number of DownloadsAs reported by the ENGAGE System Number - - -

Density of DownloadsAs number per total period of validity to date Number % - - -

Page 17: On metadata for Open Data

An Infrastructure for Open, Linked Governmental Data Provision towards

Research Communities and Citizens

Proposal Evaluation HearingBrussels 23/2/2011

Not to forget: Metadata codelists where there, since the Hearing … !

Page 18: On metadata for Open Data

Q6: Which types of metadata will you select?

• Exploit work already done by the consortium (DELFT, NTUA, AEGEAN, STFC) in public sector metadata schemas

• Multi-facet design: take under consideration the fact that the data may be used in different contexts, such as research, policy making or by citizens

• Take under consideration the fact that data sources may provide wildly differing metadata – go towards metadata standardisation for Open Data / a major contribution of ENGAGE

• Two-phase metadata design within ENGAGE workplan (Task C1.2: Data and knowledge representation annotation and linking methods). Initial proposal based on Dublin Core, UK eGovernment Metadata Schema and eGMS+, is as following:

Metadata ENGAGE Set Identifier Title CreatorPublisher Country SourceType (*) Format (*) Language (*)Sector (*) Subject (*) Keywords (*)Relative Public Service (*) Relative Information System URL / URI / DOIValidity Date (from – to) Audience (*) Legal FrameworkStatus (*) Relevant Resources Linkded Data Sets (*)

(*) Indicates Controlled Lists / Taxonomies