29
Designing a Linked Data Migrational Framework for Singapore Government Data Sets Sesagiri Raamkumar Aravind Thangavelu Muthu Kumaar Kaleeswaran Sudarsan Msc(KM) Critical Inquiry in Knowledge Management

Proposed Linked Data Migration Framework for Singapore Government Datasets

Embed Size (px)

DESCRIPTION

Critical Inquiry Presentation on 'Designing a Linked Data Migrational Framework for Singapore Government Datasets'

Citation preview

Page 1: Proposed Linked Data Migration Framework for Singapore Government Datasets

Designing a Linked Data Migrational Framework for Singapore Government Data Sets

•Sesagiri Raamkumar Aravind•Thangavelu Muthu Kumaar• Kaleeswaran Sudarsan

Msc(KM) Critical Inquiry in Knowledge Management

Page 2: Proposed Linked Data Migration Framework for Singapore Government Datasets

AGENDA• Basics of Linked Data• data.gov.sg• Purpose of this project• Migrational Framework

• Eight Steps• Use Cases• Conclusion

Page 3: Proposed Linked Data Migration Framework for Singapore Government Datasets

GovernmentsEnterprises

Libraries &Museums

Social Media Data(Blogs, Facebook)Business

Entertainment

OPPORTUNITY OF LINKING DATA ACROSS VARIOUS DOMAINSAND TYPES

Types of Data

•Factual Data•Transactional Data•Textual Data•Spatial Data•Multimedia•Files & Database

Page 4: Proposed Linked Data Migration Framework for Singapore Government Datasets

Mr.Lee Kuan Yew! an exploration!..

Mr.Brendan Luyt’s Associated publication search…….

(TraditionalApproach) (Linked Data

Approach)

Others….

Page 5: Proposed Linked Data Migration Framework for Singapore Government Datasets

Linked Open Data Cloud (Web of Data)

Page 6: Proposed Linked Data Migration Framework for Singapore Government Datasets

Linked Open Data Cloud (Web of Data)

Page 7: Proposed Linked Data Migration Framework for Singapore Government Datasets

iDA Singapore launched Data.gov.sg portal and mGov@SG public services during June 2011

Data.gov.sg provides 5000+ public data sets from 50 government agencies

Purpose: Building applications, research and for creating applications using the data

Data.Gov.Sg

Page 8: Proposed Linked Data Migration Framework for Singapore Government Datasets

ABC Water Proj (R)

Agency Websites

Singstat publicationsMINISTRIES

XLS

HTML

PDF

Accountant-General's DepartmentAccounting and Corporate Regulatory Authority

Agency For Science, Technology & ResearchAttorney-General’s Chambers

Building & Construction AuthorityCentral Narcotics Bureau

Central Provident Fund Board Civil Aviation Authority of Singapore

Department of StatisticsEconomic Development Board

Energy Market AuthorityHealth Sciences Authority

Housing & Development BoardImmigration & Checkpoints Authority

Infocomm Development Authority of SingaporeInland Revenue Authority of Singapore

Institute of Technical EducationIntellectual Property Office of Singapore

JTC CorporationJudiciary, Subordinate CourtsJudiciary, Supreme CourtLand Transport AuthorityMajlis Ugama Islam Singapura

Maritime & Port Authority of Singapore

Monetary Authority of SingaporeNanyang Polytechnic

National Environment AgencyNational Heritage Board

National Library Board National Parks Board

Ngee Ann Polytechnic People's Association

Public Service DivisionPublic Transport Council

Public Utilities Board Republic Polytechnic

Sentosa Development Corporation Singapore Civil Defence Force

Singapore Customs Singapore Land Authority

Singapore Police ForceSingapore Polytechnic

Singapore Sports CouncilSingapore Workforce Development Agency

Spring Singapore Temasek Polytechnic

Urban Redevelopment Authority

Ministry of Community Development, Youth & Sports

Ministry of Education

Ministry of Foreign Affairs

Ministry of Health

Ministry of Law –Community Mediation Unit

Ministry of Manpower

Ministry of Transport

Media Development Authority

BFABuildings(C)GreenBuilding(E)

C- CommunityCul - Culture

E- EnvironmentEmp- Employment

Edu - EducationH- HealthF- Family

R- RecreationS- Sports

Breast Screen (H)Cervical Screen (H)Healthier Dining (H)

Quit Centers (H)

Infocomm Access (C)Silver infocomm (C)

Wireless Hotspots (R)Child care (F)Disability (F)Elder care (F)

Family (F)Family Friendly Estab (F)

Student Care (F)Comm Mediation Center (C)

After Death Facilities (E)Funeral Palours (E)Dengue Cluster (H)Hawker Center (E)

NEA Offices (E)Recycling Bins (E)

Waste Disposal Site (E)

Waste Treatment (E)

Heritage sites(Cul)Monuments(Cul)

Museums(Cul)

Libraries (Cul)Streets and Places(Cul)

CD Councils (C)Community Clubs (C)

Constituency offices (C)Other facilities (C)

Other Pan networks (C)PA head quarters (C)

Residents Committee(C)Water Venture (C)

National Parks (R)Skyrise greenery (E)

Sports clubs (S)

CET Centers(Emp)WDA Service points(Emp)

Kindergartens (Edu)Get TokenAddress SearchAgency Data SearchStatic Map

Get Layer InfoMashupGet Related Data

Get DirectionsPublic Transportation

Reverse Geocode

Map-related APIs from various agenciesTraffic-related APIs from Land Transport Authority

Tourism-related APIs from the Singapore Tourism BoardEnvironment-related APIs from the National Environment Agency

Library-related data feeds & web services from National Library Board

DGS Eco System

SG DATA

TEXTUAL

SPATIAL

API

THEMES OPERATIONSCATEGORIES

UNSTRUCTURED DATA

STRUCTURED DATA

STRUCTURED DATA

STATUTORY BOARDS

SG Government Data Eco System

Page 9: Proposed Linked Data Migration Framework for Singapore Government Datasets

Drawbacks of Existing Data Ecosystem

•Siloed architecture

•Absence of vocabulary standardization(common language)

•Multiple data consumption end points

•Steep learning curve for developers during application development process

•Absence of interlinking between data sets

Solutions to above identified drawbacks through Linked Data works at multiple levels

Data Storage - Can support distributed storage

Data Representation - Common format(RDF) for both data and metadata.

Data Consumption - via a single output terminal(SPARQL)

Data Interlinking - Use of Ontologies (vocabularies)

IDA can use Linked data on top of their traditional systems instead of going for a complete overhaul

Page 10: Proposed Linked Data Migration Framework for Singapore Government Datasets

http://wheredoesmymoneygo.org/bubbletree-map.html#/~/grand-total--2010-

http://www.sgdi.gov.sg/

http://labs.data.gov.uk/gov-structure/departments/

UK Linked Data Implementation

Page 11: Proposed Linked Data Migration Framework for Singapore Government Datasets

RDF

Subject-Predicate -Object

Jurong belongs to the West Zone

Linked Data Representation Format

http://data.gov.sg/resource/area/Jurong_West

http://data.gov.sg/ontology/property/has_zone

http://data.gov.sg/resource/zone/West

Subject

Predicate

Object

http://w3.org/2003/01/geo/wgs84_pos#/lat http://w3.org/2003/01/geo/wgs84_pos#/long

12.55550.21222

Page 12: Proposed Linked Data Migration Framework for Singapore Government Datasets

Why are we doing this project?

To prescribe a migrational framework for linked data for data.gov.sg (DGS) data sets

First hand view of the required migration activities

Issues anticipated at each step

Evaluation & Recommendation on Linked Data tools

To help IDA in understanding the benefits of Linked Data

Page 13: Proposed Linked Data Migration Framework for Singapore Government Datasets

Framework Formulation Process

• Based on study of Linked Data Migration Research Papers and cookbooks published by the World Wide Web Consortium(W3C)

• Analysis of Linked Data implementations in UK ,US and Brazil

• Evaluation of Linked Data tools with Singapore data sets for recommendation in each step of the framework

• Contemplating on probable issues that could be faced during implementation

Page 14: Proposed Linked Data Migration Framework for Singapore Government Datasets

URA Sites for Sales dataset(Urban Planning)DOS Population and Household Characteristics dataset (Population Demographics)

Age Pyramid of Resident Population

Old Age Support Ratio

Datasets Used for Framework Evaluation

Page 15: Proposed Linked Data Migration Framework for Singapore Government Datasets

Proposed Linked Data Migrational Framework for DGS

Specification Identfication Analysis

Object Modeling

Ontology Modeling

URI Naming

RDF Creation

External Linking

Datasets Publication

Discovery & Exploitation

Re-use Create

S2R D2R A2R

\

Govt Agencies and IDA

Govt Agencies Domain Matter Experts

Ontology Modelers

IDA and Web Architects

Developers

Developers and Domain Experts

Developers

Web Architects

ObjectivesSpecifications

Project Duration

Dataset PrioritizationDataset License SettingImpln Mode Selection

RoadmapArchitecture

Overview

Relational ModelDataset Overview

Drawing Objects in Whiteboard

Conceptual View

Conceptual ViewPublic Vocabularies

Re-use of Existing Vocabularies

Creation of New Vocabularies

OWL, RDFS, RDF Vocabulary files

Resources Class and Properties

Visualization of URI mining process

URI AdministrationURI Lifecycle

ER ModelSpreadsheets,

DBMS, API

Conversion to RDF triples using Mapping files

RDF Triples

Government and external data sets

Linking based on Similarity Algorithms

Outbound Links

RDF TriplesOntologies

SPARQL, APIData InsertionVOID ModelingData Retrieval

API to SPARQL conversion

VOID TriplesJSON data

Actual DataExisting Apps

GamificationCrowdsourcing

Catalog RegistrationExternal Reference

New Apps

PR

OC

ES

S

PR

OC

ES

S

PR

OC

ES

S

PR

OC

ES

S

PR

OC

ES

S

PR

OC

ES

S

PR

OC

ES

S

PR

OC

ES

S

Resource

Allocation

10

Resource

Allocation

15

Resource

Allocation

15

Resource

Allocation

5

Resource

Allocation

20

Resource

Allocation

5

Resource

Allocation

15

Resource

Allocation

15

1

2

3

45

6

78

Page 16: Proposed Linked Data Migration Framework for Singapore Government Datasets

Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8

SpecificationIdentification

Addressing security concerns with licenses.•The Open Database License (ODbL) •Open Data Commons Attribution License•The Creative Commons Licenses

Linked Data only(just URI linking)

1st levelIdeal for testing the URI

lifecycleDecision on URI Administration

Centralized(DGS) vs. Decentralized(Agency)

Linked Data +RDF

2nd levelComplete realization of

Linked data and Semantic Web

standardsDecision to use this mode can be taken

after evaluation of POC

Linked Data for files only(URIs for files)

Optional To improve the

discovery of files in DGS through semantic

annotation

Key Points

Analysis•Understand data.gov.sg database specifications (relational model & ER model) •Seven issues identified at data storage and consumption level

Home

Page 17: Proposed Linked Data Migration Framework for Singapore Government Datasets

Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8

Object ModelingThis is modeling without usage context.*Requires normalization of database model in 3NF form

IssuesPossibility of applying high abstraction and high granularity to objects

Key Learning Ease in identifying the use of common objects across data setsFacilitates brainstorming of relationships between objects

Home

Page 18: Proposed Linked Data Migration Framework for Singapore Government Datasets

Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8

Ontology Modeling

Takes the output conceptual diagram from Object Modeling as input.

Key Impetus•Re-use of popular vocabularies (below table)•Use of STDTrip methodology for arriving at Ontologies for relational databases.

Predicate/Vocabularies Purpose

rdfs:label and skos:prefLabel Naming thingsGeonames Model spatial dataVoID Description Describe RDF schema or vocabularyvCard Describing addressRDF, RDFS Model simple data

Use Case Problem StatementConsider an industrial entrepreneur intending to buy a site from Urban Redevelopment Authority (URA)

Issues• Conflicting vocabulary in data.gov.sg and

OneMap• Different levels of granularity in datasets (ex: Location in URA ‘Site for Sales’ dataset

Home

Page 19: Proposed Linked Data Migration Framework for Singapore Government Datasets

Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8

Ontology Modeling

Date fields, location fields and fields related to measurements in DGS have scope for vocabulary re-use

Vocabulary for the identified data sets (developed using Protege) with screenshots

List of vocabularies required for LOGD implementation

List of tools used for ontology modeling

OUTPUT?ALLOCATION PERCENTAGE?PERSONNEL INVOLVED

Home

Page 20: Proposed Linked Data Migration Framework for Singapore Government Datasets

Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8

URI NamingUniform Resource Indicator (URI) is analogous to assignment of ip address to every computer

Identified URI Administration Modes 1.) Maintained centrally in the DGS platform (resultant URIs will start with http://data.gov.sg/) – RECOMMENDED2.) Maintained by individual agencies (resultant URIs will start with http://ura.gov.sg or http://sla.gov.sg).3.) Maintained externally by third party platforms such as Kasabi (resultant URIs will start with http://data.kasabi.org).

ABOX TBOXhttp://data.gov.sg/ontology/Ministry/ http://data.gov.sg/ministry/MOHhttp://data.gov.sg/ontology/Agency/ http://data.gov.sg/agency/SLAhttp://data.gov.sg/ontology/SiteLocation http://data.gov.sg/location/pioneer_road_northhttp://data.gov.sg/ontology/Race http://data.gov.sg/race/chinese

Dataset ID URAstaticfile001Dataset http://data.gov.sg/dataset/ URAstaticfile001/Class http://data.gov.sg/terms/class/URAstaticfile001/sitesforsaleProperty http://data.gov.sg/terms/property/URAstaticfile001/timeRow 1 http://data.gov.sg/dataset/URAstaticfile001/1Row 1 - A generic column http://data.gov.sg/dataset/URAstaticfile001/1/columnName

Dataset URIs

Issues• Usage of different Linked Data tools can hamper URI naming

• Possibility of Dead links

Key Points

Home

Page 21: Proposed Linked Data Migration Framework for Singapore Government Datasets

Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8

RDF CreationEvaluated 3 tools for each mode of conversion - Google Refine, RDF Views and RDF Sponger

Issues•Absence of intimation about API outages can cause the system to return null or invalid results•Google Refine doesn’t create URIs for each row in the static file•Changes to data.gov.sg tables , API output done without appropriate changes in mapping files will affect RDF conversion

Home

Page 22: Proposed Linked Data Migration Framework for Singapore Government Datasets

Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8

External LinkingExternal Linking is connecting with other data sets in the web of data

Data.gov.sg

WorldBank CIA World Factbook DBpedia FAO Geonames Supreme

CourtFlickr

<http://data.gov.sg/location/bugis> <owl:sameAs> <http://www.dbpedia.org/resource/Bugis><http://data.gov.sg/race/malay> <owl:sameAs> <http://www.dbpedia.org/resource/Malay_race>

Issues•The outbound links made to data sets outside of IDA’s purview can be risky

•Dead links are a vivid possibility during the change of resource URIs or system downtime

Key Points

Home

Page 23: Proposed Linked Data Migration Framework for Singapore Government Datasets

Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8

Datasets Publication

Linked Data API callhttp://data.gov.sg/lda/

childcare/north

SPARQL QuerySelect ?ccWhere {

?cc dgs:haszone dgs:north.?cc dgs:facilitytype dgs:childcare.

}LIMIT 100

TripleStore

LDA-SPARQLMapping file

Conversion from RDF to

JSON

RDF TriplesHttp://data.gov.sg/facility/cc/name1Http://data.gov.sg/facility/cc/name2Http://data.gov.sg/facility/cc/name3

.

.

.Http://data.gov.sg/facility/cc/name100

JSON OutputEntry: name1Entry: name2

.

.

.Entry: name100

Issues• Difficulty for Application developers - SPARQL does not currently support sub-queries, views, stored procedures etc

• Inferencing is not possible with Linked Data API

• Security implementation with 3rd party Linked Data hosting platforms.

Triple Store Metadata Publication

Linked Data API

Linked Data Hosting

Datasets Publication

Recommendations•Linked data hosting platforms are best suited for open license datasets(ex: Singstat publications)

•Use of APIs for updating RDF triples instead of SPARQL Update document

•Use of VOID generators for creating statistics triples

HomeKey

Points

Page 24: Proposed Linked Data Migration Framework for Singapore Government Datasets

Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8

Discovery & Exploitation

Key Theme1.) Internal discovery within Singapore for local citizens

2.) External discovery for attracting usage of Singapore government data in international economic & political research and global issues(water scarcity, Carbon Footprint etc)

• Internal Discovery can be improved by having different end points(SPARQL, API, Apps, RDF Dumps), creating awareness programs on availability of these data sets, employing crowdsourcing and gamification techniques to enhance visibility and utility of these data sets

• External discovery is optional if IDA wishes to see the DGS system being limited to Singapore purview. External discovery can be initiated by registering the datasets in open government dataset portals(Potential candidates are datasets with Open license)

Home

Page 25: Proposed Linked Data Migration Framework for Singapore Government Datasets

Original data

provided by URA

Possible because of the re-use of the

common resource URI Pasir Ris across

data sets

Similarly, location based data from OneMap API is

retrieved for Pasir Ris

Interlinked Datasets Post-Migration

Page 26: Proposed Linked Data Migration Framework for Singapore Government Datasets

Other Interesting Use Cases

Definitely not Science Fiction!

Q & A Engine that works on top of government linked data. Inspired by www.trueknowledge.com

Page 27: Proposed Linked Data Migration Framework for Singapore Government Datasets

Sense-MakingQuestion: Which recent year had a growth rate close to 50% for majority of Singapore based SME?

Step1: Spot the resources in this queryDbpedia Spotlight does just that! – Semantic Information Extraction

Which recent year had a growth rate close to 50% for majority of Singapore based SME

Step2: Identify the relationship between the resources

SME is instance of the Organization class Organization class comes under Singapore country

Growth rate is a property of Sales class Year is a class by itself

Majority is subset of Group class

Step3: Use NLP technique – Syntactic Analysis (Stanford Parser) followed by Focus Extraction for understanding the question

2010 is retuned as the result!

Step 4: Look for RDF triples that meet the criteria

Syntactic Parse tree is generated followed by Access Pattern

Page 28: Proposed Linked Data Migration Framework for Singapore Government Datasets

Summary

Four in-person discussion sessions with IDA, NIIT and SLA

Analysis of Five data.gov.sg system specifications

Evaluation of Four existing Migration Frameworks

Prototyping with Six core Linked Data Tools

Dataset Publication

Virtuoso Universal Server Linked Data API

External LinkingSILK LIMES

RDF CreationGoogle Refine RDF Views RDF Sponger

URI NamingPubby

Ontology ModelingProtégé

Object ModelingConcept Map

Page 29: Proposed Linked Data Migration Framework for Singapore Government Datasets

Summary• Applicability of the framework to Singapore Government

Data• Issues identified in existing Data Eco System• Recommended tools and best practices for each step• Launchpad for SG Linked Data implementation

Final Thoughts…• ROI is not a key metric for Linked Data implementation• Benefits of moving to Linked Data is intangible and may

not be immediately realizable• Volume of work is huge compared to traditional systems