NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC

Preview:

DESCRIPTION

An Open Public Data Exchange for New York City, submitted to the NYCBigApps 2010 Challenge.

Citation preview

Joel NatividadTCG

Thursday, June 9, 2011SemTech 2011

NYC DataWebA platform for Integrating Public Data into NYC.gov

joelnatividad
Typewritten Text
Click here for narrated version

About Me

• TCG Software

• Software Services arm of “The Chatterjee Group”

• Several Portfolio companies in Lifesciences, Telecom, Aviation, Energy, Real Estate, & Info Technology

• Headquartered in NYC

• Delivery Centers in Bangalore, Kolkata & Mumbai

• Look after Knowledge Engineering Practice of TCG

Background

• stimulate development of apps that improve access to info and govt transparency, and;

• encourage innovation & the creation of new IP with commercial potential

Main Goals

CROWDSOURCING

CROWDSOURCING

• Wisdom of the Crowd

• Self-selecting, motivated developers

• Bang for the Buck

• Ignites Entrepreneurship

CROWDSOURCING

• Challenge: Improve Recommendation Algorithm by 10%

• Dataset:

• 100 million ratings (training set)

• Half a million Users

• 18 thousand movies

• Prize:One million US Dollars

STATISTICS

• just 6 days into contest, Cinematch bested by 1%

• 20,000 Teams, 150 countries

• Entrants:

• Bell Labs

• Opera Solutions

• Well-renowned universities

CROWDSOURCING

• Challenge: Improve Recommendation Algorithm by 10%

• Dataset:

• 100 million ratings (training set)

• Half a million Users

• 18 thousand movies

• Prize:One million US Dollars

STATISTICS

• just 6 days into contest, Cinematch bested by 1%

• 20,000 Teams, 150 countries

• Entrants:

• Bell Labs

• Opera Solutions

• Well-renowned universities

CROWDSOURCING

• Washington DC CTO - Vivek Kundra

• First Federal CIO - Vivek Kundra

• First Federal CIO - Vivek Kundra

• Open Government Initiative

• Recovery.gov

• Data.gov

• USAspending.gov

• IT Dashboard

• Performance.gov

• Fedspace

• Citizen Services Dashboard

• First Federal CIO - Vivek Kundra

• Open Government Initiative

• Recovery.gov

• Data.gov

• USAspending.gov

• IT Dashboard

• Performance.gov

• Fedspace

• Citizen Services Dashboard

• First Federal CIO - Vivek Kundra

• Open Government Initiative

• Recovery.gov

• Data.gov

• USAspending.gov

• IT Dashboard

• Performance.gov

• Fedspace

• Citizen Services Dashboard

• First Federal CIO - Vivek Kundra

• Open Government Initiative

• Recovery.gov

• Data.gov

• USAspending.gov

• IT Dashboard

• Performance.gov

• Fedspace

• Citizen Services Dashboard

• First Federal CIO - Vivek Kundra

• Open Government Initiative

• Recovery.gov

• Data.gov

• USAspending.gov

• IT Dashboard

• Performance.gov

• Fedspace

• Citizen Services Dashboard

}Life Support

• First Federal CIO - Vivek Kundra

• Open Government Initiative

• Recovery.gov

• Data.gov

• USAspending.gov

• IT Dashboard

• Performance.gov

• Fedspace

• Citizen Services Dashboard

}Life SupportBudget slashed

from $34 million to

$8 million

Open Data in NYC

Council Member Gale Brewer

$ 500 million!!!

Why $ 500 million?!?!

Why $ 500 million?!?!

“Integrated” Inter-Agency System

Data Integration Alphabet Soup

SOAEAI

ORB

SOAPRPC

XML

XSLTJMS

EJB

MOM

MDA

BPM BPEL POJO

Data Integration Alphabet Soup

SOAEA

I

ORB

SOAP

RPC

XML

XSLTJMS

EJB

MOM

MDABPM BPEL POJO

Principles

• Cost Effective (NOT $500 million dollars)

• Easy to Use (Developers/Publishers/Citizens)

• based on Open Standards

• Low Adoption Curve

• Help Accelerate Open Data Innovation

• Useable Data Now!

bionic hand

The Next Web of Open Linked DataFebruary 2009

Useable Data Now

• “Beautiful” Website

• Useable by Developers/Publishers/Citizens

• based on Open Standards

• Low Adoption Curve

• Help Accelerate Open Data Innovation

• Useable Data Now!

What  NYCBigApps  Developers  were  Doing

Siloed Data

46

ETL Processes

• Spend inordinate amount of time interpreting data

• Massaged Data was then staged locally

• Developers kept reinventing the wheel

• Limited Data mashups

• Applications disconnected from NYCDatamine

Text

Download &Decipher

There must be a Better Way

How it Started

• Oct 12, 2010 - NYCBigApps 2.0 announced

• Nov 9, 2010 - NYCBigApps 2.0 kickoff meeting

• late Nov 2010 - spoke with Revelytix/Spry about collaborating

• early Dec 2010 - started work on NYCDataWeb

• Jan 26, 2011 ~4:30p - submitted entry

What  We  Did

51

Query &Results

Siloed Data

MappingOntology

MetadataOntology

DomainOntology

Optimizer

PlannerIndexes

Re-Writer

Cache

Re-Writer

Indexes

Optimizer

Rules

Planner

Rules

Definitions

“Beautiful” Website

Three dashboards were built

• NYC Agile Analytics (Spry)

• NYCreation (SMW+)- visualized SPARQL query results

• NYCmantics (SMW+)- NYC datamine explorer

What’s Next?

Semantic Gap

Semantic Gap

Developers

?!?

Semantic Gap

3.0

3.0Developers

JumpStart Semantics

3.0

3.0

The Computer for the  rest of us.

Semantics for the  rest of us.

Semantics for the  REST of us.

Phase 2Aug 2011 (Powered by NYCDataWeb)

• Hide Complexity(Simplicity = Adoption)

• Incorporate the whole NYC datamine

• Make it easier for Publishers

• Make it easier for Developers

• Make it easier for Citizens

• Open-source collaboration with vendors & other institutions

• Incorporate the best of Socrata and data.gov

• Improved Visualizations

Phase 2Aug 2011 (Powered by NYCDataWeb)

• Hide Complexity(Simplicity = Adoption)

• Incorporate the whole NYC datamine

• Make it easier for Publishers

• Make it easier for Developers

• Make it easier for Citizens

• Open-source collaboration with vendors & other institutions

• Incorporate the best of Socrata and data.gov

• Improved Visualizations

• Position NYCDataWeb as the accelerated data mashup platform

Phase 3Nov 2011 (NYCBigApps 2011)

• DataWeb Deployment Framework SMW bundle

• More Data Sources (Federator - Spinner)

• Linked Open Data

• Make it easier STILL for Publishers, Developers and Citizens

• Enable Widespread adoption of NYCDataWeb(NYCDataWeb bootcamp)

NYCInformation

Web

The  Broader  Vision

85

Query &Results

DomainOntology

RDF RDF

RDF RDF

RDF

WebPages

Sensorss

Partners

OntologyRDF

Agency  Data  Other

Triplestores

Phase 4Post NYC BigApps 2011

• Multiple solutions powered by NYCDataWeb

• <Your city/community/company here> DataWeb

• Help foster a viable ecosystem of Linked Data

• ... keep standing on the shoulders of giants

Semantic Web

Hans Rosling shows the best stats you've ever seen

February 2006

PUBLIC

PUBLIC

We need your help & feedback

A Platform for Integrating Public Data into NYC.gov

Find out more athttp://knoodl.com/ui/groups/NYC_Homepage

CREDITS• Lego Faceparty picture by RichardAM (http://www.richard-am.net/)

• Lego Inauguration Pictures from various Flickr Users (sluggobear, Atwater, Dan Hontz)

• Lego Luke looses his Hand by Flickr user wwwayazdotcom

• Tim Berners-Lee highlight from TED (http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html)

• Hans Rosling highlight from TED (http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html)

• FlowerPowerpont2.pptx provided by Anna Rosling Rönnlund of gapminder

• “Star Wars Gangsta Rap” highlight, SizzlechestXXX (http://www.youtube.com/watch?v=Ij4w7ChpuaM)

• Various screenshots provided by Revelytix, Spry Inc. and TCG Software Services