Upload
3-round-stones
View
295
Download
3
Embed Size (px)
DESCRIPTION
A presentation by 3 Round Stones to the US EPA on the new Linked Open Data Management System, including Linked Open Data on 4M facilities (from FRS), 25 years of Toxic Release Inventory (TRI), chemical substances (SRS), and Resource Conservation and Recovery Act (RCRA) content. This represents one of the largest Open Data projects published by a federal government agency using Open Source Software (OSS), Open Web Standards and government Open Data.
Citation preview
Bernadette HylandCEO & co-founder
David WoodCTO & co-founder
1400 Key Blvd, Ste 100
Arlington VA 22209
Tel. +1-877-290-2127
[email protected]@BernHyland
[email protected]@prototypo
[email protected]@3RoundStones
Extend Your Reach.
Better Data. Smarter Decisions
Resource Conservation and Recovery Act information published as
Linked Open Data
Presented: 20-Nov-2014
With everything else happening in the world, why does
Linked Open Data matter anyway??
Taxpayers spend billions of dollars for our government to
collect data
We, the people, expect government
to treat information as an asset. Information must comply with
regulations (Quality of Information Act, Section 508, protects PII) and
should be:public, accessible, described, reusable, complete, timely,
sustainable over election cycles
4
Credits: WV Chemical spill: http://www.nytimes.com/2014/01/11/us/west-virginia-chemical-spill.htmlHurricane Sandy: http://www.nytimes.com/2012/10/28/us/hurricane-sandy-on-collision-course-with-winter-storm.htmlEbola: http://www.nytimes.com/interactive/2014/07/31/world/africa/ebola-virus-outbreak-qa.html
Linked Open Data A fast way to combine, visualize &
share data from government & the public Web
Vital to first responders, scientists, policy makers, journalists &
the general public
US Federal Government is listening … “Open Data” per M13-13*
• Public
• Accessible
• Described
• Reusable
• Complete
• Timely
• Managed Post-Release
• Project Open Data
• OMB & OSTP online tools, best practices & schema to help agencies implement M13-13. See Project Open Data
• May 9, 2014 the Digital Accountability & Transparency Act (DATA Act) became Public Law 113-101
The goal of treating Information as an asset is
not new …
“Linked Data was part of my initial vision for the Web and is an important part of the Web’s
future. The Web took off as a web of hyperlinked documents which were exciting to read, but which could not be effectively used as
data.
“Linked Data was part of my initial vision for the Web and is an important part of the Web’s future. The Web
took off as a web of hyperlinked documents which were exciting to read, but which could not be
effectively used as data.”
- Tim Berners-Lee
We all know the ground truth of data on the Web
Lots of [government] open data without labels or context
What is needed is …data that describes itself
Linked Open Data is called “self-describing” data
Linked Data is “A method of publishing structured data so that it can be interlinked &
become more useful. … Extends Web pages to share information in a way that can be
read automatically by computers.”
- Sir Tim Berners-Lee
Linked Data on the Web
my data
collector
collected by
measurement
Michael
first name
Hausenblaslast name
Person
a
a measurement
2011-01-01date
0
valueunits of measure
degrees Centigrade
...
Galway Airport
collected at
or
Quick update on US Government
Open Data Project
• On 3rd iteration of a data catalog, (now using CKAN)• >500k datasets from 200+ USG authorities
• Sustained executive support for data.gov via OMB & OSTP - Project Open Data• GSA team engaging with Open Data / OSS / standards
community• Health, Energy, Law, Education & Public Safety specific
communities in place.• Agencies are [beginning] to name Chief Data Officers
But we still have a lot to do …
RCRA = Resource
Conservation and Recovery
Act
A search for “EPA RCRA” shows displayed
the first dataset 6th position :-(
This dataset is just one piece of a complex set
of data in understanding solid
waste reporting
First 5 results are for
Facilities Registry
Service …
For example, The Right-to-Know Network is a
consumer of EPA open data from
data.gov
They’ve build some nice
visualizations!
But the Toxics Release Inventory (TRI) is
complicated data . The RTF Network
would have benefited from more context
had it been available from the EPA…
RTK Network provides access
to machine readable content (as XML) but … it lacks context
This data does not use shared vocabularies
:-( No units of
measure, No definition of codes
The power of Open
Apps created in days using Open Government Data + Open Source
+ Open Web Standards … On the cloud
Linked Data Management SystemFor government open data publishing
Funded by
Landing page for new EPA Open Data site
Search for facilities in your neighborhood…
Click through to an individual facility
Site allows people to view by map or by table
This app shows nuclear power plants regulated by EPA
Apps using data from multiple EPA programs +
Open Data
1
2
3
4
5
6
Key to data sources: 1 Open Street Maps (OSS) 2 Raw data available for developers (RDF/XML) 3 EPA Resource Conservation and Recovery Act (RCRA) 4 & 5 EPA Facilities (FRS) 6 EPA Toxic Release Inventory (TRI)
Pollution graphs created in < 1 week using Open Source Software & EPA Linked Data
Pollution reports from multiple EPA programs available for a facility, not previously possible
Use of shared vocabularies, e.g. Places, Geographis, Dublin Core, Geo, FOAF, ORG, Vcard are the “lingua franca” of data interoperability
V4 Handler Module
V5 Handler Module• HHANDLER5
• HBASIC
• HNAICS5
• LU_NAICS
• HSTATE_ACTIVITY5
• LU_STATE_ACTIVITY
• HOWNER_OPERATOR5
• HUNIVERSAL_WASTE5
• LU_UNIVERSAL_WASTE
• HWASTE_CODE5
• LU_WASTE_CODE
• HCERTIFICATION5
• HOTHER_PERMIT5
New
Linked Data Model
(RCRA Facility ID)
(FRS Facilities ID)
owl:sameAs
rdfs:label
(FRS State)
frs:state
(FRS Region)
frs:region
rcra:isNonNotifer
vcard:hasAddress
rcra:landType
vcard:hasAddress
street-address
region
postal-code
country-name
vcard:Address
a
locality
(address)
county-name
street-address
region
postal-code
country-name
locality
(address)
(Appropriate RCRA Class)
a
foaf:based_near
frs:state
frs:state
Handler
asubClassOf
(land type)
rcra:LandType
ardfs:comment
rdfs:label
(RCRA Activity)
rcra:hasActivity
vcard:Postal
a
(non-notifier code)
rdfs:label
rcra:NonNotifierCode
a
(RCRA Activity)
vcard:Postal
rcra:Activity
a rcra:receivedReportOn
xsd:Date
rcra:reportTypeor
rcra:activityTypercra:reportedInCycle
rcra:inaccessibleDueTo
rcra:has_naics
(Owner/Operator)
rcra:has_current_ownerrcra:has_past_owner
rcra:has_current_operatorrcra:has_past_operator
vcard:hasAddress
a
street-address
region
postal-code
country-name
locality
(address)
(State Activity Type) rcra:state_activity
(NAICS code)
rcra:active_status
rcra:naics_cycle rdfs:comment
rdfs:label
rdfs:comment
rdfs:label
frs:state
rcra:active_status
xsd:Boolean
xsd:Boolean
(source type)
rcra:ReportTypeor
rcra:ActivityType
ardfs:comment
rdfs:label
(accessibility)
rcra:AccessibilityCode
a
rdfs:comment
rdfs:label
(RCRA Facility ID)
rcra:hasActivity (State or Region )
rcra:hasRegulator
rdfs:label
frs:Stateor
frs:Region
a
(universal waste type)
rcra:reportsUniversalWasteType rdfs:label
rcra:UniversalWasteType
a
rcra:isActive
xsd:Boolean
rcra:accumulated
xsd:Boolean
rcra:generated xsd:Boolean
(waste type)
rcra:reportsWasteType
rdfs:label
rcra:WasteTypea
rcra:isActive
xsd:Boolean
rcra:hasRegulator
(certification)
rcra:hasCertification
rcra:Certification
a rcra:hasCertificationSequence
rcra:certifiedOnxsd:Integer
(point of contact)
rcra:hasPOC
foaf:Person
a
foaf:name
foaf:title
(other permit)
rcra:hasOtherPermit
rdfs:comment
rcra:OtherPermit
a
rcra:hasPermitNumber
Issues: Incomplete Human-readable descriptions
LU_UNIVERSAL_WASTE
• Still missing California code descriptions (3)
• Should be entered in V5 Handler Module
LU_STATE_ACTIVITY
• 118 of 224 codes missing descriptions (53%)
• Alabama’s recovered from https://rcrainfo.epa.gov/rcrainfo/help/dataentry/rpt_lu_state_activity.pdf
• Should be entered in V5 Handler Module
Issues: Incomplete Human-readable descriptions
LU_WASTE_CODE
• Most descriptions excluded, e.g.
• "from br conversion”
• “Description”
• “?”
• Some descriptions cleaned, e.g.
• "from br conversion [UN1255 is the UN-NA code for petroleum naphtha]” → “petroleum naphtha”
• “WASTE PCBs” → “Waste PCBs”
Q&A Next steps for RCRA as
Linked Data
WeatherHealthA mobile app for chronic asthma/COPD
patients with weather alerts
Funded by
User
NOAA US EPA AirNow
DBpediaNational Library of Medicine
US EPA SunWise
OrgpediaAn open organizational data project
on public & private companies
Funded by
Callimachus apps allow for crowdsourcing
How did we handle data publishing & application
development US EPA, Sentara Healthcare &
Orgpedia?
The leading Web application server for Linked Data
Fanatically standards compliant **
Used to creating data-driven applications that combine data across silos
** http://www.w3.org/2013/data/
<HTML>
Enterprise Data Documents
Read/ Write
Point to, include
Our customers use Callimachus to:
Create responsive apps with many different data sources &
types of data
CONTENT MANAGEMENT
SYSTEM
LINKED DATA MANAGEMENT
SYSTEM
UN
ST
RU
CT
UR
ED
T
EX
T
TE
XT
ST
RU
CT
UR
ED
D
AT
A
DA
TA
Callimachus Enterprise customers are creating data-driven applications with data from leading
graph databases:
Do not recreate the wheel!
Summary
• Billions of dollars are spent by taxpayers for government to collect useful information - e.g., geospatial data, population, healthcare, medicine & clinical trials, environment, energy, law, education …
• Data consumers must help government to fulfill its goal to treat “information as an asset” by participating & giving feedback
• Steady forward progress has been made however, take care to not re-create the wheel!
• Use Open Data, Open Source, Web standards & published best practices whenever possible
• More work to be done …
Addi%onal Resources• “Open by Default” presenta%on by Dr. David Wood to Virginia Commonwealth officials
10/7/2014, see hJp://www.slideshare.net/3roundstones/open-‐by-‐default-‐39976290 – Open Data is the idea that "certain data should be freely available to everyone to use and
republish as they wish, without restric%ons from copyright, patents or other mechanisms of control”. Open Data follows similar “open” concepts that have proven to be valuable in the informa%on economy such as Open Standards, Open Source SoRware, Open Content and has been followed more recently by varia%ons on the theme such as Open Science and Open Government.
– Linked Data Developer website, see hJp://linkeddatadeveloper.com/
– Linked Data: Structured Data on the Web, see hJp://books.google.com/books/about/Linked_Data.html?id=rA8-‐mQEACAAJ
– Add Linked Data to HTML with RDFa.info, see hJp://seman%cweb.com/new-‐resource-‐for-‐web-‐developers-‐announced-‐add-‐linked-‐data-‐to-‐html_b28813
–See also RDFa website on GitHub, see hJps://github.com/rdfa/rdfa-‐website
57