Upload
3-round-stones
View
274
Download
0
Embed Size (px)
DESCRIPTION
This presentation provides a Jargon-free overview of Linked Open Data. Linked Data is being used by the US EPA for US Government data publication. The Linked Data approach allows for an increased ability to combine data from multiple sources and decreased costs.
Citation preview
Linked Data: Structured Data on the Web
(the jargon-free version) US EPA Linked Data
!Bernadette Hyland, CEO
[email protected] @BernHyland
General: [email protected]
@3RoundStones Main +1-877-290-2127
Agenda• Intros ...
• What is the need?
• Jargon-free overview of Linked Open Data
• Trends in data management
• Government data publication
• EPA is moving towards Linked Data
Demand for environmental data
•High demand for improved information platforms to publish, share and visualize integrated data
•e.g., chemicals, pollution, air quality, regulated facilities
•Goal: Increase data quality & comparability to facilitate access & re-use
Data Sharing & Management Snafu in 3 short acts: https://www.youtube.com/watch?
feature=player_embedded&v=N2zK3sAtr-4
RDF is a lingua franca for data
exchange
• Linked Data is about publishing and consuming data using international data standards
• Based on 20+ year old idea
• A system of linked information systems
GovernmentsGoals: Governmental transparency and/or improved
internal efficiencies (data warehouses)
“We’re moving from managing documents to managing discrete pieces of open data and content which can be tagged, shared, secured, mashed up and presented in the way that is most useful for the consumer of that information.” !
-- Report on Digital Government: Building a 21st Century Platform to Better Serve the American People
What is driving us?
Global requirements
• Comprehensively link legislation & regulations for more effective government
• Explain context, source, version & publication date with the data itself
• We need global standards for metadata
US EPA publishes lots of CSV files ...
40% annual growth in data produced
5% annual growth in IT spending
1.8 ZB
35 ZB
2012 2020
Digital Information Produced
294B
1 Trillion
2 Trillion
3 Trillion
4 Trillion
5 Trillion
Online Ad Impressions
Emails Tweets
Daily (2013)
230M
4.8T
314 million Total population
90 million software end users
55 million users of spreadsheets/databases
13 million “end user programmers”
3 million professional programmers
The United States in 2012
“Most programs today are written not by professional software developers, but by people with expertise in other domains working towards goals for which they
need computational support.”
Readable by people
Data in the Physical World
Machine readable
Readable by
motivated people
Someone else (we don’t know)
Schemas/Vocabularies
Which Copy?
Today’s Data on the Web
Lack of Context
Required Context
my data
collector
collected by
measurement
Michael
first name
Hausenblaslast name
Person
a
a measurement
2011-01-01date
0
valueunits of measure
degrees Centigrade
...
Galway Airport
collected at
Linked Data on the Web
my data
collector
collected by
measurement
Michael
first name
Hausenblaslast name
Person
a
a measurement
2011-01-01date
0
valueunits of measure
degrees Centigrade
...
Galway Airport
collected at
or
Summary of Problems
• How can we archive our data in an open manner?
• How can we record data context?
• How can we record data provenance?
• How can we know whether our data is up to date?
• How can we share our data with others?
Linked Data is a way to
answer these questions
Linked Data
• Provides an international standard mechanism to put reusable data on the World Wide Web
• Provides a single data model with multiple formats
• Provides context, provenance and access
• Allows for both human and machine reuse
Linked Data Principles
• Name data files and elements with URIs
• Use HTTP URIs so people can resolve them on the Web
• Provide useful information at those URIs, using the standards (RDF, SPARQL)
• Include links to other URIs so people can discover more information.
US EPA Linked Data
• Cloud-based Linked Data provision • 2.9M Facilities (FRS)
• 100K substances (SRS) • 25 years of toxic pollution reports (TRI) • 3 years of chemical usage reports (CDR)
• Considering: Hazardous & non-hazardous waste management (RCRA) & GHG data
• FISMA compliant
• Millions of pages driven by < 20 Web templates
• Launch Spring 2014
From WikipediaFrom EPA
Open Street Map
HOW IT IS DONE TODAY ...
Audience for EPA Data
• Middle school student doing a science project
• Concerned citizen worried about local pollution
• Environmental Science PhD from EPA
• Doctor from NIH writing a research paper
How much mercury did Hanson Permanente Cement
release in 2004?
Envirofacts
Finding Hanson Permanente
Finding Mercury Released in 2004
Compliance Report
Potential Audience
• Middle school student doing a science project
• Concerned citizen worried about local pollution
• Environmental Science PhD from EPA
• Doctor from NIH writing a research paper
✔
XX
X
Linked Data
Finding Hanson Permanente
Finding Mercury Released in 20041
2
TRI Report
Data Reuse
Potential Audience
• Middle school student doing a science project
• Concerned citizen worried about local pollution
• Environmental Science PhD from EPA
• Doctor from NIH writing a research paper
✔
✔
✔
✔
Increasing the audience of US EPA
data consumers
NOAA EPA AirNow EPA Sunwise
Wikipedia NLM
• Empower users to create their own views of data to satisfy different applications
• Build a community around the data in which users help each other to curate and connect as needed
• Skip the supermodel - Leave data in the multiple “best of breed” systems; wrap and expose on the Web of Data
Increase re-use by publishing Linked Data
http://3roundstones.com/linking-government-data/http://www.manning.com/dwood
CreditsPopulation density image
(public domain) http://en.wikipedia.org/wiki/File:USA-2000-population-density.gif
2012 population estimate (CC-BY-SA)
http://en.wikipedia.org/wiki/Demographics_of_the_United_States
Programmer estimatesScaffidi, C.; Shaw, M.; Myers, Brad, "Estimating the numbers of end users and end user programmers," Visual Languages and Human-Centric Computing, 2005 IEEE Symposium on , vol., no., pp.207,214, 20-24 Sept. 2005 doi: 10.1109/VLHCC.2005.34
End user programmer quote
Andrew J. Ko, Robin Abraham, Laura Beckwith, Alan Blackwell, Margaret Burnett, Martin Erwig, Chris Scaffidi, Joseph Lawrance, Henry Lieberman, Brad Myers, Mary Beth Rosson, Gregg Rothermel, Mary Shaw, and Susan Wiedenbeck. 2011. The state of the art in end-user software engineering. ACM Comput. Surv. 43, 3, Article 21
Bag of chips ideaOpen, Linked Data for a Global Community, Tim Berners-Lee, W3C, Gov2.0 Expo, Washington DC, May 25-27 2010. https://www.youtube.com/watch?v=1E7lV5_0M38
Social media iconsCourtesy of http://designreviver.com/freebies/6-free-new-social-icons-digg-twitter-stumble-rss-delicious-reddit/
Corporate and product logos, CAMC credit card image and book covers © their respective owners and used under Fair Use for educational purposes
This work is Copyright © 2011 3 Round Stones Inc. It is licensed under the Creative Commons Attribution 3.0 Unported LicenseFull details at: http://creativecommons.org/licenses/by/3.0/
You are free:
to Share — to copy, distribute and transmit the work
to Remix — to adapt the work
Under the following conditions:Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).
Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.