66
Linked Data: Structured Data on the Web (the jargon-free version) US EPA Linked Data Bernadette Hyland, CEO [email protected] @BernHyland General: [email protected] @3RoundStones Main +1-877-290-2127

Linked Data Overview - structured data on the web for US EPA 20140203

Embed Size (px)

DESCRIPTION

This presentation provides a Jargon-free overview of Linked Open Data. Linked Data is being used by the US EPA for US Government data publication. The Linked Data approach allows for an increased ability to combine data from multiple sources and decreased costs.

Citation preview

Page 1: Linked Data Overview - structured data on the web for US EPA 20140203

Linked Data: Structured Data on the Web

(the jargon-free version) US EPA Linked Data

!Bernadette Hyland, CEO

[email protected] @BernHyland

General: [email protected]

@3RoundStones Main +1-877-290-2127

Page 2: Linked Data Overview - structured data on the web for US EPA 20140203

Agenda• Intros ...

• What is the need?

• Jargon-free overview of Linked Open Data

• Trends in data management

• Government data publication

• EPA is moving towards Linked Data

Page 3: Linked Data Overview - structured data on the web for US EPA 20140203

Demand for environmental data

•High demand for improved information platforms to publish, share and visualize integrated data

•e.g., chemicals, pollution, air quality, regulated facilities

•Goal: Increase data quality & comparability to facilitate access & re-use

Page 4: Linked Data Overview - structured data on the web for US EPA 20140203

Data Sharing & Management Snafu in 3 short acts: https://www.youtube.com/watch?

feature=player_embedded&v=N2zK3sAtr-4

Page 5: Linked Data Overview - structured data on the web for US EPA 20140203
Page 6: Linked Data Overview - structured data on the web for US EPA 20140203

RDF is a lingua franca for data

exchange

Page 7: Linked Data Overview - structured data on the web for US EPA 20140203

• Linked Data is about publishing and consuming data using international data standards

• Based on 20+ year old idea

• A system of linked information systems

Page 8: Linked Data Overview - structured data on the web for US EPA 20140203
Page 9: Linked Data Overview - structured data on the web for US EPA 20140203

GovernmentsGoals: Governmental transparency and/or improved

internal efficiencies (data warehouses)

Page 10: Linked Data Overview - structured data on the web for US EPA 20140203

“We’re moving from managing documents to managing discrete pieces of open data and content which can be tagged, shared, secured, mashed up and presented in the way that is most useful for the consumer of that information.” !

-- Report on Digital Government: Building a 21st Century Platform to Better Serve the American People

What is driving us?

Page 11: Linked Data Overview - structured data on the web for US EPA 20140203

Global requirements

• Comprehensively link legislation & regulations for more effective government

• Explain context, source, version & publication date with the data itself

• We need global standards for metadata

Page 12: Linked Data Overview - structured data on the web for US EPA 20140203

US EPA publishes lots of CSV files ...

Page 13: Linked Data Overview - structured data on the web for US EPA 20140203

40% annual growth in data produced

5% annual growth in IT spending

1.8 ZB

35 ZB

2012 2020

Digital Information Produced

294B

1 Trillion

2 Trillion

3 Trillion

4 Trillion

5 Trillion

Online Ad Impressions

Emails Tweets

Daily (2013)

230M

4.8T

Page 14: Linked Data Overview - structured data on the web for US EPA 20140203

314 million Total population

90 million software end users

55 million users of spreadsheets/databases

13 million “end user programmers”

3 million professional programmers

The United States in 2012

Page 15: Linked Data Overview - structured data on the web for US EPA 20140203

“Most programs today are written not by professional software developers, but by people with expertise in other domains working towards goals for which they

need computational support.”

Page 16: Linked Data Overview - structured data on the web for US EPA 20140203

Readable by people

Data in the Physical World

Page 17: Linked Data Overview - structured data on the web for US EPA 20140203

Machine readable

Readable by

motivated people

Page 18: Linked Data Overview - structured data on the web for US EPA 20140203
Page 19: Linked Data Overview - structured data on the web for US EPA 20140203

Someone else (we don’t know)

Schemas/Vocabularies

Page 20: Linked Data Overview - structured data on the web for US EPA 20140203
Page 21: Linked Data Overview - structured data on the web for US EPA 20140203
Page 22: Linked Data Overview - structured data on the web for US EPA 20140203
Page 23: Linked Data Overview - structured data on the web for US EPA 20140203

[email protected]

Which Copy?

Page 24: Linked Data Overview - structured data on the web for US EPA 20140203

Today’s Data on the Web

Page 25: Linked Data Overview - structured data on the web for US EPA 20140203

Lack of Context

Page 26: Linked Data Overview - structured data on the web for US EPA 20140203

Required Context

Page 27: Linked Data Overview - structured data on the web for US EPA 20140203

my data

collector

collected by

measurement

Michael

first name

Hausenblaslast name

Person

a

a measurement

2011-01-01date

0

valueunits of measure

degrees Centigrade

...

Galway Airport

collected at

Page 28: Linked Data Overview - structured data on the web for US EPA 20140203

Linked Data on the Web

my data

collector

collected by

measurement

Michael

first name

Hausenblaslast name

Person

a

a measurement

2011-01-01date

0

valueunits of measure

degrees Centigrade

...

Galway Airport

collected at

or

Page 29: Linked Data Overview - structured data on the web for US EPA 20140203

Summary of Problems

• How can we archive our data in an open manner?

• How can we record data context?

• How can we record data provenance?

• How can we know whether our data is up to date?

• How can we share our data with others?

Page 30: Linked Data Overview - structured data on the web for US EPA 20140203

Linked Data is a way to

answer these questions

Page 31: Linked Data Overview - structured data on the web for US EPA 20140203

Linked Data

• Provides an international standard mechanism to put reusable data on the World Wide Web

• Provides a single data model with multiple formats

• Provides context, provenance and access

• Allows for both human and machine reuse

Page 32: Linked Data Overview - structured data on the web for US EPA 20140203

Linked Data Principles

• Name data files and elements with URIs

• Use HTTP URIs so people can resolve them on the Web

• Provide useful information at those URIs, using the standards (RDF, SPARQL)

• Include links to other URIs so people can discover more information.

Page 33: Linked Data Overview - structured data on the web for US EPA 20140203
Page 34: Linked Data Overview - structured data on the web for US EPA 20140203

US EPA Linked Data

• Cloud-based Linked Data provision • 2.9M Facilities (FRS)

• 100K substances (SRS) • 25 years of toxic pollution reports (TRI) • 3 years of chemical usage reports (CDR)

• Considering: Hazardous & non-hazardous waste management (RCRA) & GHG data

• FISMA compliant

• Millions of pages driven by < 20 Web templates

• Launch Spring 2014

Page 35: Linked Data Overview - structured data on the web for US EPA 20140203
Page 36: Linked Data Overview - structured data on the web for US EPA 20140203
Page 37: Linked Data Overview - structured data on the web for US EPA 20140203
Page 38: Linked Data Overview - structured data on the web for US EPA 20140203
Page 39: Linked Data Overview - structured data on the web for US EPA 20140203

From WikipediaFrom EPA

Open Street Map

Page 40: Linked Data Overview - structured data on the web for US EPA 20140203
Page 41: Linked Data Overview - structured data on the web for US EPA 20140203

HOW IT IS DONE TODAY ...

Page 42: Linked Data Overview - structured data on the web for US EPA 20140203

Audience for EPA Data

• Middle school student doing a science project

• Concerned citizen worried about local pollution

• Environmental Science PhD from EPA

• Doctor from NIH writing a research paper

Page 43: Linked Data Overview - structured data on the web for US EPA 20140203

How much mercury did Hanson Permanente Cement

release in 2004?

Page 44: Linked Data Overview - structured data on the web for US EPA 20140203
Page 45: Linked Data Overview - structured data on the web for US EPA 20140203

Envirofacts

Page 46: Linked Data Overview - structured data on the web for US EPA 20140203
Page 47: Linked Data Overview - structured data on the web for US EPA 20140203
Page 48: Linked Data Overview - structured data on the web for US EPA 20140203
Page 49: Linked Data Overview - structured data on the web for US EPA 20140203
Page 50: Linked Data Overview - structured data on the web for US EPA 20140203

Finding Hanson Permanente

Page 51: Linked Data Overview - structured data on the web for US EPA 20140203

Finding Mercury Released in 2004

Page 52: Linked Data Overview - structured data on the web for US EPA 20140203

Compliance Report

Page 53: Linked Data Overview - structured data on the web for US EPA 20140203

Potential Audience

• Middle school student doing a science project

• Concerned citizen worried about local pollution

• Environmental Science PhD from EPA

• Doctor from NIH writing a research paper

XX

X

Page 54: Linked Data Overview - structured data on the web for US EPA 20140203

Linked Data

Page 55: Linked Data Overview - structured data on the web for US EPA 20140203

Finding Hanson Permanente

Page 56: Linked Data Overview - structured data on the web for US EPA 20140203

Finding Mercury Released in 20041

2

Page 57: Linked Data Overview - structured data on the web for US EPA 20140203

TRI Report

Page 58: Linked Data Overview - structured data on the web for US EPA 20140203

Data Reuse

Page 59: Linked Data Overview - structured data on the web for US EPA 20140203

Potential Audience

• Middle school student doing a science project

• Concerned citizen worried about local pollution

• Environmental Science PhD from EPA

• Doctor from NIH writing a research paper

Page 60: Linked Data Overview - structured data on the web for US EPA 20140203

Increasing the audience of US EPA

data consumers

Page 61: Linked Data Overview - structured data on the web for US EPA 20140203
Page 62: Linked Data Overview - structured data on the web for US EPA 20140203

NOAA EPA AirNow EPA Sunwise

Wikipedia NLM

Page 63: Linked Data Overview - structured data on the web for US EPA 20140203

• Empower users to create their own views of data to satisfy different applications

• Build a community around the data in which users help each other to curate and connect as needed

• Skip the supermodel - Leave data in the multiple “best of breed” systems; wrap and expose on the Web of Data

Increase re-use by publishing Linked Data

Page 64: Linked Data Overview - structured data on the web for US EPA 20140203

http://3roundstones.com/linking-government-data/http://www.manning.com/dwood

Page 65: Linked Data Overview - structured data on the web for US EPA 20140203

CreditsPopulation density image

(public domain) http://en.wikipedia.org/wiki/File:USA-2000-population-density.gif

2012 population estimate (CC-BY-SA)

http://en.wikipedia.org/wiki/Demographics_of_the_United_States

Programmer estimatesScaffidi, C.; Shaw, M.; Myers, Brad, "Estimating the numbers of end users and end user programmers," Visual Languages and Human-Centric Computing, 2005 IEEE Symposium on , vol., no., pp.207,214, 20-24 Sept. 2005 doi: 10.1109/VLHCC.2005.34

End user programmer quote

Andrew J. Ko, Robin Abraham, Laura Beckwith, Alan Blackwell, Margaret Burnett, Martin Erwig, Chris Scaffidi, Joseph Lawrance, Henry Lieberman, Brad Myers, Mary Beth Rosson, Gregg Rothermel, Mary Shaw, and Susan Wiedenbeck. 2011. The state of the art in end-user software engineering. ACM Comput. Surv. 43, 3, Article 21

Bag of chips ideaOpen, Linked Data for a Global Community, Tim Berners-Lee, W3C, Gov2.0 Expo, Washington DC, May 25-27 2010. https://www.youtube.com/watch?v=1E7lV5_0M38

Social media iconsCourtesy of http://designreviver.com/freebies/6-free-new-social-icons-digg-twitter-stumble-rss-delicious-reddit/

Corporate and product logos, CAMC credit card image and book covers © their respective owners and used under Fair Use for educational purposes

Page 66: Linked Data Overview - structured data on the web for US EPA 20140203

This work is Copyright © 2011 3 Round Stones Inc. It is licensed under the Creative Commons Attribution 3.0 Unported LicenseFull details at: http://creativecommons.org/licenses/by/3.0/

You are free:

to Share — to copy, distribute and transmit the work

to Remix — to adapt the work

Under the following conditions:Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).

Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.