Upload
3-round-stones
View
692
Download
2
Embed Size (px)
DESCRIPTION
Government open data strategies aimed at wider access and re-use by entrepreneurs, publishers and the wider US healthcare delivery industry. Presentation to the OMG Standards Community technical workshop on semantics, held in Reston VA on 20-March 2013. Presentation by Bernadette Hyland, CEO 3 Round Stones, Inc and co-chair W3C Government Linked Data Working Group.
Citation preview
The Power of Linked Data for Government and Healthcare
Information Integration
By Bernadette HylandCEO 3 Round Stones, co-chair W3C Gov’t Linked Data WG
This presentation on http://slideshare.net/3roundstones
OMG Technical Meeting Special Event, Reston VA20-Mar-2013
1Wednesday, March 20, 13
Agenda
• Government data publication on the Web• Update on EPA Linked Data Service• Healthcare Delivery Industry’s Appetite• Update on W3C Government Linked Data Working Group
2Wednesday, March 20, 13
3 Round Stones produces the leading platform for the publication of reusable data on the Web. Our commercially supported Open Source platform is used by the Fortune 2000 and US Government agencies to collect, publish and reuse data, both on the public Internet and behind institutional firewalls.
3Wednesday, March 20, 13
http://www.manning.com/dwood/
http://3roundstones.com/linking-government-data/
http://3roundstones.com/linking-enterprise-data/
4Wednesday, March 20, 13
US EPA Linked Data
• Cloud-based Linked Data provision of 3 core programs:
• 2.9M Facilities• 100K substances• 25 years of toxic pollution reports• FISMA compliant• 16 Callimachus templates• Official launch April 2013
5Wednesday, March 20, 13
US GPO• Cloud-based Linked Data provision of persistent URLs for US Government documents:
• 100k+ documents• Used by 1,240 Federal Depository Libraries and public
• In 3rd year of operation• Deemed an “Essential service” supporting US Congress
6Wednesday, March 20, 13
7Wednesday, March 20, 13
Big DataSimple dataComplex dataLegacy data
8Wednesday, March 20, 13
9Wednesday, March 20, 13
Open Government Data
10Wednesday, March 20, 13
“We’re moving from managing documents to managing discrete pieces of open data and content which can be tagged, shared, secured, mashed up and presented in the way that is most useful for the consumer of that information.”
-- Report on Digital Government: Building a 21st Century Platform to Better Serve the American People
Growing chorus ...
11Wednesday, March 20, 13
12Wednesday, March 20, 13
GovernmentsGoals: Governmental transparency and/or improved
internal efficiencies (data warehouses)
13Wednesday, March 20, 13
14Wednesday, March 20, 13
15Wednesday, March 20, 13
Open data + open standards + open platforms
Highly scalable computing on the Cloud
Open Web Standards
5 Star Data (Linked Data), whenever possible
Leverage Open Source tools where practical
16Wednesday, March 20, 13
Use a non-proprietary format• Open Web data exchange formats
• RDF instead of CSV
• Benefits
• Accessibility, Interoperability & Re-use• Reduces the risks of
• “Super model” data warehouse approach
• Budget & schedule over runs
• Confidential info leakage
17Wednesday, March 20, 13
18Wednesday, March 20, 13
Universal Identifiers• It’s the foundation of the
Web
• Others can reference things
• Two references with the same URI are the same thing
• Quick, easy and scaleable
• People keep coming back for more!!
19Wednesday, March 20, 13
20Wednesday, March 20, 13
HELPING DEFINE THE PROCESS
PublishConvertDescribeNameModelIdentify
21Wednesday, March 20, 13
HELPING DEFINE THE PROCESS
PublishConvertDescribeNameModelIdentify
Maintain
21Wednesday, March 20, 13
22Wednesday, March 20, 13
• Start with the basics
• Well curated datasets with relevant data
• Integrate related datasets (e.g., EPA chemical substances, toxic releases & facilities)
• Reach out to developers early
• Emphasize the internal agency benefit
• Address data quality ...
• Multiple approaches including crowed sourcing
A Path to Success
23Wednesday, March 20, 13
Social responsibility of government publishers
• Must specify a license for use
• Publish frequency of data updates
• Ensure data is accurate as possible
• Recognize responsibility to maintain data
• Document & follow a persistence strategy
• Respond to reports of problematic data
24Wednesday, March 20, 13
Callimachushttp://callimachusproject.orghttp://3roundstones.com
25Wednesday, March 20, 13
CONTENTMANAGEMENT
SYSTEM
LINKED DATAMANAGEMENT
SYSTEM
Callimachus
UNSTRUCTURED
TEXT
TEXT
STRUCTURED
DATA
DATA
26Wednesday, March 20, 13
27Wednesday, March 20, 13
Guidance for developers
28Wednesday, March 20, 13
29Wednesday, March 20, 13
From WikipediaFrom EPA
Open Street Map
30Wednesday, March 20, 13
31Wednesday, March 20, 13
We’ve Seen This Before
32Wednesday, March 20, 13
33Wednesday, March 20, 13
User
NOAA US EPA AirNow
DBpediaNational Library of Medicine
US EPA SunWise
34Wednesday, March 20, 13
How much mercury did Elisa’s local cement plant release
in 2004?
35Wednesday, March 20, 13
Linked Data Approach
36Wednesday, March 20, 13
37Wednesday, March 20, 13
Finding Hanson Permanente
38Wednesday, March 20, 13
Finding Mercury Released in 20041
2
39Wednesday, March 20, 13
TRI Report
40Wednesday, March 20, 13
Data Reuse
41Wednesday, March 20, 13
Potential Audience
• Middle school student doing a science project
• Concerned citizen worried about local pollution
• Environmental Science PhD from EPA
• Doctor from NIH writing a research paper
✔
✔
✔
✔
42Wednesday, March 20, 13
HTTP-accessible endpoints capable of returning XML or textual content
Convert XML or textual results to RDF
Render RDF to HTML via templateUser resolves asingle URI to anActive PURL
Multiple targets queriedindependently
1
David Wood1 and Tom [email protected], [email protected]
Active PURLs for Clinical Study Aggregation
The problem: No coordinated view of clinical study information. Information is distributed across departments, subsidiaries and government data sources.
The solution: Gather, convert, aggregate and format for display
Challenges
Next steps
How semantic technologies help
3 Round Stones and AstraZeneca created a system to allow coordinated views of distributed clinical trial information. The system extended the CallimachusProject, an Open Source management system for Linked Data. Persistent URLs, or PURLs, were used to provide globally unique and resolvable identifiers for each clinical study. The PURL concept was extended to enablePURLs to have multiple targets and for the results of each target to undergo arbitrary transformation. PURLs which have such capabilities are called Active PURLs. Information sources relevant to clinical studies were identified, regardless of whether their location was internal or external to the pharmaceutical company'snetwork. Active PURLs were used to resolve data sources having HTTP endpoints capable of returning XML or textual results. Each information source isdynamically transformed into Resource Description Framework (RDF) formats and all sources' results then merged into a single, temporary graph of RDF data.Information is rendered to end users as coordinated HTML descriptions regarding each clinical trial using the Callimachus template engine. Machine-readableversions of the data are also available.
Linked Data techniques can help to address both the availability of clinical trial information and provide a means to build effective information systems using it.Linked Data techniques allow for "cooperation without coordination". Publishers of data provide context for use by third parties in other portions of a distributedenterprise. Users of Linked Data can combine information from multiple sources. Subsequent publication can create a virtuous circle of positive feedback, allowingresearchers, informaticists and support staff to collaboratively and distributively build a reusable knowledge base.
Distributed queries have many knownlimitations, such as the introduction ofmultiple single points of failure in anygiven PURL resolution. HTTP timeouts,auth/auth errors or other network failurescan slow or stop a pipeline from returningcorrectly. Similarly, distributed queries can resultin variant query-time performance due tocomplex network and endpoint perform-ance variances. Proactive caching and cache manage-meant strategies can improve runtimeperformance and protect end users fromthe limitations inherent in a distributedquery architecture. Caching ofintermediate results from endpoints hasnot yet been implemented.
We intend to continue to addressReferences
1. Callimachus Project,
User experience
Users resolve a URL thatprovides a unique identifier fora clinical study, drug, chemicalor other concept managed bythis system. The user maybe presented with the URL onHTML pages, search it via full-text techniques or discover itvia semantic search.
1
2 Users are presented with adynamically generated Webpage representing aggregatedclinical study information. Usersare isolated from the complexand distributed informationenvironment.
43Wednesday, March 20, 13
44Wednesday, March 20, 13
45Wednesday, March 20, 13
46Wednesday, March 20, 13
http://slideshare.com/3roundstones
Twitter : @BernHyland Email. [email protected]
Thank you for participating!!
47Wednesday, March 20, 13
Credits
David NewmanGartner: “Innovation Insight: Linked Data Drives Innovation Through Information-Sharing Network Effects” Published: 15 December 2011
David Wood, ed. Linking Government Data, Springer (2011) http://3roundstones.com/linking-government-data/
US Executive Branch
Digital Government Strategy: Building a 21st Century Platform to Better Serve the American People, http://www.whitehouse.gov/sites/default/files/omb/egov/digital-government/digital-government.html
W3C Linked Data Cookbook http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook
All other photos and images © 2010-2012 3 Round Stones, Inc. and released under a CC-by-sa licenseAll other photos and images © 2010-2012 3 Round Stones, Inc. and released under a CC-by-sa license
48Wednesday, March 20, 13
This work is Copyright © 2011-2012 3 Round Stones Inc.It is licensed under the Creative Commons Attribution 3.0 Unported LicenseFull details at: http://creativecommons.org/licenses/by/3.0/
You are free:
to Share — to copy, distribute and transmit the work
to Remix — to adapt the work
Under the following conditions:Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).
Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.
49Wednesday, March 20, 13