Upload
bernadette-hyland-wood
View
189
Download
1
Embed Size (px)
Citation preview
Bernadette Hyland
CEO & co-founder
11911 Freedom Drive, Suite 850
Reston, VA 20190
Tel. +1-571-331-3758
[email protected]@BernHyland
[email protected]@3RoundStones
Extend Your Reach.
Linked Data for Smarter Decisions.
Follow up information prepared forRobin Thottungal, Chief Data Scientist / Director of Analytics
US Environmental Protection Agency - Feb 26, 2016
Today’s reality at EPA
»Tens of thousands of sources
»Many formats - JSON, XML, CSV, PDF, PPT, SHP, SHX, text, binary…
»Thousands of data silos
»No single source of truth
»Varied interpretations
»Brittle interfaces - lack of interoperability
Image Credit: Smart Data Collective
Wide Variety of Data at EPA
3
Image Credit: MarkLogic, see http://www.marklogic.com/resources/marklogic-semantics-datasheet/resource_download/datasheets/
Credit: Frederick Giasson, Data Scientist & Software Developer, http://fgiasson.com/blog/index.php/2014/07/23/big-structures-where-the-semantic-web-meets-artificial-intelligence/
Potential at EPA …
• Findable data
• Accessible data
• Interoperable data
• Re-usable data
• Shared context
• Data Platforms (HDFS, NoSQL)
Linked Data is helping to extend & augmentEPA’s significant investment in enterprise relational technologies
How?
By leveraging NoSQL Data Platforms that rigorously adhere to international data interoperability standards. *
* Relevant international data exchange standards are published by the W3C, OGC, IEEE
Image Credit: MarkLogic
Graph databases, as a subset of NoSQL databases, are the most efficient way to look at the relationships between data
items, patterns of relationships and interactions.Image Credit: Cray, see http://www.cray.com/blog/graph-databases-101/
Graph Databases 101
Hadoop Integration»While over 90% of the world’s data has been created in the last two
years, EPA has tremendous variety of data requires the “right tool for the job”
»Historic data (“short, wide, complex data”) vs.
»Granular sensor & GIS data (“long skinny data”)
»Core mission-based systems with robust historic data, includes:
»Toxics Release Inventory (TRI)
»Facilities Registry (FRS)
»RCRA Handler
»EPA’s enterprise information architecture should include a data platform that leverages Hadoop: HDFS and MapReduce, and accommodates EPA’s robust data landscape.
»Must support modern, open source tools for application development, visualizations, crowdsourcing, and deployment on the Web
8
One option - MarkLogic Integrates Hadoop Ecosystem &EPA’s Robust Data Landscape
Image Credit: http://www.marklogic.com/what-is-marklogic/features/hadoop-integration/
EPA Robust Data Ecosystem is adaptable using a Linked Data Approach
» Makes data integration faster and easier » By using a global addressing scheme, HTTP URIs.
» Uses semantics to “glue” together data faster.» Common semantic definitions link traditional relational
models.» No more out of data documentation using standard
vocabularies.
» Robust search and discovery by leveraging the semantic graph.
» Scales to the Web!
9
All modern data platforms deployed at EPA should
»Support options for data modeling - Linked Data (JSON-LD, RDF), SQL (JSON, XML)
»Native store and query of documents, blobs and structured data.
»Standards-based query interface across documents and data, e.g., Full support for SPARQL 1.1
»Offer enterprise functionality including high availability & disaster recovery, scalability & elasticity, ACID transactions
»Be deployable on FedRamp certified cloud provider certifying controls for security, high availability, disaster recovery
»Scale to billions of statements, triples, etc.
»Store unstructured data across clusters like Hadoop, making it easy to move data partitions.
»Much but not all of EPA’s data is well suited for a Linked Data approach.
»Linked Data is based on 20+ year old idea, a system of linked information systems
M A N N I N G
David WoodMarsha ZaidmanLuke RuthWITH Michael Hausenblas
FOREWORD BY Tim Berners-Lee
Structured data on the Web
Linked Data
Goals: Governmental transparency and/or improved internal efficiencies
Governments Worldwide are using a Linked Data Approach
Linked Data Platform is in QA now! https://usepa.3roundstones.net Anticipated to move to production in 2016.
shared innovation™
Search for facilities where we live. Unlike many EPA Web portals, linked data is human AND machine readable data. No screen scraping is required. Encourages re-use (discourages data silos)
Click to drill down to pollution reports that combine data from 5 previously unconnected data silos.
EPA collects granular pollution data. Linked Data opens up the data to a much wider audience in a human readable format.
Previously, only people who employed complex screen scraping techniques could get at this data. Now, EPA open data is available using an international data standard, with one click!
Use of shared vocabularies, e.g. Places, Geographis, Dublin Core, Geo, FOAF, ORG, Vcard are the “lingua franca” of data interoperability
Case StudyUsing EPA Linked Data to assist chronic asthma/COPD patients
with timely weather alerts
Funded by
3 Round Stones provides commercial application
support on the cloud or behind the enterprise firewall using
@3RoundStones http://3RoundStones.com
CONTENT MANAGEMENT
SYSTEM
LINKED DATA MANAGEMENT
SYSTEM
Callimachus
UN
ST
RU
CT
UR
ED
T
EX
T
TE
XT
ST
RU
CT
UR
ED
D
AT
A
DA
TA
Callimachus Enterprise customers are creating data-driven applications with data from leading graph
databases:
Callimachus is a scalable Web application server for publishing and consuming open data
Who uses it?
• Government, international publishers, healthcare / life sciencesWhat pain does Callimachus address?
• Integration of data silos where a graph approach is needed• Rapid creation of visualizations, dashboards (mashups) & info graphics• Less expensive solution to a data warehouse
Example apps?
• Collaborative knowledge management • Publishing workflow• Drug discovery / clinical trials • Predictive Analytics
data interoperability & portability
Supports:• HTML5, XHTML5, CSS3, JavaScript• XQuery, XProc, XPath, XSLT• SPARQL 1.1 Query, Update, Federated Query,
Service Description, Property Paths, Graph Store HTTP Protocol
• RDF/XML, RDF/Turtle, JSON-LD, SPARQL XML, SPARQL JSON
Callimachus is fanatical about
Contractor (3 Round Stones, Inc.)
Public
Application, Script or automated client
Web Browser
SPARQL endpointREST APIResource URIs
Linked Data management systemlocated at a Tier 1 Cloud Provider
(FISMA compliant)
RDF Database
Registered developer
“Big Data Is Important, but Open Data Is More Valuable” As change agents, enterprise architects can help
their organizations become richer through strategies such as open data.
David Newman, VP Research, Gartner
Open Source Enterprise License
Community supported Commercial support
in-browser development, deployment, backups
Linked Data publication
User profiles, social sharing
Document, app management
OpenAnnotation support
External datasources
Shared deployments
Realms (virtual hosts)
Enterprise management
Cloud deployments
Callimachus
Callimachus™, the Callimachus logo, Callimachus Enterprise™, the Callimachus Enterprise logo and tagline, are trademarks of 3 Round
Stones, Inc. and are registered in the United States and abroad. Copyright © 2011-2016 3 Round Stones, Inc. All rights reserved.
Callimachus Enterprise