34
1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior Enterprise Architect EPA Enterprise Architecture Team March 26, 2008, Updated April 4, 2008

1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

Embed Size (px)

Citation preview

Page 1: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

1

Enterprise Data Architecture and Implementation:

Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance

Brand NiemannSenior Enterprise Architect

EPA Enterprise Architecture TeamMarch 26, 2008, Updated April 4, 2008

Page 2: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

2

Brief History• March 3 - 11, 2008, Enterprise Data Architecture Discussions and Activities.

– Kevin Kirby, David Prompovitch, Michael Alford, and Brand Niemann.• March 12, 2008, Enterprise Data Architecture Program, Kevin Kirby,

Overview Presentation for CIO Biweekly.– Strategy for Program Growth (see next slide).

• March 13, 2008, Enterprise Data Architecture Briefing, Kevin Kirby, Enterprise Architecture Working Group Session.

– Essentially repeat of March 12th with suggestions (see slide 4).• March 13, 2008, Data Architecture Subcommittee Meeting, Brand Niemann,

Informal Presentation.– Vision & Implementation (see slides 5-8).– Web 2.0 (see slides 9-10).

• March 16-20, 2008, The DAMA International Symposium & Wilshire Meta-Data Conference, Kevin Kirby Attending.

– At least nine presentations on Web 2.0, Wikis, etc. for Metadata and Data Management, etc.

• March 24, 2008, EPA Data Architecture: Overview of Metadata Strategy – Summary of Issues for Data Advisory Council, Kevin Kirby, Enterprise Architecture Team Call.

– Metadata Framework for Discovery & Evaluation and Conceptual Federated Search Architecture.

Page 3: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

3

Vision and Implementation

Data Architecture Component Specific Artifact Example

DRM 2.0 (1)

Description Metadata (4) Spreadsheet *

Context Taxonomy/Ontology (5) Web 2.5 Wiki 

Sharing Data (4) Spreadsheet *

DRM 3.0 (2)

Semantics (3) RDF/SPARQL (6) Middleware 

SOA Services (7)Web 2.5 Wiki (8)Web 3.0 Wiki (9)

Footnotes: See slide 4.

Our initial objective is to see if this Web 2.0 Wiki can be useful in bringing about collaboration across the Metadata Management Functions Matrix, Teams-Tasks Matrix, and DataArchitecture Documents.

A longer range goal would be to see if this Web 2.0 Wiki could be used as an EnterpriseMetadata Management and Application Development Tool (e.g. data and metadata mashups).

Note: Web 2.0 does DRM 2.0 and Web 3.0 does DRM 2.0/3.0!

Page 4: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

4

Footnotes• (1) FEA DRM 2.0 and Report to Congress (2005).• (2) February 6, 2007, and February 5, 2008.• (3) Combines Description and Context from DRM 2.0. See (2).• (4) The data and metadata are combined together (see Brand

Niemann).• (5) Information Architecture (topics and subtopics) and Data

Architecture (data tables and data elements) are integrated. See Web 2.0 Wiki Pilot: Information Classifications.

• (6) This specification defines the syntax and semantics of the SPARQL query language for RDF. SPARQL can be used to express queries across diverse data sources.

• (7) EPA Data Architecture Enterprise Metadata.• (8) Video on data reuse in mashups that will revolutionize EPA data

architecture, data management, and data reuse applications!• * Note: This also works with relational databases.

Page 5: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

5

Vision and Implementation

http://epametadata.wik.is/ (password required to see)

Page 6: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

6

Vision and Implementation

Type Standard Current RepositoryFuture Repository/Collaboration Tool

Web Dublin Core  WebCMS  Web 2.0 Wiki (1)

Data Elements ISO 11179 Environmental Data Registry (EDR) 

Web 2.0 Wiki 

EPA Applications and Databases

Like Dublin Core

Registry of EPA Applications and Databases (READ) 

Web 2.0 Wiki 

Science PortalsLike Dublin Core

Environmental Information Management System (EIMS) 

Web 2.0 Wiki 

GeospatialGeospatial Metadata Standard 

GeoData Gateway  Web 2.0 Wiki 

IndicatorsPeer Review Process 

Report on the Environment 2008 

Web 2.0 Wiki 

(1) Web 2.0 Wiki pages are XML-based and have RSS Feeds!

The EPA Data Architecture Metadata Community of Interest (CoI) is working to integrate the following metadata sources for information sharing and integration across the enterprise and the world.

Page 7: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

7

Web 2.0

Source: Mills Davis, Four Stages of the Web at http://project10x.com/about.php

Page 8: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

8

Web 2.0

• Some basic functionalities:– Author like Word– Edit/comment on every page– Some level of security for every page– Tagging– Versioning– Watchlist– RSS/XML between applications– Search– etc.

Page 9: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

9

Overview of Metadata Strategy

• Direction from July 2007 Meeting:– “Enable to share” means enabling EPA to share data within

programs, across programs, with partners, and with the public. • Purpose and General Approach: Phase 1 (through April

14, 2008):– Objects include: DBMS Data Sets, Unstructured Data (e-mail,

docs), and Multimedia, etc.• Proposed Metadata Framework for Data “Objects”:

– Coverage is Incomplete. Slide 10.• Federated Registries with a Common Front End Search

Tool:– Conceptual Architecture Using Faceted Search. Slide 11.

• Governance Artifacts to Implement this Framework:– A National Data Policy Modeled after NGD.

Page 10: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

10

Metadata Framework for Discovery & Evaluation

Data ObjectRegistry

User

Descriptive/Business

Administrative/Transactional

Structural

Technical

Security/Sensitivity

EP

A M

issio

n A

reas

Scie

nce

Ad

min

istra

tive &

Fin

an

cia

l

Data Taxonomies

MetaData Categories

Quality

Rights Management

Preservation

E-L

oB

Mis

sio

n A

reas

Location & Access

Categories of metadata help the user assess the value of the data set.

Standard taxonomies aid discovery. These might be specific to broad categories like “Admin./Financial”. EPA Data Classification is a start.

Levels of metadata exist within an RDBMS set, especially for evaluating quality and security issues.

Page 11: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

11

Conceptual Federated Search Architecture

User

Faceted Search Engine(Endeca, Intelligenx)

EIMS

Science Data Objects

InformaticaMD

Repository

RDBMS Data Setsmanaged by ETL

?

Non-ETL RDBMS

GDG

Geospatial

Documentum

Records and some documents

Major gap is for RDBMS Data Sets not managed by Informatica

Page 12: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

12

Demonstrations

• Federated• Faceted• Semantic Search• Data• Metadata• Governance• DRM 2.0 Compliance• Information Architecture and Data Architecture• DRM 3.0/Web 3.0• Discovery (Centrifuge) (TRI data pilot slides coming)

Page 13: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

13

Federated

See Multiple Nodes on the Same or Different Web Servers.

Page 14: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

14

Faceted

See Hierarchy of Topics, Subtopics, etc. That Can be Searched.

Page 15: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

15

Semantic Search

See Query Within Context and With Various Semantic Operators.

Page 16: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

16

Data

Screen-scrape This Table and Copy It to Excel and the Structure is Preserved.

Page 17: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

17

Metadata

This is the Highest Quality-Peer Reviewed Metadata the Agency Has Produced.

Page 18: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

18

Taxonomy

This Taxonomy Was Produced by Subject Matter Experts and Peer Reviewed.

Page 19: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

19

Governance

The Words Governance and Provenance Have Both Been Used.

Page 20: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

20

DRM 2.0 Compliance

The Three Requirements for Information Sharing Have Been Satisfied!

Page 21: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

21

Information Architecture and Data Architecture

• Level 1 Top-level Topics • Level 2 Next-level

Subtopics • Level 3 Data Tables • Level 4 Data Elements• See: Getting to Web

Semantics for Government Spreadsheets Pilot (RDF/SPARQL)– http://

semanticommunity.wik.is/People/Brand_Niemann/2008_Semantic_Technology_Conference

Page 22: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

22

DRM 3.0/Web 3.0

Source: Mills Davis, Four Stages of the Web at http://project10x.com/about.php

Page 23: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

23

Discovery (Centrifuge)Centrifuge Systems is a leading provider of next generation business intelligence software that helps organizations discover insights, patterns and relationships hidden in their data. The unique Centrifuge approach allows users to ask open ended questions of their data by interacting with visual representations of the data directly.

Traditional business intelligence solutions require users to define what they want to see in advance and present the results in static dashboards. With Centrifuge, users determine what is of interest “on the fly”, then manipulate the displays directly in a highly interactive fashion. The experience is refreshingly easy-to-use and the resulting insights can be extraordinary.

Centrifuge is used in some of the most demanding applications in the world, including law enforcement, counter-terrorism and homeland defense, to help analysts move from data to discovery.

Page 24: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

24

Centrifuge ServerCentrifuge provides an interactive visualization layer on data such as the Toxics Release Inventory-Made Easy for the Web (TRI-ME WEB). Data can be viewed through a desktop client or a web browser. Here is sample data through a web browser. Centrifuge Server, as a next generation Information visualization system, meets the following requirements:

•Ground-Breaking Interactive Visualization in a Browser• A 100% browser-based thin client• Collaborative Analysis• Modern SOA Architecture

• Geospatial Integration with Google™ Earth• Pluggable, Componentized and Extensible• Easy to Use

Page 25: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

25

Table View This view represents a sample table from TRI-ME WEB dataset downloaded from the EPA website.

Page 26: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

26

Relationship GraphThis relational view represents a bundled graph of MD 2006 data showing Companies linked to Chemicals. This graph shows that two of the primary collections, PBT and TRI, have multiple companies between them. The chemicals have been bundled (grouped) by their chemical classifications.

Page 27: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

27

Relationship Graph Spinoff A subset for specific chemicals of interest can be created (spunoff). In this case, PBT chemicals are shown bundled and connected to the companies associated with them.

Page 28: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

28

Table Spinoff The spinoff concept applies to all views, for example the table view shown on this page.

Page 29: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

29

Relationship Graph of Table SpinoffThis relational view represents a graph of the previous tables spinoff.

Page 30: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

30

Quantitative (Charts) View This quantitative view represents a simple distribution of the number of times a chemical is referenced across all companies and facilities in MD.

Page 31: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

31

Timeline (Temporal) View This temporal view of sample data represents how time based data can be viewed. For example this could represent toxic release events if that data were available and time stamped.

Page 32: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

32

Geospatial View This geospatial view represents a spatial distribution of facilities across Maryland

Page 33: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

33

Detailed Geospatial View This geospatial view represents the locations of toxic chemicals in Baltimore, Maryland.

Page 34: 1 Enterprise Data Architecture and Implementation: Federated, Faceted, Semantic Search of Both EPA Metadata and Data with Governance Brand Niemann Senior

34

DRM 3.0/Web 3.0

http://richard.cyganiak.de/2007/10/lod/