20
Baseline Findings EPA Enterprise Data Architecture / Data Management Metadata Mike Fleckenstein, Practice Leader, MDM, Project Performance Corp. [email protected] 571-527-6453

Baseline Findings EPA Enterprise Data Architecture / Data Management Metadata Mike Fleckenstein, Practice Leader, MDM, Project Performance Corp. [email protected]

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Baseline Findings EPA Enterprise Data Architecture / Data Management Metadata Mike Fleckenstein, Practice Leader, MDM, Project Performance Corp. mfleckenstein@ppc.com

Baseline FindingsEPA Enterprise Data Architecture / Data Management Metadata

Mike Fleckenstein, Practice Leader, MDM, Project Performance Corp.

[email protected]

Page 2: Baseline Findings EPA Enterprise Data Architecture / Data Management Metadata Mike Fleckenstein, Practice Leader, MDM, Project Performance Corp. mfleckenstein@ppc.com

Types of Data

Transactional data o Measurements at a point in timeo Dollars earned or units soldo Used for trend analysis

Reference dataoEntity by which transactions measured o‘Country’, ‘Prefix’ and ‘IndustryoOften inconsistently and redundantly stored within an organization

Master data oSingle version of the truth oKey corporate reference entities like ‘Customer’, ‘Location’ and ‘Product’

Metadata oDescribes objects by connecting objects to the subjects they are about

Page 3: Baseline Findings EPA Enterprise Data Architecture / Data Management Metadata Mike Fleckenstein, Practice Leader, MDM, Project Performance Corp. mfleckenstein@ppc.com

Types of Metadata

Technical - data sources, access protocol (ODBC, JDBC, SQL*NET, etc.), physical schema (database definition, table definition, column definition, etc), logical data source (ER models, object models, etc.)

Example: people within IT supporting financial reporting know that the financial data mart resides on machine "XPT001;" the data mart is refreshed, "12 a.m. every Saturday night;" data is sourced from "Hyperion GL" and period data was captured in "AP column.”

Business - contextual data about the information retrieved; taxonomies that define business organizations and product hierarchies; controlled vocabulary or reference data that are used to define business terms such as a medical dictionary, financial terminology and such.

Example: people in the finance department know performance reports come "once a month;" "GPR" stands for "Global Performance Report;" "AP7" means "Accounting Period Number 7;" and accounting period starts in "February." These descriptions are business meta data.

Page 4: Baseline Findings EPA Enterprise Data Architecture / Data Management Metadata Mike Fleckenstein, Practice Leader, MDM, Project Performance Corp. mfleckenstein@ppc.com

Metadata & Related Terms

o Metadata describes objects, and one of the ways in which it does that is by connecting objects to the subjects they are about

o Controlled vocabulary is a closed list of subjects, that can be used for classification

o Taxonomy is a subject-based classification that arranges the terms in the controlled vocabulary into a hierarchy

o Thesauri take taxonomies and extend them to make them better able to describe the world by not only allowing subjects to be arranged in a hierarchy

Page 5: Baseline Findings EPA Enterprise Data Architecture / Data Management Metadata Mike Fleckenstein, Practice Leader, MDM, Project Performance Corp. mfleckenstein@ppc.com

Taxonomy

Metadata can be organized using a taxonomy

Helps an audience find information more easily

Blue lines reflect metadata; black lines reflect taxonomy

Blue lines – metadata about the paper

Black lines – subject-based taxonomy

United States

New York State

New York City

Manhatten

Brooklyn

Queens

Staten Island

Bronx

Navigating Your Way Around New York City

John Doe & Molly Pepper

September, 2008

Title:

Author:

Publication Date:

Page 6: Baseline Findings EPA Enterprise Data Architecture / Data Management Metadata Mike Fleckenstein, Practice Leader, MDM, Project Performance Corp. mfleckenstein@ppc.com

Taxonomy Core Characteristics

o Simple terminologyo Looser, flatter and more intuitive than

traditional taxonomieso E.g. Eight top levels, three levels deep eacho Usability in favor of detailo Fewer ‘clicks’o Must be easy to altero Don’t overanalyze with too many ‘what ifs’o United understanding

Page 7: Baseline Findings EPA Enterprise Data Architecture / Data Management Metadata Mike Fleckenstein, Practice Leader, MDM, Project Performance Corp. mfleckenstein@ppc.com

Taxonomy Categorization Schemes

Hard

est E

asie

st

Method Definition Examples

Facet-based Information categorized into multiple taxonomies or “stackonomies” based on unique but pervasive characteristics including topic, function, etc.

Wines by regionFrance > Alsace

Wines by typeWhite > Chardonnay

Wines by price

Subject-oriented

Information categorized by subject or topic. Instantive - each child category is an instance of the parent category Partitive - each child category is a part of the parent category

water pollution, soil pollution, air pollution…

Functional Information categorized by the process to which it relates

employment, staffing, training

Organizational Information categorized by corporate departments or business entities.

Human Resources, Marketing, Accounting, Research…

Document Type Information categorized by the type of document

presentations, expense reports, press releases …

Page 8: Baseline Findings EPA Enterprise Data Architecture / Data Management Metadata Mike Fleckenstein, Practice Leader, MDM, Project Performance Corp. mfleckenstein@ppc.com

Thesauri (e.g. ISO2788)

BT ( Broader Term) - refers to the term above this one in the hierarchy

SN (Scope Node) - a string attached to the term explaining its meaning

USE - refers to another term that is preferred to this term

TT (Top Term) - refers to the topmost ancestor

RT (Related Term)- refers to a term, related to this term, without being a synonym

United States

New York State

New York City

Manhatten

Brooklyn

Queens

Staten Island

Bronx

Navigating Your Way Around New York City

John Doe & Molly Pepper

September, 2008

Title:

Author:

Publication Date:

BT

USENYC

TT

Burroughs RT

The largest city in New York State and in the

United States

SN

Page 9: Baseline Findings EPA Enterprise Data Architecture / Data Management Metadata Mike Fleckenstein, Practice Leader, MDM, Project Performance Corp. mfleckenstein@ppc.com

Metadata Maturity Model

I. Ad Hoc

II. Discovered

III. Managed

IV. Integrated

V. Optimized

METADATA MANAGEMENTThe organization of technical and business metadata with the goal to advance the sharing, retrieving and understanding of enterprise information assets.

WITH NO METADATA MGMT Information is lost or

hiddenData integration is costlyCannot support everyday

business Information is difficult to

findPartial & dated informationLoss of trust in data

Page 10: Baseline Findings EPA Enterprise Data Architecture / Data Management Metadata Mike Fleckenstein, Practice Leader, MDM, Project Performance Corp. mfleckenstein@ppc.com

Metadata Maturity Model – Phase I

I. Ad Hoc

II. Discovered

III. Managed

IV. Integrated

V. Optimized

PEOPLESmall group of rouge metadata

warriorsKnowledge is in people’s headsSharing of metadata is ad-hoc

PROCESSChanges are locally acquired, made and consumedSharing through conversations with ‘incumbents’ Infrequent changes

TECHNOLOGYSpreadsheets and unstructured toolsApplication specific metadata

components

Page 11: Baseline Findings EPA Enterprise Data Architecture / Data Management Metadata Mike Fleckenstein, Practice Leader, MDM, Project Performance Corp. mfleckenstein@ppc.com

Metadata Maturity Model – Phase II

I. Ad Hoc

II. Discovered

III. Managed

IV. Integrated

V. Optimized

PEOPLEManagement

awarenessSporadic adding to

various repositories ‘Talk’ about

importance of sharing metadata

PROCESSLimited sharing of

metadataLocal or semi-local

repositoriesLocal attempts at

managing metadataExploration of core

metadata and metadata tools

TECHNOLOGYModeling toolsApplication specific metadata

componentsSome metadata management

toolsMix

Page 12: Baseline Findings EPA Enterprise Data Architecture / Data Management Metadata Mike Fleckenstein, Practice Leader, MDM, Project Performance Corp. mfleckenstein@ppc.com

Metadata Maturity Model – Phase III

I. Ad Hoc

II. Discovered

III. Managed

IV. Integrated

V. Optimized

PEOPLEData stewardsData governance bodyManagement understands

importance of administering metadata

PROCESSGovernance process

is created and enforced

WorkflowsCommunication with

‘outside’ departments

Beginnings of real-time integration

TECHNOLOGYMetadata management tools with

governance processWorkflow engineBusiness rule engineData integration tools

Page 13: Baseline Findings EPA Enterprise Data Architecture / Data Management Metadata Mike Fleckenstein, Practice Leader, MDM, Project Performance Corp. mfleckenstein@ppc.com

Metadata Maturity Model – Phase IV

I. Ad Hoc

II. Discovered

III. Managed

IV. Integrated

V. Optimized

PEOPLEConstantly seeking optimizationMetadata administrators –

centralized validation

PROCESSEnterprise-level standardsTaxonomy, Ontologies, etc.Authoritative data sources for entities

TECHNOLOGYCollaboration toolsEnterprise data

modeling toolVocabulary and

taxonomy management tool

Page 14: Baseline Findings EPA Enterprise Data Architecture / Data Management Metadata Mike Fleckenstein, Practice Leader, MDM, Project Performance Corp. mfleckenstein@ppc.com

Metadata Maturity Model – Phase V

I. Ad Hoc

II. Discovered

III. Managed

IV. Integrated

V. Optimized

PEOPLEStart managing metadata as part of

businessCritical, ubiquitous, invisible part of the

organization

PROCESSAutomated real-time integrationDomain ontologies & topic mapsSeamless integration at low cost

TECHNOLOGYOntology

managementReasoning

technologyData mediation

Page 15: Baseline Findings EPA Enterprise Data Architecture / Data Management Metadata Mike Fleckenstein, Practice Leader, MDM, Project Performance Corp. mfleckenstein@ppc.com

Data Governance Components

Data Stewardso Principle – ‘Guardians’ of Datao Business – Help define data and stewardship standards

Data Architectso Part of EA; Understand EAo Broker requests for new data and data changeso Responsible for enterprise-wide taxonomy

Data Advisory Committee (DAC)o Strategico Managers & Execso Broad representation

Infrastructure Teamo Responsible for physical architecture and data

provisiono DBA’s & Developerso Systems & Network Administrators

Page 16: Baseline Findings EPA Enterprise Data Architecture / Data Management Metadata Mike Fleckenstein, Practice Leader, MDM, Project Performance Corp. mfleckenstein@ppc.com

DOI Data Governance Framework

Roles, Responsibilities, and Relationships

DAC

DOI Data Architect

Principal Data Stewards

Bureau Data Architect

Database Administrator (DBA) provides access and designs/develops interfaces and connections

DOI Data Architecture

Guidance Body

Subject Matter Expert (SME) analyzes and defines data requirements

Executive Sponsor

Appoints Principals, ensures adequate funding, delegates decision-making authority in areas of data requirements, standardization, and quality for a business subject area.

Coordinates the creation or review of proposed data standards with all business data stewards & Bureau Data Architects for their respective business subject area. Maintains current proposed DOI Data Standards for their respective business line. Submits Data Standards to DOI Data Architect for formal review process. Resolves review comments and conflicting data issues. Champions the use of the official DOI data standards.

Data stewards at the Bureau/Office level. Coordinates implementation of new Data standards with SME and DBA in systems supporting a business line. Ensures data security & data quality requirements for each data Standard.

Maintains & publishes the DOI Data Reference Model (DRM) in coordination with Principal Data Stewards and Bureau Data Architects. Promotes the Data Program.

Assists the DOI Data Architect in the implementation of the Data Program among Bureaus in coordination with Principal Data Stewards and Business Data Stewards. Maintains Bureau unique data standards in coordination with business data stewards.

(Data Advisory Committee)

Business Data Steward

DOI E-Gov Team

Business Line

Data Panel

Coordination with External Standards Bodies and other Communities of Interest (COIs)

DOI data governance process facilitates collaboration and consensus among all business lines and communities of interests

Yellow = IT perspective Green = Business perspective Gray = a mixed perspective

Page 17: Baseline Findings EPA Enterprise Data Architecture / Data Management Metadata Mike Fleckenstein, Practice Leader, MDM, Project Performance Corp. mfleckenstein@ppc.com

Value vs. Cost of Metadata

• ROI point• Start of governance• Right of Phase III

Co

st

None Ad-Hoc Discovered Managed Integrated Optimized

Us

efu

lln

es

s

High awareness but no governance

Sharp rise in cost for unmanaged metadata

Page 18: Baseline Findings EPA Enterprise Data Architecture / Data Management Metadata Mike Fleckenstein, Practice Leader, MDM, Project Performance Corp. mfleckenstein@ppc.com

The Dublin Core Standard

o Created in 1995 to aid internet searcheso Most common standardo Primarily for 'document-like objects' (DLOs)o Example: 'Author = Ronald Snijder‘o Qualifier: 'Author (type=personalName) = Ronald

Snijder‘o Each element can be repeated (e.g. 'Author

(type=personalName) = Seargent Pepper‘o Every metadata description should describe just

one information resourceo 15 Elements

Page 19: Baseline Findings EPA Enterprise Data Architecture / Data Management Metadata Mike Fleckenstein, Practice Leader, MDM, Project Performance Corp. mfleckenstein@ppc.com

Dublin Core Element Set (http://dublincore.org/documents/dces/)DC Element Definition Reference

1. Title Name given to the resource  2. Creator Entity primarily responsible for resource

content 

3. Subject Topic of the content  4. Description Account of the content  5. Publisher Entity responsible for making the resource

available  

6. Contributor Entity responsible for making contributions  7. Date Date associated with an event in the life

cycle of the resourcehttp://www.iso.org/iso/date_and_time_format

8. Type Nature or genre of the content http://dublincore.org/documents/dcmi-type-vocabulary/

9. Format Physical or digital manifestation http://en.wikipedia.org/wiki/Internet_media_type

10. Identifier Unambiguous reference to the resource within a given context

 

11. Source Reference to a resource from which the present resource is derived

 

12. Language Language of the intellectual content http://www.ietf.org/13. Relation Reference to a related resource  14. Coverage Extent or scope of the content  15. Rights Rights held in and over the resource  

Many Standards bodies exist

Not syntax specific

DC content is modifiable

IntrinsicalityEach element is optional

Extensions can be used & registered

Page 20: Baseline Findings EPA Enterprise Data Architecture / Data Management Metadata Mike Fleckenstein, Practice Leader, MDM, Project Performance Corp. mfleckenstein@ppc.com

Dublin Core Framework & Extensions

Domain specific metadata

Portal Content Management

Records Managment

Dublin Core

CLF

Dublin Core adopted as standard

Mandatory set of Common Look and Feel elements

Metadata extensions for managing information through its lifecycle

Domain specific metadata extensions (e.g. geospatial)

Extensions for clusters and gateways