View
212
Download
0
Tags:
Embed Size (px)
Citation preview
Baseline FindingsEPA Enterprise Data Architecture / Data Management Metadata
Mike Fleckenstein, Practice Leader, MDM, Project Performance Corp.
Types of Data
Transactional data o Measurements at a point in timeo Dollars earned or units soldo Used for trend analysis
Reference dataoEntity by which transactions measured o‘Country’, ‘Prefix’ and ‘IndustryoOften inconsistently and redundantly stored within an organization
Master data oSingle version of the truth oKey corporate reference entities like ‘Customer’, ‘Location’ and ‘Product’
Metadata oDescribes objects by connecting objects to the subjects they are about
Types of Metadata
Technical - data sources, access protocol (ODBC, JDBC, SQL*NET, etc.), physical schema (database definition, table definition, column definition, etc), logical data source (ER models, object models, etc.)
Example: people within IT supporting financial reporting know that the financial data mart resides on machine "XPT001;" the data mart is refreshed, "12 a.m. every Saturday night;" data is sourced from "Hyperion GL" and period data was captured in "AP column.”
Business - contextual data about the information retrieved; taxonomies that define business organizations and product hierarchies; controlled vocabulary or reference data that are used to define business terms such as a medical dictionary, financial terminology and such.
Example: people in the finance department know performance reports come "once a month;" "GPR" stands for "Global Performance Report;" "AP7" means "Accounting Period Number 7;" and accounting period starts in "February." These descriptions are business meta data.
Metadata & Related Terms
o Metadata describes objects, and one of the ways in which it does that is by connecting objects to the subjects they are about
o Controlled vocabulary is a closed list of subjects, that can be used for classification
o Taxonomy is a subject-based classification that arranges the terms in the controlled vocabulary into a hierarchy
o Thesauri take taxonomies and extend them to make them better able to describe the world by not only allowing subjects to be arranged in a hierarchy
Taxonomy
Metadata can be organized using a taxonomy
Helps an audience find information more easily
Blue lines reflect metadata; black lines reflect taxonomy
Blue lines – metadata about the paper
Black lines – subject-based taxonomy
United States
New York State
New York City
Manhatten
Brooklyn
Queens
Staten Island
Bronx
Navigating Your Way Around New York City
John Doe & Molly Pepper
September, 2008
Title:
Author:
Publication Date:
Taxonomy Core Characteristics
o Simple terminologyo Looser, flatter and more intuitive than
traditional taxonomieso E.g. Eight top levels, three levels deep eacho Usability in favor of detailo Fewer ‘clicks’o Must be easy to altero Don’t overanalyze with too many ‘what ifs’o United understanding
Taxonomy Categorization Schemes
Hard
est E
asie
st
Method Definition Examples
Facet-based Information categorized into multiple taxonomies or “stackonomies” based on unique but pervasive characteristics including topic, function, etc.
Wines by regionFrance > Alsace
Wines by typeWhite > Chardonnay
Wines by price
Subject-oriented
Information categorized by subject or topic. Instantive - each child category is an instance of the parent category Partitive - each child category is a part of the parent category
water pollution, soil pollution, air pollution…
Functional Information categorized by the process to which it relates
employment, staffing, training
Organizational Information categorized by corporate departments or business entities.
Human Resources, Marketing, Accounting, Research…
Document Type Information categorized by the type of document
presentations, expense reports, press releases …
Thesauri (e.g. ISO2788)
BT ( Broader Term) - refers to the term above this one in the hierarchy
SN (Scope Node) - a string attached to the term explaining its meaning
USE - refers to another term that is preferred to this term
TT (Top Term) - refers to the topmost ancestor
RT (Related Term)- refers to a term, related to this term, without being a synonym
United States
New York State
New York City
Manhatten
Brooklyn
Queens
Staten Island
Bronx
Navigating Your Way Around New York City
John Doe & Molly Pepper
September, 2008
Title:
Author:
Publication Date:
BT
USENYC
TT
Burroughs RT
The largest city in New York State and in the
United States
SN
Metadata Maturity Model
I. Ad Hoc
II. Discovered
III. Managed
IV. Integrated
V. Optimized
METADATA MANAGEMENTThe organization of technical and business metadata with the goal to advance the sharing, retrieving and understanding of enterprise information assets.
WITH NO METADATA MGMT Information is lost or
hiddenData integration is costlyCannot support everyday
business Information is difficult to
findPartial & dated informationLoss of trust in data
Metadata Maturity Model – Phase I
I. Ad Hoc
II. Discovered
III. Managed
IV. Integrated
V. Optimized
PEOPLESmall group of rouge metadata
warriorsKnowledge is in people’s headsSharing of metadata is ad-hoc
PROCESSChanges are locally acquired, made and consumedSharing through conversations with ‘incumbents’ Infrequent changes
TECHNOLOGYSpreadsheets and unstructured toolsApplication specific metadata
components
Metadata Maturity Model – Phase II
I. Ad Hoc
II. Discovered
III. Managed
IV. Integrated
V. Optimized
PEOPLEManagement
awarenessSporadic adding to
various repositories ‘Talk’ about
importance of sharing metadata
PROCESSLimited sharing of
metadataLocal or semi-local
repositoriesLocal attempts at
managing metadataExploration of core
metadata and metadata tools
TECHNOLOGYModeling toolsApplication specific metadata
componentsSome metadata management
toolsMix
Metadata Maturity Model – Phase III
I. Ad Hoc
II. Discovered
III. Managed
IV. Integrated
V. Optimized
PEOPLEData stewardsData governance bodyManagement understands
importance of administering metadata
PROCESSGovernance process
is created and enforced
WorkflowsCommunication with
‘outside’ departments
Beginnings of real-time integration
TECHNOLOGYMetadata management tools with
governance processWorkflow engineBusiness rule engineData integration tools
Metadata Maturity Model – Phase IV
I. Ad Hoc
II. Discovered
III. Managed
IV. Integrated
V. Optimized
PEOPLEConstantly seeking optimizationMetadata administrators –
centralized validation
PROCESSEnterprise-level standardsTaxonomy, Ontologies, etc.Authoritative data sources for entities
TECHNOLOGYCollaboration toolsEnterprise data
modeling toolVocabulary and
taxonomy management tool
Metadata Maturity Model – Phase V
I. Ad Hoc
II. Discovered
III. Managed
IV. Integrated
V. Optimized
PEOPLEStart managing metadata as part of
businessCritical, ubiquitous, invisible part of the
organization
PROCESSAutomated real-time integrationDomain ontologies & topic mapsSeamless integration at low cost
TECHNOLOGYOntology
managementReasoning
technologyData mediation
Data Governance Components
Data Stewardso Principle – ‘Guardians’ of Datao Business – Help define data and stewardship standards
Data Architectso Part of EA; Understand EAo Broker requests for new data and data changeso Responsible for enterprise-wide taxonomy
Data Advisory Committee (DAC)o Strategico Managers & Execso Broad representation
Infrastructure Teamo Responsible for physical architecture and data
provisiono DBA’s & Developerso Systems & Network Administrators
DOI Data Governance Framework
Roles, Responsibilities, and Relationships
DAC
DOI Data Architect
Principal Data Stewards
Bureau Data Architect
Database Administrator (DBA) provides access and designs/develops interfaces and connections
DOI Data Architecture
Guidance Body
Subject Matter Expert (SME) analyzes and defines data requirements
Executive Sponsor
Appoints Principals, ensures adequate funding, delegates decision-making authority in areas of data requirements, standardization, and quality for a business subject area.
Coordinates the creation or review of proposed data standards with all business data stewards & Bureau Data Architects for their respective business subject area. Maintains current proposed DOI Data Standards for their respective business line. Submits Data Standards to DOI Data Architect for formal review process. Resolves review comments and conflicting data issues. Champions the use of the official DOI data standards.
Data stewards at the Bureau/Office level. Coordinates implementation of new Data standards with SME and DBA in systems supporting a business line. Ensures data security & data quality requirements for each data Standard.
Maintains & publishes the DOI Data Reference Model (DRM) in coordination with Principal Data Stewards and Bureau Data Architects. Promotes the Data Program.
Assists the DOI Data Architect in the implementation of the Data Program among Bureaus in coordination with Principal Data Stewards and Business Data Stewards. Maintains Bureau unique data standards in coordination with business data stewards.
(Data Advisory Committee)
Business Data Steward
DOI E-Gov Team
Business Line
Data Panel
Coordination with External Standards Bodies and other Communities of Interest (COIs)
DOI data governance process facilitates collaboration and consensus among all business lines and communities of interests
Yellow = IT perspective Green = Business perspective Gray = a mixed perspective
Value vs. Cost of Metadata
• ROI point• Start of governance• Right of Phase III
Co
st
None Ad-Hoc Discovered Managed Integrated Optimized
Us
efu
lln
es
s
High awareness but no governance
Sharp rise in cost for unmanaged metadata
The Dublin Core Standard
o Created in 1995 to aid internet searcheso Most common standardo Primarily for 'document-like objects' (DLOs)o Example: 'Author = Ronald Snijder‘o Qualifier: 'Author (type=personalName) = Ronald
Snijder‘o Each element can be repeated (e.g. 'Author
(type=personalName) = Seargent Pepper‘o Every metadata description should describe just
one information resourceo 15 Elements
Dublin Core Element Set (http://dublincore.org/documents/dces/)DC Element Definition Reference
1. Title Name given to the resource 2. Creator Entity primarily responsible for resource
content
3. Subject Topic of the content 4. Description Account of the content 5. Publisher Entity responsible for making the resource
available
6. Contributor Entity responsible for making contributions 7. Date Date associated with an event in the life
cycle of the resourcehttp://www.iso.org/iso/date_and_time_format
8. Type Nature or genre of the content http://dublincore.org/documents/dcmi-type-vocabulary/
9. Format Physical or digital manifestation http://en.wikipedia.org/wiki/Internet_media_type
10. Identifier Unambiguous reference to the resource within a given context
11. Source Reference to a resource from which the present resource is derived
12. Language Language of the intellectual content http://www.ietf.org/13. Relation Reference to a related resource 14. Coverage Extent or scope of the content 15. Rights Rights held in and over the resource
Many Standards bodies exist
Not syntax specific
DC content is modifiable
IntrinsicalityEach element is optional
Extensions can be used & registered
Dublin Core Framework & Extensions
Domain specific metadata
Portal Content Management
Records Managment
Dublin Core
CLF
Dublin Core adopted as standard
Mandatory set of Common Look and Feel elements
Metadata extensions for managing information through its lifecycle
Domain specific metadata extensions (e.g. geospatial)
Extensions for clusters and gateways